Natural history of rare diseases using Natural Language Processing of narrative unstructured electronic health records: the example of Dravet syndrome

Abstract

Objective

The increasing implementation of electronic health records allows the use of advanced text-mining methods for establishing new patient-phenotypes and stratification, and for revealing outcome correlations. In this study we aimed to explore the electronic narrative clinical reports of a cohort of patients with Dravet Syndrome (DS) longitudinally followed at our center, to identify the capacity of this methodology to retrace natural history of DS during the early years.

Methods

We used a document-based clinical data warehouse employing Natural Language Processing to recognize the phenotype concepts in the narrative medical reports.

We included patients with DS who have a medical report produced before the age of two and a follow-up after the age of 3 years (“DS cohort”-56 individuals). We selected two control populations, a “general control cohort” (275 individuals) and a “neurological control cohort” (281 individuals), with similar characteristics in terms of gender, number of reports and age at last report.

To find concepts specifically associated with DS, we performed a phenome-wide association study using Cox regression, comparing the reports of the three cohorts. We then performed a qualitative analysis of the surviving concepts based on their median age of first appearance.

Results

A total of 76 concepts were prevalent in the reports of children with DS. Concepts appearing during the first 2 years were mostly related with the epilepsy features at the onset of DS (convulsive and prolonged seizures triggered by fever, often requiring in-hospital care). Subsequently, concepts related to new types of seizures and to drug-resistancy appeared. A series of non-seizures related concepts emerged after the age of 2-3 years, referring to the non-seizures comorbidities classically associated to DS.

Significance

The extraction of clinical terms by narrative reports of children with DS, allows to outline the known natural history of this rare disease in early childhood. This original model of “longitudinal phenotyping” could be applied for other rare and very rare conditions with poor natural history description.

0