A big data approach to the development of mixed-effects models for seizure count data



Our objective was to develop a generalized linear mixed model for predicting seizure count that is useful in the design and analysis of clinical trials. This model also may benefit the design and interpretation of seizure-recording paradigms. Most existing seizure count models do not include children, and there is currently no consensus regarding the most suitable model that can be applied to children and adults. Therefore, an additional objective was to develop a model that accounts for both adult and pediatric epilepsy.


Using data from SeizureTracker.com, a patient-reported seizure diary tool with >1.2 million recorded seizures across 8 years, we evaluated the appropriateness of Poisson, negative binomial, zero-inflated negative binomial, and modified negative binomial models for seizure count data based on minimization of the Bayesian information criterion. Generalized linear mixed-effects models were used to account for demographic and etiologic covariates and for autocorrelation structure. Holdout cross-validation was used to evaluate predictive accuracy in simulating seizure frequencies.


For both adults and children, we found that a negative binomial model with autocorrelation over 1 day was optimal. Using holdout cross-validation, the proposed model was found to provide accurate simulation of seizure counts for patients with up to four seizures per day.


The optimal model can be used to generate more realistic simulated patient data with very few input parameters. The availability of a parsimonious, realistic virtual patient model can be of great utility in simulations of phase II/III clinical trials, epilepsy monitoring units, outpatient biosensors, and mobile Health (mHealth) applications.