Investigation of bias in an epilepsy machine learning algorithm trained on physician notes

Abstract

Racial disparities in the utilization of epilepsy surgery are well documented, but it is unknown whether a natural language processing (NLP) algorithm trained on physician notes would produce biased recommendations for epilepsy presurgical evaluations. To assess this, an NLP algorithm was trained to identify potential surgical candidates using 1097 notes from 175 epilepsy patients with a history of resective epilepsy surgery and 268 patients who achieved seizure freedom without surgery (total N = 443 patients). The model was tested on 8340 notes from 3776 patients with epilepsy whose surgical candidacy status was unknown (2029 male, 1747 female, median age = 9 years; age range = 0‐60 years). Multiple linear regression using demographic variables as covariates was used to test for correlations between patient race and surgical candidacy scores. After accounting for other demographic and socioeconomic variables, patient race, gender, and primary language did not influence surgical candidacy scores (> .35 for all). Higher scores were given to patients >18 years old who traveled farther to receive care, and those who had a higher family income and public insurance (< .001, .001, .001, and .01, respectively). Demographic effects on surgical candidacy scores appeared to reflect patterns in patient referrals.

0