Refining Prediction of a Rare Disorder through Patient Medical History and Machine Learning

Seo Ho (Michael) Song, MD, PhD

Resident – BIDMC Harvard Psychiatry Residency Training Program
Seo Ho (Michael) Song poster

Scientific Abstract

Background: Stiff person syndrome (SPS) is a rare disorder characterized by muscle spasms and painful rigidity. Low prevalence and statistical underpowering have stymied efforts to explore its pathology and comorbidities to optimize diagnoses and therapy. Leveraging underutilized data embedded in the electronic medical records of individuals diagnosed with SPS, the current work developed and validated a framework to identify information in an individual’s medical history that contribute towards a diagnosis of SPS.

Methods: This retrospective cohort study assessed 23 patients carrying SPS diagnoses and 25 controls, all of whom were positive for anti-glutamic acid decarboxylase (anti-GAD) antibodies and registered at Dartmouth Health. Binarizing information listed as clinical problems in electronic medical records revealed 319 unique features. These were inputted in a feature selection analysis (contribution selection algorithm, CSA) that used support vector machines (SVM) to identify medical history items that best discriminate between patients with SPS vs. anti-GAD positive controls.

Each iteration of CSA generated SHapley Additive exPlanation (SHAP) values for each binarized feature. The model’s performance–in correctly classifying SPS–was evaluated based on precision, recall, F-score, AUC, classifier accuracy, and Matthews Correlation Coefficient (MCC) calculated from each SVM model via repeated stratified 4-fold cross-validation.

Results: Our algorithm identified depression, hypothyroidism, GERD, and joint pain as the top four predictors of SPS as they maximized predictive performance in the SVM models that used them: precision 0.817 (95%CI: 0.795, 0.840), recall 0.766 (95%CI: 0.743, 0.790), F-score 0.761 (95%CI: 0.744, 0.778), AUC 0.822 (95%CI: 0.806, 0.839), classification accuracy 0.775 (95%CI: 0.759, 0.790), MCC 0.565 (95%CI: 0.534, 0.597).

Conclusion: Our machine learning-based approach identified depression amongst the top predictors of SPS. This strategy offers a powerful inductive tool for hypothesis generation, forming a framework that complements hypothesis-driven investigations towards a closed loop platform to aid diagnostic challenges in neuropsychiatry.

Live Zoom Session – March 1st

research Areas