Skip to main content

Electronic Phenotyping of Health Outcomes of Interest Using a Linked Claims-Electronic Health Record Database: Findings from a Machine Learning Pilot Project

    Basic Details

    Claims-based algorithms are used in the Food and Drug Administration Sentinel Active Risk Identification and Analysis System to identify occurrences of health outcomes of interest (HOIs) for medical product safety assessment. This project aimed to apply machine learning classification techniques to demonstrate the feasibility of developing a claims-based algorithm to predict an HOI in structured electronic health record (EHR) data. We used the 2015-2019 IBM MarketScan Explorys Claims-EMR Data Set, linking administrative claims and EHR data at the patient level. We focused on a single HOI, rhabdomyolysis, defined by EHR laboratory test results. Using claims-based predictors, we applied machine learning techniques to predict the HOI: logistic regression, LASSO (least absolute shrinkage and selection operator), random forests, support vector machines, artificial neural nets, and an ensemble method (Super Learner).


    Teresa B. Gibson, Michael D. Nguyen, Timothy Burrell, Frank Yoon, Jenna Wong, Sai Dharmarajan, Rita Ouellet-Hellstrom, Wei Hua, Yong Ma, Elande Baro, Sarah Bloemers, Cory Pack, Adee Kennedy, Sengwee Toh, Robert Ball

    Corresponding Author

    Teresa B. Gibson, Government Health and Human Services, IBM Watson Health, Bethesda, Maryland, USA Email: