Skip to main content

Scalable Incident Detection Via Natural Language Processing and Probabilistic Language Models

    Basic Details
    Date
    Type
    Publication
    Description

    Post marketing safety surveillance depends in part on the ability to detect concerning clinical events at scale. Spontaneous reporting might be an effective component of safety surveillance, but it requires awareness and understanding among healthcare professionals to achieve its potential. Reliance on readily available structured data such as diagnostic codes risks under-coding and imprecision. Clinical textual data might bridge these gaps, and natural language processing (NLP) has been shown to aid in scalable phenotyping across healthcare records in multiple clinical domains.

    In this study, a novel incident phenotyping approach was developed and validated using unstructured clinical textual data agnostic to Electronic Health Record (EHR) and note type. It is based on a published, validated approach (PheRe) used to ascertain social determinants of health and suicidality across entire healthcare records. To demonstrate generalizability, this approach was validated on two separate phenotypes that share common challenges with respect to accurate ascertainment: (1) suicide attempt; (2) sleep-related behaviors. With samples of 89,428 records and 35,863 records for suicide attempt and sleep-related behaviors, respectively, silver standard (diagnostic coding) and gold standard (manual chart review) validation were conducted.

    The study showed an Area Under the Precision-Recall Curve of ~0.77 (95% CI 0.75–0.78) for suicide attempt and AUPR ~0.31 (95% CI 0.28–0.34) for sleep-related behaviors. Performance was also evaluated by coded race, demonstrating differences in performance by race across phenotypes. Scalable phenotyping models, like most healthcare AI, require algorithm vigilance and debiasing prior to implementation.

    Author(s)

    Colin G. Walsh, Drew Wilimitis, Qingxia Chen, Aileen Wright, Jhansi Kolli, Katelyn Robinson, Michael A. Ripperger, Kevin B. Johnson, David Carrell, Rishi J. Desai, Andrew Mosholder, Sai Dharmarajan, Sruthi Adimadhyam, Daniel Fabbri, Danijela Stojanovic, Michael E. Matheny and Cosmin A. Bejan

    Corresponding Author

    Colin G. Walsh; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN

    Email: Colin.walsh@vumc.org