Skip to main content

Advancing Scalable Natural Language Processing Approaches for Unstructured Electronic Health Record Data

    Basic Details
    Date Posted
    Status
    Complete
    Health Outcome(s)
    COVID-19
    Description

    Administrative claims data, which comprise most of the data in the current Sentinel Distributed Database, are limited in their ability to study severely ill hospitalized patients, including those in the intensive care unit. For example, data on duration of mechanical ventilation, symptom severity, oxygen saturation, supplemental oxygenation requirements, ventilator settings, and lung imaging features are not readily available in claims data. 

    Information-rich unstructured electronic health record (EHR) data may address these challenges and enhance medical product safety surveillance. COVID-19 is a compelling use case for using unstructured EHR data to improve medical product safety assessment in Sentinel since studying COVID-19 requires accurate information about its symptoms, severity, comorbidities, and treatment outcomes and because patients with COVID-19 disease often present in emergency department and receive care in inpatient settings. 

    This project will yield algorithms and methods to better position Sentinel for addressing future pandemics and advance Sentinel’s long-term objective of enhancing medical product safety surveillance by incorporating EHR data into surveillance methods. Specifically, we will use natural language processing (NLP) and text mining methods to identify patients with COVID-19 disease and to extract clinical features from unstructured EHR data. These features will complement structured data already available in the Sentinel Common Data Model (SCDM). This project will also develop processes that would allow us to more rapidly prototype EHR-based information extraction algorithms for new health outcomes of interest (HOIs) in the case of future pandemics and other needs.
     

    Workgroup Leader(s)

    David Carrell, PhD; Kaiser Permanente Washington Health Research Institute, Seattle, WA

    Joshua C. Smith, PhD; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN

    Danijela Stojanovic, PharmD, PhD; Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD

    Yueqin Zhao, PhD, MS; Office of Biostatistics, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD
     

    Workgroup Member(s)

    David Aronoff, MD; Kevin Johnson, MD, MS; Steven Johnson, DBA, MS; Michael Matheny, MD, MS, MPH; Dax Westerman, MS; Robert Winter; Vanderbilt University Medical Center, Nashville, TN

    David Cronkite, MS; Eric Johnson, Linda Kiel, Arvind Ramaprasan, MS; Kaiser Permanente Washington Health Research Institute, Seattle, WA

    Keith Marsolo, PhD; Duke Clinical Research Institute, Durham, NC

    William B. Feldman, MD, DPhil; Shamika More, MS; Rishi Desai, MS, PhD; Shirley Wang, PhD; Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA

    Adee Kennedy, MS, MPH; Darren Toh, ScD; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA