Details
Medical product safety surveillance efforts, whether using electronic health record (EHR) or claims data, typically rely on structured codes. Utilizing unstructured EHR data, particularly information extracted from clinical text through natural language processing (NLP), enriches information available for data mining, phenotyping, and surveillance. To assess overlapping and distinct information across structured and unstructured EHR data, we mapped both to a common vocabulary (Medical Dictionary for Regulatory Activities, MedDRA). We assess the feasibility of implementing such a mapping and explored similarities and differences at multiple levels of the concept hierarchy.
We randomly sampled 15,000 encounters (5,000 each from ambulatory, emergency, and inpatient encounters). For each encounter, we extracted MedDRA concepts from clinical notes using MetaMap and mapped structured International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnoses to MedDRA. We evaluated corroboration between data sources across the MedDRA hierarchy, as well as the unique information contributed by each source.