Details
Missing data in confounding variables present a frequent challenge in generating evidence using real-world data, including electronic health records (EHR). Our objective was to apply a recently published toolkit for characterizing missing data patterns and based on the toolkit results about likely missingness mechanisms, illustrate the decision-making process for analyses in an empirical case example. We utilized the Structural Missing Data Investigations (SMDI) toolkit to characterize missing data patterns in the context of a pharmacoepidemiology study comparing cardiovascular outcomes of initiating sodium-glucose-cotransporter-2 inhibitors (SGLT2i) and dipeptidyl peptidase‐4 inhibitors (DPP‐4i) among older adults. The study used a linked EHR-Medicare claims dataset from Duke Health patients (2015–2017), focusing on partially observed confounders from EHR data (HbA1c lab and body mass index [BMI] values). Our analysis incorporated SMDI's descriptive functions and diagnostic tests to explore missingness patterns and determine missingness mitigation approaches. We used findings from these investigations to inform estimation of adjusted hazard ratios comparing the two classes of medications.