Skip to main content

A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records

    Basic Details

    This study aims to develop a principled approach to empirically characterize missing data processes for partially observed Electronic Health Record (EHR) confounders and investigate the comparative performance of several analytic methods under various missingness scenarios using the plasmode simulation framework. This will help improve the statistical analysis of partially observed confounder information in pharmacoepidemiologic research. This study involves three empirical sub-cohorts of diabetic Sodium-glucose Cotransporter 2 (SGLT2) or Dipeptidyl Peptidase-4 Inhibitor (DPP4i) initiators with complete information on Hemoglobin A1c (HbA1c), Body Mass Index (BMI), and smoking as confounders of interest (COI) conducted data simulation under a plasmode framework. Four missingness mechanisms for COI were simulated, including completely at random (MCAR), at random (MAR), and two not at random (MNAR) mechanisms. The study evaluated the ability of three diagnostic groups to differentiate between mechanisms: differences in characteristics between patients with or without the observed COI, predictive ability of the missingness indicator based on observed covariates, and association of the missingness indicator with the outcome. Analytic methods including "complete case", inverse probability weighting, single and multiple imputation were compared in their ability to recover true treatment effects.


    Janick Weberpals, Sudha R Raman, Pamela A Shaw, Hana Lee, Massimiliano Russo, Bradley G Hammill, Sengwee Toh, John G Connolly, Kimberly J Dandreo, Fang Tian, Wei Liu, Jie Li, José J Hernández-Muñoz, Robert J Glynn, Rishi J Desai


    Corresponding Author

    Janick Weberpals; Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA