Among the numerous methodological issues researchers encounter when using electronic health record (EHR) data, selection bias due to incomplete/missing data has received relatively little attention. Unfortunately, standard missing data approaches (e.g. inverse-probability weighting and multiple imputation) generally fail to acknowledge the complex interplay of heterogeneous decisions made by patients, providers, and health systems that govern whether specific data elements in the EHR are observed. Building on a recently-proposed framework for modularizing the data provenance, we develop a general and scalable framework for estimation and inference with respect to regression models based on inverse-probability weighting that allows for a hierarchy of missingness mechanisms to better align with the complex nature of EHR data.
Sebastien Haneuse, PhD