Skip to main content

Targeted Learning with an Undersmoothed Lasso Propensity Score Model for Large-Scale Covariate Adjustment in Healthcare Database Studies

    Basic Details

    Healthcare data from routine-care delivery, such as electronic health records(EHRs) and administrative claims, can provide real-world evidence(RWE) on medical product effects. However, estimating causal effects can be challenging due to confounding and poorly measured information on comorbidities. To improve confounding control, data-driven algorithms can be used to identify and adjust for large numbers of variables that indirectly capture information on unmeasured or unspecified confounding factors. Lasso regression is a widely used tool for dimension reduction, but undersmoothing can improve confounding control in sparse high-dimensional datasets. In this study, we evaluate the effectiveness of collaborative-controlled targeted learning in data-adaptive undersmoothing for fitting large-scale propensity score(PS) models, revealing that cross-fitting was crucial for avoiding non-overlap in covariate distributions and reducing bias in causal estimates.


    Richard Wyss, Mark van der Laan, Susan Gruber, Xu Shi, Hana Lee, Sarah K. Dutcher, Jennifer C. Nelson, Sengwee Toh, Massimiliano Russo, Shirley V. Wang, Rishi J. Desai, Kueiyu Joshua Lin

    Corresponding Author

    Dr. Richard Wyss; Division of Pharmacoepidemiology and Pharmacoeconomics Brigham and Women’s Hospital, Harvard Medical School