The U.S. Food and Drug Administration’s Sentinel System is a national medical product safety surveillance system consisting of a large multi-site distributed database of administrative claims supplemented by electronic healthcare record (EHR) data. The program seeks to improve data capture of race and ethnicity for pharmacoepidemiology studies.
We conducted a narrative literature review of published research on data augmentation and imputation methods to improve race and ethnicity capture in U.S. health care systems databases. We focused on methods with limited (5-digit ZIP codes only) or full patient identifiers available to link to external sources of self-reported data. We organized the literature by themes: 1) variation in data capture of self-reported data, 2) data augmentation from external sources of self-reported data, and 3) imputation methods, including Bayesian analysis and multiple regression.