An Expedited Chart Review Process for Large Database Studies Using Natural Language Processing and Multi-Wave Adaptive Sampling

Details

Basic Details

Date

Tuesday, April 7, 2026

Type

Publication

Description

One of the ways to enhance analyses conducted with large claims databases is by validating the measurement characteristics of the code-based algorithms used to identify health outcomes or other key study parameters of interest. These metrics can be used in quantitative bias analyses to assess the robustness of results for an inferential study given potential bias from outcome misclassification. However, performing this validation through manual chart review of free-text notes from linked electronic health records requires extensive time and resource allocation.

We describe an expedited process for validating code-based algorithms that introduces efficiency using two distinct mechanisms: 1) use of natural language processing (NLP) to reduce time spent by human reviewers to review each chart, and 2) a multi-wave adaptive sampling approach with pre-defined criteria to stop the validation study once performance characteristics are identified with sufficient precision. We illustrate this process in a case study that validates the performance of a claims-based outcome algorithm for intentional self-harm in patients with obesity.

Materials

Epidemiology.2026 Apr 7.doi.org 10.1097/EDE.0000000000001978

Additional Information

Contributors

Author(s)

Shirley V. Wang, Georg Hahn, Sushama Kattinakere Sreedhara, Mufaddal Mahesri, Haritha S. Pillai, Rajendra Aldis, Joyce Lii, Sarah K. Dutcher, Rhoda Eniafe, Jamal T. Jones, Keewan Kim, Jiwei He, Hana Lee, Sengwee Toh, Rishi J. Desai, Jie Yang

Corresponding Author

Shirley V. Wang; Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.

Email: swang1@bwh.harvard.edu