Details
This project focused on developing a stable, feasible approach to enable secure distributed linear, logistic, and Cox regression analysis within a distributed data network while not requiring sharing of any patient-level datasets from the participating data partners. Distributed regression analysis (DRA) enables data partners to maintain control of patient-level data while generating valid regression estimates across the network.
This page includes the following:
- Final Report: Final project report detailing methods, results, and conclusions.
- SAS-based DRA Application: Two SAS packages used to run DRA, one for Data Partners and one for the analysis center. The packages include all algorithms for linear, logistic, and cox regression.
- SAS-based DRA Application Documentation: Documentation of the DRA algorithms and set up of our SAS-based DRA application for execution in a horizontally partitioned distributed data network.
- SAS-based DRA Application (for testing): Two SAS packages used to test the SAS-based DRA Application, one for Data Partners and one for the analysis center. The packages include all algorithms for linear, logistic, and cox regression and also a macro that mimics the actions of a data sharing software for internal testing.
- Test Data: Zip file of the Boston Housing [1] and Maryland State Prison [2] datasets, and the three partitioned datasets used for distributed linear, logistic, and Cox proportional hazards regression analysis testing with the SAS-based DRA application. The original Boston Housing dataset can be found here and the original Maryland State Prison data can be found here.
- Linear DRA Sample Report: Report generated by %create_grep_rpt for distributed linear regression analysis with the partitioned Boston Housing dataset.
- Logistic DRA Sample Report: Report generated by %create_grep_rpt for distributed logistic regression analysis with the partitioned Boston Housing dataset.
- Cox DRA Sample Report 1: Report generated by %create_cox_grep_rpt for distributed Cox regression analysis with the partitioned Maryland convict dataset.
- Cox DRA Sample Report 2: Report generated by %create_cox_grep_rpt for distributed stratified (Data Partner site identifier) Cox regression analysis with the partitioned Maryland convict dataset.
___________________
[1] Harrison D, Rubinfeld DL. Hedonic housing prices and the demand for clean air. Journal of environmental economics and management. 1978;5(1):81-102.
[2] Rossi PH, Henry JP. Seriousness: A measure for all purposes. Handbook of criminal justice evaluation. 1980:489-505.
Additional Information
Contributors
Darren Toh, ScD; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA
Michael Nguyen, MD; Office of Surveillance and Epidemiology, Center for Drug and Evaluation Research, U.S. Food and Drug Administration, Silver Spring, MD
Qoua Her, PharmD; Jessica Malenfant, MPH; Yury Vilk, PhD; Jessica Young, PhD; Zilu Zhang, MSc; Sarah Malek, MPPA; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA