Skip to main content

Functional Clustering Methods for Longitudinal Data with Application to Electronic Health Records

    Basic Details

    We develop a method to estimate subject-level trajectory functions from longitudinal data. The approach can be used for patient phenotyping, feature extraction, or, as in our motivating example, outcome identification, which refers to the process of identifying disease status through patient laboratory tests rather than through diagnosis codes or prescription information. We model the joint distribution of a continuous longitudinal outcome and baseline covariates using an enriched Dirichlet process prior. This joint model decomposes into (local) semiparametric linear mixed models for the outcome given the covariates and simple (local) marginals for the covariates. The nonparametric enriched Dirichlet process prior is placed on the regression and spline coefficients, the error variance, and the parameters governing the predictor space. This leads to clustering of patients based on their outcomes and covariates. We predict the outcome at unobserved time points for subjects with data at other time points as well as for new subjects with only baseline covariates.


    Bret Zeldow, James Flory, Alisa Stephens-Shields, Marsha Raebel, Jason A. Roy

    Corresponding Author

    Bret Zeldow, Department of Mathematics and Statistics, Colby College, Waterville, ME