Skip to main content

The Centers for Medicare and Medicaid Services (CMS): Medicaid and Children’s Health Insurance Program (CHIP) Claims in Sentinel Common Data Model (SCDM) Format

    Basic Details
    Date Posted
    Status
    In progress
    Description

    Sentinel has converted the Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files (TAF) Research Identifiable Files (RIF) to the Sentinel Common Data Model (SCDM). These data contain Medicaid and Children’s Health Insurance Program (CHIP) data housed in the Center for Medicare and Medicaid Services' CMS Virtual Research Data Center (VRDC). Duke University Department of Population Health Sciences (DPHS) serves as the Sentinel Data Partner in accessing the source data in the CMS VRDC, transforming it into a SCDM compliant database, including the Mother Infant Linkage (MIL) Table, executing queries, and returning program package results to the Sentinel Operations Center (SOC).  

    Based on existing Sentinel workflows, we describe this work in two phases: 

    • The Phase A ETL process involves the transformation of source data into the core tables (ENROLLMENT, DEMOGRAPHIC, DISPENSING, ENCOUNTER, DIAGNOSIS, PROCEDURE, DEATH, FACILITY, and PROVIDER) of the SCDM; and
    • The Phase B ETL process involves the linking of live birth deliveries to infants in the data to create the MIL table.

    The Phase A ETL and Phase B ETL are separate, sequential processes. The Phase A ETL enables the transformation of Medicaid/CHIP RIFs into the core tables of the SCDM. The Phase B ETL links live birth deliveries in the MIL table. The Phase B ETL requires both the running of the Phase A ETL programming and the completion of the Phase A Quality Assurance (QA) package. For more information on the MIL table, see Mother-Infant Linkage Table.

    There are three components made available to the public: (1) Technical Specifications; (2) Code pack; and (3) User Guide.

    1. Technical Specifications: Describes the required extraction, transformation, and loading (ETL) processes and mappings specific to Medicaid/CHIP source data. This document consists of the following sections: 

    • Medicaid/CHIP Source Data: This section describes the content, structure, and update schedule of the 100% Medicaid/CHIP data stored in the VRDC
    • VRDC Environment: This section describes the relevant particulars of the VRDC computing environment
    • ETL Specifications: This section describes the different types of information required before starting a new ETL and the different build types that need to be supported
    • Source Data Mapping: This section describes the table-specific and field-specific mappings necessary to transform the Medicaid/CHIP data into SCDM-compliant intermediate tables
    • Final Tables: This section describes the process of combining intermediate tables to create the final tables that will be used for analyses

    Except as related to the implementation of the Medicaid/CHIP data, the specification document does not otherwise discuss the rationale or content of the SCDM.

    As guiding principles, the processes and programs created to accomplish this ETL should be flexible and extensible. This includes attributes such as the ability to handle different kinds of ETLs (e.g., incremental build v. full rebuild), the ability to create intermediate files that can be easily reused in a subsequent ETL, and the ability to easily add new Medicaid/CHIP data sources into the process.

    2. Code Pack: Includes the following features:

    • All parameters relating to each type of source file accessed (e.g., Demographic and Eligibility (DE), Pharmacy (RX) files, Inpatient (IP), Long-Term Care (LT), and Other Services (OT) claims files
    • All parameters relating to use of already transformed source data into SCDM-formatted data files and/or how to bypass their availability
    • Establishment of SAS data libraries (i.e., LIBNAMEs) for source, intermediate and permanent files
    • Highlights on code that is unique to the CMS VRDC environment, including but not limited to security settings, remote submits, use of SAS Grid© for simultaneous processing, standard data libraries, etc. which may not be applicable to public users of the code pack
    • Sequencing of any program execution
    • A list of included programs and macros to serve as a “packing list,” so that users of the code pack can be sure that their pack is complete

    3. User Guide: Includes information on how to use the code pack, along with guidance for researchers who may have different source files and/or programming environments available. The target audience for this document is researchers who wish to create SCDM-compliant tables using Medicaid/CHIP data. While the programs in the Code Pack documented above are specific to the processing of the Medicaid and CHIP data within the VRDC, we anticipate that the mapping information, specifically, will be of use to all researchers.

    The associated files on this site are for Sentinel TAF RIF (CMS Medicaid and CHIP source files) for the most recently approved ETL, utilizing the SCDM version as described in the Technical Specifications.

    Minimum Requirements

    • SAS version 9.4 or later

    Disclaimer

    • The content on this page is technical and intended for use by scientists, analysts, and programmers, in various areas of expertise.
    • This SAS program package uses source data from the Centers for Medicare and Medicaid Services (CMS) 100% Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files Research Identifiable File (TAF RIF) source data. The SAS program package was designed for execution within CMS’s Virtual Research Data Center (VRDC) environment administered by the Research Data Assistance Center (ResDAC) with the following technical resources:
    • Source data files obtained from CMS by other researchers may have different file names, different partitioning schemes (e.g., annual RIF), different samples of the data, and possibly different variables and/or variable names. Users are responsible for making any adjustments to the SAS program package to be compatible with source data they receive from CMS for implementation in their technical environment.
    • VRDC access provisioned with 32Gbytes of RAM
    • SAS version 9.4 or later
    • Sufficient disk storage resources for source datasets, SCDM datasets, WORK data library space, and results of program packages
    • A SAS Grid© of multiple computers enabling simultaneous processing
    • There is no mechanism for technical support by Duke University, the Sentinel Operations Center, ResDAC, CMS, or by the U. S. Food and Drug Administration (FDA) for use of this SAS program package.
    • The SAS program package is distributed “as is” and with no warranties of any kind, whether express or implied, including and without limitation, any warranty of merchantability or fitness for a particular purpose.
    • In no event shall any individual, the Duke University Department of Population Health Sciences, the Sentinel Operations Center located at Harvard Pilgrim Health Care Institute, nor the FDA be liable for any damages whatsoever relating to the use, misuse, or inability to use this SAS program package (including, without limitation, damages for loss of profits or revenue, business interruption, loss of information, or any other loss).

    The information contained on this website is provided as part of FDA's commitment to place knowledge acquired from the Sentinel System in the public domain.

    Information
    Time Period
    January 1, 2014 – December 31, 2021
    Population / Cohort
    Low-income younger adults and children; mtother-infant data
    Data Source(s)
    SCDM-formatted Medicaid/CHIP Claims Data
    Workgroup Leader(s)

    Bradley Hammill, DrPH; Department of Population Health Sciences, Duke University School of Medicine, Durham, NC

    Judith C. Maro, PhD; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA

    Sarah Dutcher, PhD, MS; David Money, RPh., PMH; Efe Eworuke, PhD, MSc; Office of Surveillance and Epidemiology, Center for Drug and Evaluation Research, US Food and Drug Administration, Silver Spring, MD

    Workgroup Member(s)

    Patricia Bright, MSPH, PhD; Jamila Mwidau, RN, BSN, MPH; Sanae Cherkaoui, MS, MPH, CPH; Office of Surveillance and Epidemiology, Center for Drug and Evaluation Research, US Food and Drug Administration, Silver Spring, MD

    Steven J. Lippmann, PhD; Michael Stagner; Jessica E. Pritchard, PhD; Pratap Adhikari, MS; Department of Population Health Sciences, Duke University School of Medicine, Durham, NC

    Christine Halbig, MPH; Laura Shockro, Katie Shapiro; Daniel Kiernan; Alexander Mai; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA

    Robert Rosofsky, MA; Health Information Systems Consulting LLC, Milton, MA