Skip to main content

Use of TreeScan by Non-Sentinel Investigators

Many non-Sentinel investigators have used TreeScan software. This includes academia, industry, and other regulators. 

The Sentinel System's analytic tools support dataset creation for signal identification analyses, which you can execute with TreeScan software. Additionally, the Sentinel Initiative has supported the addition of new analytic models within the TreeScan software, e.g., Bernoulli Scan Statistics and Tree-Temporal Scan Statistics.

Below is a list of publications of how others in the scientific community have used TreeScan. These publications showcase how the community has further advanced these methods or applied them in novel ways beyond medical product safety.


Signal Detection Statistics of Adverse Drug Events in Hierarchical Structure for Matched Case-Control Data

Seok-Jae Heo, Sohee Jeong, Dagyeom Jung, Inkyung Jung
October 1, 2024

The tree-based scan statistic is a data mining method used to identify signals of adverse drug reactions in a database of spontaneous reporting systems. It is particularly beneficial when dealing with hierarchical data structures. One may use a retrospective case-control study design from spontaneous reporting systems (SRS) to investigate whether a specific adverse event of interest is associated with certain drugs. However, the existing Bernoulli model of the tree-based scan statistic may not be suitable as it fails to adequately account for dependencies within matched pairs. In this article, this study proposes signal detection statistics for matched case-control data based on McNemar’s test, Wald test for conditional logistic regression, and the likelihood ratio test for a multinomial distribution. Through simulation studies, it is demonstrated that the proposed methods outperform the existing approach in terms of the type I error rate, power, sensitivity, and false detection rate. To illustrate the proposed approach, the three methods and the existing method were applied to detect drug signals for dizziness-related adverse events related to antihypertensive drugs using the database of the Korea Adverse Event Reporting System.


Tree-Based Scan Statistics to Generate Drug Repurposing Hypotheses: A Test Case Using Sodium-Glucose Cotransporter-2 Inhibitors

George S Q Tan, Judith C Maro, Shirley V Wang, Sengwee Toh, Jedidiah I Morton, Jenni Ilomäki, Jenna Wong, Xiaojuan Li
September 11, 2024

Most drug repurposing studies using real-world data focused on validating, instead of generating, hypotheses. This study used tree-based scan statistics to generate repurposing hypotheses for sodium-glucose cotransporter-2 inhibitors (SGLT2i). An active-comparator, new-user design was used to create a 1:1 propensity-score matched cohort of SGLT2i and dipeptidyl peptidase-4 inhibitors (DPP4i) initiators in the Merative™ MarketScan® Research Databases. Tree-based scan statistics were estimated across an ICD-10-CM-based hierarchical outcome tree using incident outcomes identified from hospital and outpatient diagnoses. An adjusted P≤0.01 was used as the threshold for statistical alert to prioritize associations for evaluation as repurposing signals. The analyses varied by tree size, scanning level, and clinical settings for outcomes. There were 80,510 matched SGLT2i-DPP4i initiator pairs with 215,333 outcomes among SGLT2i initiators and 223,428 outcomes among DPP4i initiators. There were 18 prioritized associations, which included chronic kidney disease (P=0.0001), an expected signal, and anemia (P=0.0001). Heart failure (P=0.0167), another expected signal, was identified slightly beyond the statistical alert threshold. Narrowing the outcome tree, scanning at different tree levels, and including outcomes from different clinical settings influenced the scan statistics. This study identified signals aligning with recently approved indications of SGLT2i, plus potential repurposing signals supported by existing evidence but requiring future validation.


Considerations for Practical Use of Tree-Based Scan Statistics for Signal Detection Using Electronic Healthcare Data: a Case Study with Insulin Glargine

Lockwood G Taylor, Marie-Laure Kürzinger, Ruben Hermans, Shirin Enshaeifar, Bernadette Dwan, Priyanka Chhikara, Xinyu Li, Sreenivas Thummisetti, Sandrine Colas, Marielle Duverne, Juhaeri Juhaeri
August 23, 2024

Hypothesis-free signal detection (HFSD) methods such as tree-based scan statistics (TBSS) applied to longitudinal electronic healthcare data (EHD) are increasingly used in safety monitoring. However, challenges may arise in interpreting HFSD results alongside results from disproportionality analysis of spontaneous reporting. Using the anti-diabetes drug insulin glargine (Lantus®), this study applies two different tree-based scan designs using TreeScan™ software on retrospective EHD and compares the results to one another as well as to results from a disproportionality analysis using SRD.


Hierarchical Clustering Analysis to Inform Classification of Congenital Malformations for Surveillance of Medication Safety in Pregnancy

Loreen Straub, Shirley V Wang, Sonia Hernandez-Diaz, Kathryn J Gray, Seanna M Vine, Massimiliano Russo, Leena Mittal, Brian T Bateman, Yanmin Zhu, Krista F Huybrechts
August 09, 2024

There is growing interest in the secondary use of healthcare data to evaluate medication safety in pregnancy. Tree-based scan statistics (TBSS) offer an innovative approach to help identify potential safety signals. TBSS utilize hierarchically organized outcomes, generally based on existing clinical coding systems that group outcomes by organ system. When assessing teratogenicity, such groupings often lack a sound embryologic basis given the etiologic heterogeneity of congenital malformations. The study objective was to enhance the grouping of congenital malformations to be used in scanning approaches through implementation of hierarchical clustering analysis (HCA) and to pilot test an HCA-enhanced TBSS approach for medication safety surveillance in pregnancy in two test cases using >4.2 million mother-child dyads from two US-nationwide databases. HCA identified (1) malformation combinations belonging to the same organ system already grouped in existing classifications, (2) known combinations across different organ systems not previously grouped, (3) unknown combinations not previously grouped, and (4) malformations seemingly standing on their own. Testing the approach with valproate and topiramate identified expected signals, and a signal for an HCA-cluster missed by traditional classification. Augmenting existing classifications with clusters identified through large data exploration may be promising when defining phenotypes for surveillance and causal inference studies.


Mining Clinical Data for Novel Medications to Treat Alcohol Use Disorder

Luke Rozema, Jessica E. Hoyt, Bradley V. Watts, Brian Shiner
April 25, 2024

Alcohol use disorder (AUD) is a highly prevalent and often debilitating condition associated with high morbidity and mortality. Current AUD medications have limited efficacy and uptake. Alternative pharmacological options are needed. This study constructed a mechanistic tree of all US Food and Drug Administration approved medications and used a tree-based scan statistic, TreeScan, to identify medications associated with greater than expected improvements in alcohol consumption. The study cohort included all United States (US) Department of Veterans Affairs (VA) patients with a diagnosis of AUD between 10/1/1999 and 9/30/2019 with multiple Alcohol Use Disorders Identification Test-Consumption Module scores within the VA electronic health record data.


A Machine Learning-Based Phenotype for Long COVID in Children: An EHR-Based Study from the RECOVER Program

Vitaly Lorman, Hanieh Razzaghi, Xing Song, Keith Morse, Levon Utidjian, Andrea J. Allen, Suchitra Rao, Colin Rogerson, Tellen D. Bennett, Hiroki Morizono, Daniel Eckrich, Ravi Jhaveri, Yungui Huang, Daksha Ranade, Nathan Pajor, Grace M. Lee, Christopher B. Forrest, L. Charles Bailey
August 10, 2023

As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. This study developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. An XGboost model, with hyperparameters selected through cross-validated grid search was used, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values.


Drug Safety Signal Detection in a Regional Healthcare Database Using the Tree-Based Scan Statistic and Comparison to 3 Other Mining Methods

Li Hailong, Zhao Houyu, Lin Hongbo, Shen Peng, Zhan Siyan
June 1, 2023

The aim of this study was to evaluate and compare the relative performance of the tree-based scan statistic (TreeScan) with the crude cohort study, Bayesian confidence propagation neural network (BCPNN) and Gamma Poisson Shrinker (GPS) in detecting statin-related adverse events (AEs) in an electronic healthcare database. Data from a Chinese healthcare database from 2010 to 2016 were evaluated. Statin users were identified based on prescription information in their out-/in-patient records, and AEs were defined according to the ICD-10 codes in patients' diagnosis records. TreeScan was applied to detect AE signals related to statin use and was compared with 3 other methods based on sensitivity, specificity, positive predictive value, negative predictive value, accuracy, the Youden index, area under the precision-recall curve and the area under the receiver operating characteristic curve.


Understanding Pediatric Long COVID Using a Tree-Based Scan Statistic Approach: An EHR-Based Cohort Study from the RECOVER Program

Vitaly Lorman, Suchitra Rao, Ravi Jhaveri, Abigail Case Asuncion Mejias , Nathan M Pajor, Payal Patel, Deepika Thacker, Seuli Bose-Brill, Jason Block, Patrick C Hanley, Priya Prahalad, Yong Chen, Christopher B Forrest, L Charles Bailey, Grace M Lee, Hanieh Razzaghi                                                                        
March 14, 2023

Post-acute sequalae of SARS-CoV-2 infection (PASC) is not well defined in pediatrics given its heterogeneity of presentation and severity in this population. The aim of this study was to use novel methods that rely on data mining approaches rather than clinical experience to detect conditions and symptoms associated with pediatric PASC. This study used a propensity-matched cohort design comparing children identified using the new PASC ICD10CM diagnosis code (U09.9) (N = 1309) to children with (N = 6545) and without (N = 6545) SARS-CoV-2 infection. A tree-based scan statistic was used to identify potential condition clusters co-occurring more frequently in cases than controls.


A New Drug Safety Signal Detection and Triage System Integrating Sequence Symmetry Analysis and Tree-Based Scan Statistics with Longitudinal Data

Miyuki Hsing-Chun Hsieh, Hsun-Yin Liang, Chih-Ying Tsai, Yu-Ting Tseng, Pi-Hui Chao, Wei-I Huang, Wen-Wen Chen, Swu-Jane Lin, Edward Chia-Cheng Lai 
January 18, 2023

Development and evaluation of a drug-safety signal detection system integrating data-mining tools in longitudinal data is essential. This study aimed to construct a new triage system using longitudinal data for drug-safety signal detection, integrating data-mining tools, and evaluate adaptability of such system. Based on relevant guidelines and structural frameworks in Taiwan’s pharmacovigilance system, a triage system integrating sequence symmetry analysis (SSA) and tree-based scan statistics (TreeScan) as data-mining tools for detecting safety signals was constructed. An exploratory analysis was conducted utilizing Taiwan’s National Health Insurance Database and selecting two drug classes (sodium-glucose co-transporter-2 inhibitors (SGLT2i) and non-fluorinated quinolones (NFQ)) as chronic and episodic treatment respectively, as examples to test feasibility of the system.


A Broad Assessment of Covid-19 Vaccine Safety Using Tree-Based Data-Mining in the Vaccine Safety Datalink

W. Katherine Yih, Matthew F. Daley, Jonathan Duffy, Bruce Fireman, David McClure, Jennifer Nelson, Lei Qian, Ning Smith, Gabriela Vazquez-Benitez, Eric Weintraub, Joshua T.B. Williams, Stanley Xu, Judith C. Maro
January 16, 2023

Except for spontaneous reporting systems, vaccine safety monitoring generally involves pre-specifying health outcomes and post-vaccination risk windows of concern. This study used tree-based data-mining to look more broadly for possible adverse events after Pfizer-BioNTech, Moderna, and Janssen COVID-19 vaccination. Vaccine Safety Datalink enrollees receiving ≥1 dose of COVID-19 vaccine in 2020–2021 were followed for 70 days after Pfizer-BioNTech or Moderna and 56 days after Janssen vaccination. Incident diagnoses in inpatient or emergency department settings were analyzed for clustering within both the hierarchical ICD-10-CM code structure and the post-vaccination follow-up period. The self-controlled tree-temporal scan statistic and TreeScan software were used. Monte Carlo simulation was used to estimate p-values; p = 0.01 was the pre-specified cut-off for statistical significance of a cluster.


Tree-Based Data Mining for Safety Assessment of First COVID-19 Booster Doses in the Vaccine Safety Datalink

W. Katherine Yih, Matthew F. Daley, Jonathan Duffy, Bruce Fireman, David McClure, Jennifer Nelson, Lei Qian, Ning Smith, Gabriela Vazquez-Benitez, Eric Weintraub, Joshua T.B. Williams, Stanley Xu, Judith C. Maro 
January 9, 2023

The Centers for Disease Control and Prevention’s Vaccine Safety Datalink (VSD) has been performing safety surveillance for COVID-19 vaccines since their earliest authorization in the United States. Complementing its real-time surveillance for pre-specified health outcomes using pre-specified risk intervals, the VSD conducts tree-based data-mining to look for clustering of a broad range of health outcomes after COVID-19 vaccination. This study’s objective was to use this untargeted, hypothesis-generating approach to assess the safety of first booster doses of Pfizer-BioNTech (BNT162b2), Moderna (mRNA-1273), and Janssen (Ad26.COV2.S) COVID-19 vaccines. VSD enrollees receiving a first booster of COVID-19 vaccine through April 2, 2022 were followed for 56 days. Incident diagnoses in inpatient or emergency department settings were analyzed for clustering within both the hierarchical ICD-10-CM code structure and the follow-up period. The self-controlled tree-temporal scan statistic was used, conditioning on the total number of cases for each diagnosis. P-values were estimated by Monte Carlo simulation; p = 0.01 was pre-specified as the cut-off for statistical significance of clusters.


Surveillance of Antidepressant Safety (SADS): Comparison of Signal Detection of Serious Medical Events Using Tree-Based Scan Statistics and Repeated Cohort Studies

Mia Aakjaer, Murat Kulahci, Marie Louise De Bruin, Abdul Rauf Khan, Morten Andersen
December 16, 2022

The evidence-generating process in pharmacovigilance has well-known limitations, and the availability of electronic healthcare data is increasing. Therefore, new methods for signal detection, such as tree-based scan statistics, are emerging. This study aimed to detect potential safety signals following the initiation of selective serotonin reuptake inhibitors (SSRIs) and serotonin-norepinephrine reuptake inhibitors (SNRIs) using tree-based scan statistics. It was investigated whether signals were unknown or listed in the Danish Summary of Product Characteristics (SmPCs). Signals were compared to those found in a previous study using active surveillance with repeated cohorts. 


Post-Marketing Surveillance Study on Influenza Vaccine in South Korea Using a Nationwide Spontaneous Reporting Database with Multiple Data Mining Methods

Hyesung Lee, Bin Hong, SangHee Kim, Ju Hwan Kim, Nam-Kyong Choi, Sun-Young Jung, Ju-Young Shin 
November 24, 2022

Safety profiles of the influenza vaccine and its subtypes are still limited. This study aimed to address this knowledge gap using multiple data mining methods and calculated performance measurements to evaluate the precision of different detection methods. This post-marketing surveillance study was conducted between 2005 and 2019 using the Korea Adverse Event Reporting System database. Three data mining methods were applied: (a) proportional reporting ratio, (b) information component, and (c) tree-based scan statistics. The performance of each method was evaluated in comparison with the known adverse events (AEs) described in the labeling information. Compared to other vaccines, 36 safety signals were identified for the influenza vaccine, and 7 safety signals were unlabeled. In subtype-stratified analyses, application site disorders were reported more frequently with quadrivalent and cell-based vaccines, while a wide range of AEs were noted for trivalent and egg-based vaccines. Tree-based scan statistics showed well-balanced performance. Among the detected signals of influenza vaccines, narcolepsy requires special attention. A wider range of AEs were detected as signals for trivalent and egg-based vaccines. Although tree-based scan statistics showed balanced performance, complementary use of other techniques would be beneficial when large noise due to false positives is expected.


Prospective Validation of a Dynamic Prognostic Model for Identifying COVID‐19 Patients at High Risk of Rapid Deterioration

Kueiyu Joshua Lin, Elvira D'Andrea, Rishi J. Desai, Joshua J. Gagne, Jun Liu, Shirley V. Wang 
December 19, 2022

This study sought to develop and prospectively validate a dynamic model that incorporates changes in biomarkers to predict rapid clinical deterioration in patients hospitalized for COVID-19. A retrospective cohort of hospitalized patients aged ≥18 years with laboratory-confirmed COVID-19 was established using electronic health records (EHR) from a large integrated care delivery network in Massachusetts including >40 facilities from March to November 2020. A total of 71 factors, including time-varying vital signs and laboratory findings during hospitalization were screened. Elastic net regression and tree-based scan statistics were used for variable selection to predict rapid deterioration, defined as progression by two levels of a published severity scale in the next 24 h. The development cohort included the first 70% of patients identified chronologically in calendar time; the latter 30% served as the validation cohort. A cut-off point was estimated to alert clinicians of high risk of imminent clinical deterioration.


Sequential Data-Mining for Adverse Events after Recombinant Herpes Zoster Vaccination Using the Tree-Based Scan Statistic

W. Katherine Yih, Martin Kulldorff, Inna Dashevsky, Judith C. Maro
October 13, 2022

Tree-based scan statistics have been successfully used to study the safety of several vaccines without prespecifying health outcomes of concern. In this study, the binomial tree-based scan statistic was applied sequentially to detect adverse events in Days 1-28 compared with Days 29-56 after recombinant herpes zoster (RZV) vaccination, with 5 looks at the data and formal adjustment for the repeated analyses over time. IBM MarketScan data on commercially insured persons 50+ years of age receiving RZV during January 1, 2018–May 5, 2020 were used. With 999,876 doses of RZV included, statistically significant signals were detected only for unspecified adverse effects/complications following immunization, with attributable risks as low as 2 excess cases per 100,000 vaccinations. Ninety percent of cases in the signals occurred in the week after vaccination and, based on previous studies, likely represent non-serious events like fever, fatigue, and headache. Strengths of our study include its untargeted nature, self-controlled design, and formal adjustment for repeated testing. Although the method requires prespecification of the risk window of interest and may miss some true signals detectable using the tree-temporal variant of the method, it allows for early detection of potential safety problems through early initiation of ongoing monitoring.


Summary of the Use of Tree Scan Statistics for Drug and Vaccine Safety Monitoring

Sun Yixin, Wang Miao, Yang Mingfang, Zhan Siyan
November 3, 2021

The purpose of this review is to summarize the development and application of tree-based scan statistic (TreeScan), explain the methodology and provide a reference for future use of this method by reviewing the original pharmacoepidemiological and vaccine studies using the TreeScan. Medline, Embase and Web of Science databases were used for the retrieval of eligible studies using keywords related to TreeScan. A total of 15 eligible studies were included, in which 9 studies explored the adverse events of drugs and 6 studies focused on the safety of vaccines. Three types of models (Poisson probability model, Bernoulli probability model and tree-temporal scan statistic model) of TreeScan were used. The major differences among the three models were 1) whether predefined control was used according to research question, 2) whether the time from exposure to onset of adverse events was considered. Several studies explored its ability by comparing with other methods for adverse event detection or by using known adverse events. This review shows that TreeScan is an effective method for the safety signal detection of drugs or vaccines, which develops rapidly and globally. It is very necessary to promote its use in drug safety monitoring and other related fields in China.

Translated from Chinese.


Trends in Adverse Event Reports and Signal Detection of Adverse Event Following Vaccination

Bora Kim, Dongwon Yoon, Ju-Young Shin
October 31, 2021

As vaccines are administered to many people, the management of adverse reactions to vaccines should be a priority. This study aims to detect signals of adverse events following immunization to provide information on the events requiring attention. The Korean Adverse Event Reporting System database from 2010 to 2019 was used. Time series, status, and signal analyses were performed. The number of adverse events following immunization from 2010 to 2019 was determined. A comparison group was established considering the main vaccination targets for each vaccine. Signal information was detected that satisfies all four signal detection methods: proportional reporting ratio, reporting odds ratio, information component, and TreeScan. From 2010 to 2019, the number of adverse events following immunization reported was 37,688. The vaccines having the most reported adverse events were influenza (17,290 cases, 45.9%). Because of identifying the signal information of the top-10 vaccines based on adverse events, 74 adverse events were identified in 6 vaccines. The results are expected to contribute to the prevention of adverse events by providing information on adverse events following immunization requiring attention.


A Tree-Based Scan Statistic for Zero-Inflated Count Data in Post-Market Drug Safety Surveillance

Goeun Park and Inkyung Jung 
September 29, 2022

After new drugs enter the market, adverse events (AE) induced by their use must be tracked; rare AEs may not be detected during clinical trials. Some organizations have been collecting information on suspected drugs and AEs via a spontaneous reporting system to conduct post-market drug safety surveillance. These organizations use the information to detect a signal representing potential causality between drugs and AEs. The drug and AE data are often hierarchically structured. Accordingly, the tree-based scan statistic can be used as a statistical data mining method for signal detection. Most of the AE databases contain a large number of zero-count cells. Notably, not only an observational zero from the Poisson distribution, but also a true zero exists in zero-count cells. True zeros represent theoretically impossible observations or possible but unreported observations. The existing tree-based scan statistic assumes that all zeros are zero-valued observations from the Poisson distribution. Therefore, true zeros are not considered in the modeling, which can lead to bias in the inferences. In this study, we propose a tree-based scan statistic for zero-inflated count data in a hierarchical structure. According to our simulation study, in the presence of excess zeros, our proposed tree-based scan statistic provides better performance than the existing tree-based scan statistic. The two methods were illustrated using Korea Adverse Event Reporting System data from the Korea Institute of Drug Safety and Risk Management.


Safety Surveillance of Varicella Vaccine Using Tree-Temporal Scan Analysis

Chia-Hung Liu, Wan-Ting Huang, Wei-Chu Chie, K. Arnold Chan
September 21, 2021

Passive surveillance systems are susceptible to the under-reporting of adverse events (AE) and a lack of information pertaining to vaccinated populations. Conventional active surveillance focuses on predefined AEs. Advanced data mining tools could be used to identify unusual clusters of potential AEs after vaccination. The objective of this study was to assess the feasibility of a novel tree-based statistical approach to the identification of AE clustering following the implementation of a varicella vaccination program among one-year-olds.


A Novel Data Mining Application to Detect Safety Signals for Newly Approved Medications in Routine Care of Patients With Diabetes

Michael Fralick, Martin Kulldorff, Donald Redelmeier, Shirley V. Wang, Seanna Vine, Sebastian Schneeweiss, Elisabetta Patorno
April 6, 2021

Clinical trials are often underpowered to detect serious but rare adverse events of a new medication. We applied a novel data mining tool to detect potential adverse events of canagliflozin, the first sodium glucose co-transporter 2 (SGLT2 inhibitor) in the United States, using real-world data from shortly after its market entry and before public awareness of its potential safety concerns. In a U. S. commercial claims dataset (29 March 2013-30 Sept 2015), two pairwise cohorts of patients over 18 years of age with type 2 diabetes (T2D) who were newly dispensed canagliflozin or an active comparator, that is a dipeptidyl peptidase 4 inhibitor (DPP4) or a glucagon-like peptide 1 receptor agonist (GLP1), were identified and propensity score-matched. We used variable ratio matching with up to four people receiving a DPP4 or GLP1 for each person receiving canagliflozin. We identified potential safety signals using a hierarchical tree-based scan statistic data mining method with the hierarchical outcome tree constructed based on international classification of disease coding. We screened for incident adverse events where there were more outcomes observed among canagliflozin vs. comparator initiators than expected by chance, after adjusting for multiple testing.


A Broad Safety Assessment of the 9-Valent Human Papillomavirus Vaccine

W. Katherine Yih, Martin Kulldorff, Inna Dashevsky, Judith C. Maro
February 9, 2021

Surveys of parents indicate safety is their top concern about human papillomavirus (HPV) vaccination. A data-mining method not requiring pre-specification of health outcome(s) of interest or post-exposure period(s) of potentially increased risk can check for associations between an exposure and any of thousands of medically attended health outcomes. The method was applied to the 9-valent HPV vaccine (HPV9) to detect potential safety problems. Data on 9-26-year-olds who had received HPV9 vaccine between November 4, 2016 and August 5, 2018, inclusive, were extracted from Marketscan and analyzed for statistically significant clustering of incident diagnoses within the hierarchy of ICD-10-CM coded diagnoses and temporally within the 1 year after vaccination, using the self-controlled tree-temporal scan statistic and TreeScan software. Only 56 days of post-vaccination enrollment was required; subsequent follow-up was censored at disenrollment. Multiple testing was adjusted for. The analysis included 493,089 doses of HPV9. Almost all signals resulted from temporal confounding, not unexpected with a 1-year follow-up period. The only plausible signals were for non-specific adverse events (e.g., injection-site reactions and headache) on Days 1-2 after vaccination, with attributable risks as low as 1 per 100,000 vaccinees. Considering the broad scope of the evaluation and the high statistical power, the findings of no specific serious adverse events should provide reassurance about this vaccine's safety.


Active Surveillance of the Safety of Medications Used in Pregnancy

Krista F. Huybrechts, Martin Kulldorff, Sonia Hernández-Díaz, Brian T. Bateman, Yanmin Zhu, Helen Mogun, Shirley V. Wang
January 11, 2021

We rely on post-marketing approaches to define the risk of medications in pregnancy because information at the time of drug approval is limited. Most studies in pregnancy focus on a single or selected outcomes. However, women must balance the benefit of treatment against all possible adverse effects. Our objective was to apply and evaluate a tree-based scan statistic data mining method (TreeScan) as a safety surveillance approach that allows for simultaneous evaluation of a comprehensive range of adverse pregnancy outcomes, while preserving the overall false positive rate. We evaluated TreeScan with a cohort design and adjustment via propensity score techniques using two test cases: (1) opioids and neonatal opioid withdrawal syndrome, and (2) valproate and congenital malformations, implemented in pregnancy cohorts nested in the Medicaid Analytic eXtract (1/1/2000 - 12/31/2014) and IBM MarketScan Research Database (1/1/2003 - 9/30/2015). In both cases, we identified known safety concerns, with only one previously unreported alert at the preset statistical alerting threshold. This evaluation shows the promise of TreeScan-based approaches for systematic drug safety monitoring in pregnancy. A targeted screening approach followed by deeper investigation to refine understanding of potential signals will ensure pregnant women and their physicians have access to the best available evidence to inform treatment decisions.


Data Mining for Adverse Events of Tumor Necrosis Factor-Alpha Inhibitors in Pediatric Patients: Tree-Based Scan Statistic Analyses of Danish Nationwide Health Data

Viktor Wintzell, Henrik Svanström, Mads Melbye, Jonas F. Ludvigsson, Björn Pasternak, Martin Kulldorff
October 26, 2020

Tumor necrosis factor-alpha (TNF-α) inhibitors are efficacious and considered generally safe in adults. However, pediatric-specific safety evidence is scarce. The aim of this study was to screen for signals of previously unknown adverse events of TNF-α inhibitors in pediatric patients. We conducted a data-mining study based on routinely collected, nationwide Danish healthcare data for 2004-2016. Using tree-based scan statistics to identify events with unexpectedly high incidence during TNF-α inhibitor use among patients with inflammatory bowel disease or juvenile idiopathic arthritis, two analyses were performed: comparison with episodes of no use and with other time periods from the same patient. Based on incident physician-assigned diagnosis codes from outpatient and inpatient visits in specialist care, we screened thousands of potential adverse events while adjusting for multiple testing. We identified 1310 episodes of new TNF-α inhibitor use that met the eligibility criteria. Two signals of adverse events of TNF-α inhibitors, as compared with no use, were detected. First, there were excess events of dermatologic complications (ICD-10: L00-L99, 87 vs. 44 events, risk difference [RD] 3.3%), which have been described previously in adults and children. Second, there were excess events of psychiatric diagnosis adjustment disorders (ICD-10: F432, 33 vs. 7 events, RD 2.0%), which was likely associated with the underlying disease and its severity, rather than with the treatment. The self-controlled analysis generated no signal. No signals of previously unknown adverse events of TNF-α inhibitors in pediatric patients were detected. The study showed that real-world data and newly developed methods for adverse events data mining can play a particularly important role in pediatrics where pre-approval drug safety data are scarce.


Safety Surveillance of Pneumococcal Vaccine Using Three Algorithms: Disproportionality Methods, Empirical Bayes Geometric Mean, and Tree-Based Scan Statistic

Hyesung Lee, Ju Hwan Kim, Young June Choe, Ju-Young Shin
May 22, 2020

Diverse algorithms for signal detection exist. However, inconsistent results are often encountered among the algorithms due to different levels of specificity used in defining the adverse events (AEs) and signal threshold. We aimed to explore potential safety signals for two pneumococcal vaccines in a spontaneous reporting database and compare the results and performances among the algorithms. Safety surveillance was conducted using the Korea national spontaneous reporting database from 1988 to 2017. Safety signals for pneumococcal vaccine and its subtypes were detected using the following the algorithms: disproportionality methods comprising of proportional reporting ratio (PRR), reporting odds ratio (ROR), and information component (IC); empirical Bayes geometric mean (EBGM); and tree-based scan statistics (TSS). Moreover, the performances of these algorithms were measured by comparing detected signals with the known AEs or pneumococcal vaccines (reference standard). Among 10,380 vaccine-related AEs, 1135 reports and 101 AE terms were reported following pneumococcal vaccine. IC generated the most safety signals for pneumococcal vaccine (40/101), followed by PRR and ROR (19/101 each), TSS (15/101), and EBGM (1/101). Similar results were observed for its subtypes. Cellulitis was the only AE detected by all algorithms for pneumococcal vaccine. TSS showed the best balance in the performance: the highest in accuracy, negative predictive value, and area under the curve (70.3%, 67.4%, and 64.2%). Discrepancy in the number of detected signals was observed between algorithms. EBGM and TSS calibrated noise better than disproportionality methods, and TSS showed balanced performance. Nonetheless, these results should be interpreted with caution due to a lack of a gold standard for signal detection.


Bacillus Calmette-Guérin (BCG) Vaccine Safety Surveillance in the Korea Adverse Event Reporting System Using the Tree-Based Scan Statistic and Conventional Disproportionality-Based Algorithms

Ju Hwan Kim, Hyesung Lee, Ju-Young Shin
May 6, 2020

Substantial variations in the safety profiles of different formulations of the bacillus Calmette-Guérin (BCG) vaccine exist. Therefore, we aimed to detect safety signals of BCG vaccine for intradermal injection (BCG-ID) and percutaneous injection (BCG-PC) in the Korea Adverse Event Reporting System (KAERS). We conducted a vaccine safety surveillance study from the adverse events (AEs) reported following BCG vaccine in the Korea Institute of Drug Safety and Risk Management KAERS Database (KIDS-KD) between 2005 and 2017. We used the tree-based scan statistic (TSS) and four disproportionality-based algorithms for signal detection: empirical Bayesian geometric mean; proportional reporting ratio; reporting odds ratio; and information component. The detected signals from each algorithm was compared with the known AEs of BCG vaccine (reference standard) to present positive predictive value (PPV) and area under the receiver operating curve (AUC).


Using the Self-Controlled Tree-Temporal Scan Statistic to Assess the Safety of Live Attenuated Herpes Zoster Vaccine

W. Katherine Yih, Martin Kulldorff, Inna Dashevsky, Judith C. Maro
May 7, 2019

The self-controlled tree-temporal scan statistic allows detection of potential vaccine- or drug-associated adverse events without pre-specifying the specific events or post-exposure risk intervals of concern. It thus opens a promising new avenue for safety studies. The method has been successfully used to evaluate the safety of two vaccines for adolescents and young adults, but its suitability to study vaccines for older adults had not been established. The current study applied the method to assess the safety of live attenuated herpes zoster vaccination during 2011-2017 in U.S. adults ≥ 60 years old, using claims data from Truven Health MarketScan® Research Databases. Counts of International Classification of Diseases diagnosis codes recorded in emergency department or hospital settings were scanned for any statistically unusual clustering within a hierarchical tree structure of diagnoses and within 42 days after vaccination. Among 1.24 million vaccinations, four clusters were found: cellulitis on Days 1-3, non-specific erythematous condition on Days 2-4, "other complications…" on Days 1-3, and non-specific allergy on Days 1-6. These results are consistent with local injection-site reactions and other known, generally mild vaccine-associated adverse events and a favorable safety profile. This method may be useful for assessing the safety of other vaccines for older adults. 


An Implementation and Visualization of the Tree-Based Scan Statistic for Safety Event Monitoring in Longitudinal Electronic Health Data

Stephen E. Schachterle, Sharon Hurley, Qing Liu, Kenneth R. Petronis, Andrew Bate
January 8, 2019

Longitudinal electronic healthcare data hold great potential for drug safety surveillance. The tree-based scan statistic (TBSS), as implemented by the TreeScan® software, allows for hypothesis-free signal detection in longitudinal data by grouping safety events according to branching, hierarchical data coding systems, and then identifying signals of disproportionate recording (SDRs) among the singular events or event groups. The objective of this analysis was to identify and visualize SDRs with the TBSS in historical data from patients using two antifungal drugs, itraconazole or terbinafine. By examining patients who used either itraconazole or terbinafine, we provide a conceptual replication of a previous TBSS analyses by varying methodological choices and using a data source that had not been previously used with the TBSS, i.e., the Optum Clinformatics™ claims database. With this analysis, we aimed to test a parsimonious design that could be the basis of a broadly applicable method for multiple drug and safety event pairs. 


Tree-Based Scan Statistic - Application in Manufacturing-Related Safety Signal Detection

Olivia Mahaux, Vincent Bauchau, Ziad Zeinoun, Lionel Van Holle
January 3, 2019

Over the last decades, medicinal regulations have been put into place and have considerably improved manufacturing practices. Nevertheless, safety issues may still arise. Using the simulation described in this manuscript, our aim is to develop adequate detection methods for manufacturing-related safety signals, especially in the context of biological products. Pharmaceutical companies record the entire batch genealogies, from seed batches over intermediates to final product (FP) batches. We constructed a hierarchical tree based on this genealogy information and linked it to the spontaneous safety data available for the FP batch numbers. The tree-based scan statistic (TBSS) was used on simulated data as a proof of concept to locate the source that may have subsequently generated an excess of specific adverse events (AEs) within the manufacturing steps, and to evaluate the method's adjustment for multiple testing.


Meningococcal Conjugate Vaccine Safety Surveillance in the Vaccine Safety Datalink Using a Tree-Temporal Scan Data Mining Method

Rongxia Li, Eric Weintraub, Michael M. McNeil, Martin Kulldorff, Edwin M. Lewis, Jennifer Nelson, Stanley Xu, Lei Qian, Nicola P. Klein, Frank Destefano
February 18, 2018

The objective of this study was to conduct a data mining analysis to identify potential adverse events (AEs) following MENACWY-D using the tree-temporal scan statistic in the Vaccine Safety Datalink population and demonstrate the feasibility of this method in a large distributed safety data setting. Traditional pharmacovigilance techniques used in vaccine safety are generally geared to detecting AEs based on pre-defined sets of conditions or diagnoses. Using a newly developed tree-temporal scan statistic data mining method, a pilot study was performed to evaluate the safety profile of the meningococcal conjugate vaccine Menactra® (MenACWY-D), screening thousands of potential AE diagnoses and diagnosis groupings. The study cohort included enrolled participants in the Vaccine Safety Datalink aged 11 to 18 years who had received MenACWY-D vaccination(s) between 2005 and 2014. The tree-temporal scan statistic was employed to identify statistical associations (signals) of AEs following MENACWY-D at a 0.05 level of significance, adjusted for multiple testing. 

 

If you have used Sentinel's tools or TreeScan to support signal identification, contact us to add your publication to this page.