event-icon
Description

Background: The US Food and Drug Administration (FDA) actively monitors the safety of drugs and therapeutic biologics from multiple data sources including FAERS, a database that contains adverse event reports, medication error reports, and product quality complaints resulting in adverse events, submitted to FDA. With an ever-increasing number of reports (> 2 million in 2018), making the review process more efficient so human safety evaluators can focus attention on assessing reports that contain the most useful information for assessing causality between a drug and an adverse event is an important goal.
Objective: To accurately predict the adverse event reports in FAERS more likely to provide information useful for causality assessment by safety evaluators using machine learning and text analytics. We hypothesized that certain subsets of reports (industry reports (IR), direct reports (DR), and literature reports (LR) would have different linguistic patterns in the report narratives that might be useful to distinguish reports more likely to have information useful for causality assessment.
Data: Extending previous work, a set of 925 FAERS reports, classified for causality by FDA safety evaluators using a modified version of the World Health Organization–Uppsala Monitoring Centre (WHO-UMC) criteria for drug causality assessment, were coded as a binary classification task by aggregating the causality categories into two groups1: group 1. Certain, Probable, Possible & group 2. Unlikely, Unassessable. Industry and published literature reports are submitted by manufacturers and direct reports are submitted by consumers and healthcare professionals directly to FDA.
Methods: Data was divided into train and test sets maintaining proportions of causality categories. We developed two machine learning models; classification & regression trees and random forest, to predict reports less likely to provide useful information regarding causality. We pre-processed the narratives in FAERS reports and used textual features (text length, term frequency, term frequency-inverse document frequency (TF-IDF), N-grams, singular value decomposition (SVD), and cosine similarity for terms in the report narratives among reports) in the models to see which features might predict reports containing useful information regarding causality. No other features were evaluated. We evaluated the accuracy (proportion of true results among the total number of cases examined) of trained classification models. We leveraged 10-fold cross validation and compared the overall model performance.
Conclusions: Our research using FAERS narratives shows that model accuracy improves when carefully selected linguistic features are applied in machine learning models. The random forest model was more predictive and two features; text length and cosine similarity for terms in literature reports, showed higher predictive power to discriminate between FAERS reports with more and less useful information for causality assessment. Further evaluation of this approach with a larger data set and deep learning techniques might improve performance over current machine learning approaches and enable better identification of discriminating narrative features and better predictive power to identify FAERS reports with more useful information for causality assessment.

Learning Objective: Applied text processing techniques to demonstrate causality classification in adverse event reporting narratives submitted to FDA.

Authors:

Abhivyakti Sawarkar (Presenter)
FDA

Robert Ball, FDA

Presentation Materials:

Tags