Adverse event report (AER) data are a key source of signal for post marketing drug surveillance. The standard methodology to analyze AER data applies disproportionality metrics, which estimate the strength of drug/side-effect associations from discrete counts of their occurrence at report level. However, in other domains, improvements in predictive modeling accuracy have been obtained through representation learning, where discrete features are replaced by distributed representations learned from unlabeled data. This paper describes aer2vec, a novel representational approach for AER data in which concept embeddings emerge from neural networks trained to predict drug/side-effect co-occurrence. Trained models are evaluated for its utility in identifying drug/side-effect relationships, with improvements over disproportionality metrics in most cases. In addition, we evaluate the utility of an otherwise-untapped resource in the Food and Drug Administration (FDA) AER system – reporter designations of suspected causality – and find that incorporating this information enhances performance of all models evaluated.

Learning Objective: After reading this paper, the learner should be able to

Understand current pharmacovigilance methodology for processing FDA AERS data
Understand distributional semantics methods
Comprehend how distributional semantics methods can be applied to FDA AERS data


Jake Portanova, University of Washington
Nathan Murray, Hofstra University
Justin Mower, Rice University
Devika Subramanian, Rice University
Trevor Cohen (Presenter)
University of Washington

Presentation Materials: