AMIA 2019 Annual Symposium - Session Details

Word embedding has been studied extensively in recent work. The quality of word embedding is dependent on many factors, including the size of the input corpora, model architectures, and hyper-parameter settings. In this paper, we investigate whether a larger corpus generates better word embeddings. Particularly, the word embeddings of ICD-9 codes are learned on two different datasets at an academic medical center and these embeddings are compared intrinsically and extrinsically.

Learning Objective: How corpus size influences word embedding of ICD-9 codes

Authors:

Cheng Gao (Presenter)
Vanderbilt University Medical Center

Chao Yan, Vanderbilt University
Bradley Malin, Vanderbilt University Medical Center
You Chen, Vanderbilt University Medical Center

Presentation Materials:

Abstract/Manuscript - 1

Board 102 - Corpus Size Influences Clinical Concept Embeddings

Presenter (1)

Cheng Gao

Description

Tags