event-icon
Description

De-identification of clinical text, the prerequisite of electronic clinical data reuse, is a typical named entity recognition (NER) problem. A number of state-of-the-art deep learning methods for NER, such as Bi-LSTM-CRF (bidirectional long-short-term-memory conditional random fields), have been applied for de-identification. Neural language models used for language representation bring great improvement in lots of NLP tasks when they are integrated with other deep learning methods. In this paper, we introduce Bi-LSTM-CRF with neural language models for de-identification of clinical text, and evaluate it on the de-identification datasets of the i2b2 2014 and the CEGS N-GRID 2016 challenges. Four neural language models of three types individually integrated with Bi-LSTM-CRF are compared in this study. Bi-LSTM-CRF with neural language models achieves the highest “strict” micro-averaged F1-score of 95.50% on the i2b2 2014 dataset and 91.82% on the CEGS N-GRID 2016 dataset, becoming new benchmark results on these two datasets respectively.

Learning Objective: Latest deep learning methods for De-identification of clinical text.

Authors:

Buzhou Tang, Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology, Shenzhen, China
Dehuan Jiang (Presenter)
Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology, Shenzhen, China

Qingcai Chen, Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology, Shenzhen, China
Xiaolong Wang, Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology, Shenzhen, China
Jun Yan, Yidu Cloud (Beijing) Technology Co., Ltd, Beijing, China
Ying Shen, Peking University, Shenzhen Graduate School, Shenzhen, China

Presentation Materials:

Tags