Recent advances in NLP methods have been dominated by supervised machine learning and deep learning approaches, both relying on annotated clinical text. To allow use and sharing of this text, de-identification has been used, but its impact on subsequent use of the de-identified text for machine or deep learning has not been assessed, a gap we are addressing.

Learning Objective: Learn about the impact of text de-identification on subsequent uses of the text for machine learning applications.


Gary Underwood, Clinacuity, Inc.
Andrew Trice, Clinacuity, Inc.
Youngjun Kim, Medical University of South Carolina
Jean-Karlo Accetta, Clinacuity, Inc.
Stephane Meystre (Presenter)
Medical University of South Carolina

Presentation Materials: