
Historic problem list data from the Allscripts ambulatory EHR data set has a high volume of free-text, unstructured text. Structuring problem list data into ICD10 codes facilitiates down-stream analysis1. However, manually mapping free-text to ICD10 codes is a labor intensive and time consuming process because of the number of ways a concept can be expressed in free text. Natural Language Processing (NLP) techniques have been used with similar mapping problems that need to evaluate text similarity2. Web search engines are also a useful tool to map free-text. Here, we develop an automated mapping approach using natural language processing (NLP) techniques and web-scraping of a search engine. We obtained 83% accuracy on ICD10 mapping and discuss future work to improve the precision and accuracy of our mapping.

Learning Objective: Learn how to use web scraping to map ICD10 codes.
Learn how cosine similarity can be used to match similar text.


Theresa Sudaria (Presenter)

Nam Nguyen, Veradigm

Presentation Materials:
