We consider the task of producing a useful clustering of healthcare providers from their clinical action signature--their drug, procedure, and billing codes. Because high-dimensional sparse count vectors are challenging to cluster, we develop a novel autoencoder framework to address this task. Our solution creates a low-dimensional embedded representation of the high-dimensional space that preserves angular relationships and assigns examples to clusters while optimizing the quality of this clustering. Our method is able to find a better clustering than under a two-step alternative, e.g., projected K means/medoids, where a representation is learned and then clustering is applied to the representation. We demonstrate our method's characteristics through quantitative and qualitative analysis of real and simulated data, including in several real-world healthcare case studies. Finally, we develop a tool to enhance exploratory analysis of providers based on their clinical behaviors.

Learning Objective: After reading this article, you will be able to:
- discern among clustering approaches that model high-dimensional sparse data
- explain why angular representations are appropriate for such data
- identify clusters of providers based on their procedures and prescriptions in Medicare claims data
- characterize these clusters to assess alignment of specialty to clinical activities and identify surprising findings for further investigation


Nathanael Fillmore, Veterans Affairs
Sergey Goryachev, Veterans Affairs
Jeremy Weiss (Presenter)
Carnegie Mellon University

Presentation Materials: