Although finding similar cohorts of patients for purposes of understanding clinical outcomes or shared molecular biomarkers seems intuitive, concretely defining patient’s similarities remains elusive. While one-dimensional identifiers of patient similarity have been successfully implemented in a number of limited clinical circumstances, the development of multi-dimensional patient matching has been less successful due partially to the unique challenges of characterizing multi-dimensional data. This workshop will serve to develop standardized “similarity nomenclature” and definitions. Currently, the term “similar patients” is used by multiple endeavors – from clinical trials matching, cohort matching, patients like mine, to simple clustering of “like” patients. Particularly in the oncology domain, cohort matching initiatives are being used for clinical trial patient identification, by pharmaceutical companies to segregate patients into good and poor responder cohorts, and by medical researchers to identify patients who have shared clinico-genomic features. Yet each of these similarity definitions is subtly different.

A unifying factor is that all patient similarity definitions are defined using a heterogenous combination of (1) shared molecular traits, (2) shared phenotypic traits, or (3) shared outcomes. To complicate matters, these disparate types of data require algorithmically diverse computational methods to compute interpatient similarity. Simple overlap analysis, various clustering algorithms, and deep learning methodologies have all been brought to bear. Yet none of these methodologies has become the standard for cohort matching. To provide clarity in defining patient similarity, this collaborative preconference workshop aims to bring together medical informaticians from different backgrounds to develop a unified nomenclature and methods used for different types of patient similarity. Goals of the workshop are to publish in an associated journal the conference discussions and (1) concretely propose definitions for common types patient similarity and instructive example, (2) identify/describe the data elements needed for the similarity types, (3) enumerate the commonly used algorithms that are needed to process the data elements.

Learning Objective: n/a


James Chen (Presenter)
The Ohio State University

Nathan Seligson (Presenter)
The Ohio State University

Jeremy Warner (Presenter)
Vanderbilt University Medical Center

William Dalton (Presenter)

Presentation Materials: