A challenge in health informatics and health data analysis programs is to provide the right dataset to the students which is realistic but at the same time do not contain any identifiable patient information. In the presented work we algorithmically generated synthetic data from real data which maintain the characteristics of the real data but it does not contain any sensitive information. This data will be used to show students data integration methods and challenges.

Learning Objective: After participating in this session, the learner should be better able to:
1. Describe how synthetic data can be generated from real data without violating patients' privacy
2. Design data integration exercises as part of health informatics curriculum
3. Learn the importance of data integration and how the generated data can be used to show data integration challenges and methods


Hedyeh Mobahi (Presenter)
George Mason University

Hua Min, George Mason University
Janusz Wojtusiak, George Mason University

Presentation Materials: