event-icon
Description

Project abstract (1000 Characters): Increasingly, health research involves combining data gathered by different communities. Heterogeneity in data access mechanisms and data models makes this resource intensive. Standardising the approach used to manage and share research data could significantly reduce the level of resource required to run these studies in the future. The UK’s MRC funded Health eResearch Centre studied challenges experienced across research projects and developed the eLab software. This software can be used to create Virtual Research Environments (eLabs) for projects, offering researchers an integrated set of tools to meet common requirements. Here we present an evolving innovative approach and tools that can be used by researchers to address these challenges by making it simpler to standardise data management based on HL7 FHIR. The tools have been evaluated and successfully applied to harmonise and integrate data across multiple data centres as part of a large clinical study.

Project rationale, impact and innovation (3500 Characters): Health research increasingly requires large-scale collaboration across multi-institutional teams and disparate datasets. This can present a range of challenges that are complex to solve, and consume significant resource and therefore reduce research outputs. The Health eResearch Centre (www.herc.ac.uk) developed the eLab software in response to these recurring issues across multiple projects. This software can be used to create Virtual Research Environments (eLabs) for projects, offering researchers an integrated set of tools to meet common requirements.

Data: A key challenge in using big data for health research is pooling the data gathered by different communities. The source data are often managed using different IT systems and the mechanisms to query and extract data are different. The underlying data models (usually highly heterogeneous) also need to be understood; combining data requires data source expertise and consensus across many stakeholders. Research datasets are often difficult to share across project teams because only free-text descriptions are used to convey the meaning of the data. Standardising the approach used to integrate and share historic and prospective research data could significantly reduce the complexities and level of resource required to run these studies. There are existing standards and tools that can be used to address some of these challenges, but their adoption by the research community is hindered by the technical expertise required for their application. Here we present an innovative approach and tools to help researchers address these challenges by making it simpler to use and manage data based on HL7 FHIR.

Methods: In order to meet the information governance requirements on health data, it is desirable to minimise the movement and replication of datasets across different systems. Many data management systems are not integrated with computational infrastructure, and therefore analysts frequently transfer data out of repositories to remote resources to perform analysis. Here we present an integrated set of tools for performing analyses without the need to export the data; analysis code can be described, shared and executed within a single environment.

People: Many health projects work with data that must be managed according to well defined policies and procedures, and records must be kept to document conformance. There are existing tools that might be used to manage business process and that provide auditing capability, but they are often loosely coupled to the data management infrastructure. Here we present an integrated and extensible set of tools and workflows to manage common data management processes, such as requesting and approving data exports from FHIR databases for analysis. An eLab can bring geographically dispersed people into a single virtual environment for collaboration, supporting a shift to Team Science with inter-institutional and inter-disciplinary research models.

Infrastructure: Setting up secure IT infrastructure for the management of health data requires specialist staff with skills that can be difficult to find. It is simple for organisations to setup their own eLab, with many difficult tasks being fully automated. An automated deployment process that gathers, deploys and configures every component of a complex health data management system would significantly reduce the need for specialist staff and promote the development of best practice in information security.

Project design and implementation (7000 characters): Requirements were gathered across a range of data intensive projects that were part of the HeRC programme over a five-year period. A few key requirements were established:
- An eLab should not impose unnecessary complexity on projects
- An eLab should align with current working practices
- An eLab should be extensible
- An eLab should be simple to install and configure
- An eLab should be designed with careful consideration of data security
- The final eLab software should be available for other projects to use

These key requirements and the challenge areas in Section 3 were used to inform architecture decisions. Existing open source software were reviewed to see if they could be used to meet some of the requirements. Key considerations were license implications and the availability of APIs to enable integration with other components. This significantly reduced the effort to develop the software and documentation.

An initial priority was that the design should allow projects to only use the features of the eLab that they require and not need to consider other features that may add complexity. This was achieved by grouping features into three ‘User Levels’. To use an eLab, users are only required to understand features within a level and those within lower, less complex, levels. Level one includes features such as content management, user access controls and business process management. Level 2 includes metadata cataloguing tools and Level 3 includes tools for importing, querying and exporting research data using FHIR.

A review was undertaken of existing software that could be used to provide a feature set that matched our requirements and that provided APIs and SDKs that enabled extensions to be developed. The Alfresco Content Management System was adopted as a core component for providing the main user interface to an eLab (through Alfresco Share).

The installation and configuration of an eLab has been designed to be fully automated using configuration management tools including Ansible and recent developments in DevOps including Docker Swarm.

The eLab has been developed in partnership with the Trustworthy Research Environment team [1] at the University of Manchester to ensure best practice in secure software development.

Data
We reviewed existing standards and tools that might be used to address the data management challenges. We concluded that FHIR was the most appropriate match for the use cases, with rapidly increasing adoption and better support for questionnaire data. Data would be moved between eLab components using the standard FHIR REST API and would be exchanged as FHIR XML bundles. Integrating data across datasets often requires the creation of standard data models that must be mapped to existing models. We initially worked with datasets that included types of data that were common across many of the other datasets. We reviewed the existing FHIR standard (STU3) for Structure Definitions that might be used for these standard models. Where this was not possible, we created Profiles on existing Structure Definitions that included extensions and restrictions as required. We used this as an opportunity to formalise some of the text based descriptions of the variables and codes. This was achieved by creating FHIR Code Systems and Value Sets that mapped to SNOMED and LOINC. Collaboration is a key part of developing these models and we use SIMPLIFIER.NET as a tool to visualise and discuss these FHIR components. It is our plan to further develop a community of researchers that can refine and extend these models, to increase adoption and therefore the benefits of standardisation.

Many of our research projects work with historic datasets and we needed to ensure that an eLab could work with these data. Data are often stored and transferred in tabular structures, commonly as statistics package files or comma separated value files. We needed to develop the eLab to manage these files in their exiting formats, but also to enable transformation to a more standardised format for future use to address some of the issues discussed in Section 3. Tools were created that can be used to transform tabular data to FHIR XML for import into existing FHIR systems. A mapping language was developed for this purpose, alongside tooling that can be used to read maps and CSV data files and transform to FHIR XML. This tool will also validate against the standard FHIR Structure Definitions and eLab FHIR Profiles.

We did not have an existing database for standardised data and therefore decided not to create a FHIR façade, but to use an existing FHIR database. We have used both the HAPI FHIR database [2] and the Vonk FHIR Server [3] for storage. We currently use Vonk for our production eLabs due to the availability of a support package.

We investigated the data import process, carefully considering the process for large datasets using the REST API, including transaction management, splitting FHIR XML bundles and managing cross-bundle resource references. We are currently exploring the changes required for an OAuth2 approach for managing authorisation.

The FHIR data is made available to researchers through a tool that we developed called the FHIR Explorer. This tool can be used to construct a FHIR query and build FHIRPath expressions that can be used to export data from the eLab. The user is presented with the number of FHIR resources that match different Structure Definitions, including the Profiles that we have developed. Data can be selected based on the Structure Definitions and conditions can be placed using Search Parameters. Users are also able to select the resource elements that they wish to export and map those elements to columns in a data table. This data table is currently exported as a CSV file, but we are currently adding options to export as R and SPSS files.

The FHIR Explorer is integrated with an auditable export approval workflow. When a user builds a query and starts an export, they must also acknowledge terms of usage and upload a justification document. Only after access is approved will the query be executed and the results returned to the user.

Methods
JupyterHub has been integrated into eLabs, enabling researchers to share their analysis code via Jupyter Notebooks. This gives users the ability to share live code alongside equations, visualisations and narrative text, simplifying their co-creation and verification of results. We are currently working on direct access to the FHIR database from Jupyter instances. eLabs also supports analysis using other systems including HTCondor.

People
eLabs make tools available to users through dashboards that can be configured and extended based on the specific needs of a project. Alfresco Share is used to provide the dashboard and Alfresco modules have been developed to support tools including the FHIR Explorer. Users can manage custom business process through the dashboard, including receiving notifications and assigning tasks.

Project evaluation and sustainability (3500 characters): Our evaluation criteria for the eLab software are derived from the objectives outlined in funding applications and community guidelines for evaluation (https://www.software.ac.uk/resources/guides-everything/software-evaluation-guide). We also consider reported clinical outcomes in assessing impact.

For evaluation against project objectives, this usually involves regular user feedback and refinement of software to meet project needs. In some cases, the software outputs have been externally reviewed as deliverables by funders, including for EU projects. This form of assessment usually relates to eLab extensions or modules. We regularly review community guidelines and make decisions to address key criteria. These usually relate to non-functional requirements that may not be specified in funding applications. Some examples are identified in section 3, including: ensuring that the platform is simple to install and configure and that system complexity is reduced by decoupling features.

An initial evaluation of the FHIR tools has been performed by testing their ability to support the integration and management of data across 12 cohorts and 9064 children as part of the NIH funded ECHO CREW project [6]. The dataset covers 65 variables that relate to demographics, clinical measurements and observations. The dataset also includes 8102 mothers from 11 cohorts covering an additional 58 variables. The investigation proved successful and this dataset will be used further, in combination with other datasets, to support research studies undertaken by CREW investigators. Thus far, 10 manuscripts are in progress and have required data elements from this dataset to support analysis. This was an initial major goal for the FHIR tools and demonstrated successful implementation at this stage in development.

The eLab was originally developed as a research tool that could be used to support the HeRCs activities. We are in the process of migrating this to a community driven open source software project to improve longer term sustainability. We have established a developer base across different teams, and even continents, to try to reduce dependency on a single team. As part of this process, increased priority is being given to other evaluation criteria, including identity, documentation and publication. We will also seek external evaluation of the software.

Twitter project summary (140 characters): eLab: Research environments for multi-site collaboration supported by FHIR data management and analysis! #DataSavesLives #TeamScience

How is FHIR used in the App being demonstrated (500 characters)? : FHIR is used to standardise the interfaces to our data storage systems. All data are accessed using the FHIR REST API and transferred between eLab components as FHIR XML. FHIR has also been used as a way to develop standard data models for integrated datasets. Where standard FHIR resources do not meet requirements, we have developed resources such as Profiles, Extensions, Code Systems and Value Sets. We have used tools such as SIMPLIFIER.NET to support collaboration around standard models.

1. What FHIR release does your application use? (500 characters)?: The current eLab software supports any FHIR resource from STU3 with plans to move to R4 in the very near future. The eLab tools also support the use of custom conformance resources and we have developed our own Profiles, Extensions, Code Systems, Value Sets and Search Parameters.

What is the data source for the FHIR resources and how are the FHIR resources accessed? (500 characters): The Vonk FHIR Server [3] is used in our production eLabs to store FHIR data. The FHIR resources are generated by mapping and transforming research data sets into FHIR XML. eLab components interact with a Vonk server using the FHIR REST API and through the exchange of XML bundles.

Any other information about the project we should know about (1500 characters)?: References
1. The Trustworthy Research Environment, https://www.herc.ac.uk/tre/
2. The HAPI FHIR Server, https://hapifhir.io/
3. The Vonk FHIR Server, https://fire.ly/products/vonk/
4. The Asthma eLab, https://www.herc.ac.uk/case_studies/asthma-elab-stelar/
5. Gern J, Jackson DJ, Lemanske RF, et al. The Children's Respiratory and Environmental Workgroup (CREW) birth cohort consortium: design, methods, and study population. Respiratory Research (2019) 20:115 https://doi.org/10.1186/s12931-019-1088-9

Acknowledgements
We would like to thank the following organisations for their support:

The CREW project, funded by HHS/NIH grant 5UG3OD0232821

The STELAR consortium, funded by Medical Research Council grant MR/K002449/1

The Health e-Research Centre, University of Manchester, funded by Medical Research Council grant MR/K006665/1

The iFAAM project, funded by the European Union through grant agreement no 312147

We would also like to thank the following individuals for their significant contribution to the eLab project.

Prof Iain Buchan, The University of Liverpool, UK
Prof Adnan Custovic, Imperial College London, UK
Prof Clare Mills, The University of Manchester, UK
Mr David Kemp, x2764tech Limited, UK
Ms Ruth Norris, The University of Manchester, UK
Ms Victoria Turner, The University of Manchester, UK
Dr Adam Nunez, The University of Wisconsin, USA
Dr Yiqiang Song, The University of Wisconsin, USA
Ms Laura Ladick, The University of Wisconsin, USA

Authors:

Benjamin Green (Presenter)
The University of Manchester

Philip Couch, The University of Manchester
Eneida Mendonca, Indiana University
Lisa Gress, University of Wisconsin
Andrew Jerrison, The University of Manchester
Stephen Lloyd, The University of Manchester
Umberto Tachinardi, Indiana University
James Gern, University of Wisconsin
John Ainsworth, The University of Manchester

Tags