In this talk, the Personal Health Train concept will be introduced, which enables running personalized medicine workflows as trains visiting data stations (e.g. hospital records, primary care records, clinical studies and registries, patient-held data from e.g. wearable sensors etc.) The Personal Health Train is a very powerful concept, which is however dependent on source medical data to be coded with appropriate metadata on consent, license, scope etc. of the data, and the data itself to be encoded using biomedical data standards, which is an ever growing field in biomedical informatics. In order to realize the Personal Health Train biomedical data will need to be FAIR, i.e. adopt the FAIR Guiding Principles. This talk will cover the emerging GO-FAIR international movement, and provide examples of how several European health data networks currently are adopting open standards based stacks, to enable routine health care data to be come accessible for research.
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
1. Personal Health Train
GO-FAIR ON BIOMEDICAL DATA
OPEN INSIGHTS SEMINAR, Harvard Medical School DBMI, February 15, 2018, BOSTON
Kees van Bochove
Founder & CEO, The Hyve
@keesvanbochove
2. 2
Outline
Introduction & Background
The Personal Health Train concept
Principles & Design Patterns
Implementation Drivers
4. 4
The Hyve
Advance biology and medical research…
… by building and serving thriving open source communities
Services
Professional support for
open source software in
biomedical informatics
Software development
Data engineering
Consultancy
Hosting / SLAs
Core values
Share
Reuse
Specialize
Office Locations
Utrecht, The Netherlands
Cambridge, MA, United States
Customers
Pharma Companies
Academic Medical Centers
Biobanks, registries, patient
organisations
Health Data Networks
Fast-growing
Started in 2012
40+ people by now
5. Interdisciplinary team
software engineers, data scientists, project managers & staff; expertise in
bioinformatics, medical informatics, software engineering, biostatistics etc.
5
6. 6
4 Groups at The Hyve
Translational Research
Cancer Genomics
Real World Data
Real World Evidence
Wearable Sensors
Research Data Management
8. Findable:
F1. (meta)data are assigned a globally unique
and persistent identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the
identifier of the data it describes;
F4. (meta)data are registered or indexed in a
searchable resource;
http://www.nature.com/articles/sdata201618
FAIR Data
Accessible:
A1. (meta)data are retrievable by their identifier
using a standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication
and authorization procedure, where necessary;
A2. metadata are accessible, even when the data
are no longer available;
Reusable:
R1. meta(data) are richly described with a plurality of accurate
and relevant attributes;
R1.1. (meta)data are released with a clear and accessible data
usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community standards;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other (meta)data;
14. Barriers to sharing data
[..] the problem is not really technical […]. Rather, the problems are ethical,
political, and administrative.
Lancet Oncol 2011;12:933
1. Administrative (I don’t have the resources)
2. Political (I don’t want to)
3. Ethical (I am not allowed to)
4. Technical (I can’t)
15. Clinical Data Landscape
• Clinical research
• 3% of patients
• 100% of features
• 5% missing
• 285 data points
• Clinical registries
• 100% of patients
• 3% of features
• 20% missing
• 240 data points
• Clinical routine
• 100% of patients
• 100% of features
• 80% missing
• 2000 data points
Data elements
Patients
16. A different approach
If sharing is the problem: Don’t share the data
If you can’t bring the data to the learning application
You have to bring the learning application to the data
Consequences
The learning application has to be distributed
The data has to be understandable by an application (i.e. not a human)
17. 17
Which lung cancer patient is likely to survive?
From Andre Dekker, MAASTRO clinic
18. 18
MDs vs guidelines vs predictive modelling
Radiotherapy and Oncology 2014 July; 112(1): 37–43. DOI:10.1016/j.radonc.2014.04.012
Cary Oberije et al., A prospective study comparing the predictions of doctors versus models for treatment outcome
of lung cancer patients: a step towards individualized care and shared decision making
19. Radiotherapy and Oncology 2016 121, 459-467DOI: (10.1016/j.radonc.2016.10.002)
Arthur Jochems et al., Andre Dekker, MAASTRO clinic
Distributed learning on
Electronic Health Records
“bring the analysis to the data”
23. 23
Personal Health Train Principles
Control over data. The PHT empowers citizens, public or private
organisations to manage,safeguard, and share their data for use in
healthcare or scientific research.
Reusable personal health data. The PHT is a shared digital infrastructure
based on standardsand protocols adhering to the FAIR principles (i.e.,
digital resources are Findable, Accessible, Interoperable, and Reusable).
Distributed and federated solutions. The PHT architecture relies on
distributed and federatedlearning and decision support where possible.
Data stay where they are and are processed at their location of origin,
unless distributed solutions are not (yet) available or do not suffice.
Responsible use of personal health data. We will act to enable and
ensure the responsible useof personal data by adopting international
principles and regulations, including the FACT principles (Fairness,
Accuracy, Confidentiality, Transparency), privacy-by-design, privacy-by-
default, and the General Data Protection Regulation (GDPR).
Ethics-by-design. We commit to optimise the facilities to judge the
ethical aspects of research questions and enable blocking and reporting
of studies that abuse personal data.
An open ecosystem for innovation in health and well-being. Everyone
subscribing these guiding principles can contribute to the development
of the PHT. We will adopt and develop open standards and protocols.
We strive to avoid single-vendor solutions that create single points of
failure for any critical component of the shared infrastructure. The core
infrastructure will be a common and public good.
Registration at the source. Data should be captured only once,
promoting efficiency by avoiding repetitive work. Copying or moving
source data by authorised individuals should be made explicit in the
data provenance and limited as much as possible.
Machine-readability at the core. We focus on creating machine-
readable and interpretable data, metadata, workflows, and services,
aiming for maximal interoperability between diverse systems including
electronic patient records. Machine-readable data will be
accompanied by human-readable versions in different languages for
different audiences (professionals, citizens).
27. 27
Design Patterns
FAIR Principles
Hourglass Model: the standards stack
Microservices Architecture
Privacy by Design
28. 28
Improving Data Findability with FAIR tools
To operationalize the FAIR
principles, scientist need tools
For example, publishing your
data in a catalogue can help
improve Findability
However, even for flagship
scientific data catalogues,
manual work is required to
comply
preliminary work, a white paper is being prepared by my
colleagues Jarno and Carolyn to score various open
source scientific data catalogues using the new FAIR
Metrics (https://github.com/FAIRMetrics/Metrics)
29. 29
Improving accessibility of data
The easy answer is to publish
scientific data to open
repositories, or e.g. EGA for
controlled access data
However, that is the tip of the
iceberg -- the ‘deep web’ of life
science data is on USB sticks and
in corporate repositories
The open source Podium Request Portal
for requesting data and samples that The
Hyve developed for BBMRI-NL
30. Computable Consent
•Computable Consent is currently still
mostly a research topic, yet this is crucial
for implementing FAIR principles for life
science data
•There are candidate ontologies such as
DUO (see screenshot on the right),
GA4GH ADA-M etc. but they are not
much battle-tested (see e.g. the GDPR
rights, information obligations etc.)
•Even if we nail this for research purpose,
there is still the issue of implementing
this into the operational healthcare and
clinical trial workflows!
https://doi.org/10.1371/journal.pgen.1005772
31. 31
And then there is Interoperability...
https://youtu.be/C
95pl11zdAs
Operational: HL7 FHIR,
RIM, SMART on FHIR,
DCM’s, OpenEHR etc etc.
Research & Trials:
i2b2/tranSMART, OMOP,
HPO, ICD, SNOMED-CT,
LOINC, ….
32. Health Research Infrastructure
Researchers
Research
Datasets
Data Access
Requests /
Grants
Apps/
Workflows
Patients
Consents
Data Catalogue Patient Finder
Private
Analysis
Environment
Projects /
Billing
Samples
Clinical Study
Environment
Sample
Requests
Live Clinical
Data
Data Models
Apps
Micro-
services
Impleme
ntation
ideas
Data Catalogue
Sample
Requests
Apps/
Workflows
Research
Datasets
Clinical Study
Environment
Servers
Live Clinical
Data
ResearchersPatientsConsentsSamplesPatient FinderData Models
Data Access
Requests /
Grants
Private
Analysis
Environment
33. 33
Conclusions
Make your data FAIR
To survive in & contribute to modern academic research
To effectively combine public with in house data
Prepare for the shift towards personalized medicine & health
Data sharing via distributed learning rather than by data copying
Use of observational data in conjunction with clinical trial data
Focus on outcomes and health economics
… and eventually move towards a Learning Health Care System!