Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve

Personal Health Train
GO-FAIR ON BIOMEDICAL DATA
OPEN INSIGHTS SEMINAR, Harvard Medical School DBMI, February 15, 2018, BOSTON
Kees van Bochove
Founder & CEO, The Hyve
@keesvanbochove

2
Outline
 Introduction & Background
 The Personal Health Train concept
 Principles & Design Patterns
 Implementation Drivers

4
The Hyve
Advance biology and medical research…
… by building and serving thriving open source communities
Services
Professional support for
open source software in
biomedical informatics
 Software development
 Data engineering
 Consultancy
 Hosting / SLAs
Core values
Share
Reuse
Specialize
Office Locations
Utrecht, The Netherlands
Cambridge, MA, United States
Customers
Pharma Companies
Academic Medical Centers
Biobanks, registries, patient
organisations
Health Data Networks
Fast-growing
Started in 2012
40+ people by now

Interdisciplinary team
software engineers, data scientists, project managers & staff; expertise in
bioinformatics, medical informatics, software engineering, biostatistics etc.
5

6
4 Groups at The Hyve
 Translational Research
 Cancer Genomics
 Real World Data
 Real World Evidence
 Wearable Sensors
 Research Data Management

3.
FAIR DATA PRINCIPLES
7 https://www.dtls.nl/fair-data/fair-principles-explained/

Findable:
F1. (meta)data are assigned a globally unique
and persistent identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the
identifier of the data it describes;
F4. (meta)data are registered or indexed in a
searchable resource;
http://www.nature.com/articles/sdata201618
FAIR Data
Accessible:
A1. (meta)data are retrievable by their identifier
using a standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication
and authorization procedure, where necessary;
A2. metadata are accessible, even when the data
are no longer available;
Reusable:
R1. meta(data) are richly described with a plurality of accurate
and relevant attributes;
R1.1. (meta)data are released with a clear and accessible data
usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community standards;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other (meta)data;

10
Hourglass model
of the internet
 Many different applications
 One standard stack: TCP/IP
 Many different network
implementations
E-commerce 2014, © 2014 Pearson Education, Inc.

11
https://www.go-fair.org/implementation-networks/overview/go-fair-personal-health-train/

13
Health Data Research Infrastructures

Barriers to sharing data
[..] the problem is not really technical […]. Rather, the problems are ethical,
political, and administrative.
Lancet Oncol 2011;12:933
1. Administrative (I don’t have the resources)
2. Political (I don’t want to)
3. Ethical (I am not allowed to)
4. Technical (I can’t)

Clinical Data Landscape
• Clinical research
• 3% of patients
• 100% of features
• 5% missing
• 285 data points
• Clinical registries
• 100% of patients
• 3% of features
• 20% missing
• 240 data points
• Clinical routine
• 100% of patients
• 100% of features
• 80% missing
• 2000 data points
Data elements
Patients

A different approach
 If sharing is the problem: Don’t share the data
 If you can’t bring the data to the learning application
 You have to bring the learning application to the data
 Consequences
 The learning application has to be distributed
 The data has to be understandable by an application (i.e. not a human)

17
Which lung cancer patient is likely to survive?
From Andre Dekker, MAASTRO clinic

18
MDs vs guidelines vs predictive modelling
Radiotherapy and Oncology 2014 July; 112(1): 37–43. DOI:10.1016/j.radonc.2014.04.012
Cary Oberije et al., A prospective study comparing the predictions of doctors versus models for treatment outcome
of lung cancer patients: a step towards individualized care and shared decision making

Radiotherapy and Oncology 2016 121, 459-467DOI: (10.1016/j.radonc.2016.10.002)
Arthur Jochems et al., Andre Dekker, MAASTRO clinic
Distributed learning on
Electronic Health Records
“bring the analysis to the data”

20
Decision Support Systems
From Andre Dekker, MAASTRO clinic

21
Learning Health Care System
Thomas M. Maddox et al. Circulation. 2017;135:e826-e857

2.
PERSONAL HEALTH TRAIN
22 https://www.dtls.nl/fair-data/personal-health-train/

23
Personal Health Train Principles
 Control over data. The PHT empowers citizens, public or private
organisations to manage,safeguard, and share their data for use in
healthcare or scientific research.
 Reusable personal health data. The PHT is a shared digital infrastructure
based on standardsand protocols adhering to the FAIR principles (i.e.,
digital resources are Findable, Accessible, Interoperable, and Reusable).
 Distributed and federated solutions. The PHT architecture relies on
distributed and federatedlearning and decision support where possible.
Data stay where they are and are processed at their location of origin,
unless distributed solutions are not (yet) available or do not suffice.
 Responsible use of personal health data. We will act to enable and
ensure the responsible useof personal data by adopting international
principles and regulations, including the FACT principles (Fairness,
Accuracy, Confidentiality, Transparency), privacy-by-design, privacy-by-
default, and the General Data Protection Regulation (GDPR).
 Ethics-by-design. We commit to optimise the facilities to judge the
ethical aspects of research questions and enable blocking and reporting
of studies that abuse personal data.
 An open ecosystem for innovation in health and well-being. Everyone
subscribing these guiding principles can contribute to the development
of the PHT. We will adopt and develop open standards and protocols.
We strive to avoid single-vendor solutions that create single points of
failure for any critical component of the shared infrastructure. The core
infrastructure will be a common and public good.
 Registration at the source. Data should be captured only once,
promoting efficiency by avoiding repetitive work. Copying or moving
source data by authorised individuals should be made explicit in the
data provenance and limited as much as possible.
 Machine-readability at the core. We focus on creating machine-
readable and interpretable data, metadata, workflows, and services,
aiming for maximal interoperability between diverse systems including
electronic patient records. Machine-readable data will be
accompanied by human-readable versions in different languages for
different audiences (professionals, citizens).

27
Design Patterns
 FAIR Principles
 Hourglass Model: the standards stack
 Microservices Architecture
 Privacy by Design

28
Improving Data Findability with FAIR tools
 To operationalize the FAIR
principles, scientist need tools
 For example, publishing your
data in a catalogue can help
improve Findability
 However, even for flagship
scientific data catalogues,
manual work is required to
comply
preliminary work, a white paper is being prepared by my
colleagues Jarno and Carolyn to score various open
source scientific data catalogues using the new FAIR
Metrics (https://github.com/FAIRMetrics/Metrics)

29
Improving accessibility of data
The easy answer is to publish
scientific data to open
repositories, or e.g. EGA for
controlled access data
However, that is the tip of the
iceberg -- the ‘deep web’ of life
science data is on USB sticks and
in corporate repositories
The open source Podium Request Portal
for requesting data and samples that The
Hyve developed for BBMRI-NL

Computable Consent
•Computable Consent is currently still
mostly a research topic, yet this is crucial
for implementing FAIR principles for life
science data
•There are candidate ontologies such as
DUO (see screenshot on the right),
GA4GH ADA-M etc. but they are not
much battle-tested (see e.g. the GDPR
rights, information obligations etc.)
•Even if we nail this for research purpose,
there is still the issue of implementing
this into the operational healthcare and
clinical trial workflows!
https://doi.org/10.1371/journal.pgen.1005772

31
And then there is Interoperability...
https://youtu.be/C
95pl11zdAs
Operational: HL7 FHIR,
RIM, SMART on FHIR,
DCM’s, OpenEHR etc etc.
Research & Trials:
i2b2/tranSMART, OMOP,
HPO, ICD, SNOMED-CT,
LOINC, ….

Health Research Infrastructure
Researchers
Research
Datasets
Data Access
Requests /
Grants
Apps/
Workflows
Patients
Consents
Data Catalogue Patient Finder
Private
Analysis
Environment
Projects /
Billing
Samples
Clinical Study
Environment
Sample
Requests
Live Clinical
Data
Data Models
Apps
Micro-
services
Impleme
ntation
ideas
Data Catalogue
Sample
Requests
Apps/
Workflows
Research
Datasets
Clinical Study
Environment
Servers
Live Clinical
Data
ResearchersPatientsConsentsSamplesPatient FinderData Models
Data Access
Requests /
Grants
Private
Analysis
Environment

33
Conclusions
 Make your data FAIR
 To survive in & contribute to modern academic research
 To effectively combine public with in house data
 Prepare for the shift towards personalized medicine & health
 Data sharing via distributed learning rather than by data copying
 Use of observational data in conjunction with clinical trial data
 Focus on outcomes and health economics
 … and eventually move towards a Learning Health Care System!

35
Health data models
OMOP CDM OpenEHRi2b2/tranSMART

36
OMOP Common Data Model v5.0
❖ OMOP =
Observational
Medical
Outcomes
Partnership
❖ CDM = Common
Data Model

38
OpenEHR Archetype
http://openehr.org/ckm/

Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve

Ähnlich wie Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve (20)

Mehr von Kees van Bochove

Mehr von Kees van Bochove (14)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve