Keynote presentation by Wessel Kraaij at the Dutch pattern recognition and impage processing society (NVPBV) 29/5/2018, Eindhoven.
This talk discusses
1. trends in health care and respondible data science and their intersection
2. Secure federated analytics on distributed data repositories
3. Generating clinically relevant hypotheses from patient forum discussions.
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Improving health care outcomes with responsible data science
1. Improving health care outcomes with
responsible data science
Wessel Kraaij
Leiden University & TNO
w.kraaij@liacs.leidenuniv.nl
NVPHBV spring meeting 2018
2. Outline
1. Overview of value based
approaches to health care and
data science
2. Secure federated analysis of
health data
3. Patient Empowerment:
Patient Forum Miner
3. Digital Health
• Improving Health care outcomes using data science
(and AI)
• Includes E-health, M-health etc
• What do we want? simultaneously:
1. Better outcomes
2. Better access to healthcare (best healthcare for all)
3. Reducing cost
12. Machine readable, reusable data : FAIR
• Findable
• Accessible (under well defined conditions)
• Interoperable: Machine readable
• Reusable
• Secundary goal: combine datasets for
analysis
• FAIR does not imply open!!
Michel Dumontier, Maastricht University
13. GO FAIR
• Internet of FAIR data and services
• FAIR Metrics
• Implementation Networks (IN)
• Example IN: Personal Health Train
14. Responsible
Data Science
• Fairness: avoiding prejudice due to
e.g. biased training data
• Accuracy: quality of information,
prediction , how certain?
• Confidentiality: GDPR, also
commercially confidential
• Transparency: Can we explain
conclusions drawn from big data?
(opening the Black Box)
15. FAIR data (reusability)
Fairness (avoiding bias)
Accurate (incl AI)
Confidential
Transparent (incl AI)
HCI (incl AI)
Communicating with
(a diverse group of) patients
Improving outcomes of health
care system (care, cure,
prevention) using different
types of data
Making choices based on
normalized added health value
per cost unit
Relating data science
challenges to health
care trends
16. 1. Some
conclusions
• Combining / comparing personal data across
individuals is needed for personalized,
preventive health
• Digital Health challenges pose quite relevant
use cases for Data Science, AI and HCI
• Existing costing/business models may actually
be a barrier for better patient value
18. Quantified self
19
bron: MIT
A movement of citizens and
‘makers’ that aim to explore the
possibilities of self-tracking.
Gary Wolf (Wired): “Almost
everything we do generates
data”.
bron: RescueTime
WHAT COULD WE LEARN IF
we could measure and record activities, social context,
environmental context, physiological parameters, food
intake, sleep across our entire lifetime? (sometimes called
Exposome). Now we have sensors for most of these..
WE CAN LEARN:
Personalized health advice based on systems view and
population data
Integrating evidence based medicine and data driven
predictions
19. The mutual dependence of personal and
population health data
20Bron: cbw.ge en wikimedia.org
Van ‘big’ naar ik
20. Towards personalized e-advice
Model
Personal health profile
(longitudinal data)
Population health profiles
Health professional: duty of confidentiality
For e-services:
How can we avoid that competitors learn the model?
How can we avoid that a health app copies personal data?
21. Health Data Challenges
• Data is organized around treatments
• Silos
• Heterogeneous data
• Patient generated data not linked
• Need patient/citizen centered data
• Undo fragmentation
• Link person generated data
• Deal with uncertainty and imperfections
• Quantify accuracy
• Missing data
22. Some barriers for statistical analysis and ML
• Data is horizontally partitioned
• Distributed learning
• Personal Health Train
• Data is vertically partitioned
• Existing practice: Trusted 3rd party (TTP)
• Health Data Cooperative (e.g. Midata)
• Secure multiparty computation
ID age GP visits gender
1 55 70 M
2 45 60 F
ID Age gender
1 55 M
2 45 F
3 20 F
4 22 M
ID age GP visits gender
3 20 25 F
4 22 10 M
ID GP visits
1 70
2 60
3 25
4 10
23. Sign the manifesto at:
Personal health train
• Distributed data
• FAIR data stations
• Bring the algorithm to the data
• Approximate global analysis
• FAIR data stations include
• Clinical repositories
• Personal lockers (PGO)
• Umbrella stations (e.g. HDC)
• Trains implement secure workflow for
researchers and patients/citizens
https://www.dtls.nl/phtmanifesto/
24. NWA startimpuls
VWDATA
• Resonsible Data Science
• FACT, FAIR
• Learning from distributed data
• Develop secure algorithms for
learning from a ‘join’ of
databases with
confidential/personal data
25. Build prognostic model using secure regression
H2020 BigMediLytics
• Developing “secure regression” using
secure multiparty computation
methods
• Aim: improve KPI with 20% in large
scale trials
• Erasmus MC Heart Failure pilot
• KPI’s
• Increase medication compliance
• Decrease number of hospital re-
admissions
Ergo
Achmea
claims
EMC clinical
data
Develop and evaluate intervention informed by
the joint longitudinal dataset (based on risk and
cost analysis)
26. Proposed solution: Multi party computation
• Several data owners intend to perform to ‘learn’ from the combination
of the datasets
• However, they cannot release their data
• MPC enables to overcome this barrier:
• learning without disclosing
• Extensive communication protocol between partners
• Proven non disclosure
• TNO/CWI/Philips are developing and evaluating MPC methods
• homomorphic encryption
• garbled circuits
• secret sharing
28. Assessment
• Cryptographic MPC protocols have been around for decades
• Applicability was hampered by inefficiency of several protocols
• Recent advances in protocols and computing power, increased
attention for privacy have resumed interest in development of MPC
• MPC has the potential to become a crucial enabler for improving
health and health care by learning from data
29. 2.
Conclusions
Challenge is to fight fragmentation
(need collaboration, usual model is
competition)
It is crucial to restore the balance in
data governance in order to avoid that
platform companies increasingly control
crucial aspects of our civil society
NL well positioned to take the lead in
building a public infrastructure for
health data analysis.
30. 3. Patient
Forum Miner
An example of collective “empowered” participation: Patient
Platform Sarcomas
32. Rare cancer challenge
• Low incidence: small patient cohorts for
research
• Limited data in national registration
• Need for international cooperation
• Limited budgets for research
• In most hospitals not enough
experience to treat rare cancers: need
for concentration
36. Discussion forums:
● Information on:
○ Development of disease, treatment, side-effects
○ Strategies to cope with side-effects
○ Adherence to therapy
○ Co-morbidity
○ Quality of life
● Also: lots of emotional discussions, support
● Use data mining, NLP, machine learning to extract valuable information
caregiver for mother dx with GIST 12/31/12
31x19x9 cm primary tumor removed 12/12/12
400 mg gleevec 1/15/13 to 12/6/13
Recurrence and removal of 3 masses from
the omentum (3.5 to 4.4 cm) on 12/6/13
Sutent 25 mg. since 1/15/14 37.5 mg. since
8/14 (failed Sutent 6/15)
Sorefinib (naxavar) 7/3/15
more tumors removed 7/13/15
Votrient 9/4/15 to 12/30/15
50 tumors removed 1/16/16’
Wildtype Gist NO Kit or PDFRA mutation, SDH
intact
37. Analysis method
Public
Facebook
Group
4 year
40,000 posts
UMLS tagger
Word2Vec
Categories
Database
UMLS: thesaurus with medical terms
Word2Vec: semantic relationships
Summarisation: machine learning
Elastic search
User interface: relevant posts
Automatic summarisation
39. Examples of findings
Normal dose of imatinib for GIST patients is 400 mg per day.
In case of progression: increased to 800 mg per day.
In one hospital, policy was to advise patients to take full 800 mg at dinner.
Question: would it be better to split the dose?
Based on forum discussion analysis: yes, most patients experience less side
effects when splitting the dose.
Based on result, policy of hospital was changed.
Something else was found: patients reported that taking the medicine with
dark chocolate greatly diminishes nausea.
40. Example: co-morbidity
○ Searching for patients with PDGFRa mutation we found
two patients who mentioned that they also had thyroid
cancer.
○ Searching for this specific combination we found around
20 cases
○ coincidence or is there a relationship?
platelet-derived growth factor receptor α (PDGFRA) mutation, occurs in 10% of GIST cases
41. Example: research agenda
● Patients participate in agenda setting for research in university
hospital
● Longlist of topics was composed on basis of analysis of
(inter)national discussion fora: What is important for patients?
● Longlist was used to consult patients through web survey:
What are your priorities and why?
● 75 patients responded
● 3 topics were selected and are now included in a PhD project
● 1 topic could be addressed on basis of existing knowledge
published in magazine of patient organisation
42. OPTION THEME TOTAL #
PER THEME
1A Surgery of metastases 31
1B RFA 11
1C Embolisation 9
2A Cramps and nausea 8
2B Skin problems 5
2C Fatigue 22
2D Long term side effects of imatinib 30
3A Best way to take imatinib 21
3B Interaction with food 28
4A Last phase of disease 17
4B Micro-gists 8
4C Hereditary aspects 15
4D Combination with other cancers 6
TOTAL 211
43. Further development
● Building experience: project with 4 other cancer patient organisations in
the Netherlands, combine with surveys, build experience and improve
algorithms
● Strengthening scientific basis: detecting patterns, quantification,
statistical analysis, validation, filtering by linking to existing knowledge,
clinical testing of hypotheses (PhD project)
44. 3. CONCLUSIONS
● Patient experience discussion forums can be very
valuable, in particular in case of rare diseases;
worldwide community
● Patient forums can reveal unexpected patterns
and can provide information that would
otherwise never reach doctors
○ Care must be taken to avoid experimenter bias
● This project demonstrates the power of
partnership between patients and medical
professionals
45. Finally,..
• Improving Health Care affects us all:
• Data, algorithms can support patients and care
professionals.
• Patient value should be more important than
shareholder value
• Health data is sensitive, people should be in
control regarding access.
• GDPR stimulates innovation in data science
• Explainability
• Protection of personal data
46. References
• P4
• Hood, Leroy, and Stephen H. Friend. 2011. “Predictive, Personalized, Preventive, Participatory (P4) Cancer Medicine.” Nature Reviews Clinical Oncology 8
(3): 184–87. doi:10.1038/nrclinonc.2010.227.
• VBHC
• Porter, Michael E. 2010. “What Is Value in Health Care?” New England Journal of Medicine 363 (26): 2477–81. doi:10.1056/NEJMp1011024.
• FAIR
• GO-FAIR: https://www.go-fair.org/
• Wilkinson, Mark D., Michel Dumontier, Ijsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR
Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (March): 160018. doi:10.1038/sdata.2016.18.
• FACT
• http://www.responsibledatascience.org/
• NWA VWDATA
• https://wetenschapsagenda.nl/start-vwdata-onderzoeksprogramma/
• Personal Health Train
• Damiani, Andrea, Mauro Vallati, Roberto Gatta, Nicola Dinapoli, Arthur Jochems, Timo Deist, Johan van Soest, Andre Dekker, and Vincenzo Valentini. 2015.
“Distributed Learning to Protect Privacy in Multi-Centric Clinical Studies.” In The 15th Conference on Artificial Intelligence in Medicine, edited by J. H.
Holmes, R. Bellazzi, L. Sacchi, and N. Peek, 65–75. Pavia, Italy: Springer. http://eprints.hud.ac.uk/23905/.
• Secure multiparty computation
• Meilof VEENINGEN, Supriyo CHATTERJEA, Anna Zsófia HORVÁTH, Gerald SPINDLER, Eric BOERSMA, Peter van der SPEK, Onno van der GALIËN, Job
GUTTELING, Wessel KRAAIJ, and Thijs VEUGEN. Enabling Analytics on Sensitive Medical Data with Secure Multi-Party Computation. In Proceedings of
Medical Informatics Europe 2018, Gothenburg, 2018.
• Patient forum miner
• Oortmerssen, Gerard van, Stephan Raaijmakers, Maya Sappelli, Erik Boertjes, Suzan Verberne, Nicole Walasek, and Wessel Kraaij. 2017. “Analyzing Cancer
Forum Discussions with Text Mining.” In Proceedings of Second International Workshop on Extraction and Processing of Rich Semantics from Medical
Texts. Vienna.