The use of smart devices and wearables is becoming increasingly popular. This allows patients to be continuously monitored and provides a huge amount of health-related data that, if properly analyzed, can be used to improve their health by predicting potential future conditions. Advanced machine learning techniques do permit such analysis, and thus serve to forecast the evolution and health challenges of individual patients. This includes, for instance, issues as critical as early detection of heart disease.
But, moreover, the whole healthcare sector is currently undergoing a profound transformation. The rich profusion of digital data is fostering a move from more traditional approaches towards a data-driven prevention model.
In this talk I survey state of the art methods that allowed an AI-based early diagnosis and risk assessment for individual patients, using information that may include health records, genomic and wearable device data, medical imagery and online physician reviews. I will focus on methods that can be employed to forecast future events affecting a specific patient and serve to evaluate wearable device data and assist healthcare industry in undertaking a patient-focused data-driven preventive approach. Additionally, I introduce how machine-learning-based gamification techniques can be employed to motivate individual users to improve their health condition and achieve personalized challenges.
4. HEAD OF
THE PROJECT
Awarded with Marie Curie
individual fellowship.
Most renowned award in Europe.
Research Data Scientist for
German Government to assimilate
satellite data with a novel
Ensemble Learning Algorithm
Research Data Scientist at RIKEN to
assimilate HIMAWARI big data.
Working with the world-leading K-computer
Researcher Scientist at CERN,
the European Organization
for Nuclear Research,
one of the world's largest and
most respected centres
for scientific research.
ÁFRICA PERIÁÑEZ
BSc Physics --2001
MSc Theoretical Physics -- 2003
MSc String Theory -- 2006
PhD Mathematics -- 2015
Top 5 Data Scientist
across the company.
In charge of 40 countries,
covering Europe, Middle East
and Africa.
6. CHALLENGES
ELECTRONIC HEALTH RECORDS (EHR)
SENSOR DATA
TEXT
RADIOLOGY AND PATHOLOGY IMAGES
LABORATORY RESULTS: BLOOD TESTS AND EKGS
GENOMIC DATA
PATIENT HISTORIES
SPARSE
NOISY
HETEROGENEOUS
TIME-DEPENDENT
UNSTRUCTURED
COMPLEX
HIGH-DIMENSIONAL
BIOMEDICAL DATA
7. ARTERYS IMAGES
ARTIFICIAL INTELLIGENCE IS
TRANSFORMING DIAGNOSTIC IMAGING
Decision support for radiologist
Risk stratification
Automated image interpretation
IMAGE TYPES
Photographs
X-Rays
MRI’s
2D
3D
CLINICAL IMAGING
8. CLINICAL IMAGING
AI RESEARCH FOCUSES ON:
Cancer Diagnosis: e.g. skin cancer
“Dermatologist-level Classification of Skin Cancer with Deep Neural Networks”
Esteva et al. 2017 Nature
Neurology: ML uses the discharge timings
of neurons to control upper-limb prostheses
Farina et al. 2017
Cardiology: Diagnosis system based on cardiac imaging
Dilsizian et al. 2017
Other examples:
Ocular image data for cataract disease
Long et al. 2017
Detection of diabetic retinopathy through retinal fundus images
Gulshan et al. 2016
9. RISK PREDICTION
OF DIABETES
CONGESTIVE
HEART FAILURE
DIAGNOSIS
MEDICATIONS
LABORATORY TESTS
FREE-TEXT MEDICAL NOTES
ADMINISTRATIVE DATA
ICD-9 DIAGNOSIS CODES
STATIC DATA:
TEMPORAL DATA:
ELECTRONIC
HEALTH
RECORDS
EHR GROWTH
Administrative data include those that remain
unchanged during the entire course of a
clinical encounter (e.g., demographic data)
Data updated over time
(e.g., diagnoses and procedures)
30%
90%2017
2009
PATTERN
RECOGNITION IN
MULTIVARIATE
TIME SERIES OF
CLINICAL DATA
10. OMICS DATA INTEGRATIVE ANALYSIS OF MULTIMODAL - OMIC DATA
Base Pair
Novel Discovery
of Cancer Subtypes
Genomics
Integrative analysis of multi-omic data leads to an improved understanding of cancer mechanisms,
which in turn enables more precise classification of cancer subtypes
OMICS data are large
and high dimensional
CNN are applied to raw data to
capture internal structure of
DNA sequences and RNA
measurements
Molecular profiles:
Genomics
Transcriptomics
Epigenomics
Proteomics
Metabolomics
Proteomics
Metabolomics
Transcriptomics
Classification of cancer for gene expression
profiles Fakoor et al. 2013
Predict chromatin marks form DNA sequences
Zhou et al. 2015
Epigenomics
Nucleobases
Helix of
sugar-phosphates
Protein
Metabolite
O
O
OH
H3C
CH3
CH3
11. HEALTH
MONITORING
DEVICES
Biosensors (healthcare providers)
Respiration rate
Echocardiography
Clinical monitoring devices (vital signs)
Multiscale biological experiments:
- Genomic profiling to reveal
mutational landscapes
- Gene expression analysis
- Metabolomics experiments to find
relevant biomarkers
aptamers,
proteins,
antibodies,
quantum dots,
DNA
Measure Signal
Data processing
12. MOBILE AND WEARABLE
DEVICES
Sales of smartwatches
by 2021, representing
16% of total
wearable device sales
Global market
projection for mobile
health services
by 2020
Wearable devices
sales in 2017,
$30.5BN in
revenue in 2017
Number of smartphone
users worldwide
who have downloaded
health apps
500 Million310.4 Million$49.12 Billion81 Million Units
13. Heart rate
Distance traveled
Speed
Altitude
Calories consumed
Sleep patterns
Blood glucose records
Cardiac monitor data
Breathing rate
Stress level
Brain activity
Human activity recognition
to detect freezing of gait
in Parkinson disease patients
Hammerla et al. (2016)
Predict the quality of sleep
from physical activity
wearable data during
awake time
Sathyanarayana et al. (2016)
Analysis of
electroencephalogram
and local field potentials signals
Nurse et al. (2016)
DEDICATED DEVICES
SMARTWATCHES
WITH SENSORS
MOBILE HEALTH
APPS
MOBILE AND WEARABLE
DEVICES
14. Detection of heart rate anomalies
based on historical heartbeat patterns
Assessment of the risk percentage of
developing future cardiovascular diseases
Sensor data
preprocessing
Heartbeat
Pattern
Recognition
Machine Learning
Models
% Risk
Feature
extraction
F1,1
F1,2
F1,3
F X
Y
X
1,n
. . . . .
F2,1
F2,2
F2,3
F2,n
. . . . .
...
...
...
...
...
Fm,1
Fm,2
Fm,3
Fm,n
. . . . .
15. HEALTH DATA SCIENCE REVOLUTION
Climate, Allergies
Pollution, Crime
Environment Repository
DIAGNOSTIC ALERTS PREDICTIVE MODELS
ACTIONABLE RECOMMENDATIONS
DATA-DRIVEN CLINICAL TRIALS
Personal Health Repository Clinical Data Repository
Social Networks & Characteristics
Data interoperability using
health and clinical data
application program interface
Data transfer and exchange
according to PHI, HIPAA, HL-7
and other regulatory codes
Food, Walkability
Fitness, Sleep and Heart Rate Monitoring
Blood Pressure and Glucose Monitoring
Diet
Current Prescription and OTC Meds
Vital Signs, Lab Tests
Family History and Medications
Imaging Data
Multi-Omic Data
LOCATION SPECIFIC ELECTRONIC HEALTH RECORDHEALTH MONITORING DEVICES
16. HEALTH DATA SCIENCE REVOLUTION
INTEGRATING:
Patient-generated healthcare data
Existing EMRs, imaging
Biological and genetic data
ADDITIONALLY:
Real-time health-monitoring device data
Socioeconomic information
Local weather and environmental quality
Assess patient progression
from health to subclinical
diseases to clinically
significant pathological states
Predictive models to identify
causes for wellness or illness
18. Output
MACHINE LEARNING FOR HEALTH:
DEEP LEARNING
Machine learning can learn relationships
from the data without a prior definition
Activation Functions
Hyberbolic
tangent
(sigmoid)
g(z)=
exp (2z) -1
exp (2z) + 1
Deep learning is different from traditional
machine learning in how features are learnt
from the raw data
Deep learning methods possess multiple
levels of non-linear features, starting
with raw input and converting it
into a higher, more abstract level
Logistic
(sigmoid) g(z)= 1/(1+exp(-z))
19. MACHINE LEARNING FOR HEALTH:
ENSEMBLE LEARNING
Learning&Generalization
OUTPUT
Machine learning technique where
multiple learners are trained to solve
the same problem
Ordinary machine learning approaches
learn one hypothesis from training data.
Ensemble methods try to construct a set
of hypotheses and combine them
20. Diseases progress and change in a nondeterministic way:
temporal healthcare data
Long-term temporal dependencies:
record of illnesses and interventions
Modeling entire illness trajectory is important
EMRs offer precise timing of events
Records are episodic and irregular
Healthcare is non-Markovian due to long-term
dependencies
Survival models, irregular-time Bayesian networks
and LSTM networks can model healthcare
TEMPORAL DATA
AND
SEQUENTIAL
MODELING
21. IMPROVEMENT OF HEALTH OUTCOMES:
PERSONALIZED GAMIFICATION
Motivating behaviour change
for health and well-being
GAMIFICATION: the use of game design elements in non-game contexts
“Three quarters of all healthcare
costs in the US are attributable
to chronic diseases caused
by poor health behaviours.”
Johnson et al. 2008
Poor health
behaviour
INTRINSIC MOTIVATION
Satisfy basic psychological needs
EXTRINSIC MOTIVATION
Rewards or punishments
Behaviour change is crucial
to prevent disease
A main factor driving behaviour
change is individual’s motivation
22. IMPROVEMENT OF HEALTH OUTCOMES:
PERSONALIZED GAMIFICATION
The underlying idea of gamification
is to use the specific design features
or “motivation affordances” of
entertainment games in other systems
to make engagement with these more
motivating.
Johnson et al. 2016
US$
2.8 Billion
+500
Million +100k
Gamification
Market
People Using
Mobile Health Apps
Worldwide
Existing
Mobile Health
Apps
23. KATE SMITH
London, 27
IMPROVEMENT OF
HEALTH OUTCOMES:
PERSONALIZED
GAMIFICATION
POINTS
SCORES
BADGES
LEVELS
CHALLENGES
COMPETITIONS
SOCIAL FEEDBACK
RECOGNITION
LEADERBOARDS
TEAMS
CUSTOMIZABLE AVATARS
CUSTOMIZABLE ENVIRONMENT
HEALTH STATS
MY BADGES
WEEK’S CHALLENGES
WELLNESS SCORE
92%
J F M A M J J A S
Level
6
96% COMPLETE
25. INDIVIDUAL BEHAVIORAL PREDICTION:
- Personalized gamification to improve health outcomes
- Individual health challenges
TIME-DEPENDENT DATA:
- Individual identification and forecasting of future events
- Early detection of disease (sensor-data: e.g. heart disease)
DISEASE RISK ASSESSMENT
YOKOZUNA DATA
FOR HEALTH
WEARABLE
DATA
MONITORING
DEVICE
DATA
ENVIRONMENT
DATA
EHR
26. YOKOZUNA DATA PEER-REVIEWED ARTICLES
CHURN PREDICTION IN
MOBILE SOCIAL GAMES:
TOWARDS A COMPLETE
ASSESSMENT USING
SURVIVAL ENSEMBLES
Africa Perianez, Alain Saas,
Anna Guitart and ColinMagne
IEEE DSAA 2016 Montreal
DISCOVERING PLAYING
PATTERNS:
TIME SERIES CLUSTERING
OF FREE-TO-PLAY
GAME DATA
Anna Guitart
Africa Perianez and Alain Saas,
IEEE CIG 2016 Santorini
GAMES AND BIG DATA:
A SCALABLE
MULTI-DIMENSIONAL
CHURN PREDICTION
MODEL
Paul Bertens, Anna Guitart
and Africa Perianez, Alain Saas,
IEEE CIG 2017 New York
FORECASTING PLAYER
BEHAVIORAL DATA
AND SIMULATING
IN-GAME EVENTS
Anna Guitart, Pei Pei Chen,
Paul Bertens and Africa Perianez
IEEE FICC 2018 Singapore
27. CHALLENGE: MODEL TIME-TO-EVENT
Survival analysis focuses on
predicting time-to-event
SURVIVAL ANALYSIS is used in biology
and medicine to deal with this problem
ENSEMBLE LEARNING techniques
provide inherent in prediction results
Classical methods, like regression techniques,
are appropriate when all individuals have experienced the event
Censoring Problem: dataset with incomplete information
28. CHALLENGE: MODEL TIME-TO-EVENT
TWO APPROACHES:
Time-to-event as a binary classification
Time-to-event as a censored data problem
Survival analysis methods (e.g. Cox regression )
do not follow any particular statistical distribution: fitted from data
Fixed link between output and features:
significant efforts in terms of model selection and evaluation
2) Hothorn T. et al., 2006. Unbiased recursive partitioning: A conditional inference framework.
3) Cox D.R., 1972. Regression Models and Life-Tables.
SURVIVAL ANALYSIS
ONE MODEL: CONDITIONAL INFERENCE SURVIVAL ENSEMBLES
Deals with censoring
High accuracy due to ensemble learning
2
3
29. CONDITIONAL INFERENCE
SURVIVAL ENSEMBLES
Split the feature space
recursively
Based on a survival statistical
criterion the root node is divided
into two daughter nodes
Maximize the survival difference
between nodes
A single tree produces
unstable predictions
SURVIVAL TREES
Make use of hundreds of trees
Outstanding predictions
Conditional inference survival
ensembles use a Kaplan-Meier
function as splitting criterion
Robust information about
variable importance
Overfit is not present
Unbiased approach
CONDITIONAL SURVIVAL ENSEMBLES
30. CONDITIONAL INFERENCE
SURVIVAL ENSEMBLES
TWO STEP ALGORITHM:
1) The optimal split variable is selected:
relationship between covariates and response
2) The optimal split point is determined by
comparing two-sample linear statistics for all
possible partitions of the split variable
RANDOM SURVIVAL FORESTS
RSF are based on the original random forest algorithm
RSF favor variables with many possible split points
4) Ishwaran H. et al., 2008. Random Survival Forests.
5) Breiman L. et al., 2001. Random Forests.
4
5
32. INTERDISCIPLINARITY
All fields need Data Science but
Data Science also needs all scientific fields
E.g. Computer Science,
Biological & Medical Research,
Neuroscience, Statistical Learning,
Numerical Weather Prediction, Physics,
Mathematics, Epidemiology, Economy,
Climate, etc
CREATIVE PROBLEM-SOLVERS
TO LEAD THE DATA
SCIENCE REVOLUTION
YOKOZUNA data
33. YOKOZUNA DATA IN THE NEWS
“Game Makers Are Profiling
Players to Keep Them Hooked”
“The Gaming World Is
About to Change With
Artificial Intelligence”
“An algorithm that knows
when you’ll get bored with
your favourite mobile game”