Machine learning challenges and opportunities in biomedical research

Challenges and opportunities for machine
learning in biomedical research
Francisco Azuaje, PhD.
Luxembourg Institute of Health (LIH)
Presentation for the Data Science Luxembourg Meetup
12 September 2018

The Bioinformatics and Modelling Research Group (BIOMOD) @ LIH
F. Azuaje
(PI)
P. Nazarov
(Scientist)
L.C. Tranchevent
(Postdoc. Fellow)
T. Kaoma
(Bioinformatician)
S.Y Kim
(Bioinformatician)
A. Muller
(Bioinformatician)
K. Baum
(Postdoc. Fellow)
Y. Zhang
(PhD. Candidate)
Our key funders:
Mission
To enable patient-oriented research and biological understanding
through advanced computational research

Today’s presentation:
• Biomedical Data Science and ML: data landscape, trends in
approaches, crucial challenges.
• A selection of recent, attractive examples.
• The synergy between ML and network analysis, examples.
• Takeaways.

(Topol, 2014, Cell)
(Eisenstein, 2015, Nature)
Biomedical research: larger and diverse datasets
High inter-individual variabilityDatasets change in time and space High intra-individual variability

(Hutter and Zenklusen, Cell, 2018)
Key example of “big data” in cancer research

Typical questions answered by such datasets
“Fundamental” research “Applied” research
• What is the “behavior”,
“mechanism”…?
• Within a data layer, how are
samples or features related?
• How are different data layers
interrelated…?
• What if …?
• Why…?
• Risk assessment
• Diagnosis
• Prognosis
• Other clinical outcome prediction
• Prevention
• Drug targets
• Therapeutic strategies
Statistics and machine learning

Koohy, 2018, F1000 Research
ML in biomedical research
Global usage of ML techniques Trends of ML techniques ( !(PCA & LRM) )
Trends of ML techniques
SVM
RF
DNN

ML in biomedical research: Examples of model diversity and applications (1)
Large-scale phenotypic image analysis
Novelty/anomaly detection
Prediction of hard-to-discretize states
Image classification according to phenotypes
(e.g., here with Cell Profiler Analyst)
Smith et al., 2018, Cell Systems

Kermany et al., 2018, Cell
Medical Diagnoses with Transfer Image-Based Deep Learning
Retina images Retinal diseases
• DL system: Low classification errors, comparable to humans (on 1000
images)
• Strategy also successfully applied to analysis of chest X-ray images
• Potential “generalized” platform for image-based diagnoses (?)

Ambale-Venkatesh et al., Circ. Res., 2017
Random survival forests for predicting cardiovascular (CV) events
Variable importance for each of the 735 variables used in analysis
Variableimportanceis
measuredusingtheminimum
depthofthemaximalsubtree
• Accurate prediction of 6 CV outcomes (in
asymptomatic population).
• Subsets of predictive features for each
event.
• 12-year follow-up, multi-center, -ethnic,
wide age range.
Imaging, noninvasive
tests, questionnaires,
biomarker panels
Top-20 features

Shared, key challenges in the field
Heterogeneity: Data, events, states,
within and between individuals…
Data not always “big”: relative lack of
labelled data, curse of dimensionality
Data: multi-layered, hierarchical
For same data type/layer: multiple
measurement platforms

Shared, key challenges in the field (2)
Interpretability, understandability:
Global and local, novelty and consistency
with prior knowledge
Reproducibility:
Crucial requirement
“Gold standards”/”ground truth”:
Lack, limitations
Complexity of pattern recurrence,
regularities

Addressing key challenges through combination of ML and
biological network models
Why networks?
• Networks are intuitive and biologically-meaningful representations of
biological data
• Networks can be used to encode and visualize data, and more
importantly: to extract features and make predictions about the data
• Network-based models can address different predictive modelling
challenges, including: multi-modal/-layered data analysis applications
and interpretable models

A biological network can be represented as a graph that is
biologically meaningful
From: McGillivray et al., 2018, Annu. Rev.
Biomed. Data Sci.

Addressing key challenges through combination of ML and
biological network models (cont.)
Data Processed
data
Graphs
Prediction
modelsFeatures
Our strategy:

Combining ML and biological networks:
Two application examples in cancer research
Application examples from SINGALUN project:
New tools for the prediction of influential nodes and links in multi-level cancer-related networks
L.C. Tranchevent
(Postdoc. Fellow)
PI: J.C. RajapaksePI: F. Azuaje

Using biological networks and machine learning for multi-omics
patient stratification
Hypothesis: information encoded in graphs is biologically relevant.
Protein-protein network
Jeong et al., Nature (2001)
Patient similarity network

Using biological networks and machine learning for multi-omics
patient stratification (cont.)
Global strategy Examples of centrality features
• 4 categories of topological features: Centrality (12 measures), modularity
features (from 7 to 153 features), diffusion features (1000), Node2Vec-
derived features (256).
• Each category generates a model
• Integrated models (weighted voting) also investigated

Application example (1): neuroblastoma multi-omic datasets
from the CAMDA challenge
Dataset 1 (498 patients,
2 omic datasets)
Dataset 2 (142 patients,
3 omic datasets)
Focus on Data 1
6,300 classification models
• Models based on graph topology features outperform models based on “classical” approach
• Among topological features, centrality metrics are most predictive (followed by diffusion-based features)

Application example (2): Neuroblastoma multi-omics datasets
from the CAMDA challenge, a deep learning approach*
Global strategy Algorithm Parameters Balanced
accuracy
Death from disease, Fischer-M
DNN h=[8,8,8,2], o=Adam, lr=1e-3, d=0.3 87.3% *
SVM t=RBF, c=64, g=0.25 75.4%
RF n=100 75.1% *
Disease progression, Fischer
DNN h=[4,2,2,2], o=Adam, lr=1e-3, d=0.3 84.7% *
SVM t=RBF, c=16, g=0.0625 81.8%
RF n=100 78.1% *• Network features from each dataset: Centrality (12), modularity
(30 to 47) features.
• Models based on each feature category, and their combination
• Data: 498 patients (2 omic datasets, gene expression data)
• Training (50% of total data), validation and test datasets
• DNNs: multiple architectures, Rectified Linear Units (ReLU),
Softmax function (2 outputs)
Prediction performance on test
dataset (top models)
Top DNN: Input features are graph centrality measures
Fischer-M: 1 dataset only (microarrays)
Fischer: 2 datasets (microarrays + RNA-Seq)
*Article in preparation

Global strategy
• Additional Independent dataset (Versteeg, 88 patients)
(microarray dataset)
• Network features: Centrality and modularity features
concatenated
• 3000 DNNs / classification task
• DNNs: Rectified Linear Units (ReLU), Softmax function (2
outputs)
Train Test DNN SVM RF
Death from disease, centralities
Fischer-M
Fischer-M 87.3% 75.4% 75.1%
Fischer-R 82.1% 53.5% 66.8%
Versteeg 75.0% 53.3% 67.5%
Fischer-R
Fischer-R 85.8% 66.0% 62.4%
Fischer-M 81.5% 75.4% 61.2%
Versteeg 70.8% 68.3% 67.5%
Further evaluation using independent datasets
Deep neural nets using graph centrality- based
input features offer best prediction performance

Takeaways:
• Many ML challenges in BM research are shared by different application domains, but
this field poses its unique challenges.
• ML in BM research will continue advancing driven by: more data, new expectations
and emerging questions.
• Supervised learning, including e.g., deep learning, will meet many of these needs,
however: unbiased exploration, hypothesis generation and interpretation (incl.
“mechanistic”) are crucial.
• The use of graphs/networks to represent data, extract predictive features and
integrate datasets together with ML will continue enabling new discoveries and
applications closer to the patient.

Thanks to:
Funding from:
Bioinformatics and Modelling Research
Group (BIOMOD)
Our research partners in Luxembourg and abroad

Machine learning challenges and opportunities in biomedical research

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Machine learning challenges and opportunities in biomedical research

Ähnlich wie Machine learning challenges and opportunities in biomedical research (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Machine learning challenges and opportunities in biomedical research