SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Challenges and opportunities for machine
learning in biomedical research
Francisco Azuaje, PhD.
Luxembourg Institute of Health (LIH)
Presentation for the Data Science Luxembourg Meetup
12 September 2018
Graphics by M. Fraiture
The Bioinformatics and Modelling Research Group (BIOMOD) @ LIH
F. Azuaje
(PI)
P. Nazarov
(Scientist)
L.C. Tranchevent
(Postdoc. Fellow)
T. Kaoma
(Bioinformatician)
S.Y Kim
(Bioinformatician)
A. Muller
(Bioinformatician)
K. Baum
(Postdoc. Fellow)
Y. Zhang
(PhD. Candidate)
Our key funders:
Mission
To enable patient-oriented research and biological understanding
through advanced computational research
Today’s presentation:
• Biomedical Data Science and ML: data landscape, trends in
approaches, crucial challenges.
• A selection of recent, attractive examples.
• The synergy between ML and network analysis, examples.
• Takeaways.
(Topol, 2014, Cell)
(Eisenstein, 2015, Nature)
Biomedical research: larger and diverse datasets
High inter-individual variabilityDatasets change in time and space High intra-individual variability
(Hutter and Zenklusen, Cell, 2018)
Key example of “big data” in cancer research
Typical questions answered by such datasets
“Fundamental” research “Applied” research
• What is the “behavior”,
“mechanism”…?
• Within a data layer, how are
samples or features related?
• How are different data layers
interrelated…?
• What if …?
• Why…?
• Risk assessment
• Diagnosis
• Prognosis
• Other clinical outcome prediction
• Prevention
• Drug targets
• Therapeutic strategies
Statistics and machine learning
Koohy, 2018, F1000 Research
ML in biomedical research
Global usage of ML techniques Trends of ML techniques ( !(PCA & LRM) )
Trends of ML techniques
SVM
RF
DNN
ML in biomedical research: Examples of model diversity and applications (1)
Large-scale phenotypic image analysis
Novelty/anomaly detection
Prediction of hard-to-discretize states
Image classification according to phenotypes
(e.g., here with Cell Profiler Analyst)
Smith et al., 2018, Cell Systems
ML in biomedical research: Examples of model diversity and applications (2)
Kermany et al., 2018, Cell
Medical Diagnoses with Transfer Image-Based Deep Learning
Retina images Retinal diseases
• DL system: Low classification errors, comparable to humans (on 1000
images)
• Strategy also successfully applied to analysis of chest X-ray images
• Potential “generalized” platform for image-based diagnoses (?)
ML in biomedical research: Examples of model diversity and applications (3)
Ambale-Venkatesh et al., Circ. Res., 2017
Random survival forests for predicting cardiovascular (CV) events
Variable importance for each of the 735 variables used in analysis
Variableimportanceis
measuredusingtheminimum
depthofthemaximalsubtree
• Accurate prediction of 6 CV outcomes (in
asymptomatic population).
• Subsets of predictive features for each
event.
• 12-year follow-up, multi-center, -ethnic,
wide age range.
Imaging, noninvasive
tests, questionnaires,
biomarker panels
Top-20 features
Shared, key challenges in the field
Heterogeneity: Data, events, states,
within and between individuals…
Data not always “big”: relative lack of
labelled data, curse of dimensionality
Data: multi-layered, hierarchical
For same data type/layer: multiple
measurement platforms
Shared, key challenges in the field (2)
Interpretability, understandability:
Global and local, novelty and consistency
with prior knowledge
Reproducibility:
Crucial requirement
“Gold standards”/”ground truth”:
Lack, limitations
Complexity of pattern recurrence,
regularities
Addressing key challenges through combination of ML and
biological network models
Why networks?
• Networks are intuitive and biologically-meaningful representations of
biological data
• Networks can be used to encode and visualize data, and more
importantly: to extract features and make predictions about the data
• Network-based models can address different predictive modelling
challenges, including: multi-modal/-layered data analysis applications
and interpretable models
A biological network can be represented as a graph that is
biologically meaningful
From: McGillivray et al., 2018, Annu. Rev.
Biomed. Data Sci.
Addressing key challenges through combination of ML and
biological network models (cont.)
Data Processed
data
Graphs
Prediction
modelsFeatures
Our strategy:
Combining ML and biological networks:
Two application examples in cancer research
Application examples from SINGALUN project:
New tools for the prediction of influential nodes and links in multi-level cancer-related networks
L.C. Tranchevent
(Postdoc. Fellow)
PI: J.C. RajapaksePI: F. Azuaje
Using biological networks and machine learning for multi-omics
patient stratification
Hypothesis: information encoded in graphs is biologically relevant.
Protein-protein network
Jeong et al., Nature (2001)
Patient similarity network
Using biological networks and machine learning for multi-omics
patient stratification (cont.)
Global strategy Examples of centrality features
• 4 categories of topological features: Centrality (12 measures), modularity
features (from 7 to 153 features), diffusion features (1000), Node2Vec-
derived features (256).
• Each category generates a model
• Integrated models (weighted voting) also investigated
Application example (1): neuroblastoma multi-omic datasets
from the CAMDA challenge
Dataset 1 (498 patients,
2 omic datasets)
Dataset 2 (142 patients,
3 omic datasets)
Focus on Data 1
6,300 classification models
• Models based on graph topology features outperform models based on “classical” approach
• Among topological features, centrality metrics are most predictive (followed by diffusion-based features)
Application example (2): Neuroblastoma multi-omics datasets
from the CAMDA challenge, a deep learning approach*
Global strategy Algorithm Parameters Balanced
accuracy
Death from disease, Fischer-M
DNN h=[8,8,8,2], o=Adam, lr=1e-3, d=0.3 87.3% *
SVM t=RBF, c=64, g=0.25 75.4%
RF n=100 75.1% *
Disease progression, Fischer
DNN h=[4,2,2,2], o=Adam, lr=1e-3, d=0.3 84.7% *
SVM t=RBF, c=16, g=0.0625 81.8%
RF n=100 78.1% *• Network features from each dataset: Centrality (12), modularity
(30 to 47) features.
• Models based on each feature category, and their combination
• Data: 498 patients (2 omic datasets, gene expression data)
• Training (50% of total data), validation and test datasets
• DNNs: multiple architectures, Rectified Linear Units (ReLU),
Softmax function (2 outputs)
Prediction performance on test
dataset (top models)
Top DNN: Input features are graph centrality measures
Fischer-M: 1 dataset only (microarrays)
Fischer: 2 datasets (microarrays + RNA-Seq)
*Article in preparation
Global strategy
• Additional Independent dataset (Versteeg, 88 patients)
(microarray dataset)
• Network features: Centrality and modularity features
concatenated
• 3000 DNNs / classification task
• DNNs: Rectified Linear Units (ReLU), Softmax function (2
outputs)
Train Test DNN SVM RF
Death from disease, centralities
Fischer-M
Fischer-M 87.3% 75.4% 75.1%
Fischer-R 82.1% 53.5% 66.8%
Versteeg 75.0% 53.3% 67.5%
Fischer-R
Fischer-R 85.8% 66.0% 62.4%
Fischer-M 81.5% 75.4% 61.2%
Versteeg 70.8% 68.3% 67.5%
Further evaluation using independent datasets
Deep neural nets using graph centrality- based
input features offer best prediction performance
Takeaways:
• Many ML challenges in BM research are shared by different application domains, but
this field poses its unique challenges.
• ML in BM research will continue advancing driven by: more data, new expectations
and emerging questions.
• Supervised learning, including e.g., deep learning, will meet many of these needs,
however: unbiased exploration, hypothesis generation and interpretation (incl.
“mechanistic”) are crucial.
• The use of graphs/networks to represent data, extract predictive features and
integrate datasets together with ML will continue enabling new discoveries and
applications closer to the patient.
Thanks to:
Funding from:
Bioinformatics and Modelling Research
Group (BIOMOD)
Our research partners in Luxembourg and abroad

Weitere ähnliche Inhalte

Was ist angesagt?

AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinarPistoia Alliance
 
Cancer detection using data mining
Cancer detection using data miningCancer detection using data mining
Cancer detection using data miningRishabhKumar283
 
Csi poster
Csi posterCsi poster
Csi posterISSIP
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
 
a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaahmad abdelhafeez
 
Systems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicineSystems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicineimprovemed
 
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining TechniquesA Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining Techniquesahmad abdelhafeez
 
NetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno SchwikowskiNetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno SchwikowskiAlexander Pico
 
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET Journal
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
 
The XNAT imaging informatics platform
The XNAT imaging informatics platformThe XNAT imaging informatics platform
The XNAT imaging informatics platformimgcommcall
 
A Review of Intelligent Agent Systems in Animal Health Care
A Review of Intelligent Agent Systems in Animal Health CareA Review of Intelligent Agent Systems in Animal Health Care
A Review of Intelligent Agent Systems in Animal Health CareIJCSIS Research Publications
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Don Pellegrino
 
Computational technique
Computational techniqueComputational technique
Computational techniqueNainaKhan28
 
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
 
ICMLDA_poster.doc
ICMLDA_poster.docICMLDA_poster.doc
ICMLDA_poster.docbutest
 

Was ist angesagt? (20)

AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
 
Cancer detection using data mining
Cancer detection using data miningCancer detection using data mining
Cancer detection using data mining
 
Csi poster
Csi posterCsi poster
Csi poster
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool weka
 
Systems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicineSystems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicine
 
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining TechniquesA Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
 
NetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno SchwikowskiNetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno Schwikowski
 
nicolau_BioSketch
nicolau_BioSketchnicolau_BioSketch
nicolau_BioSketch
 
BioInformatics Software
BioInformatics SoftwareBioInformatics Software
BioInformatics Software
 
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer Prediction
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
The XNAT imaging informatics platform
The XNAT imaging informatics platformThe XNAT imaging informatics platform
The XNAT imaging informatics platform
 
A Review of Intelligent Agent Systems in Animal Health Care
A Review of Intelligent Agent Systems in Animal Health CareA Review of Intelligent Agent Systems in Animal Health Care
A Review of Intelligent Agent Systems in Animal Health Care
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...
 
C0344023028
C0344023028C0344023028
C0344023028
 
Classification ANN
Classification ANNClassification ANN
Classification ANN
 
Computational technique
Computational techniqueComputational technique
Computational technique
 
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
 
ICMLDA_poster.doc
ICMLDA_poster.docICMLDA_poster.doc
ICMLDA_poster.doc
 

Ähnlich wie Machine learning challenges and opportunities in biomedical research

Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyMaté Ongenaert
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Joel Saltz
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeJoel Saltz
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1Double Check ĆŐNSULTING
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillanceJoel Saltz
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineJoel Saltz
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
What Makes Transfer learning Work for Medical Images
What Makes Transfer learning Work for Medical Images What Makes Transfer learning Work for Medical Images
What Makes Transfer learning Work for Medical Images MithunjhaAnandakumar
 
Introduction to systems medicine
Introduction to systems medicineIntroduction to systems medicine
Introduction to systems medicineimprovemed
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceOla Spjuth
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...IJTET Journal
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeJoel Saltz
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management inscit2006
 
Pathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and MethodsPathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and Methodsimgcommcall
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009Ian Foster
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challengesinside-BigData.com
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics PosterMichael Atkins
 
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...Damian R. Mingle, MBA
 

Ähnlich wie Machine learning challenges and opportunities in biomedical research (20)

NTU-2019
NTU-2019NTU-2019
NTU-2019
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase Change
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer Surveillance
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision Medicine
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
What Makes Transfer learning Work for Medical Images
What Makes Transfer learning Work for Medical Images What Makes Transfer learning Work for Medical Images
What Makes Transfer learning Work for Medical Images
 
Introduction to systems medicine
Introduction to systems medicineIntroduction to systems medicine
Introduction to systems medicine
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-Science
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase Change
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
 
Pathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and MethodsPathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and Methods
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics Poster
 
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
 

Kürzlich hochgeladen

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Kürzlich hochgeladen (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Machine learning challenges and opportunities in biomedical research

  • 1. Challenges and opportunities for machine learning in biomedical research Francisco Azuaje, PhD. Luxembourg Institute of Health (LIH) Presentation for the Data Science Luxembourg Meetup 12 September 2018
  • 2. Graphics by M. Fraiture
  • 3. The Bioinformatics and Modelling Research Group (BIOMOD) @ LIH F. Azuaje (PI) P. Nazarov (Scientist) L.C. Tranchevent (Postdoc. Fellow) T. Kaoma (Bioinformatician) S.Y Kim (Bioinformatician) A. Muller (Bioinformatician) K. Baum (Postdoc. Fellow) Y. Zhang (PhD. Candidate) Our key funders: Mission To enable patient-oriented research and biological understanding through advanced computational research
  • 4. Today’s presentation: • Biomedical Data Science and ML: data landscape, trends in approaches, crucial challenges. • A selection of recent, attractive examples. • The synergy between ML and network analysis, examples. • Takeaways.
  • 5. (Topol, 2014, Cell) (Eisenstein, 2015, Nature) Biomedical research: larger and diverse datasets High inter-individual variabilityDatasets change in time and space High intra-individual variability
  • 6. (Hutter and Zenklusen, Cell, 2018) Key example of “big data” in cancer research
  • 7. Typical questions answered by such datasets “Fundamental” research “Applied” research • What is the “behavior”, “mechanism”…? • Within a data layer, how are samples or features related? • How are different data layers interrelated…? • What if …? • Why…? • Risk assessment • Diagnosis • Prognosis • Other clinical outcome prediction • Prevention • Drug targets • Therapeutic strategies Statistics and machine learning
  • 8. Koohy, 2018, F1000 Research ML in biomedical research Global usage of ML techniques Trends of ML techniques ( !(PCA & LRM) ) Trends of ML techniques SVM RF DNN
  • 9. ML in biomedical research: Examples of model diversity and applications (1) Large-scale phenotypic image analysis Novelty/anomaly detection Prediction of hard-to-discretize states Image classification according to phenotypes (e.g., here with Cell Profiler Analyst) Smith et al., 2018, Cell Systems
  • 10. ML in biomedical research: Examples of model diversity and applications (2) Kermany et al., 2018, Cell Medical Diagnoses with Transfer Image-Based Deep Learning Retina images Retinal diseases • DL system: Low classification errors, comparable to humans (on 1000 images) • Strategy also successfully applied to analysis of chest X-ray images • Potential “generalized” platform for image-based diagnoses (?)
  • 11. ML in biomedical research: Examples of model diversity and applications (3) Ambale-Venkatesh et al., Circ. Res., 2017 Random survival forests for predicting cardiovascular (CV) events Variable importance for each of the 735 variables used in analysis Variableimportanceis measuredusingtheminimum depthofthemaximalsubtree • Accurate prediction of 6 CV outcomes (in asymptomatic population). • Subsets of predictive features for each event. • 12-year follow-up, multi-center, -ethnic, wide age range. Imaging, noninvasive tests, questionnaires, biomarker panels Top-20 features
  • 12. Shared, key challenges in the field Heterogeneity: Data, events, states, within and between individuals… Data not always “big”: relative lack of labelled data, curse of dimensionality Data: multi-layered, hierarchical For same data type/layer: multiple measurement platforms
  • 13. Shared, key challenges in the field (2) Interpretability, understandability: Global and local, novelty and consistency with prior knowledge Reproducibility: Crucial requirement “Gold standards”/”ground truth”: Lack, limitations Complexity of pattern recurrence, regularities
  • 14. Addressing key challenges through combination of ML and biological network models Why networks? • Networks are intuitive and biologically-meaningful representations of biological data • Networks can be used to encode and visualize data, and more importantly: to extract features and make predictions about the data • Network-based models can address different predictive modelling challenges, including: multi-modal/-layered data analysis applications and interpretable models
  • 15. A biological network can be represented as a graph that is biologically meaningful From: McGillivray et al., 2018, Annu. Rev. Biomed. Data Sci.
  • 16. Addressing key challenges through combination of ML and biological network models (cont.) Data Processed data Graphs Prediction modelsFeatures Our strategy:
  • 17. Combining ML and biological networks: Two application examples in cancer research Application examples from SINGALUN project: New tools for the prediction of influential nodes and links in multi-level cancer-related networks L.C. Tranchevent (Postdoc. Fellow) PI: J.C. RajapaksePI: F. Azuaje
  • 18. Using biological networks and machine learning for multi-omics patient stratification Hypothesis: information encoded in graphs is biologically relevant. Protein-protein network Jeong et al., Nature (2001) Patient similarity network
  • 19. Using biological networks and machine learning for multi-omics patient stratification (cont.) Global strategy Examples of centrality features • 4 categories of topological features: Centrality (12 measures), modularity features (from 7 to 153 features), diffusion features (1000), Node2Vec- derived features (256). • Each category generates a model • Integrated models (weighted voting) also investigated
  • 20. Application example (1): neuroblastoma multi-omic datasets from the CAMDA challenge Dataset 1 (498 patients, 2 omic datasets) Dataset 2 (142 patients, 3 omic datasets) Focus on Data 1 6,300 classification models • Models based on graph topology features outperform models based on “classical” approach • Among topological features, centrality metrics are most predictive (followed by diffusion-based features)
  • 21. Application example (2): Neuroblastoma multi-omics datasets from the CAMDA challenge, a deep learning approach* Global strategy Algorithm Parameters Balanced accuracy Death from disease, Fischer-M DNN h=[8,8,8,2], o=Adam, lr=1e-3, d=0.3 87.3% * SVM t=RBF, c=64, g=0.25 75.4% RF n=100 75.1% * Disease progression, Fischer DNN h=[4,2,2,2], o=Adam, lr=1e-3, d=0.3 84.7% * SVM t=RBF, c=16, g=0.0625 81.8% RF n=100 78.1% *• Network features from each dataset: Centrality (12), modularity (30 to 47) features. • Models based on each feature category, and their combination • Data: 498 patients (2 omic datasets, gene expression data) • Training (50% of total data), validation and test datasets • DNNs: multiple architectures, Rectified Linear Units (ReLU), Softmax function (2 outputs) Prediction performance on test dataset (top models) Top DNN: Input features are graph centrality measures Fischer-M: 1 dataset only (microarrays) Fischer: 2 datasets (microarrays + RNA-Seq) *Article in preparation
  • 22. Global strategy • Additional Independent dataset (Versteeg, 88 patients) (microarray dataset) • Network features: Centrality and modularity features concatenated • 3000 DNNs / classification task • DNNs: Rectified Linear Units (ReLU), Softmax function (2 outputs) Train Test DNN SVM RF Death from disease, centralities Fischer-M Fischer-M 87.3% 75.4% 75.1% Fischer-R 82.1% 53.5% 66.8% Versteeg 75.0% 53.3% 67.5% Fischer-R Fischer-R 85.8% 66.0% 62.4% Fischer-M 81.5% 75.4% 61.2% Versteeg 70.8% 68.3% 67.5% Further evaluation using independent datasets Deep neural nets using graph centrality- based input features offer best prediction performance
  • 23. Takeaways: • Many ML challenges in BM research are shared by different application domains, but this field poses its unique challenges. • ML in BM research will continue advancing driven by: more data, new expectations and emerging questions. • Supervised learning, including e.g., deep learning, will meet many of these needs, however: unbiased exploration, hypothesis generation and interpretation (incl. “mechanistic”) are crucial. • The use of graphs/networks to represent data, extract predictive features and integrate datasets together with ML will continue enabling new discoveries and applications closer to the patient.
  • 24. Thanks to: Funding from: Bioinformatics and Modelling Research Group (BIOMOD) Our research partners in Luxembourg and abroad