SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
MACHINE LEARNING FOR PRECLINICAL
RESEARCH
Paul Agapow <p.agapow@imperial.ac.uk>

Data Science Institute, Imperial College London
Adv. Machine Learning & AI for Drug Discovery & Development (Berlin, June 2018)
BACKGROUND & DISCLOSURE
➤ Data Science Institute (Imperial
College London)
➤ Novel & advanced computation over
large rich biomedical datasets for
translational research & precision
medicine
➤ Patient subtype discovery &
mechanistic insight
➤ Scientific Advisor to PangaeaData.ai
➤ Big Data is a problem
➤ Methodology is a problem
➤ Truth is a problem
➤ But maybe we can do something about it
“Nice training set. Where’s your data?
- An Analyst
BIOMEDICAL BIG DATA IS USUALLY NOT BIG (ENOUGH)
➤ Average trial size on
ClinicalTrials.gov < 100
➤ Average #samples per GEO
dataset < 100
➤ Average GWAS cohort size
~9000 (median ~2500)
➤ 1,064 ICU admissions for flu in
UK 2016/2017 season
➤ Curse of dimensionality
➤ Deep learning requires
“thousands” of samples for
training (at least p2?)
➤ GWAS needs 3K+ for large
effects, 10K or more for small
effects …
➤ Sub-populations & rare diseases
will be smaller
VS
MAKE BIGGER DATASETS
➤ “Allow” reuse & combining not “build”
➤ FAIR
➤ Use standards like CDISC, HPO …
➤ eTRIKS
➤ Europe’s largest public-private
initiative (pharma, academic, SME,
other)
➤ Data intensive translational research
➤ Data catalog of ~70 studies
➤ Sharing data (standards, starter kit)
WE NEED MORE ETL
➤ Too damn slow and expensive
➤ Tools are poor
➤ Humans are inconsistent
➤ Standards are complex
➤ Harmonisation by ML is the only
answer
➤ Learn from data examples
➤ Corrected by humans
➤ “Discover” schema if need be
1
2
3
4
1
2
3
4
Text data
Tabular data
§ Frequent Pattern Mining-Growth Algorithms to
determine schema association rules
§ Word2Vec to condense information of text sequence and
context
§ Graph-Theoretical Algorithms to determine logical
sequences, followers, associations, matchings
§ Decision Trees, Neural Nets and Support Vector
Machines for training the model
§ Custom Algorithms to prepare data and check data quality
Pre-classified
data and master
data mappingsData
extractor
Data
extractor
From PangaeaData.AI
“On Big Data, data collection biases are always
larger than statistical uncertainty
-Daniel Himmelstein
THE SIGNAL TO NOISE RATIO IS POOR
➤ Sampling bias
➤ P-hacking
➤ Garden of forking paths
➤ Reversion to mean
➤ Multiple hypothesis testing
➤ False discovery
➤ P-values
➤ Which method is best?
➤ Omnigenics (every gene effects every
other gene)
EXAMPLE: U-BIOPRED
➤ Unbiased BIOmarkers in PREDiction of
respiratory disease outcomes
➤ 900+ patients, 16 clinical centres +
other studies combined via standards
➤ Outputs:
➤ Analyses largely on small subsets
(~100)
➤ Subtyping of asthmatics
➤ 40+ academic publications
THE REALITY OF DEEP LEARNING
➤ Deep learning is still in progress
➤ Usually insufficient (good labelled)
data
➤ Interpretability issues
➤ Legal & ethical issues, federated
analysis
➤ Tells you what you’ve told it
➤ Bias towards images
➤ For now …
DEEP LEARNING WITH LESS DATA
➤ Pre-training (data without labels)
➤ Initial training with mediocre data
➤ Adapt
➤ Transfer learning (labels / output changes)
➤ Domain adaptation (data / input changes)
➤ Data augmentation
➤ Interpretability coming slowly (LIME)
Dielman 2015
“80% of the time, you can get 80% of the way
with a simple decision tree.
- Doug Mcilwraith (paraphrased)
EXAMPLE: TEXT CLASSIFICATION FOR SYSTEMATIC REVIEWS
➤ Aim: find similar or related
publications within corpus
➤ Actual aim: find which
which method of text
classication is
“best” (Validation)
➤ Data: 15 Drug Control
Reviews & Neuropathic
Pain dataset
➤ Classify with random forest,
naive bayes, SVM & CNNs
Conclusion
Dataset WSS Classifier Dataset WSS Classifier
ACE Inhibitors 0.26 SVM NSAIDS 0.14 SVM
ADHD 0.35 MNB Opioids 0.23 SVM
Antihistamines 0.19 MNB Oral
Hypoglycemics
0.21 SVM
Atypical
Antipsychotics
0.12 SVM PPI 0.17 SVM
Beta Blockers 0.13 SVM Skeletal Muscle
Relaxants
0.21 SVM
CCB 0.21 SVM Statins 0.19 SVM
Estrogen 0.25 SVM Triptans 0.22 SVM
Neuropathic Pain 0.61 CNN Urinary
Incontinence
0.25 SVM
EXAMPLE: ASTHMA ENDOTYPING
➤ Asthma is highly heterogenous
➤ Symptoms
➤ Response to interventions
➤ Multiple mechanisms
➤ 3 or 4 or 7 clusters …
➤ Carefully curated data from U-
BIOPRED (~100)
➤ Analyse “smart”: use appropriate
analysis
Wiki Commons
MULTI- OR INTEGRATED OMICS
➤ Why?
➤ One way to get more data
➤ Statistical power
➤ Multiple defects required to drive
endogenous disease
➤ Multiple “views” on condition
➤ How?
➤ Cluster / network individual data
layers
➤ Fuse together for consensus
Nemutlu 2012
ASTHMA ENDOTYPES
➤ (Validate your methods)
➤ Use a variety of clustering approaches
over asthma cohort ‘omics data
(bayesian, spectral, iCluster)
➤ Use multi-omics approaches (SNF,
NNMF)
➤ Assess agreement / coherence
➤ Validate in pathways, in other cohorts
and in other data types
KNOWLEDGE GRAPHS
➤ Much effort being spent in building
them but:
➤ What are they for?
➤ Facts aren’t just facts
➤ “Relationships” need to be rich but
loose
➤ Schema-less databases need schema
➤ Graphs may not be the right tool
Meng Wang, 2017
KNOWLEDGE GRAPHS NEED CONTEXT
➤ Aim: extract biological relationships from
publications to build asthma knowledge
base
➤ Domain expert time is prohibitive
➤ Use previous efforts as training
➤ OpenBEL (biological expression
language)
➤ Wide range of relationships & entities
➤ Grakn
➤ Allows hyper-relationships &
inheritance
CONCLUSIONS
➤ Big biomedical data is often not big, but we can make it bigger
➤ But even big data is not without its problems
➤ Sometimes [Big | Deep | Advanced] approaches are useful, sometimes not: choose
wisely
➤ Trust but verify
“Success in the pre-clinical arena will come from
carefully curated data, melding together disparate
data sources & types, careful building of large
datasets through consortia & alliances followed
by appropriate use of machine learning and
validated at the bench or in the clinic.
THANKS
➤ Data Science Institute, ICL
➤ Fayzal Ghantiwala (Bloomberg)
➤ Nazanin Zounemat Kermani (ICL)
➤ Mansoor Saqi (ICL / KCL)
➤ Romain Guédon (Nantes)
➤ Yike Guo (ICL)
➤ eTRIKS consortium
➤ U-BIOPRED consortium
MLMH2018	-	KDD	Workshop	on	Machine	
Learning	for	Medicine	and	Healthcare	
August	20,	2018,	London,	UK	
Topics	of	interest:	
•  Data	Standards	for	Translational	
Medicine	Informatics	
•  Analysis	of	large	scale	electronic	
health	records	or	patient-
generated	health	data	records	
•  Visualisation	of	complex	and	
dynamic	biomedical	networks	
•  Disease	Subtype	Discovery	for	
Precision	Medicine	
•  Interpretable	Machine	Learning	for	
biomedicine	and	healthcare	
•  Deep	learning	for	biomedicine	
Important	Dates	
•  Submission	deadline:	
	May	25,	2018	
•  Notification	accept:		
June	8,	2018	
•  Workshop	date:		
August	8,	2018	
Meet	our	Panel!	
T.	Roy	(Ph.D),	University	of	
Southampton,	UK		
A.	Teredesai	(PhD),	University	of	
Washington,	Tacoma	
S.	Wagers	(MD),	CEO/Founder	
BioSci	Consulting,	Belgium	
Join	us	during	the	KDD	Health	Day!	
	
	
	
Win	IBM	$1,000	travel	grant	for	best	
selected	student	paper!	
	
Follow	us!	
https://mlmhworkshop.github.io/mlmh-2018	
Twitter:	
Contact	us:	
mlmhworkshop@googlegroups.com	
	
Organizers:	
M.	Saqi,	Imperial	College	London,	UK	
P.	Chakraborty,	IBM	Research,	USA	
I.	Balaur,	EISBM,	Lyon,	France	
P.	Agapow,	Imperial	College	London,	UK	
S.	Wagers,	BioSci	Consulting,	Belgium	
P.Y.	S.	Hsueh,	IBM	Research,	USA	
F.	Rahmanian,	Geneia,	USA	
M.A.	Ahmad,	Kensci	Inc.	and	University	of	
Washington	-	Tacoma,	USA

Weitere ähnliche Inhalte

Was ist angesagt?

Filling the gaps in translational research
Filling the gaps in translational researchFilling the gaps in translational research
Filling the gaps in translational researchPaul Agapow
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Ewout Steyerberg
 
Data science in health care
Data science in health careData science in health care
Data science in health careChetan Khanzode
 
Open science LMU session contribution E Steyerberg 2jul20
Open science LMU session contribution E Steyerberg 2jul20Open science LMU session contribution E Steyerberg 2jul20
Open science LMU session contribution E Steyerberg 2jul20Ewout Steyerberg
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowEagle Genomics
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...BenVanCalster
 
AI applications in life sciences - drug development
AI applications in life sciences - drug developmentAI applications in life sciences - drug development
AI applications in life sciences - drug developmentJayanthi Repalli, PhD
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareAseda Owusua Addai-Deseh
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEwout Steyerberg
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Matthieu Schapranow
 
Bias in covid 19 models
Bias in covid 19 modelsBias in covid 19 models
Bias in covid 19 modelsLaure Wynants
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyondMaarten van Smeden
 
Journal for Clinical Studies: Close Cooperation Between Data Management and B...
Journal for Clinical Studies: Close Cooperation Between Data Management and B...Journal for Clinical Studies: Close Cooperation Between Data Management and B...
Journal for Clinical Studies: Close Cooperation Between Data Management and B...KCR
 
Machine learning in medicine: calm down
Machine learning in medicine: calm downMachine learning in medicine: calm down
Machine learning in medicine: calm downBenVanCalster
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great againBenVanCalster
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck finalPistoia Alliance
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AIStr-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AIBenVanCalster
 

Was ist angesagt? (20)

Filling the gaps in translational research
Filling the gaps in translational researchFilling the gaps in translational research
Filling the gaps in translational research
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
 
Data science in health care
Data science in health careData science in health care
Data science in health care
 
Open science LMU session contribution E Steyerberg 2jul20
Open science LMU session contribution E Steyerberg 2jul20Open science LMU session contribution E Steyerberg 2jul20
Open science LMU session contribution E Steyerberg 2jul20
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflow
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...
 
AI applications in life sciences - drug development
AI applications in life sciences - drug developmentAI applications in life sciences - drug development
AI applications in life sciences - drug development
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?
 
Bias in covid 19 models
Bias in covid 19 modelsBias in covid 19 models
Bias in covid 19 models
 
Machine Learning for automatic diagnosis: why your deep neural network might ...
Machine Learning for automatic diagnosis: why your deep neural network might ...Machine Learning for automatic diagnosis: why your deep neural network might ...
Machine Learning for automatic diagnosis: why your deep neural network might ...
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
 
Journal for Clinical Studies: Close Cooperation Between Data Management and B...
Journal for Clinical Studies: Close Cooperation Between Data Management and B...Journal for Clinical Studies: Close Cooperation Between Data Management and B...
Journal for Clinical Studies: Close Cooperation Between Data Management and B...
 
Machine learning in medicine: calm down
Machine learning in medicine: calm downMachine learning in medicine: calm down
Machine learning in medicine: calm down
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great again
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck final
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AIStr-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
 

Ähnlich wie Machine Learning for Preclinical Research

AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)Paul Agapow
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a liePaul Agapow
 
Clinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansClinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansBrook White, PMP
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentEleanor Howe
 
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...The Research Council of Norway, IKTPLUSS
 
Digital transformation of translational medicine
Digital transformation of translational medicineDigital transformation of translational medicine
Digital transformation of translational medicineEagle Genomics
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsMMS Holdings
 
Hybrid Technique for Associative Classification of Heart Diseases
Hybrid Technique for Associative Classification of Heart DiseasesHybrid Technique for Associative Classification of Heart Diseases
Hybrid Technique for Associative Classification of Heart DiseasesJagdeep Singh Malhi
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studywolf vanpaemel
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?Al Dossetter
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004Salford Systems
 
Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleHadas Jacoby
 
Gaining Time -- Real-time Analysis of Big Medical Data
Gaining Time -- Real-time Analysis of Big Medical DataGaining Time -- Real-time Analysis of Big Medical Data
Gaining Time -- Real-time Analysis of Big Medical DataMatthieu Schapranow
 
AI Pharma Summit Keynote Boston 7-26-17
AI Pharma Summit Keynote Boston 7-26-17AI Pharma Summit Keynote Boston 7-26-17
AI Pharma Summit Keynote Boston 7-26-17Brandon Allgood
 
Prediction of novel targets using disease association data from Open Targets
Prediction of novel targets using disease association data from Open TargetsPrediction of novel targets using disease association data from Open Targets
Prediction of novel targets using disease association data from Open TargetsEnrico Ferrero
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisAashish Patel
 
Prediction of novel targets using disease association data from Open Targets
Prediction of novel targets using disease association data from Open TargetsPrediction of novel targets using disease association data from Open Targets
Prediction of novel targets using disease association data from Open TargetsEnrico Ferrero
 
The Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineThe Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineIda Sim
 
Automating drug target discovery with machine learning
Automating drug target discovery with machine learningAutomating drug target discovery with machine learning
Automating drug target discovery with machine learningEnrico Ferrero
 

Ähnlich wie Machine Learning for Preclinical Research (20)

AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a lie
 
Clinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansClinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-Statisticians
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
 
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
 
Digital transformation of translational medicine
Digital transformation of translational medicineDigital transformation of translational medicine
Digital transformation of translational medicine
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health Records
 
Hybrid Technique for Associative Classification of Heart Diseases
Hybrid Technique for Associative Classification of Heart DiseasesHybrid Technique for Associative Classification of Heart Diseases
Hybrid Technique for Associative Classification of Heart Diseases
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real study
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004
 
Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simple
 
Gaining Time -- Real-time Analysis of Big Medical Data
Gaining Time -- Real-time Analysis of Big Medical DataGaining Time -- Real-time Analysis of Big Medical Data
Gaining Time -- Real-time Analysis of Big Medical Data
 
AI Pharma Summit Keynote Boston 7-26-17
AI Pharma Summit Keynote Boston 7-26-17AI Pharma Summit Keynote Boston 7-26-17
AI Pharma Summit Keynote Boston 7-26-17
 
Prediction of novel targets using disease association data from Open Targets
Prediction of novel targets using disease association data from Open TargetsPrediction of novel targets using disease association data from Open Targets
Prediction of novel targets using disease association data from Open Targets
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data Analysis
 
Prediction of novel targets using disease association data from Open Targets
Prediction of novel targets using disease association data from Open TargetsPrediction of novel targets using disease association data from Open Targets
Prediction of novel targets using disease association data from Open Targets
 
The Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineThe Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based Medicine
 
Automating drug target discovery with machine learning
Automating drug target discovery with machine learningAutomating drug target discovery with machine learning
Automating drug target discovery with machine learning
 

Mehr von Paul Agapow

Digital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfDigital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfPaul Agapow
 
How to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfHow to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfPaul Agapow
 
ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trustPaul Agapow
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicinePaul Agapow
 
Beyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIBeyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIPaul Agapow
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in HealthcarePaul Agapow
 
Get yourself a better bioinformatics job
Get yourself a better bioinformatics jobGet yourself a better bioinformatics job
Get yourself a better bioinformatics jobPaul Agapow
 
Interpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchInterpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchPaul Agapow
 
Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Paul Agapow
 
Patient subtypes: real or not?
Patient subtypes: real or not?Patient subtypes: real or not?
Patient subtypes: real or not?Paul Agapow
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondonPaul Agapow
 
Introduction to Snakemake
Introduction to SnakemakeIntroduction to Snakemake
Introduction to SnakemakePaul Agapow
 
Analysing biomedical data (ers october 2017)
Analysing biomedical data (ers  october 2017)Analysing biomedical data (ers  october 2017)
Analysing biomedical data (ers october 2017)Paul Agapow
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Paul Agapow
 

Mehr von Paul Agapow (14)

Digital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfDigital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdf
 
How to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfHow to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdf
 
ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trust
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicine
 
Beyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIBeyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AI
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
Get yourself a better bioinformatics job
Get yourself a better bioinformatics jobGet yourself a better bioinformatics job
Get yourself a better bioinformatics job
 
Interpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchInterpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical Research
 
Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)
 
Patient subtypes: real or not?
Patient subtypes: real or not?Patient subtypes: real or not?
Patient subtypes: real or not?
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
Introduction to Snakemake
Introduction to SnakemakeIntroduction to Snakemake
Introduction to Snakemake
 
Analysing biomedical data (ers october 2017)
Analysing biomedical data (ers  october 2017)Analysing biomedical data (ers  october 2017)
Analysing biomedical data (ers october 2017)
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)
 

KĂźrzlich hochgeladen

Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in LucknowRussian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknowgragteena
 
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591adityaroy0215
 
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetCall Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meetpriyashah722354
 
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋Sheetaleventcompany
 
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...indiancallgirl4rent
 
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...Sheetaleventcompany
 
Hot Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
Hot  Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In LudhianaHot  Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
Hot Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In LudhianaRussian Call Girls in Ludhiana
 
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking ModelsDehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking Modelsindiancallgirl4rent
 
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In RaipurCall Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipurgragmanisha42
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012Call Girls Service Gurgaon
 
Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Vipesco
 
Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8923113531 ...
Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8923113531 ...Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8923113531 ...
Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8923113531 ...gurkirankumar98700
 
VIP Kolkata Call Girl New Town 👉 8250192130 Available With Room
VIP Kolkata Call Girl New Town 👉 8250192130  Available With RoomVIP Kolkata Call Girl New Town 👉 8250192130  Available With Room
VIP Kolkata Call Girl New Town 👉 8250192130 Available With Roomdivyansh0kumar0
 
Dehradun Call Girls Service ❤️🍑 8854095900 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 8854095900 👄🫦Independent Escort Service DehradunDehradun Call Girls Service ❤️🍑 8854095900 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 8854095900 👄🫦Independent Escort Service DehradunNiamh verma
 
Call Girls Amritsar 💯Call Us 🔝 8725944379 🔝 💃 Independent Escort Service Amri...
Call Girls Amritsar 💯Call Us 🔝 8725944379 🔝 💃 Independent Escort Service Amri...Call Girls Amritsar 💯Call Us 🔝 8725944379 🔝 💃 Independent Escort Service Amri...
Call Girls Amritsar 💯Call Us 🔝 8725944379 🔝 💃 Independent Escort Service Amri...Niamh verma
 
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near MeVIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Memriyagarg453
 
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...Gfnyt.com
 
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF ...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF  ...❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF  ...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF ...Gfnyt.com
 
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591adityaroy0215
 

KĂźrzlich hochgeladen (20)

Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
 
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in LucknowRussian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
Russian Escorts Aishbagh Road * 9548273370 Naughty Call Girls Service in Lucknow
 
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
 
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetCall Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
 
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
 
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
 
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
 
Hot Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
Hot  Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In LudhianaHot  Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
Hot Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
 
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking ModelsDehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
 
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In RaipurCall Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
 
Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510
 
Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8923113531 ...
Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8923113531 ...Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8923113531 ...
Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8923113531 ...
 
VIP Kolkata Call Girl New Town 👉 8250192130 Available With Room
VIP Kolkata Call Girl New Town 👉 8250192130  Available With RoomVIP Kolkata Call Girl New Town 👉 8250192130  Available With Room
VIP Kolkata Call Girl New Town 👉 8250192130 Available With Room
 
Dehradun Call Girls Service ❤️🍑 8854095900 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 8854095900 👄🫦Independent Escort Service DehradunDehradun Call Girls Service ❤️🍑 8854095900 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 8854095900 👄🫦Independent Escort Service Dehradun
 
Call Girls Amritsar 💯Call Us 🔝 8725944379 🔝 💃 Independent Escort Service Amri...
Call Girls Amritsar 💯Call Us 🔝 8725944379 🔝 💃 Independent Escort Service Amri...Call Girls Amritsar 💯Call Us 🔝 8725944379 🔝 💃 Independent Escort Service Amri...
Call Girls Amritsar 💯Call Us 🔝 8725944379 🔝 💃 Independent Escort Service Amri...
 
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near MeVIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
 
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
 
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF ...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF  ...❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF  ...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF ...
 
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
 

Machine Learning for Preclinical Research

  • 1. MACHINE LEARNING FOR PRECLINICAL RESEARCH Paul Agapow <p.agapow@imperial.ac.uk>
 Data Science Institute, Imperial College London Adv. Machine Learning & AI for Drug Discovery & Development (Berlin, June 2018)
  • 2. BACKGROUND & DISCLOSURE ➤ Data Science Institute (Imperial College London) ➤ Novel & advanced computation over large rich biomedical datasets for translational research & precision medicine ➤ Patient subtype discovery & mechanistic insight ➤ Scientic Advisor to PangaeaData.ai
  • 3. ➤ Big Data is a problem ➤ Methodology is a problem ➤ Truth is a problem ➤ But maybe we can do something about it
  • 4. “Nice training set. Where’s your data? - An Analyst
  • 5. BIOMEDICAL BIG DATA IS USUALLY NOT BIG (ENOUGH) ➤ Average trial size on ClinicalTrials.gov < 100 ➤ Average #samples per GEO dataset < 100 ➤ Average GWAS cohort size ~9000 (median ~2500) ➤ 1,064 ICU admissions for flu in UK 2016/2017 season ➤ Curse of dimensionality ➤ Deep learning requires “thousands” of samples for training (at least p2?) ➤ GWAS needs 3K+ for large effects, 10K or more for small effects … ➤ Sub-populations & rare diseases will be smaller VS
  • 6. MAKE BIGGER DATASETS ➤ “Allow” reuse & combining not “build” ➤ FAIR ➤ Use standards like CDISC, HPO … ➤ eTRIKS ➤ Europe’s largest public-private initiative (pharma, academic, SME, other) ➤ Data intensive translational research ➤ Data catalog of ~70 studies ➤ Sharing data (standards, starter kit)
  • 7. WE NEED MORE ETL ➤ Too damn slow and expensive ➤ Tools are poor ➤ Humans are inconsistent ➤ Standards are complex ➤ Harmonisation by ML is the only answer ➤ Learn from data examples ➤ Corrected by humans ➤ “Discover” schema if need be 1 2 3 4 1 2 3 4 Text data Tabular data § Frequent Pattern Mining-Growth Algorithms to determine schema association rules § Word2Vec to condense information of text sequence and context § Graph-Theoretical Algorithms to determine logical sequences, followers, associations, matchings § Decision Trees, Neural Nets and Support Vector Machines for training the model § Custom Algorithms to prepare data and check data quality Pre-classified data and master data mappingsData extractor Data extractor From PangaeaData.AI
  • 8. “On Big Data, data collection biases are always larger than statistical uncertainty -Daniel Himmelstein
  • 9. THE SIGNAL TO NOISE RATIO IS POOR ➤ Sampling bias ➤ P-hacking ➤ Garden of forking paths ➤ Reversion to mean ➤ Multiple hypothesis testing ➤ False discovery ➤ P-values ➤ Which method is best? ➤ Omnigenics (every gene effects every other gene)
  • 10. EXAMPLE: U-BIOPRED ➤ Unbiased BIOmarkers in PREDiction of respiratory disease outcomes ➤ 900+ patients, 16 clinical centres + other studies combined via standards ➤ Outputs: ➤ Analyses largely on small subsets (~100) ➤ Subtyping of asthmatics ➤ 40+ academic publications
  • 11.
  • 12. THE REALITY OF DEEP LEARNING ➤ Deep learning is still in progress ➤ Usually insucient (good labelled) data ➤ Interpretability issues ➤ Legal & ethical issues, federated analysis ➤ Tells you what you’ve told it ➤ Bias towards images ➤ For now …
  • 13. DEEP LEARNING WITH LESS DATA ➤ Pre-training (data without labels) ➤ Initial training with mediocre data ➤ Adapt ➤ Transfer learning (labels / output changes) ➤ Domain adaptation (data / input changes) ➤ Data augmentation ➤ Interpretability coming slowly (LIME) Dielman 2015
  • 14. “80% of the time, you can get 80% of the way with a simple decision tree. - Doug Mcilwraith (paraphrased)
  • 15. EXAMPLE: TEXT CLASSIFICATION FOR SYSTEMATIC REVIEWS ➤ Aim: nd similar or related publications within corpus ➤ Actual aim: nd which which method of text classication is “best” (Validation) ➤ Data: 15 Drug Control Reviews & Neuropathic Pain dataset ➤ Classify with random forest, naive bayes, SVM & CNNs Conclusion Dataset WSS Classifier Dataset WSS Classifier ACE Inhibitors 0.26 SVM NSAIDS 0.14 SVM ADHD 0.35 MNB Opioids 0.23 SVM Antihistamines 0.19 MNB Oral Hypoglycemics 0.21 SVM Atypical Antipsychotics 0.12 SVM PPI 0.17 SVM Beta Blockers 0.13 SVM Skeletal Muscle Relaxants 0.21 SVM CCB 0.21 SVM Statins 0.19 SVM Estrogen 0.25 SVM Triptans 0.22 SVM Neuropathic Pain 0.61 CNN Urinary Incontinence 0.25 SVM
  • 16. EXAMPLE: ASTHMA ENDOTYPING ➤ Asthma is highly heterogenous ➤ Symptoms ➤ Response to interventions ➤ Multiple mechanisms ➤ 3 or 4 or 7 clusters … ➤ Carefully curated data from U- BIOPRED (~100) ➤ Analyse “smart”: use appropriate analysis Wiki Commons
  • 17. MULTI- OR INTEGRATED OMICS ➤ Why? ➤ One way to get more data ➤ Statistical power ➤ Multiple defects required to drive endogenous disease ➤ Multiple “views” on condition ➤ How? ➤ Cluster / network individual data layers ➤ Fuse together for consensus Nemutlu 2012
  • 18. ASTHMA ENDOTYPES ➤ (Validate your methods) ➤ Use a variety of clustering approaches over asthma cohort ‘omics data (bayesian, spectral, iCluster) ➤ Use multi-omics approaches (SNF, NNMF) ➤ Assess agreement / coherence ➤ Validate in pathways, in other cohorts and in other data types
  • 19. KNOWLEDGE GRAPHS ➤ Much effort being spent in building them but: ➤ What are they for? ➤ Facts aren’t just facts ➤ “Relationships” need to be rich but loose ➤ Schema-less databases need schema ➤ Graphs may not be the right tool Meng Wang, 2017
  • 20. KNOWLEDGE GRAPHS NEED CONTEXT ➤ Aim: extract biological relationships from publications to build asthma knowledge base ➤ Domain expert time is prohibitive ➤ Use previous efforts as training ➤ OpenBEL (biological expression language) ➤ Wide range of relationships & entities ➤ Grakn ➤ Allows hyper-relationships & inheritance
  • 21. CONCLUSIONS ➤ Big biomedical data is often not big, but we can make it bigger ➤ But even big data is not without its problems ➤ Sometimes [Big | Deep | Advanced] approaches are useful, sometimes not: choose wisely ➤ Trust but verify
  • 22. “Success in the pre-clinical arena will come from carefully curated data, melding together disparate data sources & types, careful building of large datasets through consortia & alliances followed by appropriate use of machine learning and validated at the bench or in the clinic.
  • 23. THANKS ➤ Data Science Institute, ICL ➤ Fayzal Ghantiwala (Bloomberg) ➤ Nazanin Zounemat Kermani (ICL) ➤ Mansoor Saqi (ICL / KCL) ➤ Romain GuĂŠdon (Nantes) ➤ Yike Guo (ICL) ➤ eTRIKS consortium ➤ U-BIOPRED consortium
  • 24. MLMH2018 - KDD Workshop on Machine Learning for Medicine and Healthcare August 20, 2018, London, UK Topics of interest: •  Data Standards for Translational Medicine Informatics •  Analysis of large scale electronic health records or patient- generated health data records •  Visualisation of complex and dynamic biomedical networks •  Disease Subtype Discovery for Precision Medicine •  Interpretable Machine Learning for biomedicine and healthcare •  Deep learning for biomedicine Important Dates •  Submission deadline: May 25, 2018 •  Notication accept: June 8, 2018 •  Workshop date: August 8, 2018 Meet our Panel! T. Roy (Ph.D), University of Southampton, UK A. Teredesai (PhD), University of Washington, Tacoma S. Wagers (MD), CEO/Founder BioSci Consulting, Belgium Join us during the KDD Health Day! Win IBM $1,000 travel grant for best selected student paper! Follow us! https://mlmhworkshop.github.io/mlmh-2018 Twitter: Contact us: mlmhworkshop@googlegroups.com Organizers: M. Saqi, Imperial College London, UK P. Chakraborty, IBM Research, USA I. Balaur, EISBM, Lyon, France P. Agapow, Imperial College London, UK S. Wagers, BioSci Consulting, Belgium P.Y. S. Hsueh, IBM Research, USA F. Rahmanian, Geneia, USA M.A. Ahmad, Kensci Inc. and University of Washington - Tacoma, USA