SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Interpreting Complex Real World Data for
Pharmaceutical Research
Paul Agapow, Health Informatics Director
Data Analytics for Pharma Development, Munich
Public
November 2019
Disclosure
• No conflicts of interest
• Based on experience in current &
previous positions
• Oncology Data Science @AZ
• Data Science Institute @ICL
• Does not reflect official AZ thought
or projects
2
Thesis
• Use of Real World Data is highly attractive (and
perhaps inevitable) for Pharmaceutical R&D
• However, RWD can be complex and difficult to
interpret
• We need to develop new algorithms and new
approaches to using RWD
3
What are the over-arching problems in Pharma R&D?
A revolution in drug development?
• Every day we hear of new
advances & developments
• Acceleration in basic
biomedical research
• Constant development of
new molecular technologies
• An age of cheap
computation & powerful
machine learning
5
Drug development is increasingly unsustainable
• Accelerated biomedical
research not reflected in
drug development
• Eroom’s Law: cost of
developing new drug roughly
doubles every nine years
6
Pharmacelera (2014)
• It costs ~ $1B and 10 years to
develop & launch a drug
• Each patient in a clinical trial costs
$1-10K
• The “valley of death”: most
candidate drugs will fail
• The later it fails, the more
expensive
• But much of our data focuses on
the early stages
7
We neglect the tough maths of drug development
• Maybe all the low-hanging fruit has
been picked
• E.g. single gene / single system
diseases
• Most diseases are complex &
systemic
• Many patients are complex
• Lifestyle, exposure, co-
morbidities, co-medications
• A cohort is rarely just a simple table
8
Simple biology only helps with simple patients
How might Real World Data ameliorate these?
10
What is Real World Data?
Swift et al. (2018) in Clinical and Translational Science
Therapies & algorithms always under-
perform in the “real world” because:
• Disease is complex
• Patients are complex
• RCT populations are unrealistic /
biased
• Desk drawer problem
RWD (including sensors & monitors) offer
a total intelligence approach to patient
populations
11
If it’s not Real World Data, it’s not real
• “Free”
• Reusable
• Access scales (numbers, complexity,
time) that may be others impractical
• No lead time for gathering data
• Exploration free of investment
• Surveillance
• Validation of non-RWD studies
12
Real World Data offers substantial ROI
ML is hungry for data but:
• Not enough labelled data
• Not enough of the right sort of data:
• e.g. adverse events
• Badly imbalanced data
• Not enough data that isn’t WEIRD
RWD can give us Big Data
13
We need more data
How can we best use Real World Data?
• Strength is weakness: patients are
complex
• Possible confounding
• True randomization is difficult
• Data quality is variable
• Uneven density of sampling
• The data is sometimes just not
there
• Multiple modalities
• Governance / privacy issues
• There is justifiable skepticism
15
It is not easy to analyse Real World Data
When:
• We have no good idea of the
“model” underlying the data
• Variables may interact in complex
ways
• There’s potentially a lot of variables
and we don’t know which ones are
important
• Lowers barrier to exploration
16
Machine learning is highly attractive for drug dev
Express RWD as a graph
17
• But how do you get a graph into a ML model (table world)?
• Preprocess to summarize a graph:
– Careful feature engineering
– Summary statistics (e.g. centrality)
– Use neighborhood
– Kernel functions
• Or make the graph structure part of the training:
– Graph Convolutional Networks:
• Use an adjacency matrix
• Looks at neighbors
– Representation Learning:
• Find an embedding in a low-dimensional space
• Node2vec
Use case: predicting polypharmacy side-effects
18
• Zitnik et al. (2018) Bioinformatics
• Construct a graph of protein-
protein interactions, drug-protein
targets, and drug-drug interactions
• A graph convolutional network
(GCN) that encodes / embeds
nodes
• Decoder uses these embeddings
to model side effects
• Result: predicts the likely side-
effects of a pair of drugs
Analyse data as a trajectories
• A patient history or disease course is a
temporal signal
• A time-ordered sequence of:
– Events
– Symptoms
– Treatments
• Noisy, uneven
• A well-studied but computationally
formidable problem
• Use to predict outcomes, progression,
co-morbidities, casuality, subtypes,
adverse effects, etc.
Brunak (2014) Paths to COPD
• Stratify population for further study
• Validate / annotate with other data
modalities:
• Medications
• Gender, socio-economic, life events
• Biomarkers
• Use to develop risk scores
• Apply to oncology treatment
20
Use trajectories in COPD for …
Hypertension Diabetes Retinal Dx
Acute
bronchitis Candidiasis
Menstruation
disorder
Use of multi- or integrated ‘omics
21
• Why?
– One way to get more data
– Statistical power
– Multiple defects required to drive
endogenous disease
– Multiple “views” on condition
• How?
– Cluster / network individual data layers
– Fuse together for consensus
MULTI- ORINTEGRATEDOMICS
Why?
One way to get more data
Statistical power
Multiple defects required to drive
endogenous disease
Multiple “views” on condition
How?
Cluster / network individual data
layers
Fuse together for consensus
Nemutlu 2012
Best practices for integrated ‘omics
22
• (Validate your methods)
• Use a variety of clustering
approaches over asthma cohort
‘omics data (bayesian, spectral,
iCluster)
• Use multi-omics approaches (SNF,
NNMF)
• Assess agreement / coherence
• Validate in pathways, in other cohorts
and in other data types
ASTHMAENDOTYPES
(Validate your methods)
Use a variety of clustering approaches
over asthma cohort ‘omics data
(bayesian, spectral, iCluster)
Use multi-omics approaches (SNF,
NNMF)
Assess agreement / coherence
Validate in pathways, in other cohorts
and in other data types
Hypothesis generation vs. validation or proof
23
• In complex scenarios, it may be
difficult to robustly model a population
• Perhaps we should not
• Also valuable to:
– Validate or reproduce
– Examine (for comfort)
– Hypothesis generation
– Exploration
• RWD is all its complexities would be
well suited to this
We need to build better Real World datasets
Despite promise, we have little more than
proofs of concept:
• Chen et al. (2019) showed DUD-E dataset,
used by many “accurate” CNN models of
drug-target interactions, actually biased
• AI-radiomics shows incredible performance
in trials but mediocre performance in the
clinic
• Many ML studies are direly underpowered
• Cultural issues
25
Unfortunately, we are terrible at machine learning
• Kuchenbaecker et al. 2019:
• 78% of genetic studies focus on
those of European descent
• 2/3 of studies are from three
nations: UK, USA, Iceland
• Outside of Europe / North America we
walk off an information cliff
• Populations differ not just by genetics
but lifestyle, diet, exposure …
• A diverse dataset is not just more
valid, it has more information
26
Build more diverse datasets
• Too expensive & vast for any one
organisation?
• Harvest EHRs & registries
• Collaborate with national centres
• Build small, locally dense datasets
• Will require long-term funding & broad
collaboration to ensure usefulness &
sustainability (FAIR, consortiums, public-
private, IMI ..?)
• Important but not urgent ..?
27
Where will this data come from?
Wrapping it up
Summary
• Real World Data is highly attractive for Pharma R&D
• More “real”
• Better ROI
• Provides necessary volume of data for ML
• However, RWD is not a table:
• Use longitudinal approaches
• Use graphical approaches
• Not proof but hypothesis generation?
• We need to actively build larger and more diverse Real
World datasets
29
Challenges
• How do we generate larger and more diverse datasets?
• How do we balance the needs of primary care and
secondary reuse?
• Is privacy / confidentiality possible or workable?
• Which methods “work” and where do they work?
• How do we scale?
30
Thanks
• Health Informatics / Oncology Biometrics @ AZ
• Michal Krassowski, Jinyi Wu @ ICL
• NHLI, Brompton
• Naheed Kurji @Cyclica
31

Weitere ähnliche Inhalte

Was ist angesagt?

Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
 
Data analytics and the power of creating social impact
Data analytics and the power of creating social impactData analytics and the power of creating social impact
Data analytics and the power of creating social impactTA Telecom
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great againBenVanCalster
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsMaarten van Smeden
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIMaarten van Smeden
 
Machine learning in medicine: calm down
Machine learning in medicine: calm downMachine learning in medicine: calm down
Machine learning in medicine: calm downBenVanCalster
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?Paul Agapow
 
Machine learning applied in health
Machine learning applied in healthMachine learning applied in health
Machine learning applied in healthBig Data Colombia
 
ML & AI in Drug development: the hidden part of the iceberg
ML & AI in Drug development: the hidden part of the icebergML & AI in Drug development: the hidden part of the iceberg
ML & AI in Drug development: the hidden part of the icebergPaul Agapow
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...BenVanCalster
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyondMaarten van Smeden
 
Bias in covid 19 models
Bias in covid 19 modelsBias in covid 19 models
Bias in covid 19 modelsLaure Wynants
 
Analysis of "A Predictive Analytics Primer" by Tom Davenport
 Analysis of "A Predictive Analytics Primer" by Tom Davenport Analysis of "A Predictive Analytics Primer" by Tom Davenport
Analysis of "A Predictive Analytics Primer" by Tom DavenportEt Hish
 
The absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problemThe absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problemMaarten van Smeden
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Maarten van Smeden
 
DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI...
 DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI... DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI...
DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI...Nexgen Technology
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease PredictionMustafa Oğuz
 

Was ist angesagt? (20)

Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
 
Data analytics and the power of creating social impact
Data analytics and the power of creating social impactData analytics and the power of creating social impact
Data analytics and the power of creating social impact
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great again
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
 
Clinical prediction models
Clinical prediction modelsClinical prediction models
Clinical prediction models
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part II
 
Machine learning in medicine: calm down
Machine learning in medicine: calm downMachine learning in medicine: calm down
Machine learning in medicine: calm down
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?
 
Machine learning applied in health
Machine learning applied in healthMachine learning applied in health
Machine learning applied in health
 
ML & AI in Drug development: the hidden part of the iceberg
ML & AI in Drug development: the hidden part of the icebergML & AI in Drug development: the hidden part of the iceberg
ML & AI in Drug development: the hidden part of the iceberg
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
 
Bias in covid 19 models
Bias in covid 19 modelsBias in covid 19 models
Bias in covid 19 models
 
Analysis of "A Predictive Analytics Primer" by Tom Davenport
 Analysis of "A Predictive Analytics Primer" by Tom Davenport Analysis of "A Predictive Analytics Primer" by Tom Davenport
Analysis of "A Predictive Analytics Primer" by Tom Davenport
 
MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
 
The absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problemThe absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problem
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
 
DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI...
 DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI... DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI...
DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI...
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
 

Ähnlich wie Interpreting Complex Real World Data for Pharmaceutical Research

AI in Healthcare
AI in HealthcareAI in Healthcare
AI in HealthcarePaul Agapow
 
ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trustPaul Agapow
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryDinesh V
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Philip Bourne
 
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...Koray Atalag
 
Biostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsBiostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsSetia Pramana
 
Informatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside careInformatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside careMike Hogarth, MD, FACMI, FACP
 
Pathway 2.0 for RWE and MA 2015 -John Cai
Pathway 2.0 for RWE and MA 2015 -John CaiPathway 2.0 for RWE and MA 2015 -John Cai
Pathway 2.0 for RWE and MA 2015 -John CaiJohn Cai
 
Social Network Applications
Social Network ApplicationsSocial Network Applications
Social Network ApplicationsJason Riedy
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...Christopher Hart
 
ROADMAP at Lausanne III OECD 28Oct2016
ROADMAP at Lausanne III OECD 28Oct2016ROADMAP at Lausanne III OECD 28Oct2016
ROADMAP at Lausanne III OECD 28Oct2016Martin Pan
 
Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014ipposi
 
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
COMPUTERS IN PHARMACEUTICAL DEVELOPMENTCOMPUTERS IN PHARMACEUTICAL DEVELOPMENT
COMPUTERS IN PHARMACEUTICAL DEVELOPMENTArunpandiyan59
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryNeo4j
 
Cloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences ResearchCloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences ResearchInterpretOmics
 
Beyond the Hype: Practical Approaches for Implementing Machine Learning and A...
Beyond the Hype: Practical Approaches for Implementing Machine Learning and A...Beyond the Hype: Practical Approaches for Implementing Machine Learning and A...
Beyond the Hype: Practical Approaches for Implementing Machine Learning and A...Life Sciences Network marcus evans
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...D3 Consutling
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicinePaul Agapow
 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareNUS-ISS
 

Ähnlich wie Interpreting Complex Real World Data for Pharmaceutical Research (20)

AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trust
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare Industry
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
 
Biostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsBiostatistics and Statistical Bioinformatics
Biostatistics and Statistical Bioinformatics
 
Informatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside careInformatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside care
 
Pathway 2.0 for RWE and MA 2015 -John Cai
Pathway 2.0 for RWE and MA 2015 -John CaiPathway 2.0 for RWE and MA 2015 -John Cai
Pathway 2.0 for RWE and MA 2015 -John Cai
 
Day 1: Real-World Data Panel
Day 1: Real-World Data Panel Day 1: Real-World Data Panel
Day 1: Real-World Data Panel
 
Social Network Applications
Social Network ApplicationsSocial Network Applications
Social Network Applications
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
ROADMAP at Lausanne III OECD 28Oct2016
ROADMAP at Lausanne III OECD 28Oct2016ROADMAP at Lausanne III OECD 28Oct2016
ROADMAP at Lausanne III OECD 28Oct2016
 
Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014
 
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
COMPUTERS IN PHARMACEUTICAL DEVELOPMENTCOMPUTERS IN PHARMACEUTICAL DEVELOPMENT
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
 
Cloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences ResearchCloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences Research
 
Beyond the Hype: Practical Approaches for Implementing Machine Learning and A...
Beyond the Hype: Practical Approaches for Implementing Machine Learning and A...Beyond the Hype: Practical Approaches for Implementing Machine Learning and A...
Beyond the Hype: Practical Approaches for Implementing Machine Learning and A...
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicine
 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in Healthcare
 

Mehr von Paul Agapow

Digital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfDigital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfPaul Agapow
 
How to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfHow to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfPaul Agapow
 
Multi-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gainMulti-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gainPaul Agapow
 
Get yourself a better bioinformatics job
Get yourself a better bioinformatics jobGet yourself a better bioinformatics job
Get yourself a better bioinformatics jobPaul Agapow
 
Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Paul Agapow
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)Paul Agapow
 
Patient subtypes: real or not?
Patient subtypes: real or not?Patient subtypes: real or not?
Patient subtypes: real or not?Paul Agapow
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a liePaul Agapow
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondonPaul Agapow
 
Introduction to Snakemake
Introduction to SnakemakeIntroduction to Snakemake
Introduction to SnakemakePaul Agapow
 
Analysing biomedical data (ers october 2017)
Analysing biomedical data (ers  october 2017)Analysing biomedical data (ers  october 2017)
Analysing biomedical data (ers october 2017)Paul Agapow
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Paul Agapow
 

Mehr von Paul Agapow (13)

Digital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfDigital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdf
 
How to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfHow to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdf
 
Multi-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gainMulti-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gain
 
Get yourself a better bioinformatics job
Get yourself a better bioinformatics jobGet yourself a better bioinformatics job
Get yourself a better bioinformatics job
 
Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)
 
Patient subtypes: real or not?
Patient subtypes: real or not?Patient subtypes: real or not?
Patient subtypes: real or not?
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a lie
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
Introduction to Snakemake
Introduction to SnakemakeIntroduction to Snakemake
Introduction to Snakemake
 
Analysing biomedical data (ers october 2017)
Analysing biomedical data (ers  october 2017)Analysing biomedical data (ers  october 2017)
Analysing biomedical data (ers october 2017)
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)
 

Kürzlich hochgeladen

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 

Interpreting Complex Real World Data for Pharmaceutical Research

  • 1. Interpreting Complex Real World Data for Pharmaceutical Research Paul Agapow, Health Informatics Director Data Analytics for Pharma Development, Munich Public November 2019
  • 2. Disclosure • No conflicts of interest • Based on experience in current & previous positions • Oncology Data Science @AZ • Data Science Institute @ICL • Does not reflect official AZ thought or projects 2
  • 3. Thesis • Use of Real World Data is highly attractive (and perhaps inevitable) for Pharmaceutical R&D • However, RWD can be complex and difficult to interpret • We need to develop new algorithms and new approaches to using RWD 3
  • 4. What are the over-arching problems in Pharma R&D?
  • 5. A revolution in drug development? • Every day we hear of new advances & developments • Acceleration in basic biomedical research • Constant development of new molecular technologies • An age of cheap computation & powerful machine learning 5
  • 6. Drug development is increasingly unsustainable • Accelerated biomedical research not reflected in drug development • Eroom’s Law: cost of developing new drug roughly doubles every nine years 6 Pharmacelera (2014)
  • 7. • It costs ~ $1B and 10 years to develop & launch a drug • Each patient in a clinical trial costs $1-10K • The “valley of death”: most candidate drugs will fail • The later it fails, the more expensive • But much of our data focuses on the early stages 7 We neglect the tough maths of drug development
  • 8. • Maybe all the low-hanging fruit has been picked • E.g. single gene / single system diseases • Most diseases are complex & systemic • Many patients are complex • Lifestyle, exposure, co- morbidities, co-medications • A cohort is rarely just a simple table 8 Simple biology only helps with simple patients
  • 9. How might Real World Data ameliorate these?
  • 10. 10 What is Real World Data? Swift et al. (2018) in Clinical and Translational Science
  • 11. Therapies & algorithms always under- perform in the “real world” because: • Disease is complex • Patients are complex • RCT populations are unrealistic / biased • Desk drawer problem RWD (including sensors & monitors) offer a total intelligence approach to patient populations 11 If it’s not Real World Data, it’s not real
  • 12. • “Free” • Reusable • Access scales (numbers, complexity, time) that may be others impractical • No lead time for gathering data • Exploration free of investment • Surveillance • Validation of non-RWD studies 12 Real World Data offers substantial ROI
  • 13. ML is hungry for data but: • Not enough labelled data • Not enough of the right sort of data: • e.g. adverse events • Badly imbalanced data • Not enough data that isn’t WEIRD RWD can give us Big Data 13 We need more data
  • 14. How can we best use Real World Data?
  • 15. • Strength is weakness: patients are complex • Possible confounding • True randomization is difficult • Data quality is variable • Uneven density of sampling • The data is sometimes just not there • Multiple modalities • Governance / privacy issues • There is justifiable skepticism 15 It is not easy to analyse Real World Data
  • 16. When: • We have no good idea of the “model” underlying the data • Variables may interact in complex ways • There’s potentially a lot of variables and we don’t know which ones are important • Lowers barrier to exploration 16 Machine learning is highly attractive for drug dev
  • 17. Express RWD as a graph 17 • But how do you get a graph into a ML model (table world)? • Preprocess to summarize a graph: – Careful feature engineering – Summary statistics (e.g. centrality) – Use neighborhood – Kernel functions • Or make the graph structure part of the training: – Graph Convolutional Networks: • Use an adjacency matrix • Looks at neighbors – Representation Learning: • Find an embedding in a low-dimensional space • Node2vec
  • 18. Use case: predicting polypharmacy side-effects 18 • Zitnik et al. (2018) Bioinformatics • Construct a graph of protein- protein interactions, drug-protein targets, and drug-drug interactions • A graph convolutional network (GCN) that encodes / embeds nodes • Decoder uses these embeddings to model side effects • Result: predicts the likely side- effects of a pair of drugs
  • 19. Analyse data as a trajectories • A patient history or disease course is a temporal signal • A time-ordered sequence of: – Events – Symptoms – Treatments • Noisy, uneven • A well-studied but computationally formidable problem • Use to predict outcomes, progression, co-morbidities, casuality, subtypes, adverse effects, etc. Brunak (2014) Paths to COPD
  • 20. • Stratify population for further study • Validate / annotate with other data modalities: • Medications • Gender, socio-economic, life events • Biomarkers • Use to develop risk scores • Apply to oncology treatment 20 Use trajectories in COPD for … Hypertension Diabetes Retinal Dx Acute bronchitis Candidiasis Menstruation disorder
  • 21. Use of multi- or integrated ‘omics 21 • Why? – One way to get more data – Statistical power – Multiple defects required to drive endogenous disease – Multiple “views” on condition • How? – Cluster / network individual data layers – Fuse together for consensus MULTI- ORINTEGRATEDOMICS Why? One way to get more data Statistical power Multiple defects required to drive endogenous disease Multiple “views” on condition How? Cluster / network individual data layers Fuse together for consensus Nemutlu 2012
  • 22. Best practices for integrated ‘omics 22 • (Validate your methods) • Use a variety of clustering approaches over asthma cohort ‘omics data (bayesian, spectral, iCluster) • Use multi-omics approaches (SNF, NNMF) • Assess agreement / coherence • Validate in pathways, in other cohorts and in other data types ASTHMAENDOTYPES (Validate your methods) Use a variety of clustering approaches over asthma cohort ‘omics data (bayesian, spectral, iCluster) Use multi-omics approaches (SNF, NNMF) Assess agreement / coherence Validate in pathways, in other cohorts and in other data types
  • 23. Hypothesis generation vs. validation or proof 23 • In complex scenarios, it may be difficult to robustly model a population • Perhaps we should not • Also valuable to: – Validate or reproduce – Examine (for comfort) – Hypothesis generation – Exploration • RWD is all its complexities would be well suited to this
  • 24. We need to build better Real World datasets
  • 25. Despite promise, we have little more than proofs of concept: • Chen et al. (2019) showed DUD-E dataset, used by many “accurate” CNN models of drug-target interactions, actually biased • AI-radiomics shows incredible performance in trials but mediocre performance in the clinic • Many ML studies are direly underpowered • Cultural issues 25 Unfortunately, we are terrible at machine learning
  • 26. • Kuchenbaecker et al. 2019: • 78% of genetic studies focus on those of European descent • 2/3 of studies are from three nations: UK, USA, Iceland • Outside of Europe / North America we walk off an information cliff • Populations differ not just by genetics but lifestyle, diet, exposure … • A diverse dataset is not just more valid, it has more information 26 Build more diverse datasets
  • 27. • Too expensive & vast for any one organisation? • Harvest EHRs & registries • Collaborate with national centres • Build small, locally dense datasets • Will require long-term funding & broad collaboration to ensure usefulness & sustainability (FAIR, consortiums, public- private, IMI ..?) • Important but not urgent ..? 27 Where will this data come from?
  • 29. Summary • Real World Data is highly attractive for Pharma R&D • More “real” • Better ROI • Provides necessary volume of data for ML • However, RWD is not a table: • Use longitudinal approaches • Use graphical approaches • Not proof but hypothesis generation? • We need to actively build larger and more diverse Real World datasets 29
  • 30. Challenges • How do we generate larger and more diverse datasets? • How do we balance the needs of primary care and secondary reuse? • Is privacy / confidentiality possible or workable? • Which methods “work” and where do they work? • How do we scale? 30
  • 31. Thanks • Health Informatics / Oncology Biometrics @ AZ • Michal Krassowski, Jinyi Wu @ ICL • NHLI, Brompton • Naheed Kurji @Cyclica 31