Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Interpreting Complex Real World Data for Pharmaceutical Research
1. Interpreting Complex Real World Data for
Pharmaceutical Research
Paul Agapow, Health Informatics Director
Data Analytics for Pharma Development, Munich
Public
November 2019
2. Disclosure
• No conflicts of interest
• Based on experience in current &
previous positions
• Oncology Data Science @AZ
• Data Science Institute @ICL
• Does not reflect official AZ thought
or projects
2
3. Thesis
• Use of Real World Data is highly attractive (and
perhaps inevitable) for Pharmaceutical R&D
• However, RWD can be complex and difficult to
interpret
• We need to develop new algorithms and new
approaches to using RWD
3
4. What are the over-arching problems in Pharma R&D?
5. A revolution in drug development?
• Every day we hear of new
advances & developments
• Acceleration in basic
biomedical research
• Constant development of
new molecular technologies
• An age of cheap
computation & powerful
machine learning
5
6. Drug development is increasingly unsustainable
• Accelerated biomedical
research not reflected in
drug development
• Eroom’s Law: cost of
developing new drug roughly
doubles every nine years
6
Pharmacelera (2014)
7. • It costs ~ $1B and 10 years to
develop & launch a drug
• Each patient in a clinical trial costs
$1-10K
• The “valley of death”: most
candidate drugs will fail
• The later it fails, the more
expensive
• But much of our data focuses on
the early stages
7
We neglect the tough maths of drug development
8. • Maybe all the low-hanging fruit has
been picked
• E.g. single gene / single system
diseases
• Most diseases are complex &
systemic
• Many patients are complex
• Lifestyle, exposure, co-
morbidities, co-medications
• A cohort is rarely just a simple table
8
Simple biology only helps with simple patients
10. 10
What is Real World Data?
Swift et al. (2018) in Clinical and Translational Science
11. Therapies & algorithms always under-
perform in the “real world” because:
• Disease is complex
• Patients are complex
• RCT populations are unrealistic /
biased
• Desk drawer problem
RWD (including sensors & monitors) offer
a total intelligence approach to patient
populations
11
If it’s not Real World Data, it’s not real
12. • “Free”
• Reusable
• Access scales (numbers, complexity,
time) that may be others impractical
• No lead time for gathering data
• Exploration free of investment
• Surveillance
• Validation of non-RWD studies
12
Real World Data offers substantial ROI
13. ML is hungry for data but:
• Not enough labelled data
• Not enough of the right sort of data:
• e.g. adverse events
• Badly imbalanced data
• Not enough data that isn’t WEIRD
RWD can give us Big Data
13
We need more data
15. • Strength is weakness: patients are
complex
• Possible confounding
• True randomization is difficult
• Data quality is variable
• Uneven density of sampling
• The data is sometimes just not
there
• Multiple modalities
• Governance / privacy issues
• There is justifiable skepticism
15
It is not easy to analyse Real World Data
16. When:
• We have no good idea of the
“model” underlying the data
• Variables may interact in complex
ways
• There’s potentially a lot of variables
and we don’t know which ones are
important
• Lowers barrier to exploration
16
Machine learning is highly attractive for drug dev
17. Express RWD as a graph
17
• But how do you get a graph into a ML model (table world)?
• Preprocess to summarize a graph:
– Careful feature engineering
– Summary statistics (e.g. centrality)
– Use neighborhood
– Kernel functions
• Or make the graph structure part of the training:
– Graph Convolutional Networks:
• Use an adjacency matrix
• Looks at neighbors
– Representation Learning:
• Find an embedding in a low-dimensional space
• Node2vec
18. Use case: predicting polypharmacy side-effects
18
• Zitnik et al. (2018) Bioinformatics
• Construct a graph of protein-
protein interactions, drug-protein
targets, and drug-drug interactions
• A graph convolutional network
(GCN) that encodes / embeds
nodes
• Decoder uses these embeddings
to model side effects
• Result: predicts the likely side-
effects of a pair of drugs
19. Analyse data as a trajectories
• A patient history or disease course is a
temporal signal
• A time-ordered sequence of:
– Events
– Symptoms
– Treatments
• Noisy, uneven
• A well-studied but computationally
formidable problem
• Use to predict outcomes, progression,
co-morbidities, casuality, subtypes,
adverse effects, etc.
Brunak (2014) Paths to COPD
20. • Stratify population for further study
• Validate / annotate with other data
modalities:
• Medications
• Gender, socio-economic, life events
• Biomarkers
• Use to develop risk scores
• Apply to oncology treatment
20
Use trajectories in COPD for …
Hypertension Diabetes Retinal Dx
Acute
bronchitis Candidiasis
Menstruation
disorder
21. Use of multi- or integrated ‘omics
21
• Why?
– One way to get more data
– Statistical power
– Multiple defects required to drive
endogenous disease
– Multiple “views” on condition
• How?
– Cluster / network individual data layers
– Fuse together for consensus
MULTI- ORINTEGRATEDOMICS
Why?
One way to get more data
Statistical power
Multiple defects required to drive
endogenous disease
Multiple “views” on condition
How?
Cluster / network individual data
layers
Fuse together for consensus
Nemutlu 2012
22. Best practices for integrated ‘omics
22
• (Validate your methods)
• Use a variety of clustering
approaches over asthma cohort
‘omics data (bayesian, spectral,
iCluster)
• Use multi-omics approaches (SNF,
NNMF)
• Assess agreement / coherence
• Validate in pathways, in other cohorts
and in other data types
ASTHMAENDOTYPES
(Validate your methods)
Use a variety of clustering approaches
over asthma cohort ‘omics data
(bayesian, spectral, iCluster)
Use multi-omics approaches (SNF,
NNMF)
Assess agreement / coherence
Validate in pathways, in other cohorts
and in other data types
23. Hypothesis generation vs. validation or proof
23
• In complex scenarios, it may be
difficult to robustly model a population
• Perhaps we should not
• Also valuable to:
– Validate or reproduce
– Examine (for comfort)
– Hypothesis generation
– Exploration
• RWD is all its complexities would be
well suited to this
25. Despite promise, we have little more than
proofs of concept:
• Chen et al. (2019) showed DUD-E dataset,
used by many “accurate” CNN models of
drug-target interactions, actually biased
• AI-radiomics shows incredible performance
in trials but mediocre performance in the
clinic
• Many ML studies are direly underpowered
• Cultural issues
25
Unfortunately, we are terrible at machine learning
26. • Kuchenbaecker et al. 2019:
• 78% of genetic studies focus on
those of European descent
• 2/3 of studies are from three
nations: UK, USA, Iceland
• Outside of Europe / North America we
walk off an information cliff
• Populations differ not just by genetics
but lifestyle, diet, exposure …
• A diverse dataset is not just more
valid, it has more information
26
Build more diverse datasets
27. • Too expensive & vast for any one
organisation?
• Harvest EHRs & registries
• Collaborate with national centres
• Build small, locally dense datasets
• Will require long-term funding & broad
collaboration to ensure usefulness &
sustainability (FAIR, consortiums, public-
private, IMI ..?)
• Important but not urgent ..?
27
Where will this data come from?
29. Summary
• Real World Data is highly attractive for Pharma R&D
• More “real”
• Better ROI
• Provides necessary volume of data for ML
• However, RWD is not a table:
• Use longitudinal approaches
• Use graphical approaches
• Not proof but hypothesis generation?
• We need to actively build larger and more diverse Real
World datasets
29
30. Challenges
• How do we generate larger and more diverse datasets?
• How do we balance the needs of primary care and
secondary reuse?
• Is privacy / confidentiality possible or workable?
• Which methods “work” and where do they work?
• How do we scale?
30
31. Thanks
• Health Informatics / Oncology Biometrics @ AZ
• Michal Krassowski, Jinyi Wu @ ICL
• NHLI, Brompton
• Naheed Kurji @Cyclica
31