Filling the gaps in translational research

Filling the gaps in translational research
Paul Agapow, Health Informatics Director
D4, Basel
Public
October 2019

Disclosure
• No conflicts of interest
• Based on experience in current &
previous positions
• Health Informatics @AZ, RWD
• Data Science Institute @ICL,
precision medicine
• Does not reflect official AZ thought
or projects
2

Stated thesis
• Development of new therapies is largely a
matter of translating basic research into
actionable healthcare
• Too often such research focuses on the wrong
problems and approaches
• What needs to be done to close the gap?
3

A revolution in drug development?
• Every day we hear of new
advances & developments
• Acceleration in basic
biomedical research
• Constant development of
new molecular technologies
• An age of cheap
computation & powerful
machine learning
5

Drug development is increasingly unsustainable
• Eroom’s Law: cost of
developing new drug roughly
doubles every nine years
• Accelerated biomedical
research not reflected in
drug development
• (Not discussing regulation)
6
Pharmacelera (2014)

There is a conflict in priorities
“Academic”
• Interesting problems
• Ideal, clean data
• Isolated, simple biology
• “Proof of concept”
• Focus on early dev
• Often single instances &
statistics
8
“Industrial”
• Needful problems
• Real, messy data
• Real, systemic biology
• Operational
• Need help in late dev
• Usually a numbers game
Thesis: we tend to employ & incentivize approaches that
inhibit long-term therapy development

• Tendency to solve:
• Easy problems (low hanging fruit)
• Interesting problems
• Publishable problems
• Problems we have data for
• But:
• That’s not where the problems
are
• That’s not where the savings are
9
Translational research tends to solve the wrong problems

• Landmark AZ papers:
• Cook et al. 2014
• Morgan et al. 2018
• 5 Rs:
• right target
• right patient
• right tissue
• right safety
• right commercial potential
• Translational research focuses on
early Rs at expense of later Rs
10
We neglect the 5 Rs of drug development

• It costs ~ $1B and 10 years to
develop & launch a drug
• Each patient in a clinical trial costs
$1-10K
• The “valley of death”: most
candidate drugs will fail
• The later it fails, the more
expensive
11
We neglect the tough maths of drug development

• There is a tendency to treat drug
development as just a data
problem
• Machine learning
• Shift from whole-organism to
high throughput methods
• Simplified view of biology
• As we move later in the
development cycle, biology
grows more complex & more
important
13
Problem: Biology isn’t “just the domain”, it’s the problem

• Maybe all the low-hanging fruit has
been picked
• E.g. single gene / single system
diseases
• Most diseases are complex &
systemic
• Many patients are complex
• Lifestyle, exposure, co-
morbidities, co-medications
• A cohort is rarely just a simple table
14
Problem: Simple biology only helps with simple patients

• Work in real complex biology early
• More work in phase 2 saves time
and money in phase 3
• Work hand-in-hand with chemists,
epidemiologists, toxicologists
• Validate functionally early & often
15
Action: Try to fail early & fail often
Ferrero (2017) ODSC Europe

• Incorporate as much biological
information as possible as early as
possible in the discovery process
• Integrative analysis
• Constrain search with biology
• Accept complexity
• Polypills
• Polypharmacology
• Look for hostile data
• Adverse effects
16
Action: Recognise & work with complex biology early
Krassowski (unpub.)

ML is hungry for data but:
• Not enough labelled data
• Not enough of the right sort of data:
• e.g. adverse events
• Badly imbalanced data (+ve or -ve):
• e.g. “what is the effect of drug X
on cancer Y”
• Not enough data without weird
biases
• e.g. hospital data
17
Problem: We need more data

• If you use sparsely sampled data to
explore or describe a data space,
the apparent shape of that data
space has more to do with the data
sampling than the actual space.
• And just about everything is sparse
• We need better datasets
Problem: Train or learn from sparse data at your peril
18

• Solutions are inevitably shaped by
the composition of the test cohort,
which are usually:
• WEIRD
• Young male Caucasians
• Worried well
• But:
• Can models based on these
populations generalise?
• Do drugs behave differently in
different populations?
19
Problem: It is too easy to study the wrong populations

• Need data with relevant / real
populations
• Not just more validity, there is more
information in a diverse dataset
• Lowers barrier to exploration
• How do we do this?
• Harvest EHRs & other RWE
• Collaborate with national centres
• Build small, locally dense datasets
• Will require long-term funding &
broad collaboration to ensure
usefulness & sustainability (FAIR,
consortiums, public-private, IMI ..?)
20
Action: Build more diverse datasets

• On top of dataset bias, our data is too simple
• Therapies and algorithm always under-perform in the “real world” because:
• Disease is complex
• Patients are complex
• Co-morbidities
• RCT populations are unrealistic
• Desk drawer problem
21
Problem: if it’s not Real World Data, it’s not real

• Synthetic control arms
• Build tools over RWD to help clinical trials
recruitment
• Build new algorithms to exploit RWD
• Disease trajectories / Brunak
22
Action: use RWD, build algorithms for RWD
Hypertension Diabetes Retinal Dx
Acute
bronchitis Candidiasis
Menstruation
disorder

When:
• We have no good idea of the
“model” underlying the data
• Variables may interact in complex
ways
• There’s potentially a lot of variables
and we don’t know which ones are
important
• Lowers barrier to exploration
23
Machine learning is highly attractive for drug dev

Despite promise, we have little more than
proofs of concept:
• Chen et al. (2019) showed DUD-E dataset,
used by many “accurate” CNN models of
drug-target interactions, actually biased
• AI-radiomics shows incredible performance
in trials but mediocre performance in the
clinic
• Many ML studies are direly underpowered
• We keep solving the same “easy” problems
• Cultural issues
24
Problem: We are terrible at machine learning

• Use ML to prioritize research:
• “I can’t do anything with 1000
candidate molecules in Phase
III. Give me 5 good ones.”
• E.g. Prioritize & validate pro-
arrhythmic candidates
• Use ML later in the pipeline where
savings are greatest:
• E.g. adverse events
• E.g. screening trial candidates
• Don’t put data science in a box
25
Action: Use ML where it has impact
Costabal, et al. (2019) BioXRiv

• Take a drug approved for one
indication and use it for another
• Why:
• Cheap
• Starting with a drug that does
something is closer than
starting de novo
• Safer
• Drugs act on common pathways
26
Action: Drug repurposing & repositioning is smart
Naylor & Schonfeld (2014) DDW Winter

• Use data and ML where they have impact / saving
• This is often later in the drug development process
• Use data and ML to keep yourself honest
• Let’s build the datasets we don’t have
27
Summary

Thanks
• Health Informatics @AZ
• Michal Krassowski (ICL)
• Jinyi Wu (ICL)
• Naheed Kurji (Cyclica)
28

Filling the gaps in translational research

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Filling the gaps in translational research

Ähnlich wie Filling the gaps in translational research (20)

Mehr von Paul Agapow

Mehr von Paul Agapow (14)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Filling the gaps in translational research