Very brief overview of AI in drug discovery

A Very Brief Overview of Artificial Intelligence in Drug Discovery
Dr. Gerry Higgins
Research Professor, Department of Computational Medicine and Bioinformatics
University of Michigan Medical School

Outline
I. Challenges for big pharma
II. Competitive space and growth rate
III. Example of application

Definitions from Wikipedia
Machine learning is a field of computer science that gives computers the
ability to learn without being explicitly programmed – not to be conflated with
the term “big data”;
Deep learning (also known as deep structured learning or hierarchical
learning) is part of a broader family of machine learning methods based on
learning data representations, as opposed to task-specific algorithms. Learning
can be supervised, semi-supervised or unsupervised;
Unsupervised machine learning is the machine learning task of inferring a
function to describe hidden structure from "unlabeled" data (a classification or
categorization is not included in the observations). Since the examples given to
the learner are unlabeled, there is no evaluation of the accuracy of the
structure that is output by the relevant algorithm—which is one way of
distinguishing unsupervised learning from supervised learning;
A probabilistic computing machine is a non-deterministic machine which
chooses between the available transitions at each point according to some
probability distribution – usually Bayesian.

Courtesy, Jeff Dean, Google Brain

A somewhat inaccurate of the history of drug discovery

Challenge 1 – Which data should be used as a training set?
1. Big pharma R&D began abandoning the approach of designing small compounds to
fit well-characterized “catalytic pockets” in enzymes, receptors, transporters, etcetera
using protein-based medicinal chemistry by the 1990s [1];
2. The initial enthusiasm about novel drugs through understanding the sequence and
variation ended in the early part of this century when most of these targets were not
successful in clinical trials [2];
3. The “omnigenic” model [3] argued that so-called “core” genes with large effect sizes
should become the focus of drug discovery, dismissing small effect sizes from
disease risk GWAS as irrelevant to drug discovery – however, pharmacogenomic
effect sizes are much larger than are most trait or disease risk effect sizes [4, 5];
[1] Hopkins, M. M., Martin, P. A., Nightingale, P., Kraft, A. & Mahdi, S. The myth of the biotech revolution: an
assessment of technological, clinical and organizational change. Res. Policy. 36, 566–589 (2007).
[2] Ma, P. & Zemmel, R. Value of novelty? Nat. Rev. Drug Discov. 1, 571–572 (2002).
[3] Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell
169, 1177–1186 (2017);
[4] Personal communication, Dr. Lillian Wang, GSK
[5] Higgins, G.A., Allyn-Feuer, A., Athey, B.D. Epigenomic mapping and effect sizes of noncoding variants
associated with psychotropic drug response. Pharmacogenomics. 16, no. 14 1565-1583 (2015).

Challenge 1 – Which data should be used as a training set?
1. David Altshuler and colleagues argued that naturally occurring, familial gain- and
loss-of-function variants in humans provide the most successful targets for initial
target discovery, rather than relying on animal or cellular models, especially for CNS
drug development [6];
2. Phenome-wide association studies (PheWas) in which data from BioVUE, a
pharmacogenomic genotype-enabled electronic health record at Vanderbilt,
demonstrated that drug response and disease risk variants are clustered within the
same regulatory networks [7];
3. Big pharma-large clinical dataset countrywide partnerships have been successful at
finding rare familial mutations that are druggable mechanism-indication targets [8];
4. Machine learning of the molecular mechanism-clinical indication of drugs in clinical
trials predicted in silico which would be FDA-approved with about 80% accuracy [9].
[6] Plenge, R. M., Scolnick, E. M. & Altshuler, D. Validating therapeutic targets through human genetics. Nat. Rev.
Drug Discov. 12, 581–594 (2013).
[7] Denny, J.C., Ritchie, M.D., Basford, M.A., Pulley, J.M., Bastarache, L., Brown-Gentry, K., Wang, D., Masys,
D.R., Roden, D.M. and Crawford, D.C., PheWAS: demonstrating the feasibility of a phenome-wide scan to
discover gene–disease associations. Bioinformatics, 26(9), 1205-1210 (2010).
[8] Dugger, S.A., Platt, A. and Goldstein, D.B. Drug development in the era of precision medicine. Nat. Rev. Drug
Discov. (2017).
[9] Zhu F, Li XX, Yang SY, Chen YZ. Clinical success of drug targets prospectively predicted by in silico study.
Trends in Pharmacological Sciences (2017).

Lead Generation Using Phenotype : Genotype: Function Curves [1]
FUNCTION OF Navoltage CHANNEL 1.7
[1] Adapted from Plenge, R.M., Scolnick, E.M. and Altshuler, D. Validating therapeutic targets through human genetics.
Nature Reviews Drug Discovery. 12, 531-594.
Familial or Sporadic Disease = Mutations
 Primary erythermalgia = ***
 Paroxysmal extreme pain disorder = ^^^
 Severe Fibromyalgia (MXL) = 
 No pain disorder, wild type = 
 Channelopathy insensitive pain = ###
###


^^^
***
PAINPHENOTYPE
“The sooner you obtain a dose-response curve in humans, the higher the probability of success”
Testing in biological systems:
• HTS
• siRNA/ shRNA experiments
• iPS cells
• Knockout mice
• Toxicity testing

Mutations in voltage-gated sodium channel (SCN9A) gene
and migraine drugs in Phase III clinical trials [1, 2]
rs6754031_GG

Severe Fibromyalgia
In Mexican Females1;
Also –arthritic pain2.
[1] Dugger, S.A., Platt, A. and Goldstein, D.B. Drug development in the era of precision medicine. Nat. Rev. Drug
Discov. (2017).
[2] DeFrancesco L. Drug pipeline:4Q17. Nature Biotech. 2018.
Diagram adapted from: Drenth, J.P.H. et al. Mutations in sodium-channel gene SCN9A cause a spectrum of
human genetic pain disorders. J. Clin. Invest. 117(12), 3603-3609 (2007);
1Vargas-Alarcon, G. et al. A SCN9A gene-encoded dorsal root ganglia sodium channel polymorphism associated
with severe fibromyalgia. BMC Musculoskeletal Disorders. 13(23), 1-5 (2013);
2Bingham, B. et al. The molecular basis of pain and its clinical implications in rheumatology. Nat. Clin. Prac.
Rheum. 5(1), 28-37 (2009).
Targeting of intragenic enhancer led to
development of Raxatrigine and Funapide.

WhatMolecular mechanism-clinical indication drug targets
found or replicated in specialties other than oncology
ANGPTL4 inhibitors
for coronary artery
disease
PCSK9 as a target
for high cholesteral
SOST as a target for
osteoporosis
SNCA9 as a target
of analgesics
FTO as a target for
anti-obesity drugs
Anti-CGRP
monoclonal
antobodies for
migraine
Oligonucleotide
antisense therapy drug
against
phosphorodiamidate…
Anti-amyloid monoclonal
antibodies for late-onset
Alzheimer's disease

0
2
4
6
8
10
12
2
1 1
3
11
2 2
6
2
1
Promising Non-promising
“The Prediction Results of 31 Clinical Trial Phase III Targets Analyzed by a 2009 In Silico Study
Were Judged by Their Current Clinical Status” [1].
[1] Zhu F, Li XX, Yang SY, Chen YZ. Clinical Success of Drug Targets Prospectively Predicted by In Silico Study.
Trends in Pharmacological Sciences. 2017 Dec 30.
[2] Zhu F, Han L, Zheng C, Xie B, Tammi MT, Yang S, Wei Y, Chen Y. What are next generation innovative
therapeutic targets? Clues from genetic, structural, physicochemical, and systems profiles of successful
targets. Journal of Pharmacology and Experimental Therapeutics. 2009 Jul 1;330(1):304-15.
Machine Leaning of DNA sequence, clinical definition of human disease
phenotype and known properties of molecular target mechanisms
prospectively predicted clinical success of drugs [1, 2]

Drugs similar to those previously approved by the FDA based on molecular mechanism-
clinical indication target pairs are more likely to be approved by the FDA
[1] Shih HP, Zhang X, Aronov AM. Drug discovery effectiveness from the standpoint of therapeutic mechanisms
and indications. Nat. Rev. Drug Disc. 2018 Jan;17(1):19.
22%
21%
19%
17%
10%
10%
>5 drugs per
validated pair
4-5 drugs per
validated pair
3 drugs per
validated pair
2 drugs per
validated pair
1 drug per
validated pair
Drugs based on
unvalidated pairs
Percentage of the total of 880 FDA approved drugs –
majority are based on previously validated mechanism-indication pairs

Challenge 2 – Molecular mechanism and clinical indication pairing from a
well characterized patient cohort provide the ideal combination for drug
discovery – How and where do we obtain these datasets?
• “EHR and other biomedical data are contained in repository siloes, exist in
disparate file formats, are distributed among various clinical environments
with different cultures and data cleansing often results in data degradation” –
Eric Lai, Takeda Pharmaceuticals
• The Office of the National Coordinator for Health Information Technology
(ONCHIT) noted that by December 2017 that …”227 million Americans, or
69% of the U.S. population, had lost protected health information (PHI)
contained in their electronic health records (EHRs). This is obviously a
challenge, and remains a legal problem for hospital administrators, but there
is an emerging concern among clinical IT professionals that these data
cannot be protected, even using applications such as Blockchain…”
• Big pharma is using clinical and genomic data from partners such as Iceland
(deCODE), the United Kingdom (UK BioBank), Finland, Estonia, Poland and
China where there exists fewer restrictions to patient data than the U.S.
well characterized patient cohort provide the ideal combination for drug

well-characterized patient cohort provide the ideal combination for drug
• “Vendors of data analytics will make access to patient data a mandate, not a
competitive advantage.” –Quintiles IMS
• Many companies are now directly selling comprehensive, aggregated de-
identified patient data, linked to DNA SNPs and genome sequence, including
IBM’s Watson Health (Explorys/Phytel) and Quintiles IMS Health Real World
Evidence (over 100,000 EHRs from the U.K., some linked genotype data are
included)
• “All types of healthcare products and solutions that bundle clinical and
molecular data are currently focused on one or more steps of a patient’s
journey. However, companies like Prognos will encourage the industry to
explore specific therapeutic areas in granularity so that they can serve the
patient across the entire care pathway from prevention, prediction,
diagnosis, treatment to management.” (Frost & Sullivan, 2017).

EXISTING AND DESIRED POOL OF SCIENTISTS:
Experimental pharmacologists
DESIRED BUT HARD-TO-FIND ENGINEERS:
Computer software engineers with skills in deep learning
VERY FEW SCIENTISTS
WITH SHARED SKILL SETS
BIG PHARMA WORK-AROUNDS:
• Train in-house experimental
pharmacologists in computer
science;
• Cultivate “data curators” from
existing pool of scientists to act as
interface between experimental
pharmacologists and computer
scientists;
• Outsource to AI (deep learning)
companies.
Challenge 3 – Big Pharma cannot find skilled AI researchers

Competitive space and growth rate, 2018
Top vendors that sell “real-world evidence” (i.e., patient data from EHRs) or
bundle these data with their analytics platforms (KLAS, 2017)

Compound annual growth
rate (CAGR) of AI in
healthcare,
Frost & Sullivan, 1/2018
Value by end-user, deep
learning in drug discovery
and diagnostics,
Coherent Market Insights,
8/2017

What McKinsey thinks about trajectory of AI-related technologies (2016)

Application example – Quantum Molecular Dynamics

Application example - Quantum Molecular Dynamics

Good Review Publication with Alex Kalinin as a co-author
Opportunities and Obstacles for Deep Learning In Biology and
Medicine: https://www.biorxiv.org/content/early/2018/01/19/142760

Very brief overview of AI in drug discovery

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Very brief overview of AI in drug discovery

Ähnlich wie Very brief overview of AI in drug discovery (20)

Mehr von Dr. Gerry Higgins

Mehr von Dr. Gerry Higgins (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Very brief overview of AI in drug discovery