A phenotype is an observable characteristic of an individually and typically pertains to its morphology, function, and behavior. Phenotypes, whether observed at the bench or the bedside, are increasingly being used to gain insight into the diagnosis, mechanism, and treatment of disease. A key aspect of these approaches involve comparing phenotypes that are defined in multiple terminologies that often cater to altogether different organisms, such as mice and humans. In this seminar, I will discuss computational approaches for harmonizing and utilizing phenotypes for translational research. We will examine case studies which involve the computation of semantic similarity including the use of phenotypes to inform clinical diagnosis of rare diseases, to identify human drug targets using mice knock-out models, and to explore phenotype-based approaches for drug repositioning .
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Making the most of phenotypes in ontology-based biomedical knowledge discovery
1. Making the most of phenotypes
in ontology-based biomedical
knowledge discovery
1
Michel Dumontier, Ph.D.
Associate Professor of Medicine (Biomedical Informatics)
Stanford University
@micheldumontier::Biostats:19-02-15
2. Topics
• Computable Phenotypes
• Methods to compare Phenotypes
• Cross-Species Phenotype Integration
• Applications
– Undiagnosed Diseases
– Drug Target Identification
– Drug Repurposing
@micheldumontier::Biostats:19-02-152
3. Phenotypes
• A phenotype is an observable characteristic of
an individual and typically pertains to its
morphology, function, and behavior.
– qualitative, deals with normal and abnormal phen.
– red eye color, abnormal gait, enlarged colon
@micheldumontier::Biostats:19-02-153
5. Matching patients to diseases
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Flat back of head Hypotonia
Abnormal skull morphology Decreased muscle mass
@micheldumontier::Biostats:19-02-155
6. Differential diagnosis becomes challenging
with rare and complex disorders
• Over 7000 rare diseases
• < 1 in 1500-2500
• Most have fewer than 50
case reports
• Nearly 1 in 10 Americans
suffer from one or more
rare diseases
• Only 250 medicinal
products have been
approved to diagnose and
treat rare diseases
@micheldumontier::Biostats:19-02-156
Carpenter Syndrome
- acrocephalopolysyndactyly (ACPS)
disorder
- 40 cases described in the literature
- <1 in 1M
7. Genotypes + Phenotypes
Improves Diagnosis
@micheldumontier::Biostats:19-02-157
Remove off-target, common variants,
and variants not in known disease
causing genes
http://compbio.charite.de/PhenIX/
Target panel of 2741 known
Mendelian disease genes
Compare
phenotype
profiles from:
Clinvar, OMIM,
Orphanet
Zemojtel et al. Sci Transl Med. 2014. 6(252):252ra123
9. So how did they do it?
1. Computable representation of phenotypes
2. Methods to compare phenotype profiles
3. Using model organisms to increase coverage
of the phenotype space
@micheldumontier::Biostats:19-02-159
10. Difficult to find all results
using text searches
@micheldumontier::Biostats:19-02-1510
11. The Human Phenotype Ontology:
A Computable Representation of Human Phenotypes
11,000+ classes
Follows the True Path Rule
Used to annotate:
• Patients
• Disorders/Diseases
• Genes, Gene Variants,
& Genotypes
Reduced pancreatic
beta cells
Abnormality of
pancreatic islet
cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Abnormality of exocrine
pancreas physiology
Köhler et al. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
@micheldumontier::Biostats:19-02-1511
12. HPO has unique terms
@micheldumontier::Biostats:19-02-1512
Winnenburg and Bodenreider, ISMB PhenoDay, 2014
13. Increased numbers of
diseases are described
using the HPO
@micheldumontier::Biostats:19-02-1513
Phenotype annotations per species
http://www.monarchinitiative.org
14. Phenotype “BLAST”: Which phenotypic
profile is most similar?
Disease X
Patient
Disease Y
@micheldumontier::Biostats:19-02-1514
15. Phenotips: Getting high quality
patient phenotypes
@micheldumontier::Biostats:19-02-1515
Girdea et al. (2013), PhenoTips: Patient Phenotyping Software for Clinical and Research
Use. Hum. Mutat., 34: 1057–1065. doi: 10.1002/humu.22347
16. Semantic Similarity
• Semantic similarity is a metric defined over a set of
terms, where the distance between them is based on
their meaning.
• It can be estimated by examining, for instance,
– Topological similarity
– Information content
– Statistical co-occurrence
• Widely used in bioinformatics for gene enrichment,
function prediction, network screening, clustering,
etc.
@micheldumontier::Biostats:19-02-1516
18. Measures of Semantic Similarity
Edge-Based Measures
– Shortest path (Rada)
– Common path
– Scaling by depth, etc.
• Requires uniform distribution
of nodes and edges
Node-based Measures
– Shared terms
– Common ancestors
– Information content (IC)
• Better able to account for
structural heterogeneity
Set comparisons
• Pairwise
– Max/average/sum
– All or best pairs
• Groupwise
– Set, graph, vector
– Various combinations
Implementations
– Semanticmeasureslibrary.org
– OWL-SIM
@micheldumontier::Biostats:19-02-1518
Semantic Similarity in Biomedical Ontologies
PLoS Comput Biol. 2009 Jul; 5(7): e1000443.
23. Image credit: Viljoen and Beighton, J Med Genet. 1992
Schwartz-jampel Syndrome, Type I
Schwartz-jampel Syndrome,
Type I
Caused by Hspg2 mutation, a
proteoglycan
~100 phenotype annotations
@micheldumontier::Biostats:19-02-1523
25. Semantic similarity
is robust in the face of missing information
92% of derived profiles are most similar to original
disease profile
Profile Similarity Derived Profile Rank
@micheldumontier::Biostats:19-02-1525
26. Semantic similarity algorithms
are sensitive to specificity of information
The more general the phenotype, the poorer the
match the disease
Profile Similarity Derived Profile Rank
@micheldumontier::Biostats:19-02-1526
30. Down Syndrome Mouse
@micheldumontier::Biostats:19-02-1530
Ts65Dn mice survive to adulthood and express
some characteristics of Down syndrome such
as developmental delay, hyperactivity, weight
problems, craniofacial dysmorphology,
impaired learning, and behavior deficit
31. Each species uniquely covers a
different set of phenotypes
Provides an opportunity to use this information to inform
human disease @micheldumontier::Biostats:19-02-1531
32. Human and model phenotypes can be
linked to >75% human genes
@micheldumontier::Biostats:19-02-1532
33. Problem: Clinical and model
phenotypes are described differently
@micheldumontier::Biostats:19-02-1533
34. lung
lung
lobular organ
parenchymatous
organ
solid organ
pleural sac
thoracic
cavity organ
thoracic
cavity
abnormal lung
morphology
abnormal respiratory
system morphology
Mammalian Phenotype
Mouse Anatomy
FMA
abnormal pulmonary
acinus morphology
abnormal pulmonary
alveolus morphology
lung
alveolus
organ system
respiratory
system
Lower
respiratory
tract
alveolar sac
pulmonary
acinus
organ system
respiratory
system
Human development
lung
lung bud
respiratory
primordium
pharyngeal region
Problem:
Each organism uses different vocabularies
develops_from
part_of
is_a (SubClassOf)
surrounded_by
@micheldumontier::Biostats:19-02-1534
36. Enhance lexical approach with OWL
bridging axioms
• Key idea:
– Describe the phenotype in a machine-interpretable
way
• Break it down into digestible chunks!
• Logical definition
– The machine will then be able to help you
• Match phenotypes
• Automate ontology checking and addition of new terms
• Approach:
– Use Web Ontology Language (OWL), a description
logic to describe phenotypes
– Use OWL reasoning to find connections
Mungall et al. (2012). Genome Biology, 13(1), R5
Köhler et al. (2014) F1000Research 2:30
Haendel et al. (2014) JBMS 5:21
Hoendorf et al. (2011). NAR 39(18):e119
Hoendorf et al. (2011) Bioinformatics 27(7):1001
@micheldumontier::Biostats:19-02-1536
42. animal models provide insight for on target effects
• In the majority of 100 best selling drugs ($148B in
US alone), there is a direct correlation between
knockout phenotype and drug effect
• Immunological Indications
– Anti-histamines (Claritin, Allegra, Zyrtec)
– KO of histamine H1 receptor leads to decreased
responsiveness of immune system
– Predicts on target effects : drowsiness, reduced
anxiety
@micheldumontier::Biostats:19-02-1542
Zambrowicz and Sands. Nat Rev Drug Disc. 2003.
43. Identifying drug targets
from mouse knock-out phenotypes
@micheldumontier::Biostats:19-02-1543
drug
gene
phenotypes effects
human gene
non-functional
gene model
ortholog
similar
inhibits
Main idea: if a drug’s phenotypes matches the phenotypes of a
null model, this suggests that the drug is an inhibitor of the gene
44. Terminological Interoperability
(we must compare apples with apples)
Mouse
Phenotypes
Drug effects
(mappings from UMLS to DO, NBO, MP)
Mammalian
Phenotype
OntologyPhenomeNet
PhenomeDrug
@micheldumontier::Biostats:19-02-15
poor
coordination
decreased gut
peristalsis
axon
degeneration
decreased
stride length
erotypic
ehavior
Abnormal
EEG
failure to find
food
Unstable
posture
Constipation
Neuronal loss in
Substantia Nigra
Shuffling gait
Resting tremors
REM disorder
Hyposmia
poor rotarod
performance
decreased gut
peristalsis
axon
degeneration
decreased
stride length
sterotypic
behavior
abnormal
EEG
failure to find
food
abnormal
coordination
abnormal
digestive
physiology
CNS neuron
degeneration
abnormal
locomotion
abnormal
motor function
sleep
disturbance
abnormal
olfaction
45. Semantic Similarity
@micheldumontier::Biostats:19-02-1545
Given a drug effect profile D and a mouse model M, we
compute the semantic similarity as an information weighted
Jaccard metric.
The similarity measure used is non-symmetrical and
determines the amount of information about a drug effect
profile D that is covered by a set of mouse model
phenotypes M.
46. Loss of function models predict
targets of inhibitor drugs
• 14,682 drugs; 7,255 mouse genotypes
• Validation against known and predicted inhibitor-target pairs
– 0.76 ROC AUC for human targets (DrugBank)
– 0.81 ROC AUC for mouse targets (STITCH)
• diclofenac (STITCH:000003032)
– NSAID used to treat pain, osteoarthritis and rheumatoid arthritis
– Drug effects include liver inflammation (hepatitis), swelling of liver
(hepatomegaly), redness of skin (erythema)
– 49% explained by PPARg knockout
• peroxisome proliferator activated receptor gamma (PPARg) regulates metabolism,
proliferation, inflammation and differentiation,
• Diclofenac is a known inhibitor
– 46% explained by COX-2 knockout
• Diclofenac is a known inhibitor
@micheldumontier::Biostats:19-02-15
Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M. Mouse model phenotypes provide
information about human drug targets. Bioinformatics. 2014 Mar 1;30(5):719-25
47. Computational Drug Repurposing
• Similarity
– Guilt by association
– If drug i is similar to drug j, and drug i treats
disease x, then drug j may treat disease x
• Complementarity
– if the signature of drug i complements/counters
the signature of disease x, then drug i may treat
disease x
@micheldumontier::Biostats:19-02-1547
48. PhenomeDrug:
phenotypic complementarity
• Extends the idea to match opposing drug-
disease phenotypes
– Drugs that induce hypotension may be useful in
treating hypertension
• Problem: We don’t have any information
about phenotypic complementarity
– We generated over 300 antonym pairs for the
Human Phenotype Ontology
– Developed a measure to compute phenotypic
complementarity
@micheldumontier::Biostats:19-02-1548
50. Preliminary Results
• Suggest that for some
well annotated diseases,
we recapitulate top drug
candidates
• Quality of drug
annotation is an issue
– Some drugs have
insufficient annotations to
find “good” matches
• Full assessment underway
• Pulmonary Arterial
Hypertension
@micheldumontier::Biostats:19-02-1550
51. Summary
• Ontologies provide the structure and semantics
by which phenotypes can be accurately
represented and computed with
• Measures of semantic similarity in combination
with terminological integration enable a broad
diversity of ontology-based analyses, including
– Diagnosis of rare diseases
– Identifying human drug targets
– Drug repositioning
@micheldumontier::Biostats:19-02-1551
52. Acknowledgements
Dumontier Lab
• Tanya Hiebert
• Joachim Baran
PhenomeDrug
• Robert Hoehndorf
• George Gkoutos
Monarch Initiative
• Melissa Haendel
• Peter Robinson
• Chris Mungall
• the Monarch Team
@micheldumontier::Biostats:19-02-1552