Introduction of Human Body & Structure of cell.pptx
Global phenotypic data sharing standards to maximize diagnostic discovery
1. Global Phenotypic Data Sharing Standards
to Maximize Diagnostic Discovery
Melissa Haendel, PhD and Sebastian Köhler, PhD
RD-Action workshop
April 26th and 27th, Brussels
2. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
3. What do we mean by phenotype?
= Phenotypic abnormality = clinical feature
Constellation/Pattern clinical features
defines a disease:
– [Disease X]... is a rare developmental disorder defined by
the combination of aplasia cutis congenita of the scalp
vertex and terminal transverse limb defects. In addition,
vascular anomalies such as cutis marmorata
telangiectatica ... are recurrently seen.
(Yes, this is a simplification)
4. Starting point: OMIM
Clinical Synopsis (CS) section
Free text phenotypic description
Very expressive
Online Mendelian Inheritance in Man database
5. (Un)Controlled Vocabularies
Not designed to be easily machine interpretable
Spelling problems, acronyms, etc.
Homonyms:
... fibrillation ...
fibrillation ≠ fibrillation
= ventricular fibrillation= muscle fibrillation
6. Why you should care
OMIM Query Number of Results
large bones 264
large bone 785
enlarged bones 87
enlarged bone 156
big bones 16
huge bones 4
massive bones 28
hyperplastic bones 12
hyperplastic bone 40
bone hyperplasia 134
increased bone growth 612
7. Motivation
HPO started in 2008
Goal: computer-interpretable clinical features!
Reliable information extraction from databases based on clinical
features
Compute similarity between diseases based on clinical features
Compute similarity between patients based on clinical features
Compute similarity between patients and diseases based on clinical
features
Interoperability with basic research to improve diagnostic discovery
Easy to use
Freely available
8. The Human Phenotype Ontology
(HPO)
Description of phenotypic abnormalities (or clinical features) in
humans
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
incoordination
abnormality of
movement
abnormality of the
central nervous
system
This is a term
CS of OMIM:0815
CS of OMIM:1234
Neurofibrillary tangles
may be present
Paired helical filaments
9. The Human Phenotype Ontology (HPO)
Synonyms merged into one term
Textual definitions for each term
id: HP:0002185
name: Neurofibrillary tangles
def: Pathological protein
aggregates formed by
hyperphosphorylation of a
microtubule-associated protein
known as tau, causing it to
aggregate in an insoluble form.
[HPO:sdoelken]
synonym: Neurofibrillary tangles
may be present EXACT []
synonym: Paired helical filaments
EXACT []
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
abnormality of
movement
abnormality of the
central nervous
system
incoordination
10. The Human Phenotype Ontology
(HPO)
Semantic relations
(’subclass of’, ‘is a’)
From top to bottom,
terms get more specific
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
abnormality of
movement
abnormality of the
central nervous
system
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a incoordination
11. Computable phenotype definitions of
disease
HPO Terms are used to annotate (describe) diseases
E.g. neurofibrillary tangles is used to annotate Alzheimer Disease:
Orphanet + Monarch:
~124,000 annotations of 7,700 rare diseases from OMIM,
Orphanet, DECIPHER
~133,000 annotations of 3,145 common diseases
Köhler et al. https://doi.org/10.1093/nar/gkw1039
OMIM:0815 OMIM:1234
Neurofibrillary tangles
may be present
Paired helical filaments
15. HPO language translations
We need your help! http://bit.ly/hpo-translations
Translation of labels, synonyms, and text definitions
Italian Spanish Russian French
German English layperson Japanese Chinese
100%11%
12%
100%
19%19%
near 100%
20%
16. Adoption of HPO
Public facing databases using HPO to
annotate patients
Tools ingesting HPO-annotated data:
Köhler et al. https://doi.org/10.1093/nar/gkw1039
17. Why HPO is a successful standard
One language shared by “all“
Synonyms “map“ to one concept (HPO term)
Contains terms that no other ontology has
Comes with disease annotations! (Not just “Yet another clinical
terminology“)
Simple, qualitative phenotyping, deviation (abnormal, abnormal
increase, abnormal decrease, ...) to ease analysis
Documented, traceable editing
Open science community project with diverse contributors
Constantly improved and extended, examples:
Layperson version for patients
Language translations
Opposite-relations between terms
18. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
19. A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with matching phenotype concepts is already good
Splenomegaly
Nasal speech
Increased spleen size Nasal voice
These are synonyms in
HPO, i.e. map to the
same term
These are synonyms in
HPO, i.e. map to the
same term
20. A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Splenomegaly Oral motor hypotonia
Ruptured spleen Decreased muscle mass
21. Similarity between two terms
Oral motor
hypotonia
Muscular
hypotonia of the
trunk
Abnormal muscle
tone
Oral motor
hypotonia
Abnormality of
calvarial
morphology
Phenotypic
abnormality
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content
22. Comparing phenotype profiles
E.g. Patient-to-Disease
comparison
Patient‘s phenotypes
more similar to Disease A
Orphamizer would rank
Disease A before Disease
Disease BPatientPatient Disease A
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content
24. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based visualization tools
Phenotype data standards for exchange
25. The genome is sequenced, but...
3,398
OMIM
Mendelian Diseases with
no known genetic basis
?
At least 120,000*
ClinVar
Variants with no known
pathogenicity
…we still don’t know very much about what it does
*This is > twice what it was
in 2016!
27. More species = more coverage
19,008
78%
14,779
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016
19,008
Even inclusion of just four species boosts
phenotypic coverage of genes by 38%
(5189%)
Combined = 89%
19,008
2,195 7,544 7,235 = 16,974
(union of coverage in any species)
9,739
51%
Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
28. Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –
https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
30. Challenge: Each database uses their own
phenotype vocabulary/ontology
ZFA
MP
DPO
WPO
HP
OMIA
VT
FYPO
APO
SNOMED
…
NCIT
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
EHR
IMPC
OMIM
…
QTLdb
31. Can we help machines understand
phenotype terms?
“Palmoplantar
hyperkeratosis”
Human phenotype
I have absolutely
no idea what
that means
32. Decomposition of complex concepts
using species neutral terms
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner,
M. (2010). Integrating phenotype ontologies across multiple species.
Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
“Palmoplantar
hyperkeratosis”
increased
Stratum corneum
layer of skin
=
Human phenotype
PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
keratinization
GO
41. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
42. Prevailing clinical genomic pipelines
leverage only a tiny fraction of the available
data
PATIENT EXOME
/ GENOME
PATIENT CLINICAL
PHENOTYPES
PUBLIC GENOMIC DATA
PUBLIC CLINICAL PHENOTYPE,
DISEASE DATA
POSSIBLE DISEASES
DIAGNOSIS & TREATMENT
PATIENT ENVIRONMENT
PUBLIC ENVIRONMENT,
DISEASE DATA
PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES,
CORRELATIONS
Under-utilized data
44. Combining G2P data for variant
prioritization
Whole exome
Remove off-target and
common variants
Variant score from allele
freq and pathogenicity
Phenotype score from phenotypic similarity
PHIVE score to give final candidates
Mendelian filters
45. Exomiser results for UDP diagnosed
patients
Inclusion of phenotype data improves variant prioritization
In 60% of first 1000 genomes at GEL, Exomiser
predicts top candidate
In 86% of cases, Exomiser predicts within top 5
46. Example case solved by Exomiser
Phenotypic
profile
Genes
Heterozygous,
missense mutation
STIM-1
N/A
Heterozygous,
missense mutation
STIM-1
N/A
Stim1Sax/Sax
Ranked STIM-1 variant maximally pathogenic
based on cross-species G2P data,
in the absence of traditional data sources
http://bit.ly/exomiser
47. Deep phenotyping and “fuzzy” matching
algorithms improve diagnostics
4.9% exomes with dual molecular diagnoses,
differentiated with deep phenotyping
48. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
49. How much phenotyping is enough?
Enlarged ears (2)Dark hair (6) Female (4)
Male (4)
Blue skin (1)
Pointy ears (1)
Hair absent on head (1)
Horns present (1)
Hair present
on head (7)
Enlarged lip (2)
Increased skin
pigmentation (3)
bit.ly/annotationsufficiency
51. Matchmaker Exchange for patients, diseases, and model
organisms to aid diagnosis and mechanistic discovery
www.monarchinitiative.org
http://bit.ly/Monarch-MME
Goal: Get clinical sites & public databases to provide standardized phenotype data
52. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
62. Journals are now requiring HPO
terms
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision
medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
Each phenopacket
can be shared via DOI
in any repository
outside paywall (eg.
Figshare, Zenodo, etc)
Each article can be
associated with a
phenopacket
63. Community “curate-athons” for of HPO
Cardiovascular curate-athon at Stanford.
@20 cardiologists (surgeons, pediatric, etc.),
four ontologists, and three clinical curators
met for two days.
Abnormal Complex
Voltage to be added to all waves
-increased, decreased, fluctuating (alternans)
Duration to be added to all waves
-increased, decreased
P wave
-notching
-axis
QRS
-fractionation
-axis (right/left/extreme)
Q wave
R wave
S wave
R’ wave
S’ wave (abnormal only)
J wave (can be normal variant)
Epsilon wave (abnormal only)
Osborne wave (abnormal only)
Terminal slur wave (can be normal variant)
Delta wave (abnormal only)
Added 100s of clinically relevant
cardiophysiology phenotypes to HPO,
new exome analysis possible
64. Summary
The Human Phenotype Ontology is a robust standard
describing phenotypic abnormalities FOR the community,
FROM the community for deep phenotyping rare disease
patients
Model organism data can fill gaps in our knowledge and
aid mechanistic exploration of disease candidates
Tools that leverage the Human Phenotype Ontology can be
used to prioritize coding and noncoding variants for WES
and WGS and CNVs
Patients can provide self-phenotyping information as
partners in the deep phenotyping process
Phenopackets is a FAIR-based GA4GH exchange standard
for facilitating distributed phenotype data sharing for
clinics, labs, patients, and journals
65. Acknowledgements
Orphanet
Ana Rath
Annie Olry
Marc Hanauer
Halima Lourghi
Lawrence Berkeley
Chris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
RENCI
Jim Balhoff
OHSU
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Dan Keith
Genomics
England/Queen Mary
Damian Smedley
Jules Jacobson
Jackson Laboratory
Peter Robinson
Leigh Carmody
With special thanks to Julie McMurry for excellent graphic design
Garvan
Tudor Groza
Craig McNamara
Hipbi / NeuroCure
Dominik Seelow
Markus Schülke-
Gerstenfeld
Charite
Dominik Seelow
Tomasz Zemojtel
One of the workshop questions was : why the HPO has been recommended as an optimal ontology for clinical (phenotypic) descriptions.
I have not been part of the process that lead to this recommendation, such that I will try to rather give my impression, why HPO has been so successfull over the last 8 years.
First, what is the content of HPO. It contains phenotypic abnormalities ... Definition in the context of HPO ... Bla bla
What data did we want to use in the beginning. This is what we had.
Problems. Well – known. Just briefly.
Why is it so important to have controlled vocabularies at all
Query today:
Search: 'large bone'
Results: 9,128 entries.
Search: 'enlarged bone'
Results: 3,912 entries.
CHV = Consumer Health Vocabulary
Translation teams at: https://github.com/Human-Phenotype-Ontology/HPO-translations/blob/master/README.md
Contact: sebastian.koehler@charite.de
Merged with next slide
You take it from here Melissa?
There is a lot we don’t know about the genome
As of March 2017, OMIM number: 3398 unknown 4,964 known
ClinVar number: 121,000 at least
with the addition that these are variants that researchers have found suspicious, due to rarity in the population or something else, contextually 160k variants in the entire genome is not much
Each organism provides unique genetic & phenotypic data that helps fill in knowledge gaps in the human genome. For example, much work has been done in chicks to understand limb development. I used to work in a fruit fly lab studying the brain, so I am particularly attached to fly data. As you can imagine, phenotypes described for flies, or other models, use very different terms than those used for humans. Later, I will discuss how Monarch is overcoming this challenge. Now I will show you an example of how using phenotype data from other organisms can improve human health.
Our approach is to try and get the machine to understand the terms so that it can assist us intelligently.
We make things digestible. Complex concepts into simpler parts. We use ontologies that are comparative by design.
Represent organism as a biological subject
Represent diseases/genotypes as collections of nodes in the graph
Interoperable with other bioinformatics resources and leverage modern semantic standards
5 root classes:
Phenotypic abnormality, Mode of Inheritance, Clinical modifier, Mortality/Ageing, Frequency
11,813 classes/terms in HPO
~124,000 annotations of 7,700 rare diseases from OMIM, Orphanet, DECIPHER
~133,000 annotations of 3,145 common diseases
OWLsim algorithm
About HPO 2: We want the vocabulary to be enable sophisticated phenotypic matching within and across species
Our team has led international ontology development efforts, including ICD112, the HPO29,30, the Gene Ontology18,31,32, and major tissue/cell ontologies used for mam- malian functional genomics20,33–37. We have extensive experience integrating data using these ontologies38,39. A fundamental challenge is to translate the vocabularies used by clinicians via EMRs and billing systems to those used in primary research data. For example, a clinician may describe a patient as having “Microcephaly” with an EMR code ICD10-Q02. A basic scientist using mice may describe this condition with MP:0003303.To translate between clinician and scientist, we provide services that map equivalent concepts15,40. Finally, TransMed will generate dynamic ontologies by combining existing classifications with data in the system, e.g. to gener- ate disease nosologies based on pathway membership, orthology, and phenotypic similarity.
///
Nosology: We will prototype dynamic ontology generation based on combining our existing knowledge sources. We will apply a mixture of methods. This includes our own k-BOOM Bayesian algorithm that weighs different knowledge sources and ontologies. We will also apply our data-driven techniques for generating nosologies based on molecular mechanistic information ingested into our knowledge graph. For low probability associations and equivalencies that may have high value, we will perform some curation to reconcile these.
https://github.com/monarch-initiative/monarch-disease-ontology/issues/90
Note the two subgraphs; little overlap in the upper areas
This was the novel case we solved. The UDP patient had a number of signs and symptoms including various platelet abnormalities. The same heterozygous, missense mutation was seen in 2 patients and ranked top by Exomiser. It had never been seen in any of the SNP databases and was predicted maximally pathogenic. Finally a mouse curated by MGI involving a heterozygous, missense point mutation introduced by chemical mutagenesis exhibited strikingly similar platelet abnormalities.
Example showing how adding fuzzy phenotype matching improves disease diagnosis above using sequence based methodologies alone.
Knowing what the normal distribution and clustering of phenotypes is helps us know that blue skin is rare and can reliably distinguish between phenotype profiles. Likewise to know that if the first phenotype entered is enlarged lip, the next one to ask for would be enlarged ears. The combination of 3 non-unique phenotypes offers a perfect match.
This is a lot of text and not easy to see for the audience.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
G-P or D (disease)
causes
contributes to
is risk factor for
protects against
correlates with
is marker for
modulates
involved in
increases susceptibility to
G-G (kind of)
regulates
negatively regulates (inhibits)
positively regulates (activates)
directly regulates
interacts with
co-localizes with
co-expressed with
P/D - P/D
part of
results in
co-occurs with
correlates with
hallmark of (P->D)
E-P
contributes to (E->P)
influences (E->P)
exacerbates (E->P)
manifest in (P->E)
G-E (kind of)
expressed in
expressed during
contains
inactivated by
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
Needs adjusting yet
Fully translational – from bench to bedside – group of stakeholders, contributors, and partners