SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Use of semantic phenotyping to
aid disease diagnosis
Melissa Haendel
July 10th, 2014
Outline
 Semantic Diagnosis of known diseases
 Semantic similarity across species
 Combining Exome analysis with cross-
species semantic phenotyping
 How much phenotyping is enough?
The undiagnosed patient
 Known disorders not recognized during
prior evaluations?
 Atypical presentation of known
disorders?
 Combinations of several disorders?
 Novel, unreported disorder?
OMIM Query # Records
“large bone” 785
“enlarged bone” 156
“big bone” 16
“huge bones” 4
“massive bones” 28
“hyperplastic bones” 12
“hyperplastic bone” 40
“bone hyperplasia” 134
“increased bone growth” 612
Searching for phenotypes using
text alone is insufficient
The Challenge: Interpretation of
Disease Candidates
?
 What’s in the box?
 How are
candidates
identified?
 How do they
compare?
Prioritized
Candidates, Models,
functional validation
M1
M2
M3
M4
...
Phenotypes
P1
P2
P3
…
Genotype info
G1
G2
G3
G4
…
Pathogenicity, frequency,
protein interactions, gene
expression, gene
networks, epigenomics,
metabolomics….
What is an ontology?
A set of logically defined, inter-related terms
used to annotate data
Use of common or logically related terms across
databases enables integration
Relationships between terms allow annotations to
be grouped in scientifically meaningful ways
Reasoning software enables computation of inferred
knowledge
Groups of annotations can be compared using
semantic similarity algorithms
Human Phenotype Ontology
10,158 terms used to
annotate:
• Patients
• Disorders
• Genotypes
• Genes
• Sequence variants
In human
Reduced pancreatic
beta cells
Abnormality of
pancreatic islet
cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Abnormality of exocrine
pancreas physiology
Köhler et al. The Human Phenotype Ontology project: linking molecular biology and
disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
A human phenotype example
Abnormality
of the eye
Vitreous
hemorrhage
Abnormal
eye
morphology
Abnormality of the
cardiovascular system
Abnormal
eye
physiology
Hemorrhage
of the eye
Internal
hemorrhage
Abnormality
of the globe
Abnormality of
blood circulation
➔Phenotype annotations are unevenly
distributed across different anatomical systems
Survey of Annotations in Disease Corpus
7,401 diseases
99,045 annotations
exome analysis
Recessive, De novo filters
Remove off-target, common variants,
and variants not in known disease
causing genes
Zemojtelet al., manuscript in presshttp://compbio.charite.de/PhenIX/
Target panel of 2,742 known
Mendelian disease genes
Compare
phenotype
profiles using
data from:
HGMD, Clinvar,
OMIM, Orphanet
PhenIX performance testing
Simulated datasets for a given disease and inheritance model created by spiking
DAG panel generated VCF file with mutations from HGMD
PhenIX helped diagnose 11/38 patients
global developmental delay (HP:0001263)
delayed speech and language development (HP:0000750)
motor delay (HP:0001270)
proportionate short stature (HP:0003508)
microcephaly (HP:0000252)
feeding difficulties (HP:0011968)
congenital megaloureter(HP:0008676)
cone-shaped epiphysis of the phalanges of the hand (HP:0010230)
sacral dimple (HP:0000960)
hyperpigmentated/hypopigmentated macules (HP:0007441)
hypertelorism (HP:0000316)
abnormality of the midface (HP:0000309)
flat nose (HP:0000457)
thick lower lip vermilion (HP:0000179)
thick upper lip vermilion (HP:0000215)
full cheeks (HP:0000293)
short neck (HP:0000470)
What to do when we can’t
diagnose with a known
disease?
Outline
 Semantic Diagnosis of known diseases
 Semantic similarity across species
 Combining Exome analysis with cross-
species semantic phenotyping
 How much phenotyping is enough?
B6.Cg-Alms1foz/fox/J
increased weight,
adipose tissue volume,
glucose homeostasis altered
ALSM1(NM_015120.4)
[c.10775delC] + [-]
GENOTYPE
PHENOTYPE
obesity,
diabetes mellitus,
insulin resistance
increased food intake,
hyperglycemia,
insulin resistance
kcnj11c14/c14; insrt143/+(AB)
Models recapitulate various
phenotypic aspects of
disease
?
How much phenotype data?
• Human genes have poor phenotype coverage
GWAS
+
ClinVar
+
OMIM
How much phenotype data?
• Human genes have poor phenotype coverage
• What else can we leverage?
GWAS
+
ClinVar
+
OMIM
How much phenotype data?
• Human genes have poor phenotype coverage
• What else can we leverage? …animal models
Orthology via PANTHER v9
How much phenotype data?
• Combined, human and model phenotypes can be linked to
>75% human genes.
Orthology via PANTHER v9
Monarch phenotype data
Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD;
Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions;
Drugbank; AutDB; Allen Brain …157 sources to date
Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse
Species Data source Genes Genotypes Variants Phenotype
annotations
Diseases
mouse MGI 13,433 59,087 34,895 271,621
fish ZFIN 7,612 25,588 17,244 81,406
fly Flybase 27,951 91,096 108,348 267,900
worm Wormbase 23,379 15,796 10,944 543,874
human HPOA 112,602 7,401
human OMIM 2,970 4,437 3,651
human ClinVar 3,215 100,523 445,241 4,056
human KEGG 2,509 3,927 1,159
human ORPHANET 3,113 5,690 3,064
human CTD 7,414 23,320 4,912
Survey of Annotations Disease/Model Corpus
Data from MGI, ZFIN, & HPO, reasoned over with cross-species phenotype ontology
https://code.google.com/p/phenotype-ontologies/
➔Models have a different phenotype distribution
Multiple ways to compare disease
to models
 Asserted models
 Inferred by orthology
 Inferred by gene enrichment
 Inferred by phenotypic similarity
Models based on phenotypic
similarity
Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009).
Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol,
7(11). doi:10.1371/journal.pbio.1000247
Problem: Clinical and model
phenotypes are described differently
lung
lung
lobular organ
parenchymatous
organ
solid organ
pleural sac
thoracic
cavity organ
thoracic
cavity
abnormal lung
morphology
abnormal respiratory
system morphology
Mammalian Phenotype
Mouse Anatomy
FMA
abnormal pulmonary
acinus morphology
abnormal pulmonary
alveolus morphology
lung
alveolus
organ system
respiratory
system
Lower
respiratory
tract
alveolar sac
pulmonary
acinus
organ system
respiratory
system
Human development
lung
lung bud
respiratory
primordium
pharyngeal region
Another Problem: Data silos
develops_from
part_of
is_a (SubClassOf)
surrounded_by
Solution: bridging semantics
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative
multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
anatomical
structure
endoderm of
forgut
lung bud
lung
respiration organ
organ
foregut
alveolus
alveolus of lung
organ part
FMA:lung
MA:lung
endoderm
GO: respiratory
gaseous exchange
MA:lung
alveolus
FMA:
pulmonary
alveolus
is_a (taxon equivalent)
develops_from
part_of
is_a (SubClassOf)
capable_of
NCBITaxon: Mammalia
EHDAA:
lung bud
only_in_taxon
pulmonary acinus
alveolar sac
lung primordium
swim bladder
respiratory
primordium
NCBITaxon:
Actinopterygii
Haendel, M. A. et al. (2014). Unification of multi-species vertebrate anatomy ontologies for comparative
biology in Uberon. Journal of Biomedical Semantics 2014, 5:21. doi:10.1186/2041-1480-5-21
Modular phenotype description
Entity (Anatomy, Spatial, Gene Ontology)
BSPO: anterior region part_of ZFA:head
ZFA:heart
ZFA:ventral mandibular arch
GO:swim bladder inflation
Quality (PATO)
Small size
Edematous
Thick
Arrested
Mammalian Phenotype Ontology
Smith et al. (2005). The Mammalian Phenotype Ontology as a
tool for annotating, analyzing and comparing phenotypic
information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7
10,097 terms used to
annotate and query:
• Genotypes
• Alleles
• Genes
In mice
abnormal
pancreatic
beta cell
mass
abnormal
pancreatic
beta cell
morphology
abnormal
pancreatic islet
morphology
abnormal
endocrine
pancreas
morphology
abnormal
pancreatic
beta cell
differentiation
abnormal
pancreatic
alpha cell
morphology
abnormal
pancreatic
alpha cell
differentiation
abnormal
pancreatic
alpha cell
number
Phenotype representation requires
more than “phenotype ontologies”
glucose
metabolism
(GO:0006006)
Gene/protein
function data
glucose
(CHEBI:172
34)
Metabolomics,
toxicogenomics
data
Disease &
phenotype
data
type II
diabetes
mellitus
(DOID:9352)
pyruvate
(CHEBI:153
61)
Disease Gene Ontology Chemical
pancreatic
beta cell
(CL:0000169)
transcriptomic
data
Cell
Uberpheno – building a cross-
species semantic framework
Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with
gene annotations for biomedical research F1000Research 2014, 2:30
Uberpheno construction
Uberpheno construction
Uberpheno construction
Uberpheno construction
OWLsim: Phenotype similarity
across patients or organisms
Unstable
posture
Constipation
Neuronal loss in
Substantia Nigra
Shuffling gait
Resting tremors
REM disorder
Hyposmia
poor rotarod
performance
decreased gut
peristalsis
axon
degeneration
decreased
stride length
sterotypic
behavior
abnormal
EEG
failure to find
food
abnormal
coordination
abnormal
digestive
physiology
CNS neuron
degeneration
abnormal
locomotion
abnormal
motor function
sleep
disturbance
abnormal
olfaction
https://code.google.com/p/owltools/wiki/OwlSim
Visualizing phenotypic similarity
➔Each model recapitulates some of the disease
phenotypes
Holoprosencephaly I (unknown gene, mapped to 21q22.3)
compared to most similar mouse models
Models of disease based on
phenotypic similarity
Holoprosencephaly I (unknown gene, mapped to 21q22.3)
compared to most similar mouse models
➔The ontologies enable comparison across species
Outline
 Semantic Diagnosis of known diseases
 Semantic similarity across species
 Combining Exome analysis with cross-
species semantic phenotyping
 How much phenotyping is enough?
https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2
Exomiser results for the
Undiagnosed Disease Program
 11 previously diagnosed families
Exomiser 2.0 identified the causative variants
with a rank of at least 7/408 potential variants
 23 families without identified disorders
We have now prioritized variants in STIM1,
ATP13A2, PANK2, and CSF1R in 5 different
families (2 STIM1 families)
Exomiser performance on
solved UDP cases
0
1
2
3
4
5
6
7
8
9
10
11
Exo Variant Exo Pheno Exo Exo no Mendelian Exo Novel
top10
top 5
top candidate
UDP_2731 candidates
Chromosome Position Reference Allele Variant Allele GENE Phenotype score Variant Score Exomiser Score
chrX 19554576T C SH3KBP1 0.5051473 0.995576 0.7503617
chr2 179658310T C TTN 0.64627105 0.79311335 0.71969223
chr2 179632598C T TTN 0.64627105 0.79311335 0.71969223
chr2 179567340G A TTN 0.64627105 0.79311335 0.71969223
chr2 179553542G T TTN 0.64627105 0.79311335 0.71969223
chr2 179549131C T TTN 0.64627105 0.79311335 0.71969223
chr18 67836115G T RTTN 0.7629328 0.25979215 0.51136243
chr18 67721492G C RTTN 0.7629328 0.25979215 0.51136243
chr18 67673764T C RTTN 0.7629328 0.25979215 0.51136243
chrX 140993905-
GCTCCTTCTCCTCCACTTTATTGAG
TATTTTCCAGAGTTCCCCTGAGAG
AAGTCAGAGAACTTCTGAGGGTTT
TGCACAGTCTCCTCTCCAGATTCCT
GTGAGCT MAGEC1 0.5416666 0.85 0.6958333
chr6 30858858G A DDR1 0.37619072 1 0.68809533
chr3 129308149
AGCCTCCCACCCCCACCCCCT
CCCCACATCCCCAACCATACC
TACCTTGAGA - PLXND1 0.34432834 0.95 0.64716417
chr5 37245866G A C5orf42 0.7855199 0.5 0.6427599
chr5 37169169T C C5orf42 0.7855199 0.5 0.6427599
chr6 42946264G A PEX6 0.7187602 0.5 0.6093801
chr6 42931861G A PEX6 0.7187602 0.5 0.6093801
chrX 53113897G C TSPYL2 0.59999996 0.4906897 0.5453448
chr13 75911097T C TBC1D4 0.23643239 0.7895149 0.51297367
chr13 75900510G A TBC1D4 0.23643239 0.7895149 0.51297367
chr13 75861174- A TBC1D4 0.23643239 0.7895149 0.51297367
chr18 67836115G T RTTN 0.7629328 0.25979215 0.51136243
chr18 67721492G C RTTN 0.7629328 0.25979215 0.51136243
chr18 67673764T C RTTN 0.7629328 0.25979215 0.51136243
UDP_2731
Behavioural/
Psychiatric
Abnormality
Thyroid
stimulating
hormone excess
Gait apraxia
Spasticity
increased
exploration in new
environment
increased
dopamine level
hyperactivity
hyperactivity
Behavioral
abnormality
Abnormality of
the endocrine
system
abnormal
locomotor
behavior
Abnormal
voluntary
movement
Patient
phenotypes Sh3kbp1 tm1Ivdi -/-
What if there aren’t any similar
diseases or models?
YARS
MARS
IARSIL41L
AARSIARS2
Abnormal
stereopsis
Choreoathetosis
Microcephaly
Akinesia
Visual impairment
Myoclonus
Microcephaly
Myoclonus
abnormal visual
perception
Involuntary
movements
Microcephaly
musculoskeletal
movement
phenotype
Patient
phenotypes
Combined Oxidative
Phosphorylation
Deficiency 14
FARS2
WARS2
?
AIMP1
UDP_1166
➔ Exomiser can utilize phenotypic similarity via the
interactome
Outline
 Semantic Diagnosis of known diseases
 Semantic similarity across species
 Combining Exome analysis with cross-
species semantic phenotyping
 How much phenotyping is enough?
How does the clinician know they’ve
provided enough phenotyping?
 How many annotations…?
 How many different categories?
 How many within each?
Method
 Create a variety of “derived” diseases that are less-
specific
 Assess the change in similarity between the derived
disease and it’s parent.
 Ask questions:
 Is the derived disease still considered similar to
the original disease?
 …or more similar to a different disease?
 Is it distinguishable beyond random?
Image credit: Viljoen and Beighton, J Med Genet. 1992
Example: Schwartz-jampel Syndrome, Type I
 Rare disease
 Caused by Hspg2 mutation, a
proteoglycan
~100 phenotype annotations
Example: Schwartz-jampel Syndrome, Type I
to test influence of a single
phenotypic category
Schwartz-jampel Syndrome derivations
to test influence of a single
phenotypic category
Schwartz-jampel Syndrome derivations
Example: Schwartz-jampel Syndrome, Type I
*
*
*
➔When averaged over all diseases, the absence of a
single phenotypic category has far less impact when
there’s more breadth in annotations
How much phenotyping is
enough?
• How many annotations…?
• How many different categories?
• How many within each?
Annotation Sufficiency Score
• Measurement of breadth and depth of an phenotype
profile
• Uses human disease, mouse and fish* gene phenotype
profiles to seed the individual phenotype scores
• Custom queries available via REST services
• http://monarchinitiative.org/page/services
*soon to add more species
Annotation Sufficiency Score
http://www.phenotips.orghttp://www.monarchinitiative.org
Conclusions
 Semantic representation of patient phenotypes
can aid disease diagnosis
 There exists a lot of phenotype data in model
organisms that is complementary to known human
data
 Ontological integration and use of cross-species
inferencing can aid prioritization of variants
 The entire cross-species corpus can be utilized to
support quality assurance processes for
phenotype data capture
NIH-UDP
William Bone
Murat Sincan
David Adams
Amanda Links
David Draper
Joie Davis
Neal Boerkoel
Cyndi Tifft
Bill Gahl
OHSU
Nicole Vasilesky
Matt Brush
Bryan Laraway
Shahim Essaid
Lawrence Berkeley
Nicole Washington
Suzanna Lewis
Chris Mungall
UCSD
Amarnath Gupta
Jeff Grethe
Anita Bandrowski
Maryann Martone
U of Pitt
Chuck Boromeo
Jeremy Espino
Becky Boes
Harry Hochheiser
Acknowledgments
Sanger
Anika Oehlrich
Jules Jacobson
Damian Smedley
Toronto
Marta Girdea
Sergiu Dumitriu
Heather Trang
Mike Brudno
JAX
Cynthia Smith
Charité
Sebastian Kohler
Sandra Doelken
Sebastian Bauer
Peter Robinson
Funding:
NIH Office of Director: 1R24OD011883
NIH-UDP: HHSN268201300036C
Candidate gene prioritization
Phenot ypic inf or mat ionGenet ic inf or mat ion
gene/ gene pr oduct Inf o
Phenotypes
collected for
individual patients
Sequences from an
individual,family,or
related group
Candidate interpretation
Human sequence reference
sequences (e.g.reference
sequence,1K genome data,
genomic location)
Community phenotype data (e.g.
literature MODS,KOMP2,OMIM,
EHRs,GWAS,ClinVar,disease
specific repositories,etc.)
Pathway
Functional (GO)
Gene
expression,
OMICS data
Protein-Protein
Interactions
Enrichment analysis
(e.g.GATACA,Galaxy)
Combined variant +
phenotype candidate
reporting(e.g.Exomizer)
BiomedicalKnowledgeIndividual'sInformation
Phenotypic comparison
methods
Variant calling
(e.g.GATK)
Pathogenicity
/Impact
calling (e.g.
VAAST,SIFT)
Orthologs
Network module analysis
Survey of Annotations in Disease Corpus*
➔Most diseases impact >1 system
PhenoViz: Integrate all human, mouse, and
fish data to understand CNVs
Desktop application
for differential
diagnostics in CNVs
 Explain manifestations of CNV diseases based on genes
contained in CNV
E.g., Supravalcular aortic stenosis in Williams syndrome can be
explained by haploinsufficiency for elastin
 Double the number of explanations using model data
Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72

Weitere ähnliche Inhalte

Was ist angesagt?

The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
mhaendel
 
Enhancing Rare Disease Literature for Researchers and Patients
Enhancing Rare Disease Literature for Researchers and PatientsEnhancing Rare Disease Literature for Researchers and Patients
Enhancing Rare Disease Literature for Researchers and Patients
Erin D. Foster
 
Novel Compound to Halt Virus replication Identified AND Spasticity Gene Findi...
Novel Compound to Halt Virus replication Identified AND Spasticity Gene Findi...Novel Compound to Halt Virus replication Identified AND Spasticity Gene Findi...
Novel Compound to Halt Virus replication Identified AND Spasticity Gene Findi...
Nora Piedad Velasquez
 

Was ist angesagt? (20)

On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integration
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introduction
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discovery
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discovery
 
Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve disease
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
 
Empowering patients by increasing accessibility to clinical terminology
Empowering patients by increasing accessibility to clinical terminologyEmpowering patients by increasing accessibility to clinical terminology
Empowering patients by increasing accessibility to clinical terminology
 
Enhancing Rare Disease Literature for Researchers and Patients
Enhancing Rare Disease Literature for Researchers and PatientsEnhancing Rare Disease Literature for Researchers and Patients
Enhancing Rare Disease Literature for Researchers and Patients
 
Toward interactive visual tools for comparing phenotype profiles
Toward interactive visual tools for comparing phenotype profilesToward interactive visual tools for comparing phenotype profiles
Toward interactive visual tools for comparing phenotype profiles
 
Mikel egana itbam_2010_ogo_system
Mikel egana itbam_2010_ogo_systemMikel egana itbam_2010_ogo_system
Mikel egana itbam_2010_ogo_system
 
Resazurin Cell Viability Assay
Resazurin Cell Viability AssayResazurin Cell Viability Assay
Resazurin Cell Viability Assay
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Novel Compound to Halt Virus replication Identified AND Spasticity Gene Findi...
Novel Compound to Halt Virus replication Identified AND Spasticity Gene Findi...Novel Compound to Halt Virus replication Identified AND Spasticity Gene Findi...
Novel Compound to Halt Virus replication Identified AND Spasticity Gene Findi...
 

Ähnlich wie Use of semantic phenotyping to aid disease diagnosis

Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14
mhaendel
 
Integrative and translational analysis of the phenome
Integrative and translational analysis of the phenomeIntegrative and translational analysis of the phenome
Integrative and translational analysis of the phenome
Robert Hoehndorf
 
Heimler Syndrome Paper
Heimler Syndrome PaperHeimler Syndrome Paper
Heimler Syndrome Paper
Nada Alsheqaih
 
Fundamentals of Analysis of Exomes
Fundamentals of Analysis of ExomesFundamentals of Analysis of Exomes
Fundamentals of Analysis of Exomes
daforerog
 
How Can Ngs Forward Research Essay
How Can Ngs Forward Research EssayHow Can Ngs Forward Research Essay
How Can Ngs Forward Research Essay
Stefanie Yang
 
Taking A Look At Influenza A Virus
Taking A Look At Influenza A VirusTaking A Look At Influenza A Virus
Taking A Look At Influenza A Virus
Nicole Gomez
 
HIPBI-RD: Harmonising phenomics information for a better interoperability in ...
HIPBI-RD: Harmonising phenomics information for a better interoperability in ...HIPBI-RD: Harmonising phenomics information for a better interoperability in ...
HIPBI-RD: Harmonising phenomics information for a better interoperability in ...
Human Variome Project
 

Ähnlich wie Use of semantic phenotyping to aid disease diagnosis (20)

GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
 
Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14
 
Supporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi RehmSupporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi Rehm
 
From baleen to cleft palate: an ontological exploration of evolution and dis...
From baleen to cleft palate: an ontological exploration of evolution and dis...From baleen to cleft palate: an ontological exploration of evolution and dis...
From baleen to cleft palate: an ontological exploration of evolution and dis...
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 
Integrative and translational analysis of the phenome
Integrative and translational analysis of the phenomeIntegrative and translational analysis of the phenome
Integrative and translational analysis of the phenome
 
Knockout mice
Knockout miceKnockout mice
Knockout mice
 
Heimler Syndrome Paper
Heimler Syndrome PaperHeimler Syndrome Paper
Heimler Syndrome Paper
 
Fundamentals of Analysis of Exomes
Fundamentals of Analysis of ExomesFundamentals of Analysis of Exomes
Fundamentals of Analysis of Exomes
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
 
Epididymal Histiocytic Sarcomas Identified in B6C3F1 Mouse Carcenogenicity St...
Epididymal Histiocytic Sarcomas Identified in B6C3F1 Mouse Carcenogenicity St...Epididymal Histiocytic Sarcomas Identified in B6C3F1 Mouse Carcenogenicity St...
Epididymal Histiocytic Sarcomas Identified in B6C3F1 Mouse Carcenogenicity St...
 
Heraud Et Al. S C R
Heraud Et Al.  S C RHeraud Et Al.  S C R
Heraud Et Al. S C R
 
Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015
 
How Can Ngs Forward Research Essay
How Can Ngs Forward Research EssayHow Can Ngs Forward Research Essay
How Can Ngs Forward Research Essay
 
Ibo Galindo - 'Neuropatías periféricas hereditarias'
Ibo Galindo - 'Neuropatías periféricas hereditarias'Ibo Galindo - 'Neuropatías periféricas hereditarias'
Ibo Galindo - 'Neuropatías periféricas hereditarias'
 
Taking A Look At Influenza A Virus
Taking A Look At Influenza A VirusTaking A Look At Influenza A Virus
Taking A Look At Influenza A Virus
 
HIPBI-RD: Harmonising phenomics information for a better interoperability in ...
HIPBI-RD: Harmonising phenomics information for a better interoperability in ...HIPBI-RD: Harmonising phenomics information for a better interoperability in ...
HIPBI-RD: Harmonising phenomics information for a better interoperability in ...
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 

Mehr von mhaendel

Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
mhaendel
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
mhaendel
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
mhaendel
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
mhaendel
 

Mehr von mhaendel (11)

The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we do
 
Force11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapeForce11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscape
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standard
 
On the nature of Credit
On the nature of CreditOn the nature of Credit
On the nature of Credit
 

Kürzlich hochgeladen

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 

Kürzlich hochgeladen (20)

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 

Use of semantic phenotyping to aid disease diagnosis

  • 1. Use of semantic phenotyping to aid disease diagnosis Melissa Haendel July 10th, 2014
  • 2. Outline  Semantic Diagnosis of known diseases  Semantic similarity across species  Combining Exome analysis with cross- species semantic phenotyping  How much phenotyping is enough?
  • 3. The undiagnosed patient  Known disorders not recognized during prior evaluations?  Atypical presentation of known disorders?  Combinations of several disorders?  Novel, unreported disorder?
  • 4. OMIM Query # Records “large bone” 785 “enlarged bone” 156 “big bone” 16 “huge bones” 4 “massive bones” 28 “hyperplastic bones” 12 “hyperplastic bone” 40 “bone hyperplasia” 134 “increased bone growth” 612 Searching for phenotypes using text alone is insufficient
  • 5. The Challenge: Interpretation of Disease Candidates ?  What’s in the box?  How are candidates identified?  How do they compare? Prioritized Candidates, Models, functional validation M1 M2 M3 M4 ... Phenotypes P1 P2 P3 … Genotype info G1 G2 G3 G4 … Pathogenicity, frequency, protein interactions, gene expression, gene networks, epigenomics, metabolomics….
  • 6. What is an ontology? A set of logically defined, inter-related terms used to annotate data Use of common or logically related terms across databases enables integration Relationships between terms allow annotations to be grouped in scientifically meaningful ways Reasoning software enables computation of inferred knowledge Groups of annotations can be compared using semantic similarity algorithms
  • 7. Human Phenotype Ontology 10,158 terms used to annotate: • Patients • Disorders • Genotypes • Genes • Sequence variants In human Reduced pancreatic beta cells Abnormality of pancreatic islet cells Abnormality of endocrine pancreas physiology Pancreatic islet cell adenoma Pancreatic islet cell adenoma Insulinoma Multiple pancreatic beta-cell adenomas Abnormality of exocrine pancreas physiology Köhler et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
  • 8. A human phenotype example Abnormality of the eye Vitreous hemorrhage Abnormal eye morphology Abnormality of the cardiovascular system Abnormal eye physiology Hemorrhage of the eye Internal hemorrhage Abnormality of the globe Abnormality of blood circulation
  • 9. ➔Phenotype annotations are unevenly distributed across different anatomical systems Survey of Annotations in Disease Corpus 7,401 diseases 99,045 annotations
  • 10. exome analysis Recessive, De novo filters Remove off-target, common variants, and variants not in known disease causing genes Zemojtelet al., manuscript in presshttp://compbio.charite.de/PhenIX/ Target panel of 2,742 known Mendelian disease genes Compare phenotype profiles using data from: HGMD, Clinvar, OMIM, Orphanet
  • 11. PhenIX performance testing Simulated datasets for a given disease and inheritance model created by spiking DAG panel generated VCF file with mutations from HGMD
  • 12. PhenIX helped diagnose 11/38 patients global developmental delay (HP:0001263) delayed speech and language development (HP:0000750) motor delay (HP:0001270) proportionate short stature (HP:0003508) microcephaly (HP:0000252) feeding difficulties (HP:0011968) congenital megaloureter(HP:0008676) cone-shaped epiphysis of the phalanges of the hand (HP:0010230) sacral dimple (HP:0000960) hyperpigmentated/hypopigmentated macules (HP:0007441) hypertelorism (HP:0000316) abnormality of the midface (HP:0000309) flat nose (HP:0000457) thick lower lip vermilion (HP:0000179) thick upper lip vermilion (HP:0000215) full cheeks (HP:0000293) short neck (HP:0000470)
  • 13. What to do when we can’t diagnose with a known disease?
  • 14. Outline  Semantic Diagnosis of known diseases  Semantic similarity across species  Combining Exome analysis with cross- species semantic phenotyping  How much phenotyping is enough?
  • 15. B6.Cg-Alms1foz/fox/J increased weight, adipose tissue volume, glucose homeostasis altered ALSM1(NM_015120.4) [c.10775delC] + [-] GENOTYPE PHENOTYPE obesity, diabetes mellitus, insulin resistance increased food intake, hyperglycemia, insulin resistance kcnj11c14/c14; insrt143/+(AB) Models recapitulate various phenotypic aspects of disease ?
  • 16. How much phenotype data? • Human genes have poor phenotype coverage GWAS + ClinVar + OMIM
  • 17. How much phenotype data? • Human genes have poor phenotype coverage • What else can we leverage? GWAS + ClinVar + OMIM
  • 18. How much phenotype data? • Human genes have poor phenotype coverage • What else can we leverage? …animal models Orthology via PANTHER v9
  • 19. How much phenotype data? • Combined, human and model phenotypes can be linked to >75% human genes. Orthology via PANTHER v9
  • 20. Monarch phenotype data Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources to date Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse Species Data source Genes Genotypes Variants Phenotype annotations Diseases mouse MGI 13,433 59,087 34,895 271,621 fish ZFIN 7,612 25,588 17,244 81,406 fly Flybase 27,951 91,096 108,348 267,900 worm Wormbase 23,379 15,796 10,944 543,874 human HPOA 112,602 7,401 human OMIM 2,970 4,437 3,651 human ClinVar 3,215 100,523 445,241 4,056 human KEGG 2,509 3,927 1,159 human ORPHANET 3,113 5,690 3,064 human CTD 7,414 23,320 4,912
  • 21. Survey of Annotations Disease/Model Corpus Data from MGI, ZFIN, & HPO, reasoned over with cross-species phenotype ontology https://code.google.com/p/phenotype-ontologies/ ➔Models have a different phenotype distribution
  • 22. Multiple ways to compare disease to models  Asserted models  Inferred by orthology  Inferred by gene enrichment  Inferred by phenotypic similarity
  • 23. Models based on phenotypic similarity Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247
  • 24. Problem: Clinical and model phenotypes are described differently
  • 25. lung lung lobular organ parenchymatous organ solid organ pleural sac thoracic cavity organ thoracic cavity abnormal lung morphology abnormal respiratory system morphology Mammalian Phenotype Mouse Anatomy FMA abnormal pulmonary acinus morphology abnormal pulmonary alveolus morphology lung alveolus organ system respiratory system Lower respiratory tract alveolar sac pulmonary acinus organ system respiratory system Human development lung lung bud respiratory primordium pharyngeal region Another Problem: Data silos develops_from part_of is_a (SubClassOf) surrounded_by
  • 26. Solution: bridging semantics Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 anatomical structure endoderm of forgut lung bud lung respiration organ organ foregut alveolus alveolus of lung organ part FMA:lung MA:lung endoderm GO: respiratory gaseous exchange MA:lung alveolus FMA: pulmonary alveolus is_a (taxon equivalent) develops_from part_of is_a (SubClassOf) capable_of NCBITaxon: Mammalia EHDAA: lung bud only_in_taxon pulmonary acinus alveolar sac lung primordium swim bladder respiratory primordium NCBITaxon: Actinopterygii Haendel, M. A. et al. (2014). Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon. Journal of Biomedical Semantics 2014, 5:21. doi:10.1186/2041-1480-5-21
  • 27. Modular phenotype description Entity (Anatomy, Spatial, Gene Ontology) BSPO: anterior region part_of ZFA:head ZFA:heart ZFA:ventral mandibular arch GO:swim bladder inflation Quality (PATO) Small size Edematous Thick Arrested
  • 28. Mammalian Phenotype Ontology Smith et al. (2005). The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7 10,097 terms used to annotate and query: • Genotypes • Alleles • Genes In mice abnormal pancreatic beta cell mass abnormal pancreatic beta cell morphology abnormal pancreatic islet morphology abnormal endocrine pancreas morphology abnormal pancreatic beta cell differentiation abnormal pancreatic alpha cell morphology abnormal pancreatic alpha cell differentiation abnormal pancreatic alpha cell number
  • 29. Phenotype representation requires more than “phenotype ontologies” glucose metabolism (GO:0006006) Gene/protein function data glucose (CHEBI:172 34) Metabolomics, toxicogenomics data Disease & phenotype data type II diabetes mellitus (DOID:9352) pyruvate (CHEBI:153 61) Disease Gene Ontology Chemical pancreatic beta cell (CL:0000169) transcriptomic data Cell
  • 30. Uberpheno – building a cross- species semantic framework Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research F1000Research 2014, 2:30
  • 35. OWLsim: Phenotype similarity across patients or organisms Unstable posture Constipation Neuronal loss in Substantia Nigra Shuffling gait Resting tremors REM disorder Hyposmia poor rotarod performance decreased gut peristalsis axon degeneration decreased stride length sterotypic behavior abnormal EEG failure to find food abnormal coordination abnormal digestive physiology CNS neuron degeneration abnormal locomotion abnormal motor function sleep disturbance abnormal olfaction https://code.google.com/p/owltools/wiki/OwlSim
  • 36. Visualizing phenotypic similarity ➔Each model recapitulates some of the disease phenotypes Holoprosencephaly I (unknown gene, mapped to 21q22.3) compared to most similar mouse models
  • 37. Models of disease based on phenotypic similarity Holoprosencephaly I (unknown gene, mapped to 21q22.3) compared to most similar mouse models ➔The ontologies enable comparison across species
  • 38. Outline  Semantic Diagnosis of known diseases  Semantic similarity across species  Combining Exome analysis with cross- species semantic phenotyping  How much phenotyping is enough?
  • 40. Exomiser results for the Undiagnosed Disease Program  11 previously diagnosed families Exomiser 2.0 identified the causative variants with a rank of at least 7/408 potential variants  23 families without identified disorders We have now prioritized variants in STIM1, ATP13A2, PANK2, and CSF1R in 5 different families (2 STIM1 families)
  • 41. Exomiser performance on solved UDP cases 0 1 2 3 4 5 6 7 8 9 10 11 Exo Variant Exo Pheno Exo Exo no Mendelian Exo Novel top10 top 5 top candidate
  • 42. UDP_2731 candidates Chromosome Position Reference Allele Variant Allele GENE Phenotype score Variant Score Exomiser Score chrX 19554576T C SH3KBP1 0.5051473 0.995576 0.7503617 chr2 179658310T C TTN 0.64627105 0.79311335 0.71969223 chr2 179632598C T TTN 0.64627105 0.79311335 0.71969223 chr2 179567340G A TTN 0.64627105 0.79311335 0.71969223 chr2 179553542G T TTN 0.64627105 0.79311335 0.71969223 chr2 179549131C T TTN 0.64627105 0.79311335 0.71969223 chr18 67836115G T RTTN 0.7629328 0.25979215 0.51136243 chr18 67721492G C RTTN 0.7629328 0.25979215 0.51136243 chr18 67673764T C RTTN 0.7629328 0.25979215 0.51136243 chrX 140993905- GCTCCTTCTCCTCCACTTTATTGAG TATTTTCCAGAGTTCCCCTGAGAG AAGTCAGAGAACTTCTGAGGGTTT TGCACAGTCTCCTCTCCAGATTCCT GTGAGCT MAGEC1 0.5416666 0.85 0.6958333 chr6 30858858G A DDR1 0.37619072 1 0.68809533 chr3 129308149 AGCCTCCCACCCCCACCCCCT CCCCACATCCCCAACCATACC TACCTTGAGA - PLXND1 0.34432834 0.95 0.64716417 chr5 37245866G A C5orf42 0.7855199 0.5 0.6427599 chr5 37169169T C C5orf42 0.7855199 0.5 0.6427599 chr6 42946264G A PEX6 0.7187602 0.5 0.6093801 chr6 42931861G A PEX6 0.7187602 0.5 0.6093801 chrX 53113897G C TSPYL2 0.59999996 0.4906897 0.5453448 chr13 75911097T C TBC1D4 0.23643239 0.7895149 0.51297367 chr13 75900510G A TBC1D4 0.23643239 0.7895149 0.51297367 chr13 75861174- A TBC1D4 0.23643239 0.7895149 0.51297367 chr18 67836115G T RTTN 0.7629328 0.25979215 0.51136243 chr18 67721492G C RTTN 0.7629328 0.25979215 0.51136243 chr18 67673764T C RTTN 0.7629328 0.25979215 0.51136243
  • 43. UDP_2731 Behavioural/ Psychiatric Abnormality Thyroid stimulating hormone excess Gait apraxia Spasticity increased exploration in new environment increased dopamine level hyperactivity hyperactivity Behavioral abnormality Abnormality of the endocrine system abnormal locomotor behavior Abnormal voluntary movement Patient phenotypes Sh3kbp1 tm1Ivdi -/-
  • 44. What if there aren’t any similar diseases or models? YARS MARS IARSIL41L AARSIARS2 Abnormal stereopsis Choreoathetosis Microcephaly Akinesia Visual impairment Myoclonus Microcephaly Myoclonus abnormal visual perception Involuntary movements Microcephaly musculoskeletal movement phenotype Patient phenotypes Combined Oxidative Phosphorylation Deficiency 14 FARS2 WARS2 ? AIMP1 UDP_1166 ➔ Exomiser can utilize phenotypic similarity via the interactome
  • 45. Outline  Semantic Diagnosis of known diseases  Semantic similarity across species  Combining Exome analysis with cross- species semantic phenotyping  How much phenotyping is enough?
  • 46. How does the clinician know they’ve provided enough phenotyping?  How many annotations…?  How many different categories?  How many within each?
  • 47. Method  Create a variety of “derived” diseases that are less- specific  Assess the change in similarity between the derived disease and it’s parent.  Ask questions:  Is the derived disease still considered similar to the original disease?  …or more similar to a different disease?  Is it distinguishable beyond random?
  • 48. Image credit: Viljoen and Beighton, J Med Genet. 1992 Example: Schwartz-jampel Syndrome, Type I  Rare disease  Caused by Hspg2 mutation, a proteoglycan ~100 phenotype annotations
  • 49. Example: Schwartz-jampel Syndrome, Type I to test influence of a single phenotypic category
  • 50. Schwartz-jampel Syndrome derivations to test influence of a single phenotypic category
  • 52. Example: Schwartz-jampel Syndrome, Type I * * * ➔When averaged over all diseases, the absence of a single phenotypic category has far less impact when there’s more breadth in annotations
  • 53. How much phenotyping is enough? • How many annotations…? • How many different categories? • How many within each?
  • 54. Annotation Sufficiency Score • Measurement of breadth and depth of an phenotype profile • Uses human disease, mouse and fish* gene phenotype profiles to seed the individual phenotype scores • Custom queries available via REST services • http://monarchinitiative.org/page/services *soon to add more species
  • 56. Conclusions  Semantic representation of patient phenotypes can aid disease diagnosis  There exists a lot of phenotype data in model organisms that is complementary to known human data  Ontological integration and use of cross-species inferencing can aid prioritization of variants  The entire cross-species corpus can be utilized to support quality assurance processes for phenotype data capture
  • 57. NIH-UDP William Bone Murat Sincan David Adams Amanda Links David Draper Joie Davis Neal Boerkoel Cyndi Tifft Bill Gahl OHSU Nicole Vasilesky Matt Brush Bryan Laraway Shahim Essaid Lawrence Berkeley Nicole Washington Suzanna Lewis Chris Mungall UCSD Amarnath Gupta Jeff Grethe Anita Bandrowski Maryann Martone U of Pitt Chuck Boromeo Jeremy Espino Becky Boes Harry Hochheiser Acknowledgments Sanger Anika Oehlrich Jules Jacobson Damian Smedley Toronto Marta Girdea Sergiu Dumitriu Heather Trang Mike Brudno JAX Cynthia Smith Charité Sebastian Kohler Sandra Doelken Sebastian Bauer Peter Robinson Funding: NIH Office of Director: 1R24OD011883 NIH-UDP: HHSN268201300036C
  • 58.
  • 59. Candidate gene prioritization Phenot ypic inf or mat ionGenet ic inf or mat ion gene/ gene pr oduct Inf o Phenotypes collected for individual patients Sequences from an individual,family,or related group Candidate interpretation Human sequence reference sequences (e.g.reference sequence,1K genome data, genomic location) Community phenotype data (e.g. literature MODS,KOMP2,OMIM, EHRs,GWAS,ClinVar,disease specific repositories,etc.) Pathway Functional (GO) Gene expression, OMICS data Protein-Protein Interactions Enrichment analysis (e.g.GATACA,Galaxy) Combined variant + phenotype candidate reporting(e.g.Exomizer) BiomedicalKnowledgeIndividual'sInformation Phenotypic comparison methods Variant calling (e.g.GATK) Pathogenicity /Impact calling (e.g. VAAST,SIFT) Orthologs Network module analysis
  • 60. Survey of Annotations in Disease Corpus* ➔Most diseases impact >1 system
  • 61. PhenoViz: Integrate all human, mouse, and fish data to understand CNVs Desktop application for differential diagnostics in CNVs  Explain manifestations of CNV diseases based on genes contained in CNV E.g., Supravalcular aortic stenosis in Williams syndrome can be explained by haploinsufficiency for elastin  Double the number of explanations using model data Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72