Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

The Application of the Human Phenotype Ontology

Wird geladen in …3

Hier ansehen

1 von 76 Anzeige

The Application of the Human Phenotype Ontology

Herunterladen, um offline zu lesen

Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.

Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.

Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.

Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.


Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (14)


Ähnlich wie The Application of the Human Phenotype Ontology (20)


The Application of the Human Phenotype Ontology

  2. 2. OUTLINE  Why phenotyping is hard  About Ontologies  Diagnosing known diseases  Getting the phenotype data  How much phenotyping is enough?  Model organism data for undiagnosed diseases
  3. 3. PHENOTYPIC BEGINNINGS http://rundangerously.blogspot.com/2010/06/inconvenient-summer- head-cold.html http://www.vetnext.com/images http://commons.wikimedia.org/wiki/File:CleftLip1.png Phenotyping SEEMS like a simple task, but there are shades of grey and nuances that are difficult to convey.
  4. 4. http://anthro.palomar.edu/abnormal/abnormal_4.htm http://www.pyroenergen.com/articles07/downs-syndrome.htm http://www.theguardian.com/commentisfree/2009/oct/27/downs-syndrome-increase-terminations
  5. 5. THE CONSTELLATION OF PHENOTYPES SIGNIFIES THE DISEASE – A ‘PROFILE’ http://www.learn.ppdictionary.com/prenatal_development_2.htm http://www.pyroenergen.com/articles07/do wns-syndrome.htm http://www.theguardian.com/commentisfree/2009/oct/27/downs-syndrome-increase-terminations http://anthro.palomar.edu/abnormal/abnormal_4.htm
  6. 6. CLINICAL PHENOTYPING Often free text or checkboxes Dysmorphic features • df • dysmorphic • dysmorphic faces • dysmorphic features Congenital malformation/anomaly: • congenital anomaly • congenital malformation • congenital anamoly • congenital anomly • congential anomaly • congentital anomaly • cong. m. • cong. Mal • cong. malfor • congenital malform • congenital m. • multiple congenital anomalies • multiple congenital abormalities • multiple congenital abnormalities Examples of lists: * dd. cong. malfor. behav. pro. * dd. mental retardation * df< delayed puberty * df&lt * dd df mr * mental retar.short stature
  7. 7. SEARCHING FOR PHENOTYPES USING TEXT ALONE IS INSUFFICIENT OMIM Query # Records “large bone” 785 “enlarged bone” 156 “big bone” 16 “huge bones” 4 “massive bones” 28 “hyperplastic bones” 12 “hyperplastic bone” 40 “bone hyperplasia” 134 “increased bone growth” 612
  8. 8. TERMS SHOULD BE WELL DEFINED SO THEY GET USED PROPERLY We need to capture synonyms and use unique labels
  9. 9. SO WHAT IS THE PROBLEM?  Obviously similar phenotype descriptions mean the same thing to you, but not to a computer:  generalized amyotrophy  generalized muscle, atrophy  muscular atrophy, generalized  Many publications have little information about the actual phenotypic features seen in patients with particular mutations  Databases cannot talk to one another about phenotypes
  10. 10. OUTLINE  Why phenotyping is hard  About Ontologies  Diagnosing known diseases  Getting the phenotype data  How much phenotyping is enough?  Model organism data for undiagnosed diseases
  11. 11. ONTOLOGIES CAN HELP. A controlled vocabulary of logically defined, inter-related terms used to annotate data  Use of common or logically related terms across databases enables integration  Relationships between terms allow annotations to be grouped in scientifically meaningful ways  Reasoning software enables computation of inferred knowledge  Some well known ontologies are SNOMED-CT, Foundational Model of Anatomy, Gene Ontology, Linnean Taxonomy of species
  13. 13. HUMAN PHENOTYPE ONTOLOGY Used to annotate: • Patients • Disorders • Genotypes • Genes • Sequence variants Abnormality of pancreatic islet cells Reduced pancreatic beta cells Abnormality of endocrine pancreas physiology Pancreatic islet cell adenoma Abnormality of exocrine pancreas physiology Pancreatic islet cell adenoma Insulinoma Multiple pancreatic beta-cell adenomas Mappings to SNOMED-CT, UMLS, MeSH, ICD, etc. Köhler et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
  14. 14. USING A CONTROLLED VOCABULARY TO LINK PHENOTYPES TO DISEASES Failure to Thrive Chromosome 21 Trisomy Flat Head Abnormal Ears Umbilical Hernia Broad Hands
  15. 15. SURVEY OF ANNOTATIONS IN DISEASE CORPUS 7000+ diseases OMIM + Orphanet + Decipher (ClinVar coming soon) 111,000+ annotations Phenotype annotations are unevenly distributed across different anatomical systems
  16. 16. HOW DOES HPO RELATE TO OTHER CLINICAL VOCABULARIES? Winnenburg and Bodenreider, ISMB PhenoDay, 2014
  17. 17. LOGICAL TERM DEFINITION Definitions are of the following Genus-Differentia form: X = a Y which has one or more differentiating characteristics. where X is the is_a parent of Y. Definition of a cylinder: Surface formed by the set of lines perpendicular to a plane, which pass through a given circle in that plane. is_a is_a Definition: Blue cylinder = Cylinder that has color blue. Definition: Red cylinder = Cylinder that has color red.
  18. 18. ABOUT REASONERS A piece of software able to infer logical consequences from a set of asserted facts or axioms. They are used to check the logical consistency of the ontologies and to extend the ontologies with "inferred" facts or axioms For example, a reasoner would infer: Major premise: All mortals die. Minor premise: Some men are mortals. Conclusion: Some men die.
  19. 19. PHENOTYPES CAN BE CLASSIFIED IN MULTIPLE WAYS Abnormality of the eye Abnormal eye morphology Vitreous hemorrhage Abnormality of the cardiovascular system Abnormal eye physiology Hemorrhage of the eye Internal hemorrhage Abnormality of the globe Abnormality of blood circulation
  20. 20. PHENOTYPE MATCHING Patient Phenotype Resting tremors REM disorder Unstable posture Myopia Neuronal loss in Substantia Nigra Phenotype of known variant Resting tremors REM disorder Unstable posture Nystagmus Neuronal loss in Substantia Niagra
  21. 21. abn. of the eye abn. of the ocular region abn. of the eyelid abn. of globe localization or size abn. of the palpebral fissures hypertelorism downward slanting palpebral fissures Noonan Syndrome a) Syndrome term Query term Overlap between query and disease abn. of the eye abn. of the ocular region abn. of the eyelid abn. of globe localization or size abn. of the palpebral fissures downward slanting palpebral fissures Opitz Syndrome b) telecanthus hypertelorism c) Noonan Syndrome downward slanting palpebral fissures hypertelorism 3.78 3.05 Opitz Syndrome hypertelorism telecanthus 3.05 Query (Q) hypertelorism 2.45 downward slanting palpebral fissures (IC of abn. of the eyelid) sim(Q,Noonan) = 3.78 + 3.05 sim(Q,Opitz) =2.45 + 3.05 2 = 2.75 2 = 3.42
  23. 23. OUTLINE  Why phenotyping is hard  About Ontologies  Diagnosing known diseases  Getting the phenotype data  How much phenotyping is enough?  Model organism data for undiagnosed diseases
  24. 24. THE YET-TO-BE DIAGNOSED PATIENT  Known disorders not recognized during prior evaluations?  Atypical presentation of known disorders?  Combinations of several disorders?  Novel, unreported disorder?
  25. 25. EXOME ANALYSIS Remove off-target, common variants, and variants not in known disease causing genes Recessive, de novo filters http://compbio.charite.de/PhenIX/ Target panel of 2741 known Mendelian disease genes Compare phenotype profiles using data from: HGMD, Clinvar, OMIM, Orphanet Zemojtel et al. Sci Transl Med 3 September 2014: Vol. 6, Issue 252, p.252ra123
  26. 26. PHENIX PERFORMANCE TESTING Figure removed due to restrictions. Please see the paper: http://stm.sciencemag.org/content/6/252/252ra123.full Simulated datasets created by spiking DAG panel generated VCF file with the causative mutation removed
  27. 27. CONTROL PATIENTS WITH KNOWN MUTATIONS Inheritance Gene Average Rank AD ACVR1, ATL1, BRCA1, BRCA2, CHD7 (4), CLCN7, COL1A1, COL2A1, EXT1, FGFR2 (2), FGFR3, GDF5, KCNQ1, MLH1 (2), MLL2/KMT2D, MSH2, MSH6, MYBPC3, NF1 (6), P63, PTCH1, PTH1R (2), PTPN11 (2), SCN1A, SOS1, TRPS1, TSC1, WNT10A 1.7 AR ATM, ATP6V0A2, CLCN1 (2), LRP5, PYCR1, SLC39A4 5 X EFNB1, MECP2 (2), DMD, PHF6 1.8
  28. 28. WORKFLOW FOR CLINICAL EXOME Suspected genetic disease DRG sequencing Deep phenotyping ANALYSIS Top ranked candidates Clinical rounds Exclude candidate gene Sanger validation Cosegregation studies Fails Diagnosis Passes Inconsistent Consistent Reconsider short list Choose best candidate Variant Analysis Annotation sources: HGMD MAF (dbSNP, ESP) ClinVar Prediction criteria: Predicted pathogenicity Variant class Location in DRG target region Computational Phenotype Analysis HPO Annotation sources: OMIM, Orphanet, MGI... Semantic similarity Mode of inheritance Ontology: Prediction criteria: MP
  29. 29. PHENIX HELPED DIAGNOSE 11/40 PATIENTS global developmental delay (HP:0001263) delayed speech and language development (HP:0000750) motor delay (HP:0001270) proportionate short stature (HP:0003508) microcephaly (HP:0000252) feeding difficulties (HP:0011968) congenital megaloureter (HP:0008676) cone-shaped epiphysis of the phalanges of the hand (HP:0010230) sacral dimple (HP:0000960) hyperpigmentated/hypopigmentated macules (HP:0007441) hypertelorism (HP:0000316) abnormality of the midface (HP:0000309) flat nose (HP:0000457) thick lower lip vermilion (HP:0000179) thick upper lip vermilion (HP:0000215) full cheeks (HP:0000293) short neck (HP:0000470)
  30. 30. University of Queensland, University of Sydney
  31. 31. SKELETOME PATIENT ARCHIVE  Integration with the HPO, Orphanet, and Monarch Initiative  Automated phenotyping from clinical summaries  Collaborative diagnosis University of Queensland, University of Sydney
  32. 32. THE FACES OF RARE DISEASES  Screening and diagnosis  Treatment moni toring  Surgical planning and audi t  Genotype-phenotype correlation  Cross-species comparisons  Face to text conversion for text mining mm Non-invasive, non-irradiating deeply precise 3D facial analysis University of Western Australia
  33. 33. OUTLINE  Why phenotyping is hard  About Ontologies  Diagnosing known diseases  Getting the phenotype data  How much phenotyping is enough?  Model organism data for undiagnosed diseases
  35. 35. USING ONTOLOGIES IN THE CLINIC  Ontologies are large (HPO has > 10,000 terms) and difficult to navigate  Mapping data to an ontology post-visit is time consuming and prone to error  Best time to phenotype using ontologies is during the patient visit  Goals of PhenoTips  Make deep phenotyping simple  Make it “faster than paper”
  36. 36. PhenomeCentral is a Matchmaker  Lets you know about other similar patients  Lets you easily connect with other users Each Patient Record can be:  Public – Anyone can see the record  Private – Only specified users/consortia can see the record  Matchable – The record cannot be seen, but can be “discovered” by users who submit similar patients
  37. 37. STEP 1: ADD PATIENT  Can use the interface built into PhenomeCentral  Can export data directly from a local PhenoTips instance  Add a vcf file (or list of genes)  Set each record as Private, Public or Matchable
  40. 40. OUTLINE  Why phenotyping is hard  About Ontologies  Diagnosing known diseases  Getting the phenotype data  How much phenotyping is enough?  Model organism data for undiagnosed diseases
  41. 41. HOW MUCH PHENOTYPING IS ENOUGH?  How many annotations…?  How many different categories?  How many within each?
  42. 42. Not everything that counts can be counted and not everything that can be counted counts -Albert Einstein
  43. 43. METHOD: DERIVE BY CATEGORY REMOVAL  Remove annotations that are subclasses of a single high-level node  Repeat for each 1° subclass
  44. 44. Example: Schwartz-jampel Syndrome, Type I to test influence of a single phenotypic category
  45. 45. Example: Schwartz-jampel Syndrome derivations to test influence of a single phenotypic category
  46. 46. Schwartz-jampel Syndrome derivations
  47. 47. SEMANTIC SIMILARITY ALGORITHMS ARE ROBUST IN THE FACE OF MISSING INFORMATION Similarity of Derived Disease to Original Derived Disease Profile Rank  (avg) 92% of derived diseases are most-similar to original disease  Severity of impact follows proportion of phenotype
  48. 48. METHOD: DERIVE BY LIFTING  Iteratively map each class to their direct superclass(es)  Keep only leaf nodes
  49. 49. SEMANTIC SIMILARITY ALGORITHMS ARE SENSITIVE TO SPECIFICITY OF INFORMATION Similarity of Derived Disease to Original Derived Disease Profile Rank  Severity of impact increases with more-general phenotypes
  50. 50. ANNOTATION SUFFICIENCY SCORE http://monarchinitiative.org/page/services
  51. 51. OUTLINE  Why phenotyping is hard  About Ontologies  Diagnosing known diseases  Getting the phenotype data  How much phenotyping is enough?  Model organism data for undiagnosed diseases
  53. 53. MODELS RECAPITULATE VARIOUS PHENOTYPIC ASPECTS OF DISEASE B6.Cg-Alms1foz/fox/J increased weight, adipose tissue volume, glucose homeostasis altered ALSM1(NM_015120.4) [c.10775delC] + [-] GENOTYPE PHENOTYPE obesity, diabetes mellitus, insulin resistance increased food intake, hyperglycemia, insulin resistance kcnj11c14/c14; insrt143/+(AB) ?
  54. 54. HOW MUCH PHENOTYPE DATA?  Human genes have poor phenotype coverage GWAS + ClinVar + OMIM
  55. 55. HOW MUCH PHENOTYPE DATA?  Human genes have poor phenotype coverage What else can we leverage? GWAS + ClinVar + OMIM
  56. 56. HOW MUCH PHENOTYPE DATA?  Human genes have poor phenotype coverage What else can we leverage? …animal models Orthology via PANTHER v9
  58. 58. EACH MODEL CONTRIBUTES DIFFERENT PHENOTYPES Data from MGI, ZFIN, & HPO, reasoned over with cross-species phenotype ontology https://code.google.com/p/phenotype-ontologies/
  60. 60. SOLUTION: BRIDGING SEMANTICS anatomical structure endoderm of forgut lung bud respiration organ lung organ foregut is_a (SubClassOf) part_of develops_from capable_of alveolus organ part alveolus of lung FMA:lung MA:lung endoderm GO: respiratory gaseous exchange MA:lung alveolus FMA: pulmonary alveolus is_a (taxon equivalent) NCBITaxon: Mammalia EHDAA: lung bud only_in_taxon pulmonary acinus alveolar sac lung primordium swim bladder respiratory primordium NCBITaxon: Actinopterygii Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
  61. 61. PHENOTYPE REPRESENTATION REQUIRES MORE THAN “PHENOT YPE ONTOLOGIES Disease Gene Ontology Chemical glucose metabolism (GO:0006006) Gene/protein function data glucose (CHEBI:172 34) Metabolomics, toxicogenomics data type II diabetes mellitus (DOID:9352) Disease & phenotype data pyruvate (CHEBI:153 61) Cell pancreatic beta cell (CL:0000169) transcriptomic data
  62. 62. MODELS BASED ON PHENOTYPIC SIMILARITY Washington, Haendel, et al. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247
  63. 63. OWLSIM: PHENOTYPE SIMILARITY ACROSS PATIENTS OR ORGANISMS find Resting tremors REM disorder Shuffling gait Unstable posture Neuronal loss in Substant ia Nigra Const ipat ion Hyposmia sterotypic behavior abnormal EEG decreased stride length poor rotarod performance axon degenerat ion decreased gut peristalsis failure to find food abnormal motor function sleep disturbance abnormal locomotion abnormal coordination CNS neuron degenerat ion abnormal digestive physiology abnormal olfaction https://code.google.com/p/owltools/wiki/OwlSim
  64. 64. MONARCH PHENOTYPE DATA Species Data source Genes Genotypes Variants Phenotype annotations Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources to date Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse Diseases mouse MGI 13,433 59,087 34,895 271,621 fish ZFIN 7,612 25,588 17,244 81,406 fly Flybase 27,951 91,096 108,348 267,900 worm Wormbase 23,379 15,796 10,944 543,874 human HPOA 112,602 7,401 human OMIM 2,970 4,437 3,651 human ClinVar 3,215 100,523 445,241 4,056 human KEGG 2,509 3,927 1,159 human ORPHANET 3,113 5,690 3,064 human CTD 7,414 23,320 4,912
  65. 65. EXOMISER METHOD Cohort VCF file Homo rec De novo dom Compound het X-linked Exomiser Filters: Mendelian Frequency Candidates HP https://www.sanger.ac.uk/resources/databa ses/exomiser/query/exomiser2
  66. 66. EXOMISER RESULTS ON NIH UNDIAGNOSED DISEASE PROGRAM PATIENTS 9 previously diagnosed families Identified causative variants with a rank of at least 7/408 potential variants 21 families without identified disorders We have now prioritized variants in STIM1, ATP13A2, PANK2, and CSF1R in 5 different families (2 STIM1 families) Bone et al., submitted 
  67. 67. UDP_2731 Patient phenotypes Sh3kbp1 tm1Ivdi -/ - Gait apraxia Spasticity Thyroid stimulating hormone excess Behavioural/ Psychiatric Abnormality hyperactivity hyperactivity increased dopamine level increased exploration in new environment abnormal locomotor behavior Abnormal voluntary movement Abnormality of the endocrine system Behavioral abnormality
  68. 68. WALKING THE INTERACTOME Microcephaly Myoclonus Myoclonus YARS Microcephaly Akinesia Choreoathetosis Microcephaly musculoskeletal movement phenotype Involuntary movements IL41L IARS IARS2 MARS AARS Abnormal stereopsis Visual impairment abnormal visual perception Patient phenotypes Combined Oxidative Phosphorylation Deficiency 14 FARS2 WARS2 ? AIMP1 UDP_1166
  69. 69. FINDING COLLABORATORS FOR FUNCTIONAL VALIDATION Patient Phenotype profile Phenotyping experts
  70. 70. PHENOVIZ: INTEGRATE ALL HUMAN, MOUSE, AND FISH DATA TO UNDERSTAND CNVS Desktop application for differential diagnostics in CNVs  Explain manifestations of CNV diseases based on genes contained in CNV E.g., Supravalcular aortic stenosis in Williams syndrome can be explained by haploinsufficiency for elastin  Double the number of explanations using model data Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72
  71. 71. A LOOK AT THE HPO
  72. 72. WHO USES THE HPO?  Bayés, Àlex, et al. Nature neuroscience 2011  Castellano, Sergi, et al. PNAS 2014  Corpas, Manuel, et al. " Current Protocols in Human Genetics 2012  Sifrim, Alejandro, et al. Nature methods 2013  Lappalainen, Ilkka, et al. Nucleic acids research 2013  Firth, Helen V., and Caroline F. Wright. Developmental Medicine & Child Neurology 2011  Many more…
  73. 73. ADVANTAGES OF HPO  Widely used, flexible, freely available, and community supported resource  Prioritization of candidate variants through tools such as PhenIX and Exomizer, and others  Extensive links to model organism ontologies, allowing selection of optimal models for wet-lab validation and research, and collaborators  Intuitive clinical interfaces built into tools such as PhenoTips, Certagenia, and others  Ability to easily share data with key international projects (Decipher/DDD, RD-Connect, PhenomeCentral, Matchmaker Exchange, etc.)
  74. 74. LIMITATIONS  Quantitative vs. qualitative – Much of clinical data is quantitative lab data with reference standards. It is possible to convert based on ±3 SD, but no way to record the reference measure/population yet.  Temporal presentation – ontologies can support temporal ordering, but data capture tools don’t yet capture this and the comparison algorithms don’t yet take it into account  Severity – semantic encoding is available, but simple in comparison to phenotype-specific measures  Emerging ontology – some areas have poor coverage, such as nervous system, behavior, and imaging results. Need to represent the assays in these contexts.
  75. 75. ACKNOWLEDGMENTS NIH-UDP William Bone Murat Sincan David Adams Amanda Links David Draper Joie Davis Neal Boerkoel Cyndi Tif f t Bill Gahl OHSU Nicole Vasilesky Matt Brush Bryan Laraway Shahim Essaid Lawrence Berkeley Nicole Washington Suzanna Lewis Chris Mungall UCSD Amarnath Gupta Jef f Grethe Anita Bandrowski Maryann Martone U of Pitt Chuck Boromeo Jeremy Espino Becky Boes Harry Hochheiser Sanger Anika Oehl r ich Jules Jacobson Damian Smedley Toronto Mar ta Gi rdea Sergiu Dumi t r iu Heather Trang Mike Brudno JAX Cynthia Smi th Charité Sebast ian Kohler Sandra Doelken Sebast ian Bauer Peter Robinson Funding: NIH Office of Director: 1R24OD011883 NIH-UDP: HHSN268201300036C, HHSN268201400093P
  76. 76. WHERE TO GET HPO, AND HOW TO REQUEST NEW CONTENT We need you! Browse in the following places: http://www.human-phenotype-ontology.org/ http://purl.bioontology.org/ontology/HP Get the file: http://purl.obolibrary.org/obo/hp.owl Request content: https://sourceforge.net/p/obo/human-phenotype-requests/new/ More Documentation: https://code.google.com/p/phenotype-ontologies/

Hinweis der Redaktion

  • A good example of this can be seen here. So the average person has had enough experience with Down’s Syndrome that they are likely able to notice that all three of these patients have it. However, if you asked them to describe this phenotype in short phrases that can be agreed upon by the majority of people, can be used to identify the disorder, and ideally can be easily used by a computer, they are going to have a difficult time. It is a collection of phenotypes “Together “ that define a disease and it is difficult for someone who does have the proper training to parse this information out.

    Down’s Syndrome good example

    Average person can identify Down’s , but can’t list out the Pheno for a computer

    Unless clinically TRAINED tough to outline a phenotype
  • But, through hard work, and being very specific with our description of phenotype we can begin to approach making this information manageable and use it to identify disease.

    Can make a COMPUTABLE list of phenotype with hard work
  • How can we characterize the diseases? Well, a distribution of the phenotypes across all diseases reveals that most have phenotypes affecting the nervous system, while the least have connective tissue phenotypes.
    7401 diseases
    99,045 annotations

    There are 20 categories in all. Note that they are not additive, as some phenotypes might belong to two categories. For example, some eye and ear phenotypes also belong to the head and neck.

    MESSAGE: Phenotype annotations unevenly distributed in different anatomical systems.
  • As you might expect, we use this phenotype ontology to search for known variants that have similar phenotype to our patient. Using just HPO and the disease annotated with HPO terms we can compare to the known human disease, but as we are all painfully aware of we do not know the consequences of mutations in every gene that is in the human genome.

    We use patient pheno and search for known variants to with similar pheno

    Can do this with Humans, but don’t know all Human disease
  • PhenIX usese human data and predicted deleteriousness

    HGMD ClinVAR OMIM Orphanet
  • HGMD mutations were inserted into variant files from DAG panels from which the causative mutations had been removed and phenotypic annotations of the corresponding diseases were extracted from the HPO database. The genes were ranked with PhenIX. Results were simulated either on the entire disease set (All) or by filtering for known autosomal dominant (AD) or autosomal recessive (AR) diseases (fig. S2). A total of 8504 (All), 3471 (AD), and 5006 (AR) simulations were performed.
  • See final figure in paper: http://stm.sciencemag.org/content/6/252/252ra123.full
  • A community-driven knowledge curation platform for skeletal dysplasias
    Editorial process for curating phenotype and genotype knowledge on skeletal dysplasias
    Integration with HPO
  • 1: Diagnostic Facial Signature via a vector diagram: shows directional differences versus a normative data set. It is a 2D representation of a 3D image. The ends of the coloured lines represent the patient’s (or averaged groups of patients’ ) image(s). The grey scale represents the normative data (normal equivalent). In this case, if the tips of the arrows are seen sticking out from the grey surface, then the patient (or group of patient’s) is characterised by a protrusion of that region. E.g. here the patient has a prominent nasal tip and prominent lips. If you were to rotate the 3D image you would see the lines in the cheek region pointing inwards i.e. the patient has a flat cheek region compared to the reference range.
    2: Monitoring treatment in a multisystem disorder. Progressive reduction in facial dysmorphology (anomaly), top to bottom, over treatment for a rare metabolic condition (Mucopolysaccharidosis type 1)
    3: Text mining: converting face to text, specifically, elements of morphology terms.
  • When we have a patient with an undiagnosed/rare disease, we want to be able to search the whole landscape of knowledge.

    How can we effectively utilize phenotype information collected about the patients?
    It all boils down to the question, how much is enough to be useful?
    What does that information need to be like in order to be useful?

    We have partnered with the UDP to try to help figure out how much information might be necessary to collect about the patients with rare diseases.
  • To illustrate the method, let’s take Schwartz-jampel Syndrome. It has ~100 phenotype annotations, distributed to 17 phenotypic categories. The majority of which are in the skeletal system.

    Problem in Hspg2, a proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules.

    For this example, I’ve taken a subset of the phenotypes, and colored them by category.
  • We can test the roll of category by creating a derived disease that removes all the phenotypes for that category as our “case”…
  • We can test the role of category by creating a derived disease that removes all the phenotypes for that category as our “case”…

    And then as a control, remove an equal amount of “information” from other categories.

    In the case of Schwartz-Jampel Syndrome, removing only skeletal phenotypes (which comprises 40% of phenotype profile) it significantly reduces its similarity, dropping it to only 86% similar, whereas removing the same amount of information from the controls gives an average of 91% similarity. In this case, there were 73 controls to compare to. We have performed these types of experiments for every disease profile in our corpus (approx 8K).

    The experiment therefore is:
    Create a variety of “derived” less-specific diseases
    Assess the change in similarity:
    Is the derived disease still considered similar to the original disease?
    …or more similar to a different disease?
    Is it distinguishable beyond random?

    Using the results, we can create a metric to define when a phenotype profile is unique enough to be useful in comparisons to other diseases and models systems.
  • Using the results, we can create a metric to define when a phenotype profile is unique enough to be useful in comparisons to other diseases and models systems.
    This has been implemented at UDP so that clinicians can provide quality human data to be able to compare to animal models.
    It has also been implemented on monarch genotype pages, where one can see how well annotated any given model is.

    This functionality is also available as a service call. We can also generate different types of reports based on these analyses to determine which diseases have the least coverage, which models are the least well annotated, etc.
  • Our abilities to link genotype to phenotype are constrained by our knowledgebase. To date, <40% of human genes have been directly linked to diseases or traits by ClinVar, OMIM, and GWAS. How are we going to discover the disease-gene links if the phenotype coverage is so poor?

    (Of course, there may be much more links for GWAS, once we figure out what to do with all the variants that lie outside of gene boundaries.
  • So, <40% of human genes have phenotypes. And when we look at the orthologs for each of the standard multi-cellular model organisms, there doesn’t appear to be any more than about 50% coverage for any given model.
  • But when put together, they bring the phenotypic coverage of human genes (either directly or inferred via orthology) up to nearly 80%. That is A LOT of coverage. How can we better tap that?
  • Since any computational methods rely heavily on the data, what does the available data look like?

    The distribution of phenotype information per model genotype is different compared to human disease annotations.
    For mouse, there’s a much higher representation of metabolic, cardiovascular, blood, and endocrine phenotypes available to compare;
    For fish, there’s increased nervous, skeletal, head and neck, and cardiovascular, and connective tissue.

    (Note that these do not include “normal” phenotypes for either diseases or genotypes.)

    Each model brings something different to the phenotype landscape.

  • Different terminology is used to describe clinical manifestations than is used to describe model system biological features.
  • Things like finding models of sirenomelia due to disruption of the lateral plate mesoderm . Helping to find models and gene candidates based on the relationships in the development
  • Without additional knowledge and linking, computers can’t make the connections. These links take us from the molecular to the protein, to the cellular and anatomical, to the disease level of phenotypes
  • OWLsim computes semantic similarity between sets of phenotypes within and across species using the bridging semantics. Phenotypes in common from the bridging ontologies relate human clinical phenotypes with model organism phenotypes.
    Examples include motor systems, olfaction, and digestion.
    In this case, data encoded using the human phenotype ontology has been made interoperable with mouse, zebrafish and other model system ontologies. This also enables the use of more complex algorithms to detect similarity – not bases solely on mapping or string matching; e.g. constipation and decreased gut peristalsis are both subtypes of abnormal digestive system physiology.
  • Run through pipeline: Exomiser LOT is a version of exomiser that is less restrictive as far as what transcripts it recognizes (not at worried about off target reads because of the Mendalian filters and the ability to look at the BAMS)


    Exomiser LOT

    Pheno only
  • Exomiser is an exome analysis tool that leverages the uberpheno cross species phentoype comparisons, standard exome filtering (pathogenicity, frequency, off targets, etc.), in combination with mendelian filtering and interactome walking. The original method paper is published: http://genome.cshlp.org/content/early/2013/10/25/gr.160325.113.abstract

    This work is unpublished but has been (just) submitted. Happy to share the manuscript if of interest.
  • Each model organism has a different suite of phenotypes that are examined, because different models are used to explore different types of biological function and malfunction. By using a diversity of model systems, we have the potential to identify candidates based on partial overlaps with the patient phenotype profile by looking at different models with mutations in potential candidates or related via interactions, co-expression, genomic regulatory region, etc.
  • Phenoviz is a new graphviz plugin that can be used as a standalone app for Windows, Mac, or Linux. The user uploads a list of CNVs detected by Array CGH (SNP Chips, or even genome sequence data would also work as a starting point, but the program expects a simple list). You also enter a list of the HPO terms observed in the patient. The application then tries to find “matches” based on the single gene disorders (human – HPO annotations) or the mouse models (mainly knockouts, MP annotations from MGI) or fish models (ZFIN E/Q annotations). This is being in the Charite Array CGH diagnostics service to help with interpretation of CNVs. Subjectively, the tool helps you to quickly find good candidates in order to write reports. The program also picks out the best matching CNV in case the user enters several (a typical array CGH finding in our lab has up to 50 CNVs, of which 2-5 are not found in databases of common variants like DGV).

  • There are a lot of people who have contributed to this work over many years. 