Anzeige

Why the world needs phenopacketeers, and how to be one

mhaendel
28. Apr 2016
Anzeige

Más contenido relacionado

Presentaciones para ti(20)

Anzeige
Anzeige

Why the world needs phenopacketeers, and how to be one

  1. Why the world needs PhenoPacketeers, and how to be one Melissa Haendel, PhD April , 2016 Biocuration 2016 @monarchinit @ontowonka haendel@ohsu.edu
  2. What is a phenotype? @ontowonka
  3. Biology central dogma ADAPTED FROM http://www.xkcd.com/295/ @ontowonka
  4. Genes Environment Phenotypes+ = Biology central dogma Standards for encoding and exchanging data must be up to these challenges. This is where you come in. @ontowonka
  5. Genes Environment Phenotypes+ = Computable encodings are essential Base pairs Variant notation (eg. HGVS) Human Phenotype Ontology Mammalian Phenotype Ontology Medical procedure coding Environment Ontology @ontowonka
  6. Genes Environment Phenotypes VCF PXFGFF Standard exchange formats exist for genes … but for phenotypes? Environment? BED @ontowonka
  7. The relationships too must be captured It is not just the bits… G-P or D (disease) causes contributes to is risk factor for protects against correlates with is marker for modulates involved in increases susceptibility to G-G (kind of) regulates negatively regulates (inhibits) positively regulates (activates) directly regulates interacts with co-localizes with co-expressed with P/D - P/D part of results in co-occurs with correlates with hallmark of (P->D) E-P contributes to (E->P) influences (E->P) exacerbates (E->P) manifest in (P->E) G-E (kind of) expressed in expressed during contains inactivated by
  8. The genome is sequenced, but… …we still don’t know very much about what it does 3,435 OMIM Mendelian Diseases with no known genetic basis ? 66,396 ClinVar Variants with no known pathogenicity
  9. Why we need all the organisms Model data can provide up to 80% phenotypic coverage of the human coding genome
  10. We learn different phenotypes from different organisms
  11. B6.Cg-Alms1foz/fox/J increased weight, adipose tissue volume, glucose homeostasis altered ALSM1(NM_015120.4) [c.10775delC] + [-] GENOTYPE PHENOTYPE obesity, diabetes mellitus, insulin resistance increased food intake, hyperglycemia, insulin resistance kcnj11c14/c14; insrt143/+(AB) Can we use model phenotypes to inform genetic mechanisms of disease? ???
  12. CC2.0 European Southern Observatory https://www.flickr.com/photos/esoastronomy/6923443595 Crossing the language barrier
  13. Ulcerated paws Palmoplantar hyperkeratosis Thick hand skin Image credits: "HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG http://www.guinealynx.info/pododermatitis.html
  14. Semantics serve as a bridge http://xkcd.com/1406/
  15. Challenge: Each database uses their own vocabulary/ontology MP HP MGI HPOA
  16. Challenge: Each database uses their own phenotype vocabulary/ontology ZFA MP DPO WPO HP OMIA VT FYPO APO SNO MED … … … WB PB FB OMIA MGI RGD ZFIN SGD HPOA EHR IMPC OMIM … QTLdb
  17. Can we help machines understand phenotype terms? “Palmoplantar hyperkeratosis” Human phenotype I have absolutely no idea what that means
  18. Decomposition of complex concepts allows interoperability Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2 “Palmoplantar hyperkeratosis” increased Stratum corneum layer of skin = Human phenotype PATO Uberon Species neutral ontologies, homologous concepts Autopod keratinization GO
  19. Harmonizing diseases, phenotypes, anatomy, and genotypes
  20. Current weighted features • Breadth of phenotypic coverage • Depth/specificity of phenotypic coverage • Rarity Planned algorithmic features: • Disease and phenotype staging • Age of onset • Asserted absence of phenotypes Fuzzy phenotype profile matching: Patients  Diseases  Models www.owlsim.org
  21. Diagnosing an undiagnosed disease
  22. Why model organisms matter to patients
  23. The prevailing clinical diagnosis pipelines leverage only a tiny fraction of the available data PATIENT EXOME / GENOME PATIENT PHENOTYPES PATIENT ENVIRONMENT PUBLIC GENOMIC DATA PUBLIC PHENOTYPE, DISEASE DATA PUBLIC ENVIRONMENT, DISEASE DATA POSSIBLE DISEASES DIAGNOSIS & TREATMENT Under-utilized data
  24. It takes an interoperable village to diagnose a rare platelet syndrome http://bit.ly/stim1paper Phenotypic profile Genes Heterozygous, missense mutation STIM-1 MGI mouse N/A Heterozygous, missense mutation STIM-1 N/A Ranked STIM-1 variant maximally pathogenic based on cross-species G2P data, in the absence of traditional data sources http://bit.ly/exomiser Stim1Sax/Sax
  25. Introducing PhenoPackets It’s exactly what you think it is: a packet of phenotype data to be used anywhere, written by anyone
  26. If it is alive, it can be PhenoPackaged Some biodiversity images adapted from http://i.vimeocdn.com/video/417366050_1280x720.jpg Model Organisms Biodiversity Crops Domestic Animals Disease vectors Epidemiological Monitoring Drug discovery & Development Rare Disease Diagnosis Personalized Medicine Environmental Monitoring Patients & Cohorts Genetic Engineering Mechanistic Discovery
  27. What is in a PhenoPacket? This is “Maru”, a 4-year-old, male cat of the Scottish Fold breed abnormal sheltering behavior [MP:0014039] (onset at birth) Biography Phenotypes &qualifiers youtube.com/user /mugumogu Weighs 6kg Measurements Source
  28. title: "age of onset example" persons: - id: "#1" label: "Donald Trump" sex: "M" phenotype_profile: - entity: "person#1" phenotype: types: - id: "HP:0200055" label: "Small hands" onset: description: "during development" types: - id: "HP:0003577" label: "Congenital onset" evidence: - types: - id: "ECO:0000033" label: ”Traceable Author Statement" source: - id: "PMID:1" Image credits: upi.com What does a PhenoPacket look like? Canonical JSON format
  29. title: "measurement example, taken from genenetwork.org" organisms: - id: "#1" label: "BXD mouse population” taxon: NCBITaxon:10090 phenotype_profile: - entity: "#1" phenotype: description: "cerebellum weight" types: - id: "PATO:0000128" label: "weight" measurements: - unit: mg value: 61.400 property_values: - property: standard_error filler: 2.38 attribute_of: types: - id: "UBERON:0002037" label: "cerebellum" onset: description: "measured in adults" types: - id: "MmusDv:0000061" label: "early adult" Ontology of Statistical properties We can represent population phenotypes too attribute For non-abnormal phenotypes we can use a trait ontology, or a building block approach, with • PATO • Uberon Measured entity UO How does it handle measurements?
  30. Phenopackets for laypersons Image credits: ngly1.org • Dry eyes • Developmental delay • Elevated liver function phenotype_profile: - entity: ”patient16" phenotype: types: - id: "HP:0000522" label: ”Alacrima" onset: description: ”at birth" types: - id: "HP:0003577" label: "Congenital onset" evidence: - types: - id: "ECO:0000033" label: ”Traceable Author Statement" source: - id: ” https://twitter.com/examplepatient/status/1 • Patient registries • Social media
  31. Human Phenotype Ontology, now with 6,200 plain language synonyms for patients, families, and non-experts http://bit.ly/hpo-biocuration
  32. Phenopackets for journals Each article can be associated with a phenopacket Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372 Each phenopacket can be shared via DOI in any repository outside paywall (eg. Figshare, Zenodo, etc)
  33. So, do you expect us to put these together ourselves? Emerging tool: WebPhenote (based on Phenote) create.monarchinitiative.org
  34. WebPhenote Form-based Graph-based Noctua / LEGO inside
  35. PhenoPacket formats CSV JSON RDF OWL Export phenopacket to
  36. Example of export https://monarchinitiative.org/variant/ClinVarVariant:88756
  37. Example of export
  38. The PhenoPackets ecosystem Mechanistic discovery Improved searchability Integrated Data Landscape Tool/algorithm creation Cohort identification Patient registries Databases, Web tools, AlgorithmsPhenopacket Registry JournalsDiagnostic screening programs Clinical trials Phenopacket flow Primarybenefits tostakeholders Patients/ Families Physicians Patient matchmaking Diagnosis speed/accuracy Organismal biologist www.phenopackets.org https://github.com/phenopackets/
  39. PHENOTYPING ISN’T FREE; SO HOW MUCH IS ENOUGH? bit.ly/annotationsufficiency Enlarged ears Dark hair Blue skin Pointy ears Hair on head Horns Enlarged lip Increased skin pigmentation yes no !
  40. THE MORE PHENOTYPE DATA WE HAVE, THE BETTER ABLE WE ARE TO ANSWER THAT QUESTION bit.ly/annotationsufficiency • Depth/specificity of phenotypic coverage • Rarity • Breadth of phenotypic coverage
  41. Which phenotypes (and sets of phenotypes) enable precision recall and matching Enlarged ears (2)Dark hair (6) Female (4) Male (4) Blue skin (1) Pointy ears (1) Hair absent on head (1) Horns present (1) Hair present on head (7) Enlarged lip (2) Increased skin pigmentation (3)
  42. PhenoPackets make phenotype data: Findable Accessible outside paywalls and private data sources Attributable Interoperable and Computable, Reusable, exchangeable across contexts and disciplines FAIR++
  43. Sign up below to receive updates Or to provide feedback and requirements http://bit.ly/biocuration2016 Thank you! Live Long and Phenotype
  44. Acknowledgements Lawrence Berkeley Chris Mungall Suzanna Lewis Jeremy Nguyen Seth Carbon Charité Peter Robinson Sebastian Kohler RTI Jim Balhoff Cyverse Ramona Walls U of Pittsburgh Harry Hochheiser OHSU Matt Brush Kent Shefchek Julie McMurry Tom Conlin Nicole Vasilevsky Queen Mary College London Damian Smedley Jules Jacobson Garvan Tudor Groza Alfred Wegener Pier Buttigieg FUNDING: NIH Office of Director: 1R24OD011883; NIH-UDP: HHSN268201300036C, HHSN268201400093P, Phenotype Ontology Research Coordination Network (NSF-DEB-0956049) With special thanks to Julie McMurry for excellent graphic design

Hinweis der Redaktion

  1. Trite answer: Something that can be represented by a class in a phenotype ontology MP HP .. But there is more This basic phenotype description can be adorned with… Natural language descriptions Temporal information (onset) Qualifiers Severity, progression Quantitative information Measurements (unit, value, error, etc) Environment …Much more!
  2. The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  3. The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  4. The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  5. The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  6. The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  7. There is a lot we don’t know about the genome As of April 2016 OMIM updated number: 3435 ClinVar updated number: 66396
  8. Data from mouse, rat, zebrafish, worm, fruitfly Human:OMIM, clinvar Orthology via PANTHER v9
  9. Highlighting how we get different phenotypic information from different sources, species Data from MGI, ZFIN, & HPO, reasoned over with cross-species phenotype ontology https://code.google.com/p/phenotype-ontologies/ The distribution of phenotype information per model genotype is different compared to human disease annotations. For mouse, there’s a much higher representation of metabolic, cardiovascular, blood, and endocrine phenotypes available to compare; For fish, there’s increased nervous, skeletal, head and neck, and cardiovascular, and connective tissue. (Note that these do not include “normal” phenotypes for either diseases or genotypes.) What does it mean to replicate a phenotypic profile in a model organism? For many patients or diseases, we may need different models to fully recapitulate the disease. Further, some phenotypes are common in a given species and if present in the patient, would be a less significant result.
  10. 2 issues: database integration, vocabulary integration
  11. Multiple databases
  12. Our approach is to try and get the machine to understand the terms so that it can assist us intelligently.
  13. We make things digestible. Complex concepts into simpler parts. We use ontologies that are comparative by design.
  14. If we include bridging ontologies, we can unify diseases across sources AND phenotypes across sources and organisms.
  15. This was the novel case we solved. The UDP patient had a number of signs and symptoms including various platelet abnormalities. The same heterozygous, missense mutation was seen in 2 patients and ranked top by Exomiser. It had never been seen in any of the SNP databases and was predicted maximally pathogenic. Finally a mouse curated by MGI involving a heterozygous, missense point mutation introduced by chemical mutagenesis exhibited strikingly similar platelet abnormalities.
  16. Mosquito image from https://pixabay.com/en/brazil-health-mosquito-news-virus-1300017/ no attribution required
  17. https://pixabay.com/en/instruments-measurement-measure-860912/
  18. Reeldx patientslikeme post phenopacket on facebook Same format
  19. https://pixabay.com/en/pencil-green-writing-tools-37254/
  20. Knowing what the normal distribution and clustering of phenotypes is helps us know that blue skin is rare and can reliably distinguish between phenotype profiles. Likewise to know that if the first phenotype entered is enlarged lip, the next one to ask for would be enlarged ears. The combination of 3 non-unique phenotypes offers a perfect match.
  21. There are a lot of people who have contributed to this work over many years. 
Anzeige