Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Phenopackets as applied to variant interpretation
1. Making Phenotypic data FAIR++ for
Disease Diagnosis and Discovery
Findable
Accessible outside paywalls and private data sources
Attributable
Interoperable and Computable,
Reusable, exchangeable across contexts and disciplines
@ontowonkaMelissa Haendel, PhD
2. Genes Environment Phenotypes+ =
Computable encodings are essential
Base pairs
Variant notation (eg. HGVS)
Human Phenotype
Ontology
Mammalian
Phenotype Ontology
Medical procedure coding
Environment Ontology
@ontowonka
3. Genes Environment Phenotypes
VCF PXFGFF
Standard exchange formats exist for genes …
but for phenotypes? Environment?
BED
@ontowonka
4. Problems with tabular formats
• Denormalized
– Repetition of fields
– Ad-hoc syntax for multi-values fields, nesting
• Proliferation
– different formats generated for each use case
• E.g. disease-phenotype, patient-phenotype, …
• Hard to extend
– Not all phenotypes can be pre-packaged as a phenotype term
• E.g. Measurements, environments
• Ad hoc software, need standard libraries
• Focus should be on the datamodel
5. Phenopackets for clinical labs
Patient and
family
history
Diagnostic
tests,
clinical
phenotypes
Genomic
information
Physical
exam
Patient medical history
Clinical labs often get no phenotypes or one-line descriptions.
What if we could make the phenotype data PHI-free and
simultaneously more descriptive?
Clinical testing lab
6. Phenopackets for journals
Each article can be
associated with a
phenopacket
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision
medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
Each phenopacket
can be shared via
DOI in any repository
outside paywall (eg.
Figshare, Zenodo,
etc) and cited as a
data citation
9. A simple data model
Entities
–Organism
• Patient
• Non-human animal
• Population
–Genetic/genomic element
–Condition
• Disease
• Phenotype
Associations
–E.g. between disease and phenotype
–Each association has
• Evidence
• Provenance
Entity
Condition
associationEvidence
Disease Phenotype
13. monarchinitiative.org
title: "measurement example, taken from genenetwork.org"
organisms:
- id: "#1"
label: "BXD mouse population”
taxon: NCBITaxon:10090
phenotype_profile:
- entity: "#1"
phenotype:
description: "cerebellum weight"
types:
- id: "PATO:0000128"
label: "weight"
measurements:
- unit: mg
value: 61.400
property_values:
- property: standard_error
filler: 2.38
attribute_of:
types:
- id: "UBERON:0002037"
label: "cerebellum"
onset:
description: "measured in adults"
types:
- id: "MmusDv:0000061"
label: "early adult"
Ontology of
Statistical
properties
We can represent
population
phenotypes too
attribute
For non-abnormal
phenotypes we can
use a trait ontology,
or a building block
approach, with
• PATO
• Uberon
Measured entity
UO
How does it handle measurements?
14. Example: pathogenicity for a variant)
disease_profile:
- entity: CLINVAR:226213
disease:
- id: NCIT:C4872
label: "Breast Carcinoma"
interpretation: "pathogenic"
contributors:
- id: CLINGEN:Agent007 label: "Clinical Pathogenicity Calculator v1"
created: "2016-07-12T11:00:59+00:00"
method:
- id: doi:10.1038/gim.2015.30
label: "ACMG ISV guidelines 2015"
evidence:
- id: CLINGEN:ev025
type: ECO:9000100 ('population frequency evidence')
acmg_criterion: CLINGEN:vic008 ('ACMG v2015 PM2, absent from
controls in population databases')
description: "Variant is absent from a large cohort of non-finnish
europeans (NFE) in the ExAC population database, with sequencing
coverage of the variant exceeding 25X"
outcome: "moderately supporting"
supporting_reference:
- id: PMID:27997510
supporting_data:
- id: CLINGEN:PAF082A type: SEPIO:9000895 ('allele frequency
data')
value: "0"
- id: CLINGEN:PAF082B
type: SEPIO:9000846 ('median sequencing coverage data')
value: "28X"
- id: CLINGEN:PAF082C
type: SEPIO:9000878 ('population ethnicity data')
value: "non-finnish european”
…....
header
entities
assocs
variants:
- id: CLINVAR:226213
type: SO:0001483 ('single
nucleotide variant')
label:"NM_007294.3(BRCA1):c
.4677_5075del"
positions:
- type: HGVS
value:"NM_007294.3:c.4677_
5075del"
Use GA4GH variant
representation (Reece
Hart leading)
http://bit.ly/variant-path-PXF
ClinGen (Larry Babb) collab
15. Complex phenotypes
Not every phenotype can be boiled down to a pre-packaged ontology
term
PXF allows post-coordination / post-composition
– E.g. ‘mild’, ‘severe’ qualifiers
– Temporal qualifiers: start, end, acute/chronic, …
– Specifying precise location of phenotype
– On-the-fly composition of phenotypic descriptors from base ontologies
• Chemical entities
• Cell types
• GO
• Anatomy
Additionally
– Free text descriptions
– Measurements / quantitative phenotypes
– Environments (ongoing)
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple
species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
16. PXF and GA4GH Stack
PXF primary use case is as a file format
GA4GH primary use case as an API
Obviously these are related…
...But the devil is in the details
– E.g. Is there a well-defined mapping between proto and JSON?
How can we better interoperate? Working to converge (M.
Diekhans)
– Define PXF using ProtoBuf
– What would a query API look like?
• As an exchange format, we don’t have to worry about this
• Query APIs for complex data structures proliferate complexity
• What is the overall GA4GH strategy here?
17. PXF, GA4GH, and other related activities
G2P
– PXF extends initial implementation
– Make PXF a FHIR resource
Metadata
– Align how to reference ontology terms
– Standardizing identifier prefixes
MME
– PXF does not provide a search API
– PXF subsumes phenotype profile representation
Beacon
– PXF could be a response element
18. Summary: Phenotype Exchange Format
• One model, derive alternate concrete forms
– YAML, JSON, RDF, TSV (subset)
• Species-agnostic
– From microbes through plants through humans
– clinical and basic research
• Applicable to a variety of entities
– Patients/individual organisms, cohorts, populations
– Diseases
– Papers
– Genes, genotypes, alleles, variants
• Simple for simple cases…
– Bag of terms model
• …Incremental expressivity
– Temporality and causality
– Quantitative as well as qualitative
– Negation, severity, frequency, penetrance, expressivity
• Ontology-smart
– Rational Composition (post-coordination)
– Explicit semantics
http://phenopackets.org
19. Phenopacket Tool ecosystem
• Non JVM language bindings
– Python (beta)
• https://github.com/phenopackets/phenopacket-python/
– Javascript (alpha)
• https://github.com/phenopackets/phenopacket-js/
• Pxftools
– command line library, Scala utilities
– https://github.com/phenopackets/pxftools
• PhenoPacketScraper
– GSOC project to make phenopackets from case study articles
– https://github.com/monarch-initiative/phenopacket-scraper-core
• OwlSim
– Like blast, for phenotypes
– https://github.com/monarch-initiative/owlsim-v3
• WebPhenote
– Noctua extension for phenopacket creation
– http://create.monarchinitiative.org
20. Acknowledgments
• Chris Mungall
(schema/architecture)
• Jules Jacobsen (java API)
• James Balhoff (pxftools)
• Jeremy Nguyen-Xuan (pxftools)
• Seth Carbon (web phenote)
• Kent Shefcheck (python API)
• Matt Brush (modeling)
• Dan Keith (web phenote)
• Satwik Bhattamishra (GSOC
student, PhenoPacketScraper)
• Julie McMurry
• Peter Robinson
• Pier Buttigieg
• Ramona Walls
• Damian Smedley
• Sebastian Kohler
• Tudor Groza
• Harry Hochheiser
• Mark Diekhans
• Melanie Courtot
• Michael Baudis
• Helen Parkinson
• Suzanna Lewis
Hinweis der Redaktion
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.