SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Making Phenotypic data FAIR++ for
Disease Diagnosis and Discovery
Findable
Accessible outside paywalls and private data sources
Attributable
Interoperable and Computable,
Reusable, exchangeable across contexts and disciplines
@ontowonkaMelissa Haendel, PhD
Genes Environment Phenotypes+ =
Computable encodings are essential
Base pairs
Variant notation (eg. HGVS)
Human Phenotype
Ontology
Mammalian
Phenotype Ontology
Medical procedure coding
Environment Ontology
@ontowonka
Genes Environment Phenotypes
VCF PXFGFF
Standard exchange formats exist for genes …
but for phenotypes? Environment?
BED
@ontowonka
Problems with tabular formats
• Denormalized
– Repetition of fields
– Ad-hoc syntax for multi-values fields, nesting
• Proliferation
– different formats generated for each use case
• E.g. disease-phenotype, patient-phenotype, …
• Hard to extend
– Not all phenotypes can be pre-packaged as a phenotype term
• E.g. Measurements, environments
• Ad hoc software, need standard libraries
• Focus should be on the datamodel
Phenopackets for clinical labs
Patient and
family
history
Diagnostic
tests,
clinical
phenotypes
Genomic
information
Physical
exam
Patient medical history
Clinical labs often get no phenotypes or one-line descriptions.
What if we could make the phenotype data PHI-free and
simultaneously more descriptive?
Clinical testing lab
Phenopackets for journals
Each article can be
associated with a
phenopacket
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision
medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
Each phenopacket
can be shared via
DOI in any repository
outside paywall (eg.
Figshare, Zenodo,
etc) and cited as a
data citation
Phenopackets for databases
Databases could share G2P data in a standardized format,
retaining domain or species specificity
OMIA
Ontologies provide pre-packaged phenotype
descriptions
A simple data model
 Entities
–Organism
• Patient
• Non-human animal
• Population
–Genetic/genomic element
–Condition
• Disease
• Phenotype
 Associations
–E.g. between disease and phenotype
–Each association has
• Evidence
• Provenance
Entity
Condition
associationEvidence
Disease Phenotype
PhenoPacket export formats
CSV JSON RDF OWL
monarchinitiative.org
title: "age of onset example"
persons:
- id: "#1"
label: "Donald Trump"
sex: "M"
phenotype_profile:
- entity: "person#1"
phenotype:
types:
- id: "HP:0200055"
label: "Small hands"
onset:
description: "during development"
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: "ECO:0000033"
label: ”Traceable Author Statement"
source:
- id: "PMID:1"
Image credits: upi.com
What does a PhenoPacket look like?
Canonical JSON format
Nesting allows refinement
phenotype_profile:
- entity: “#1”
phenotype:
types:
- id: HP:0100024
label: conspicuously happy disposition
onset:
types:
- id: HP:0011463
label: Adult onset
description: “Writes distracting tweets”
header
entities
assocs
persons:
- id: „#1“
label: Mickey Mouse
date_of_birth: 1928-01-01
sex: M
- id: „#2“
label: Goofy
sex: M
patients.pxf
monarchinitiative.org
title: "measurement example, taken from genenetwork.org"
organisms:
- id: "#1"
label: "BXD mouse population”
taxon: NCBITaxon:10090
phenotype_profile:
- entity: "#1"
phenotype:
description: "cerebellum weight"
types:
- id: "PATO:0000128"
label: "weight"
measurements:
- unit: mg
value: 61.400
property_values:
- property: standard_error
filler: 2.38
attribute_of:
types:
- id: "UBERON:0002037"
label: "cerebellum"
onset:
description: "measured in adults"
types:
- id: "MmusDv:0000061"
label: "early adult"
Ontology of
Statistical
properties
We can represent
population
phenotypes too
attribute
For non-abnormal
phenotypes we can
use a trait ontology,
or a building block
approach, with
• PATO
• Uberon
Measured entity
UO
How does it handle measurements?
Example: pathogenicity for a variant)
disease_profile:
- entity: CLINVAR:226213
disease:
- id: NCIT:C4872
label: "Breast Carcinoma"
interpretation: "pathogenic"
contributors:
- id: CLINGEN:Agent007 label: "Clinical Pathogenicity Calculator v1"
created: "2016-07-12T11:00:59+00:00"
method:
- id: doi:10.1038/gim.2015.30
label: "ACMG ISV guidelines 2015"
evidence:
- id: CLINGEN:ev025
type: ECO:9000100 ('population frequency evidence')
acmg_criterion: CLINGEN:vic008 ('ACMG v2015 PM2, absent from
controls in population databases')
description: "Variant is absent from a large cohort of non-finnish
europeans (NFE) in the ExAC population database, with sequencing
coverage of the variant exceeding 25X"
outcome: "moderately supporting"
supporting_reference:
- id: PMID:27997510
supporting_data:
- id: CLINGEN:PAF082A type: SEPIO:9000895 ('allele frequency
data')
value: "0"
- id: CLINGEN:PAF082B
type: SEPIO:9000846 ('median sequencing coverage data')
value: "28X"
- id: CLINGEN:PAF082C
type: SEPIO:9000878 ('population ethnicity data')
value: "non-finnish european”
…....
header
entities
assocs
variants:
- id: CLINVAR:226213
type: SO:0001483 ('single
nucleotide variant')
label:"NM_007294.3(BRCA1):c
.4677_5075del"
positions:
- type: HGVS
value:"NM_007294.3:c.4677_
5075del"
Use GA4GH variant
representation (Reece
Hart leading)
http://bit.ly/variant-path-PXF
ClinGen (Larry Babb) collab
Complex phenotypes
 Not every phenotype can be boiled down to a pre-packaged ontology
term
 PXF allows post-coordination / post-composition
– E.g. ‘mild’, ‘severe’ qualifiers
– Temporal qualifiers: start, end, acute/chronic, …
– Specifying precise location of phenotype
– On-the-fly composition of phenotypic descriptors from base ontologies
• Chemical entities
• Cell types
• GO
• Anatomy
 Additionally
– Free text descriptions
– Measurements / quantitative phenotypes
– Environments (ongoing)
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple
species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
PXF and GA4GH Stack
 PXF primary use case is as a file format
 GA4GH primary use case as an API
 Obviously these are related…
 ...But the devil is in the details
– E.g. Is there a well-defined mapping between proto and JSON?
 How can we better interoperate? Working to converge (M.
Diekhans)
– Define PXF using ProtoBuf
– What would a query API look like?
• As an exchange format, we don’t have to worry about this
• Query APIs for complex data structures proliferate complexity
• What is the overall GA4GH strategy here?
PXF, GA4GH, and other related activities
 G2P
– PXF extends initial implementation
– Make PXF a FHIR resource
 Metadata
– Align how to reference ontology terms
– Standardizing identifier prefixes
 MME
– PXF does not provide a search API
– PXF subsumes phenotype profile representation
 Beacon
– PXF could be a response element
Summary: Phenotype Exchange Format
• One model, derive alternate concrete forms
– YAML, JSON, RDF, TSV (subset)
• Species-agnostic
– From microbes through plants through humans
– clinical and basic research
• Applicable to a variety of entities
– Patients/individual organisms, cohorts, populations
– Diseases
– Papers
– Genes, genotypes, alleles, variants
• Simple for simple cases…
– Bag of terms model
• …Incremental expressivity
– Temporality and causality
– Quantitative as well as qualitative
– Negation, severity, frequency, penetrance, expressivity
• Ontology-smart
– Rational Composition (post-coordination)
– Explicit semantics
http://phenopackets.org
Phenopacket Tool ecosystem
• Non JVM language bindings
– Python (beta)
• https://github.com/phenopackets/phenopacket-python/
– Javascript (alpha)
• https://github.com/phenopackets/phenopacket-js/
• Pxftools
– command line library, Scala utilities
– https://github.com/phenopackets/pxftools
• PhenoPacketScraper
– GSOC project to make phenopackets from case study articles
– https://github.com/monarch-initiative/phenopacket-scraper-core
• OwlSim
– Like blast, for phenotypes
– https://github.com/monarch-initiative/owlsim-v3
• WebPhenote
– Noctua extension for phenopacket creation
– http://create.monarchinitiative.org
Acknowledgments
• Chris Mungall
(schema/architecture)
• Jules Jacobsen (java API)
• James Balhoff (pxftools)
• Jeremy Nguyen-Xuan (pxftools)
• Seth Carbon (web phenote)
• Kent Shefcheck (python API)
• Matt Brush (modeling)
• Dan Keith (web phenote)
• Satwik Bhattamishra (GSOC
student, PhenoPacketScraper)
• Julie McMurry
• Peter Robinson
• Pier Buttigieg
• Ramona Walls
• Damian Smedley
• Sebastian Kohler
• Tudor Groza
• Harry Hochheiser
• Mark Diekhans
• Melanie Courtot
• Michael Baudis
• Helen Parkinson
• Suzanna Lewis
Phenopackets as applied to variant interpretation

Weitere ähnliche Inhalte

Was ist angesagt?

The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...mhaendel
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updatemhaendel
 
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMaking the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMichel Dumontier
 
Use of semantic phenotyping to aid disease diagnosis
Use of semantic phenotyping to aid disease diagnosisUse of semantic phenotyping to aid disease diagnosis
Use of semantic phenotyping to aid disease diagnosismhaendel
 
Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery mhaendel
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoverymhaendel
 
Integrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discoveryIntegrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discoverymhaendel
 
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introductionmhaendel
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverymhaendel
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
 
What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...mhb120
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...João André Carriço
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsTim Clark
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR ProfilingCreative-Bioarray
 
Resazurin Cell Viability Assay
Resazurin Cell Viability AssayResazurin Cell Viability Assay
Resazurin Cell Viability Assaycreativebioarray22
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataChris Mungall
 
Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshopChris Mungall
 

Was ist angesagt? (20)

The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
 
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMaking the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discovery
 
Use of semantic phenotyping to aid disease diagnosis
Use of semantic phenotyping to aid disease diagnosisUse of semantic phenotyping to aid disease diagnosis
Use of semantic phenotyping to aid disease diagnosis
 
Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
 
Integrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discoveryIntegrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discovery
 
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introduction
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discovery
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
 
What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR Profiling
 
Resazurin Cell Viability Assay
Resazurin Cell Viability AssayResazurin Cell Viability Assay
Resazurin Cell Viability Assay
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
 
Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshop
 

Ähnlich wie Phenopackets as applied to variant interpretation

Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...Artificial Intelligence Institute at UofSC
 
A knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsA knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsramakanz
 
ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014ENCODE-DCC
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...Felipe Albrecht
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppSimon Jupp
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesLeighton Pritchard
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
Introduction to BioNLP and its applications
Introduction to BioNLP and its applicationsIntroduction to BioNLP and its applications
Introduction to BioNLP and its applicationsShankaiYan
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenix Bioinformatics
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...Neuroscience Information Framework
 
The Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype GridThe Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype GridHarry Hochheiser
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebaseKew Sama
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
Ontology-based data access and semantic mining with Aber-OWL
Ontology-based data access and semantic mining with Aber-OWLOntology-based data access and semantic mining with Aber-OWL
Ontology-based data access and semantic mining with Aber-OWLRobert Hoehndorf
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
 

Ähnlich wie Phenopackets as applied to variant interpretation (20)

Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
 
A knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsA knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systems
 
ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Introduction to BioNLP and its applications
Introduction to BioNLP and its applicationsIntroduction to BioNLP and its applications
Introduction to BioNLP and its applications
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...
 
The Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype GridThe Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype Grid
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
gene_concept_2.pdf
gene_concept_2.pdfgene_concept_2.pdf
gene_concept_2.pdf
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
Ontology-based data access and semantic mining with Aber-OWL
Ontology-based data access and semantic mining with Aber-OWLOntology-based data access and semantic mining with Aber-OWL
Ontology-based data access and semantic mining with Aber-OWL
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 

Mehr von mhaendel

The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA mhaendel
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholdermhaendel
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odysseymhaendel
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebasesmhaendel
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?mhaendel
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsmhaendel
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we domhaendel
 
Force11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapeForce11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapemhaendel
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardmhaendel
 
On the nature of Credit
On the nature of CreditOn the nature of Credit
On the nature of Creditmhaendel
 
Standardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontologyStandardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontologymhaendel
 

Mehr von mhaendel (12)

The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we do
 
Force11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapeForce11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscape
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standard
 
On the nature of Credit
On the nature of CreditOn the nature of Credit
On the nature of Credit
 
Standardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontologyStandardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontology
 

Kürzlich hochgeladen

Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 

Kürzlich hochgeladen (20)

Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 

Phenopackets as applied to variant interpretation

  • 1. Making Phenotypic data FAIR++ for Disease Diagnosis and Discovery Findable Accessible outside paywalls and private data sources Attributable Interoperable and Computable, Reusable, exchangeable across contexts and disciplines @ontowonkaMelissa Haendel, PhD
  • 2. Genes Environment Phenotypes+ = Computable encodings are essential Base pairs Variant notation (eg. HGVS) Human Phenotype Ontology Mammalian Phenotype Ontology Medical procedure coding Environment Ontology @ontowonka
  • 3. Genes Environment Phenotypes VCF PXFGFF Standard exchange formats exist for genes … but for phenotypes? Environment? BED @ontowonka
  • 4. Problems with tabular formats • Denormalized – Repetition of fields – Ad-hoc syntax for multi-values fields, nesting • Proliferation – different formats generated for each use case • E.g. disease-phenotype, patient-phenotype, … • Hard to extend – Not all phenotypes can be pre-packaged as a phenotype term • E.g. Measurements, environments • Ad hoc software, need standard libraries • Focus should be on the datamodel
  • 5. Phenopackets for clinical labs Patient and family history Diagnostic tests, clinical phenotypes Genomic information Physical exam Patient medical history Clinical labs often get no phenotypes or one-line descriptions. What if we could make the phenotype data PHI-free and simultaneously more descriptive? Clinical testing lab
  • 6. Phenopackets for journals Each article can be associated with a phenopacket Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372 Each phenopacket can be shared via DOI in any repository outside paywall (eg. Figshare, Zenodo, etc) and cited as a data citation
  • 7. Phenopackets for databases Databases could share G2P data in a standardized format, retaining domain or species specificity OMIA
  • 8. Ontologies provide pre-packaged phenotype descriptions
  • 9. A simple data model  Entities –Organism • Patient • Non-human animal • Population –Genetic/genomic element –Condition • Disease • Phenotype  Associations –E.g. between disease and phenotype –Each association has • Evidence • Provenance Entity Condition associationEvidence Disease Phenotype
  • 11. monarchinitiative.org title: "age of onset example" persons: - id: "#1" label: "Donald Trump" sex: "M" phenotype_profile: - entity: "person#1" phenotype: types: - id: "HP:0200055" label: "Small hands" onset: description: "during development" types: - id: "HP:0003577" label: "Congenital onset" evidence: - types: - id: "ECO:0000033" label: ”Traceable Author Statement" source: - id: "PMID:1" Image credits: upi.com What does a PhenoPacket look like? Canonical JSON format
  • 12. Nesting allows refinement phenotype_profile: - entity: “#1” phenotype: types: - id: HP:0100024 label: conspicuously happy disposition onset: types: - id: HP:0011463 label: Adult onset description: “Writes distracting tweets” header entities assocs persons: - id: „#1“ label: Mickey Mouse date_of_birth: 1928-01-01 sex: M - id: „#2“ label: Goofy sex: M patients.pxf
  • 13. monarchinitiative.org title: "measurement example, taken from genenetwork.org" organisms: - id: "#1" label: "BXD mouse population” taxon: NCBITaxon:10090 phenotype_profile: - entity: "#1" phenotype: description: "cerebellum weight" types: - id: "PATO:0000128" label: "weight" measurements: - unit: mg value: 61.400 property_values: - property: standard_error filler: 2.38 attribute_of: types: - id: "UBERON:0002037" label: "cerebellum" onset: description: "measured in adults" types: - id: "MmusDv:0000061" label: "early adult" Ontology of Statistical properties We can represent population phenotypes too attribute For non-abnormal phenotypes we can use a trait ontology, or a building block approach, with • PATO • Uberon Measured entity UO How does it handle measurements?
  • 14. Example: pathogenicity for a variant) disease_profile: - entity: CLINVAR:226213 disease: - id: NCIT:C4872 label: "Breast Carcinoma" interpretation: "pathogenic" contributors: - id: CLINGEN:Agent007 label: "Clinical Pathogenicity Calculator v1" created: "2016-07-12T11:00:59+00:00" method: - id: doi:10.1038/gim.2015.30 label: "ACMG ISV guidelines 2015" evidence: - id: CLINGEN:ev025 type: ECO:9000100 ('population frequency evidence') acmg_criterion: CLINGEN:vic008 ('ACMG v2015 PM2, absent from controls in population databases') description: "Variant is absent from a large cohort of non-finnish europeans (NFE) in the ExAC population database, with sequencing coverage of the variant exceeding 25X" outcome: "moderately supporting" supporting_reference: - id: PMID:27997510 supporting_data: - id: CLINGEN:PAF082A type: SEPIO:9000895 ('allele frequency data') value: "0" - id: CLINGEN:PAF082B type: SEPIO:9000846 ('median sequencing coverage data') value: "28X" - id: CLINGEN:PAF082C type: SEPIO:9000878 ('population ethnicity data') value: "non-finnish european” ….... header entities assocs variants: - id: CLINVAR:226213 type: SO:0001483 ('single nucleotide variant') label:"NM_007294.3(BRCA1):c .4677_5075del" positions: - type: HGVS value:"NM_007294.3:c.4677_ 5075del" Use GA4GH variant representation (Reece Hart leading) http://bit.ly/variant-path-PXF ClinGen (Larry Babb) collab
  • 15. Complex phenotypes  Not every phenotype can be boiled down to a pre-packaged ontology term  PXF allows post-coordination / post-composition – E.g. ‘mild’, ‘severe’ qualifiers – Temporal qualifiers: start, end, acute/chronic, … – Specifying precise location of phenotype – On-the-fly composition of phenotypic descriptors from base ontologies • Chemical entities • Cell types • GO • Anatomy  Additionally – Free text descriptions – Measurements / quantitative phenotypes – Environments (ongoing) Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
  • 16. PXF and GA4GH Stack  PXF primary use case is as a file format  GA4GH primary use case as an API  Obviously these are related…  ...But the devil is in the details – E.g. Is there a well-defined mapping between proto and JSON?  How can we better interoperate? Working to converge (M. Diekhans) – Define PXF using ProtoBuf – What would a query API look like? • As an exchange format, we don’t have to worry about this • Query APIs for complex data structures proliferate complexity • What is the overall GA4GH strategy here?
  • 17. PXF, GA4GH, and other related activities  G2P – PXF extends initial implementation – Make PXF a FHIR resource  Metadata – Align how to reference ontology terms – Standardizing identifier prefixes  MME – PXF does not provide a search API – PXF subsumes phenotype profile representation  Beacon – PXF could be a response element
  • 18. Summary: Phenotype Exchange Format • One model, derive alternate concrete forms – YAML, JSON, RDF, TSV (subset) • Species-agnostic – From microbes through plants through humans – clinical and basic research • Applicable to a variety of entities – Patients/individual organisms, cohorts, populations – Diseases – Papers – Genes, genotypes, alleles, variants • Simple for simple cases… – Bag of terms model • …Incremental expressivity – Temporality and causality – Quantitative as well as qualitative – Negation, severity, frequency, penetrance, expressivity • Ontology-smart – Rational Composition (post-coordination) – Explicit semantics http://phenopackets.org
  • 19. Phenopacket Tool ecosystem • Non JVM language bindings – Python (beta) • https://github.com/phenopackets/phenopacket-python/ – Javascript (alpha) • https://github.com/phenopackets/phenopacket-js/ • Pxftools – command line library, Scala utilities – https://github.com/phenopackets/pxftools • PhenoPacketScraper – GSOC project to make phenopackets from case study articles – https://github.com/monarch-initiative/phenopacket-scraper-core • OwlSim – Like blast, for phenotypes – https://github.com/monarch-initiative/owlsim-v3 • WebPhenote – Noctua extension for phenopacket creation – http://create.monarchinitiative.org
  • 20. Acknowledgments • Chris Mungall (schema/architecture) • Jules Jacobsen (java API) • James Balhoff (pxftools) • Jeremy Nguyen-Xuan (pxftools) • Seth Carbon (web phenote) • Kent Shefcheck (python API) • Matt Brush (modeling) • Dan Keith (web phenote) • Satwik Bhattamishra (GSOC student, PhenoPacketScraper) • Julie McMurry • Peter Robinson • Pier Buttigieg • Ramona Walls • Damian Smedley • Sebastian Kohler • Tudor Groza • Harry Hochheiser • Mark Diekhans • Melanie Courtot • Michael Baudis • Helen Parkinson • Suzanna Lewis

Hinweis der Redaktion

  1. The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  2. The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  3. Note key-value approach
  4. https://pixabay.com/en/instruments-measurement-measure-860912/