Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
17. Sep 2017•0 gefällt mir•350 views
Downloaden Sie, um offline zu lesen
Melden
Gesundheit & Medizin
Architecture of language and data translation that underlays the NCATS Biomedical Data Translator. Presented at the Fanconi Anemia Annual Meeting. http://fanconi.org/index.php/research/annual_symposium
3. Prevailing clinical genomic pipelines
leverage only a tiny fraction of the available
data
Under-utilized data
Loss of discriminatory power
?
4. More species = more coverage
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016
19,008
9,739
51%
5. More species = more coverage
19,008
78%
14,779
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016
19,008
9,739
51%
Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
6. More species = more coverage
Even inclusion of just four species boosts phenotypic coverage of genes by 38%
(5189%)
Combined = 89%
19,008
2,195 7,544 7,235 = 16,974
(union of coverage in any species)
Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
9. Challenge: Each data source uses their own
vocabulary/ontology
ZFA
MP
DPO
WPO
HP
OMIA
VT
FYPO
APO
SNOMED
…
…
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
IMPC
OMIM
…
QTLdb
HPOA
EHR
10. Can we help machines understand
phenotypes?
“Triphalangeal
thumb”
Human phenotype
I have absolutely
no idea what
that means
11. Decomposition of complex concepts allows
interoperability
“Triphalangeal
thumb”
Phalanx of manual
digit
=
Human phenotype PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
GO
=
duplicated
embryonic skeletal
system
morphogenesis
12. Decomposition of complex concepts allows
interoperability
“Triphalangeal
thumb”
Phalanx of manual
digit
=
Human phenotype PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
GO
“Polydactyly”
Mouse phenotype
=
duplicated
embryonic skeletal
system
morphogenesis
13. Example case solved by Exomiser
Phenotypic
profile
Genes
Heterozygous,
missense mutation
STIM-1
N/A
Heterozygous,
missense mutation
STIM-1
N/A
Stim1Sax/Sax
Ranked STIM-1 variant maximally pathogenic
based on cross-species G2P data,
in the absence of traditional data sources
https://exomiser.github.io/Exomiser/
bit.ly/stim1paper
14. Example case solved by Exomiser
Phenotypic
profile
Genes
Heterozygous,
missense mutation
STIM-1
N/A
Heterozygous,
missense mutation
STIM-1
N/A
Stim1Sax/Sax
Ranked STIM-1 variant maximally pathogenic
based on cross-species G2P data,
in the absence of traditional data sources
bit.ly/stim1paper
In Genomics England 100K Genomes, of first 1936 diagnosed
patients, 82% are in the top 5 Exomiser hits across a range
of rare diseases and family structures
16. Harmonizing diseases, phenotypes, anatomy, and genotypes
91% of our 2.2 Million G2P associations require integrating
2 or more data sources
19. Translational applicability for FA
Tools can support more rapid diagnostics for FA
patients
Integration of data enables mechanistic discovery
and new candidate gene targets
Identification of models for FA hypothesis
validation
Helping patients contribute data and participate
in their ongoing evaluation, care, and science
20. Acknowledgements
Lawrence Berkeley
Chris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
Nicole Washington
Charite
Sebastian Kohler
Garvan
Tudor Groza
Craig McNamara
RTI
Jim Balhoff
Boston Children’s
Ingrid Holm
Catherine Brownstein
John Brownstein
ClinGen
Heidi Rehm
Larry Babb
Harindra Arachchi
OHSU
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Dan Keith
Maureen Hoatlin
Genomics England/Queen
Mary
Damian Smedley
Jules Jacobson
Tomasz Konopka
Pilar Cacheiro
Jackson Laboratory
Peter Robinson
Leigh Carmody
Hannah Blau
EBI
Helen Parkinson
David Osumi-Sutherland
With special thanks to Julie McMurry for excellent graphic design
Johns Hopkins
Chris Chute
Casey Overby
Ada Hamosh
Mayo
Hongfang Liu
Ravi Komandur
UCSC
David Haussler
Benedict Paten
Mark Deikhans
Scripps
Andrew Su
Ben Good
Chunlei Wu
Gregg Stupp
Sanford Health Imagenetics
Neal Boerkoel
Kayli Rageth
Murat Sincan
21. www.monarchinitiative.org
Chris Mungall, Peter Robinson, Damian Smedley
Funding:
NIH Office of Director: 1R24OD011883; NIH-UDP: HHSN268201300036C, HHSN268201400093P;
NCINCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman)
Our approach is to try and get the machine to understand the terms so that it can assist us intelligently.
We make things digestible. Complex concepts into simpler parts. We use ontologies that are comparative by design.
This was the novel case we solved. The UDP patient had a number of signs and symptoms including various platelet abnormalities. The same heterozygous, missense mutation was seen in 2 patients and ranked top by Exomiser. It had never been seen in any of the SNP databases and was predicted maximally pathogenic. Finally a mouse curated by MGI involving a heterozygous, missense point mutation introduced by chemical mutagenesis exhibited strikingly similar platelet abnormalities.
In thefirst 1936 patients, 82% are in the top 5 Exomiser hits. This is across a whole range of different rare diseases and family structures ie. 34% cases are just simple singletons.
This was the novel case we solved. The UDP patient had a number of signs and symptoms including various platelet abnormalities. The same heterozygous, missense mutation was seen in 2 patients and ranked top by Exomiser. It had never been seen in any of the SNP databases and was predicted maximally pathogenic. Finally a mouse curated by MGI involving a heterozygous, missense point mutation introduced by chemical mutagenesis exhibited strikingly similar platelet abnormalities.
In thefirst 1936 patients, 82% are in the top 5 Exomiser hits. This is across a whole range of different rare diseases and family structures ie. 34% cases are just simple singletons.
If we include bridging ontologies, we can unify diseases across sources AND phenotypes across sources and organisms.
If we include bridging ontologies, we can unify diseases across sources AND phenotypes across sources and organisms.
There are a lot of people who have contributed to this work over many years.
Fully translational – from bench to bedside – group of stakeholders, contributors and partners
The classic G+E=P. But the = has a lot that can be applied to aid the linking.