SlideShare ist ein Scribd-Unternehmen logo
1 von 69
“SEMANTICS-IN-A-BOX”
INTEGRATED DATA APPLIANCES TO CONTEXTUALIZE
EXPERIMENTS WITH A WORLD OF PUBLIC KNOWLEDGE
1
Erich Gombocz
IO Informatics, Berkeley, CA, USA
egombocz@io-informatics.com
OUTLINE
• STATE OF SEMANTIC INTEROPERABILITY
• ADOPTION IN LIFE SCIENCES ONGOING FOR YEARS … - WHY?
• LINKED LIFE DATA (LLD) / LINKED OPEN DATA (LOD)
• ROADBLOCKS, APPROACHES AND SOLUTIONS
• WHAT IS AN ‘INTEGRATED DATA APPLIANCE’ (IDA)?
• ‘SEMANTICS IN A BOX’: PERSISTENT, CURRENT KNOWLEDGEBASE(S)
• COMBINING APPLICATIONS AND RESOURCES – PRE-CONFIGURED, CONTROLLED VERSIONING
• KNOWLEDGEBASE EXAMPLES
• DRUG, TARGETS AND DISEASES; PROTEOMICS; METABOLOMICS; MICROBIAL PATHOGENS
• THE IDA EXPERIENCE
• FLY-THROUGH : USING KBS TO ENRICH EXPERIMENTAL DATA, ACTIONABLE QUERIES
• TAKE HOME
• PROS & CONS OF ‘SEMANTICS IN A BOX’, CONCLUSIONS
• ACKNOWLEDGEMENTS, REFERENCES
2© 2013 -
STATE OF SEMANTIC INTEROPERABILITY
LIFE SCIENCES ADOPTION RATE & REASONS
3© 2013 -
• RDF HAS EVOLVED AS ACCEPTED FRAMEWORK
• DYNAMIC, EXTENSIBLE, INTEROPERABLE SOLUTIONS NEEDED FOR BIG DATA
• ADVANTAGE: DON’T NEED TO KNOW A PRIORI WHICH QUESTIONS TO ASK
• THE LOD CLOUD IS GROWING …
• SPARQL 1.1 IS DE-FACTO STANDARD
• MARCH 21, 2013 W3C RECOMMENDATION
• LOTS OF POCS, PILOT STUDIES …
BUT
• TOO IDEALISTIC EXPECTATIONS:
***** LINKED (OPEN) DATA ≠ ***** COLLABORATIVE USABILITY !
• DIVERGING DIRECTIONS:
• DIFFERENT VOCABULARIES, REGISTRIES, OBJECTIVES, DESCRIPTORS
• DIFFERENT APPROACHES, PROVENANCE METADATA (VOID, PROV-O, PAV,
OPENPHACTS, BIO2RDF, BIODBCORE, SADI, MIRIAM)
• W3C HCLS TRIES TO RESOLVE THIS BY BUILDING CONSENT ON MAPPINGS
4
LINKED LIFE DATA / LINKED OPEN DATA
ROADBLOCKS, APPROACHES, SOLUTIONS
5© 2013 -
THINKING LLD / LOD
6
MYTH #1: PUBLIC SPARQL ENDPOINTS ARE EQUAL
• DIFFERENT VOCABULARIES, REGISTRIES, OBJECTIVES, DESCRIPTORS
• DIFFERENT CONCEPTUAL APPROACH (OPENPHACTS, BIO2RDF,
BIODBCORE, SADI, MIRIAM, …)
MYTH #2: PUBLIC SPARQL ENDPOINTS ARE INTEROPERABLE
• VERSIONING AND PROVENANCE ISSUES (PROV-O, VOID, SKOS, PAV)
• CLINICAL INTEROPERABILITY (HL7, MEDDRA, CDISC, MESH, ICD9/10 …)
MYTH #3: PUBLIC RESOURCES ARE ALWAYS AVAILABLE
• RELIABILITY CONCERNS FROM SERVICE-LEVEL TO URI PERSISTENCE
• MORE AND MORE “OPEN DATA” ARE CLOSED FOR COMMERCIAL USE
• ISSUES OF ACCESS TRACEABILITY ON CONFIDENTIAL DATA
• SERIOUS FUNDING UNEASE ABOUT AVAILABILITY OF GOVERNMENT-BACKED RESOURCES
NAVIGATING OBSTACLES
7
• OBJECTIVES NOT ALIGNED WITH USE CASES
• MISSING DOMAIN EXPERTISE: DATA RELATIONSHIP GUESSWORK
• NO PROVENANCE OR VERSIONING CONSIDERATIONS AT START
• INCONSISTENT NAMESPACE POLICIES AND MAPPING PRACTICES
• RELIANCE ON INTERNAL, NON-DESCRIPTIVE ONTOLOGIES WHICH
PREVENT INTEROPERABILITY
• MISALIGNMENT OF EXPERIMENTAL , CORPORATE AND PUBLIC
STANDARDS
• WAITING FOR THE ‘PERFECT’ ONTOLOGY – WILL IT EVER COME?
• IS ‘SAME AS’ IN A REALLY THE SAME IN B ?
• CONCEPTUALLY? CONTEXTUALLY? SEMANTICALLY?
• HANDLING CHANGES: TRADEOFFS IN SIMPLIFYING REDUNDANCY
© 2013 -
AVAILABILITY CHALLENGE
IS MY RESOURCE UP TODAY?
BEST PRACTICES CHECKLIST
• WHICH RESOURCES DO WE NEED?
• REVIEW BASICS (LICENSING, PROVENANCE, VERSIONING, HIGH INTERLINK
QUALITY, PERSISTENCE)
• BUILD GENERALLY APPLICABLE SOLUTIONS (VOCABULARIES, COMMON
PREDICATES)
• FOCUS ON TRUE ‘’ RESOURCES
• DYNAMIC “APPLICATIONS ONTOLOGY” FIRST!
• HAVE THE BIG PICTURE IN MIND, BUT DON’T WAIT FOR PERFECTION
• ALIGN WITH FORMAL ONTOLOGIES (OR PARTS OF)
WHENEVER POSSIBLE
• NCBO BIOPORTAL
• THINK INTEROPERABILITY FROM THE BEGINNING
9
10© 2013 -
WHAT IS AN IDA, AND WHY?
INTEGRATED DATA APPLIANCE
THE IDA CONCEPT
• INTEGRATED, PERSISTENT, CURRENT SEMANTIC KBS
• GOAL: READY TO USE FOR ENRICHMENT OF EXPERIMENTAL / INTERNAL DATASETS
• COMBINING APPLICATIONS AND RESOURCES
• WEB QUERY SERVER, KNOWLEDGE EXPLORER PRO, VIRTUOSO
• ALL NECESSARY TOOLS INCLUDED FOR MAPPING AND QUERY
• PRE-CONFIGURED KNOWLEDGEBASE(S), CONTROLLED VERSIONING, PERIODIC
UPDATES
• ENTERPRISE-READY APPLIANCE
• 64 GB RAM FOR FAST QUERY PERFORMANCE
• RAID-5 REDUNDANT ARCHIVING
11
KB EXAMPLE 1
DRUGS, TARGETS, DISEASES KNOWLEDGEBASE
12© 2013 -
RESOURCES
DRUGBANK
DISEASOME
SIDER
UNIPROT
REACTOME
NCBI BIOSYSTEMS
13
14
15
16
17
18
19
20
21
KB EXAMPLE 2
PROTEOMICS KNOWLEDGEBASE
22© 2013 -
RESOURCES
UNIPROT
GO
REACTOME
NCBI BIOSYSTEMS
23
24
25
26
27
28
KB EXAMPLE 3
METABOLOMICS KNOWLEDGEBASE
29© 2013 -
RESOURCES
HMDB
PUBCHEM
PUBCHEM ASSAY
BIOCYC
30
31
32
33
34
35
KB EXAMPLE 4
MICROBIAL PATHOGEN KNOWLEDGEBASE
36© 2013 -
RESOURCES
ICTV
MIST2
BIOCYC
PATRIC
NCBI TAXONOMY
37
38
39
40
41
42© 2013 -
‘SEMANTICS IN A BOX’ EXPERIENCE
CONTEXTUALIZING EXPERIMENTS WITH KB RESOURCES
43© 2013 -
USE CASE 1:
TOXICITY CLASSIFICATION
BIOLOGICAL QUALIFICATION OF COMBINATORIAL BIOMARKERS
WITH PHARMACOGENOMIC EXPERIMENTAL CORRELATIONS
RESOURCES
INTERNAL:
GENE EXPRESSION
QUANT. METABOLOMICS
KBS:
DRUGBANK
DISEASOME
SIDER
UNIPROT
GO
REACTOME
NCBI BIOSYSTEMS
44
45
46
47
48
49
50
51
52
53
54
55
56
57
RESULT
58
BIOLOGICALLY QUALIFIED SETS OF BIOMARKERS TO SCREEN FOR DIFFERENT
TYPES OF TOXICITY
• Benzene Toxicity 18 genes, 2 metabolites
• Ethanol Toxicity 16 genes, 6 metabolites
• Halogenated Toxicity 21 genes, 5 metabolites
59
60© 2013 -
USE CASE 2:
PATHOGEN IDENTIFICATION
IDENTIFICATION OF PATHOGENS IN SAMPLES USING
MULTIPLE PUBLIC MICROBIAL PATHOGEN RESOURCES
RESOURCES
INTERNAL:
MICROBIAL ASSAYS
MS SEQUENCING
KBS:
ICTV
MIST2
BIOCYC
PATRIC
NCBI TAXONOMY
61
62
63
TAKE HOME
PRO’S & CON’S OF ‘SEMANTICS IN A BOX’, CONCLUSIONS
64© 2013 -
‘SEMANTICS IN A BOX’
PROS
• READY-TO-GO: NO SETUP AND INTEGRATION TIME, NO INTEROPERABILITY ISSUES
• PRECONFIGURED ENTERPRISE-READY HARDWARE WITH SEMANTICALLY INTEGRATED SETS OF PUBLIC
KNOWLEDGEBASES OUT-OF-THE-BOX
• NO CONCERNS ABOUT UPTIME OF PUBLIC RESOURCES
• CONTROLLED VERSIONING AND MAINTENANCE CYCLES SOLVE RELIABILITY AND DATA
INTEGRITY ISSUES
• NO TRACEABILITY WORRIES ON CONFIDENTIAL DATA
• INTEGRATED CLIENT AND WEB APPLICATIONS FOR GRAPH VISUALIZATION, EXPLORATION
AND QUERY REDUCE BARRIERS TO ENTRY FOR END USERS AND FOCUS PRIMARILY ON ITS
SCIENTIFIC UTILITY
CONS
• LIVE PUBLIC RESOURCES MAY UPDATE IN-BETWEEN SCHEDULED MAINTENANCE
• SELECTION OF RESOURCES MAY NOT SUFFICE ALL USE CASES
65
CONCLUSIONS
• THE USE OF IDA-HOSTED PUBLIC RESOURCES COMBINED WITH EXPERIMENTAL DATA TO
PROVIDE MODELS FOR CLASSIFICATION OF TOXICITY TYPES IN PRE-CLINICAL SETTINGS
DEMONSTRATES A SUCCESSFUL AND FAST SEMANTIC INTEGRATION WHICH PROVIDED
BIOLOGICAL QUALIFICATION OF GENOMIC AND METABOLOMIC BIOMARKERS.
• AS RDF IS ALREADY PRE-ALIGNED AND CONTAINS PROVENANCE AND VERSIONING, A
BETTER A PRIORI DETERMINATION OF ADVERSE EFFECTS OF DRUG COMBINATIONS CAN BE
ACHIEVED MUCH FASTER AND AT MUCH LESS EFFORT. RICH SPARQL QUERIES CORRELATE
RESPONSES OF UNRELATED STUDIES WITH DIFFERENT EXPERIMENTAL MODELS, AND
VALIDATE SYSTEM CHANGES ASSOCIATED WITH KNOWN COMMON TOXICITY MECHANISMS.
• HAVING LINKED DATA AVAILABLE IN ONE APPLIANCE TOGETHER WITH EXPERIMENTAL
RESULTS MAKES IT EASY TO EMPLOY SEMANTIC TECHNOLOGIES WORRY FREE, AND, AS
SUCH, TO PROMOTE A BETTER UNDERSTANDING OF BIOLOGICAL SYSTEMS MORE READILY.
TIME AND MONEY SAVED HAS HUGE SOCIO-ECONOMIC BENEFITS IN DRUG DISCOVERY AND
HEALTHCARE.
66
ACKNOWLEDGEMENTS
67
SUPPORT FOR TOXICITY STUDIES
NIST ATP #70NANB2H3009
NIAAA #HHSN281200510008C
W3C
HCLS LLD / PHARMACOGENOMICS SIG
Scott Marshall, Michel Dumontier
PATHOGEN PROJECT
FDA NARMS
Sherry Ayers
PUBLIC RESOURCES
SIB / UNIPROT CONSORTIUM
Jerven Bolleman
WIKIMEDIA FOUNDATION
Anja Jentsch
BIO2RDF II
Michel Dumontier
BMIR / NCBO STANFORD
Mark Musen, Trish Whetzel
IDA DEVELOPMENT
SAGE-N
James Candlin, David Chiang
IO INFORMATICS
Andrea Splendiani, Jason Eshleman,
Robert Stanley
TOXICITY PROJECT
COGENICS
Pat Hurban, Alan Higgins, Imran Shah, Hongkang Mei,
Ed Lobenhofer
BOWLES CENTER FOR ALCOHOL STUDIES / UNC
Fulton Crews
REFERENCES
1) LDOW2012 Linked Data on the Web. Bizer C,Heath T, Berners-Lee T, Hausenblas M. WWW Workshop on Linked Data on the Web, 2012
Apr.16, Lyon, France.
2) The National Center for Biomedical Ontology. Musen MA, Noy NF, Shah NH, Whetzel PL, Chute CG, Story MA, Smith B. J Am Med Inform
Assoc. 2012 Mar-Apr; 19 (2): 190-5
3) Using SPARQL to Query BioPortal Ontologies and Metadata Salvadores M, Horridge M, Alexander PR, Fergerson RW, Musen MA, and Noy NF.
International Semantic Web Conference. Boston US. LNCS 7650, pp. 180195, 2012.
4) The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside.
Luciano JS, Andersson B, Batchelor C, Bodenreider O, Clark T, Denney CK, Domarew C, Gambet T, Harland L, Jentzsch A, Kashyap V, Kos P,
Kozlovsky J, Lebo T, Marshall SM, McCusker JP, McGuinness DL, Ogbuji C, Pichler E, Powers RL, Prud’hommeaux E, Samwald M, Schriml L,
Tonellato PJ, Whetzel PL, Zhao J, Stephens S, Dumontier M. J.Biomed.Semantics 2011; 2(Suppl 2):S1
5) VoID Vocabulary of Interlinked Datasets. Cyganiak R, Zhao J, Alexander K, Hausenblas M. DERI, W3C note 6-Mar-2011
6) PROV-O: The PROV Ontology. W3C Candidate Recommendation 11- Dec-2012
7) Does network analysis of integrated data help understanding how alcohol affects biological functions? - Results of a semantic approach to
biomarker discovery. Gombocz EA, A.J. Higgins AJ, Hurban P, Lobenhofer EK, Crews FT, Stanley RA, Rockey C, Nishimura T. 2008 Sept.29-
Oct.1.Biomarker Discovery Summit, Philadelphia, PA.
8) W3C Semantic Web Use Cases and Case Studies Case Study: Applied Semantic Knowledgebase for Detection of Patients at Risk of Organ
Failure through Immune Rejection Stanley R, McManus B, Ng R, Gombocz E, Eshleman J, Rockey C. Joint Case Study of IO Informatics and
University British Columbia (UBC), NCE CECR PROOF Centre of Excellence, James Hogg iCAPTURE Centre, Vancouver, BC, Canada, 2011
9) A Novel Approach to Recognize Peptide Functions in Microorganisms: Establishing Systems Biology-based Relationship Networks to Better
Understand Disease Causes and Prevention E. Gombocz E, Candlin J 8th Annual Conference US Human Proteome Organisation: The Future
of Proteomics (HUPO 2012) San Francisco, CA, March 4-7, 2012
10) Correlation Network Analysis and Knowledge Integration In: Applied Statistics for Network Biology: Methods in Systems Biology Plasterer TN,
Stanley R, Gombocz E; M. Dehmer, F. Emmert-Streib, A. Graber, A. Salvador (Eds.)
Wiley-VCH, Weinheim, ISBN: 978-3-527-32750-8 (2011)
11) Improved dataset coverage and interoperability with Bio2RDF Release 2. Callahan A, Cruz-Toledo J, Ansell P, Klassen D, Tumarello G,
Dumontier M. SWAT4LS Workshop. 2012 Nov.30, Paris, France.
12) Ontology-Based Querying with Bio2RDF’s Linked Open Data. Callahan A, Cruz-Toledo J, Dumontier M. 2013. Journal of Biomedical Semantics;
in press.
68
69© 2013 -
THANK YOU!
egombocz@io-informatics.com
QUESTIONS?

Weitere ähnliche Inhalte

Was ist angesagt?

Michael C Li Resume 1.2016 LinkedIn
Michael C Li Resume 1.2016 LinkedInMichael C Li Resume 1.2016 LinkedIn
Michael C Li Resume 1.2016 LinkedIn
Michael Li
 
Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015
Dana Caulder
 
AP for Medical Applications
AP for Medical ApplicationsAP for Medical Applications
AP for Medical Applications
Paul Melnyk
 
John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov Personalised Medicine Essay, Mark - 95 out of 100John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham Taylor
Human Variome Project
 

Was ist angesagt? (20)

Michael C Li Resume 1.2016 LinkedIn
Michael C Li Resume 1.2016 LinkedInMichael C Li Resume 1.2016 LinkedIn
Michael C Li Resume 1.2016 LinkedIn
 
DSRG report 2001
DSRG report 2001DSRG report 2001
DSRG report 2001
 
JALANov2000
JALANov2000JALANov2000
JALANov2000
 
Digital transformation of translational medicine
Digital transformation of translational medicineDigital transformation of translational medicine
Digital transformation of translational medicine
 
Very brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryVery brief overview of AI in drug discovery
Very brief overview of AI in drug discovery
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Webinar: Turning Molecules into Medicines
Webinar: Turning Molecules into MedicinesWebinar: Turning Molecules into Medicines
Webinar: Turning Molecules into Medicines
 
Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015
 
Updated Agenda- CRISPR Congress in Berlin, 24-26 October 2016
Updated Agenda- CRISPR Congress in Berlin, 24-26 October 2016Updated Agenda- CRISPR Congress in Berlin, 24-26 October 2016
Updated Agenda- CRISPR Congress in Berlin, 24-26 October 2016
 
Collaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile ageCollaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile age
 
Expert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchExpert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational Research
 
Using the CDD Vault for MM4TB
Using the CDD Vault for MM4TBUsing the CDD Vault for MM4TB
Using the CDD Vault for MM4TB
 
Validating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesValidating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniques
 
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
 
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
 
AP for Medical Applications
AP for Medical ApplicationsAP for Medical Applications
AP for Medical Applications
 
John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov Personalised Medicine Essay, Mark - 95 out of 100John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov Personalised Medicine Essay, Mark - 95 out of 100
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare system
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham Taylor
 

Ähnlich wie E.Gombocz: Semantics in a Box (SemTech 2013-04-30)

Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016
Rick Silva
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 
Khoury ashg2014
Khoury ashg2014Khoury ashg2014
Khoury ashg2014
muink
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
Barry Smith
 

Ähnlich wie E.Gombocz: Semantics in a Box (SemTech 2013-04-30) (20)

Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbe
 
Next Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemNext Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problem
 
Wimmics seminar--drug interaction knowledge base, micropublication, open anno...
Wimmics seminar--drug interaction knowledge base, micropublication, open anno...Wimmics seminar--drug interaction knowledge base, micropublication, open anno...
Wimmics seminar--drug interaction knowledge base, micropublication, open anno...
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
E.Gombocz: Changing the Model in Pharma and Healthcare (DILS Keynote 2013-07...
E.Gombocz: Changing the Model in Pharma and Healthcare  (DILS Keynote 2013-07...E.Gombocz: Changing the Model in Pharma and Healthcare  (DILS Keynote 2013-07...
E.Gombocz: Changing the Model in Pharma and Healthcare (DILS Keynote 2013-07...
 
Vph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalVph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_final
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
Big data, bioscience and the cloud   biocatalyst june 2015 sullivanBig data, bioscience and the cloud   biocatalyst june 2015 sullivan
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
 
Mr. Thomas A. Burke - One Health, Traceability and Emerging Technologies
Mr. Thomas A. Burke - One Health, Traceability and Emerging TechnologiesMr. Thomas A. Burke - One Health, Traceability and Emerging Technologies
Mr. Thomas A. Burke - One Health, Traceability and Emerging Technologies
 
Khoury ashg2014
Khoury ashg2014Khoury ashg2014
Khoury ashg2014
 
Next Generation Data and Opportunities for Clinical Pharmacologists
Next Generation Data and Opportunities for Clinical PharmacologistsNext Generation Data and Opportunities for Clinical Pharmacologists
Next Generation Data and Opportunities for Clinical Pharmacologists
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 

Kürzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 

E.Gombocz: Semantics in a Box (SemTech 2013-04-30)

  • 1. “SEMANTICS-IN-A-BOX” INTEGRATED DATA APPLIANCES TO CONTEXTUALIZE EXPERIMENTS WITH A WORLD OF PUBLIC KNOWLEDGE 1 Erich Gombocz IO Informatics, Berkeley, CA, USA egombocz@io-informatics.com
  • 2. OUTLINE • STATE OF SEMANTIC INTEROPERABILITY • ADOPTION IN LIFE SCIENCES ONGOING FOR YEARS … - WHY? • LINKED LIFE DATA (LLD) / LINKED OPEN DATA (LOD) • ROADBLOCKS, APPROACHES AND SOLUTIONS • WHAT IS AN ‘INTEGRATED DATA APPLIANCE’ (IDA)? • ‘SEMANTICS IN A BOX’: PERSISTENT, CURRENT KNOWLEDGEBASE(S) • COMBINING APPLICATIONS AND RESOURCES – PRE-CONFIGURED, CONTROLLED VERSIONING • KNOWLEDGEBASE EXAMPLES • DRUG, TARGETS AND DISEASES; PROTEOMICS; METABOLOMICS; MICROBIAL PATHOGENS • THE IDA EXPERIENCE • FLY-THROUGH : USING KBS TO ENRICH EXPERIMENTAL DATA, ACTIONABLE QUERIES • TAKE HOME • PROS & CONS OF ‘SEMANTICS IN A BOX’, CONCLUSIONS • ACKNOWLEDGEMENTS, REFERENCES 2© 2013 -
  • 3. STATE OF SEMANTIC INTEROPERABILITY LIFE SCIENCES ADOPTION RATE & REASONS 3© 2013 -
  • 4. • RDF HAS EVOLVED AS ACCEPTED FRAMEWORK • DYNAMIC, EXTENSIBLE, INTEROPERABLE SOLUTIONS NEEDED FOR BIG DATA • ADVANTAGE: DON’T NEED TO KNOW A PRIORI WHICH QUESTIONS TO ASK • THE LOD CLOUD IS GROWING … • SPARQL 1.1 IS DE-FACTO STANDARD • MARCH 21, 2013 W3C RECOMMENDATION • LOTS OF POCS, PILOT STUDIES … BUT • TOO IDEALISTIC EXPECTATIONS: ***** LINKED (OPEN) DATA ≠ ***** COLLABORATIVE USABILITY ! • DIVERGING DIRECTIONS: • DIFFERENT VOCABULARIES, REGISTRIES, OBJECTIVES, DESCRIPTORS • DIFFERENT APPROACHES, PROVENANCE METADATA (VOID, PROV-O, PAV, OPENPHACTS, BIO2RDF, BIODBCORE, SADI, MIRIAM) • W3C HCLS TRIES TO RESOLVE THIS BY BUILDING CONSENT ON MAPPINGS 4
  • 5. LINKED LIFE DATA / LINKED OPEN DATA ROADBLOCKS, APPROACHES, SOLUTIONS 5© 2013 -
  • 6. THINKING LLD / LOD 6 MYTH #1: PUBLIC SPARQL ENDPOINTS ARE EQUAL • DIFFERENT VOCABULARIES, REGISTRIES, OBJECTIVES, DESCRIPTORS • DIFFERENT CONCEPTUAL APPROACH (OPENPHACTS, BIO2RDF, BIODBCORE, SADI, MIRIAM, …) MYTH #2: PUBLIC SPARQL ENDPOINTS ARE INTEROPERABLE • VERSIONING AND PROVENANCE ISSUES (PROV-O, VOID, SKOS, PAV) • CLINICAL INTEROPERABILITY (HL7, MEDDRA, CDISC, MESH, ICD9/10 …) MYTH #3: PUBLIC RESOURCES ARE ALWAYS AVAILABLE • RELIABILITY CONCERNS FROM SERVICE-LEVEL TO URI PERSISTENCE • MORE AND MORE “OPEN DATA” ARE CLOSED FOR COMMERCIAL USE • ISSUES OF ACCESS TRACEABILITY ON CONFIDENTIAL DATA • SERIOUS FUNDING UNEASE ABOUT AVAILABILITY OF GOVERNMENT-BACKED RESOURCES
  • 7. NAVIGATING OBSTACLES 7 • OBJECTIVES NOT ALIGNED WITH USE CASES • MISSING DOMAIN EXPERTISE: DATA RELATIONSHIP GUESSWORK • NO PROVENANCE OR VERSIONING CONSIDERATIONS AT START • INCONSISTENT NAMESPACE POLICIES AND MAPPING PRACTICES • RELIANCE ON INTERNAL, NON-DESCRIPTIVE ONTOLOGIES WHICH PREVENT INTEROPERABILITY • MISALIGNMENT OF EXPERIMENTAL , CORPORATE AND PUBLIC STANDARDS • WAITING FOR THE ‘PERFECT’ ONTOLOGY – WILL IT EVER COME? • IS ‘SAME AS’ IN A REALLY THE SAME IN B ? • CONCEPTUALLY? CONTEXTUALLY? SEMANTICALLY? • HANDLING CHANGES: TRADEOFFS IN SIMPLIFYING REDUNDANCY © 2013 -
  • 8. AVAILABILITY CHALLENGE IS MY RESOURCE UP TODAY?
  • 9. BEST PRACTICES CHECKLIST • WHICH RESOURCES DO WE NEED? • REVIEW BASICS (LICENSING, PROVENANCE, VERSIONING, HIGH INTERLINK QUALITY, PERSISTENCE) • BUILD GENERALLY APPLICABLE SOLUTIONS (VOCABULARIES, COMMON PREDICATES) • FOCUS ON TRUE ‘’ RESOURCES • DYNAMIC “APPLICATIONS ONTOLOGY” FIRST! • HAVE THE BIG PICTURE IN MIND, BUT DON’T WAIT FOR PERFECTION • ALIGN WITH FORMAL ONTOLOGIES (OR PARTS OF) WHENEVER POSSIBLE • NCBO BIOPORTAL • THINK INTEROPERABILITY FROM THE BEGINNING 9
  • 10. 10© 2013 - WHAT IS AN IDA, AND WHY? INTEGRATED DATA APPLIANCE
  • 11. THE IDA CONCEPT • INTEGRATED, PERSISTENT, CURRENT SEMANTIC KBS • GOAL: READY TO USE FOR ENRICHMENT OF EXPERIMENTAL / INTERNAL DATASETS • COMBINING APPLICATIONS AND RESOURCES • WEB QUERY SERVER, KNOWLEDGE EXPLORER PRO, VIRTUOSO • ALL NECESSARY TOOLS INCLUDED FOR MAPPING AND QUERY • PRE-CONFIGURED KNOWLEDGEBASE(S), CONTROLLED VERSIONING, PERIODIC UPDATES • ENTERPRISE-READY APPLIANCE • 64 GB RAM FOR FAST QUERY PERFORMANCE • RAID-5 REDUNDANT ARCHIVING 11
  • 12. KB EXAMPLE 1 DRUGS, TARGETS, DISEASES KNOWLEDGEBASE 12© 2013 - RESOURCES DRUGBANK DISEASOME SIDER UNIPROT REACTOME NCBI BIOSYSTEMS
  • 13. 13
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. KB EXAMPLE 2 PROTEOMICS KNOWLEDGEBASE 22© 2013 - RESOURCES UNIPROT GO REACTOME NCBI BIOSYSTEMS
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. 27
  • 28. 28
  • 29. KB EXAMPLE 3 METABOLOMICS KNOWLEDGEBASE 29© 2013 - RESOURCES HMDB PUBCHEM PUBCHEM ASSAY BIOCYC
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. KB EXAMPLE 4 MICROBIAL PATHOGEN KNOWLEDGEBASE 36© 2013 - RESOURCES ICTV MIST2 BIOCYC PATRIC NCBI TAXONOMY
  • 37. 37
  • 38. 38
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. 42© 2013 - ‘SEMANTICS IN A BOX’ EXPERIENCE CONTEXTUALIZING EXPERIMENTS WITH KB RESOURCES
  • 43. 43© 2013 - USE CASE 1: TOXICITY CLASSIFICATION BIOLOGICAL QUALIFICATION OF COMBINATORIAL BIOMARKERS WITH PHARMACOGENOMIC EXPERIMENTAL CORRELATIONS RESOURCES INTERNAL: GENE EXPRESSION QUANT. METABOLOMICS KBS: DRUGBANK DISEASOME SIDER UNIPROT GO REACTOME NCBI BIOSYSTEMS
  • 44. 44
  • 45. 45
  • 46. 46
  • 47. 47
  • 48. 48
  • 49. 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. 54
  • 55. 55
  • 56. 56
  • 57. 57
  • 58. RESULT 58 BIOLOGICALLY QUALIFIED SETS OF BIOMARKERS TO SCREEN FOR DIFFERENT TYPES OF TOXICITY • Benzene Toxicity 18 genes, 2 metabolites • Ethanol Toxicity 16 genes, 6 metabolites • Halogenated Toxicity 21 genes, 5 metabolites
  • 59. 59
  • 60. 60© 2013 - USE CASE 2: PATHOGEN IDENTIFICATION IDENTIFICATION OF PATHOGENS IN SAMPLES USING MULTIPLE PUBLIC MICROBIAL PATHOGEN RESOURCES RESOURCES INTERNAL: MICROBIAL ASSAYS MS SEQUENCING KBS: ICTV MIST2 BIOCYC PATRIC NCBI TAXONOMY
  • 61. 61
  • 62. 62
  • 63. 63
  • 64. TAKE HOME PRO’S & CON’S OF ‘SEMANTICS IN A BOX’, CONCLUSIONS 64© 2013 -
  • 65. ‘SEMANTICS IN A BOX’ PROS • READY-TO-GO: NO SETUP AND INTEGRATION TIME, NO INTEROPERABILITY ISSUES • PRECONFIGURED ENTERPRISE-READY HARDWARE WITH SEMANTICALLY INTEGRATED SETS OF PUBLIC KNOWLEDGEBASES OUT-OF-THE-BOX • NO CONCERNS ABOUT UPTIME OF PUBLIC RESOURCES • CONTROLLED VERSIONING AND MAINTENANCE CYCLES SOLVE RELIABILITY AND DATA INTEGRITY ISSUES • NO TRACEABILITY WORRIES ON CONFIDENTIAL DATA • INTEGRATED CLIENT AND WEB APPLICATIONS FOR GRAPH VISUALIZATION, EXPLORATION AND QUERY REDUCE BARRIERS TO ENTRY FOR END USERS AND FOCUS PRIMARILY ON ITS SCIENTIFIC UTILITY CONS • LIVE PUBLIC RESOURCES MAY UPDATE IN-BETWEEN SCHEDULED MAINTENANCE • SELECTION OF RESOURCES MAY NOT SUFFICE ALL USE CASES 65
  • 66. CONCLUSIONS • THE USE OF IDA-HOSTED PUBLIC RESOURCES COMBINED WITH EXPERIMENTAL DATA TO PROVIDE MODELS FOR CLASSIFICATION OF TOXICITY TYPES IN PRE-CLINICAL SETTINGS DEMONSTRATES A SUCCESSFUL AND FAST SEMANTIC INTEGRATION WHICH PROVIDED BIOLOGICAL QUALIFICATION OF GENOMIC AND METABOLOMIC BIOMARKERS. • AS RDF IS ALREADY PRE-ALIGNED AND CONTAINS PROVENANCE AND VERSIONING, A BETTER A PRIORI DETERMINATION OF ADVERSE EFFECTS OF DRUG COMBINATIONS CAN BE ACHIEVED MUCH FASTER AND AT MUCH LESS EFFORT. RICH SPARQL QUERIES CORRELATE RESPONSES OF UNRELATED STUDIES WITH DIFFERENT EXPERIMENTAL MODELS, AND VALIDATE SYSTEM CHANGES ASSOCIATED WITH KNOWN COMMON TOXICITY MECHANISMS. • HAVING LINKED DATA AVAILABLE IN ONE APPLIANCE TOGETHER WITH EXPERIMENTAL RESULTS MAKES IT EASY TO EMPLOY SEMANTIC TECHNOLOGIES WORRY FREE, AND, AS SUCH, TO PROMOTE A BETTER UNDERSTANDING OF BIOLOGICAL SYSTEMS MORE READILY. TIME AND MONEY SAVED HAS HUGE SOCIO-ECONOMIC BENEFITS IN DRUG DISCOVERY AND HEALTHCARE. 66
  • 67. ACKNOWLEDGEMENTS 67 SUPPORT FOR TOXICITY STUDIES NIST ATP #70NANB2H3009 NIAAA #HHSN281200510008C W3C HCLS LLD / PHARMACOGENOMICS SIG Scott Marshall, Michel Dumontier PATHOGEN PROJECT FDA NARMS Sherry Ayers PUBLIC RESOURCES SIB / UNIPROT CONSORTIUM Jerven Bolleman WIKIMEDIA FOUNDATION Anja Jentsch BIO2RDF II Michel Dumontier BMIR / NCBO STANFORD Mark Musen, Trish Whetzel IDA DEVELOPMENT SAGE-N James Candlin, David Chiang IO INFORMATICS Andrea Splendiani, Jason Eshleman, Robert Stanley TOXICITY PROJECT COGENICS Pat Hurban, Alan Higgins, Imran Shah, Hongkang Mei, Ed Lobenhofer BOWLES CENTER FOR ALCOHOL STUDIES / UNC Fulton Crews
  • 68. REFERENCES 1) LDOW2012 Linked Data on the Web. Bizer C,Heath T, Berners-Lee T, Hausenblas M. WWW Workshop on Linked Data on the Web, 2012 Apr.16, Lyon, France. 2) The National Center for Biomedical Ontology. Musen MA, Noy NF, Shah NH, Whetzel PL, Chute CG, Story MA, Smith B. J Am Med Inform Assoc. 2012 Mar-Apr; 19 (2): 190-5 3) Using SPARQL to Query BioPortal Ontologies and Metadata Salvadores M, Horridge M, Alexander PR, Fergerson RW, Musen MA, and Noy NF. International Semantic Web Conference. Boston US. LNCS 7650, pp. 180195, 2012. 4) The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside. Luciano JS, Andersson B, Batchelor C, Bodenreider O, Clark T, Denney CK, Domarew C, Gambet T, Harland L, Jentzsch A, Kashyap V, Kos P, Kozlovsky J, Lebo T, Marshall SM, McCusker JP, McGuinness DL, Ogbuji C, Pichler E, Powers RL, Prud’hommeaux E, Samwald M, Schriml L, Tonellato PJ, Whetzel PL, Zhao J, Stephens S, Dumontier M. J.Biomed.Semantics 2011; 2(Suppl 2):S1 5) VoID Vocabulary of Interlinked Datasets. Cyganiak R, Zhao J, Alexander K, Hausenblas M. DERI, W3C note 6-Mar-2011 6) PROV-O: The PROV Ontology. W3C Candidate Recommendation 11- Dec-2012 7) Does network analysis of integrated data help understanding how alcohol affects biological functions? - Results of a semantic approach to biomarker discovery. Gombocz EA, A.J. Higgins AJ, Hurban P, Lobenhofer EK, Crews FT, Stanley RA, Rockey C, Nishimura T. 2008 Sept.29- Oct.1.Biomarker Discovery Summit, Philadelphia, PA. 8) W3C Semantic Web Use Cases and Case Studies Case Study: Applied Semantic Knowledgebase for Detection of Patients at Risk of Organ Failure through Immune Rejection Stanley R, McManus B, Ng R, Gombocz E, Eshleman J, Rockey C. Joint Case Study of IO Informatics and University British Columbia (UBC), NCE CECR PROOF Centre of Excellence, James Hogg iCAPTURE Centre, Vancouver, BC, Canada, 2011 9) A Novel Approach to Recognize Peptide Functions in Microorganisms: Establishing Systems Biology-based Relationship Networks to Better Understand Disease Causes and Prevention E. Gombocz E, Candlin J 8th Annual Conference US Human Proteome Organisation: The Future of Proteomics (HUPO 2012) San Francisco, CA, March 4-7, 2012 10) Correlation Network Analysis and Knowledge Integration In: Applied Statistics for Network Biology: Methods in Systems Biology Plasterer TN, Stanley R, Gombocz E; M. Dehmer, F. Emmert-Streib, A. Graber, A. Salvador (Eds.) Wiley-VCH, Weinheim, ISBN: 978-3-527-32750-8 (2011) 11) Improved dataset coverage and interoperability with Bio2RDF Release 2. Callahan A, Cruz-Toledo J, Ansell P, Klassen D, Tumarello G, Dumontier M. SWAT4LS Workshop. 2012 Nov.30, Paris, France. 12) Ontology-Based Querying with Bio2RDF’s Linked Open Data. Callahan A, Cruz-Toledo J, Dumontier M. 2013. Journal of Biomedical Semantics; in press. 68
  • 69. 69© 2013 - THANK YOU! egombocz@io-informatics.com QUESTIONS?

Hinweis der Redaktion

  1. Semantic W3C standards provide a framework for the creation of knowledge bases that are extensible, coherent, interoperable, and on which interactive analytics systems can be developed. A growing number of knowledge bases are being built on these standards— in particular as Linked Open Data (LOD) resources, and their availability has received increasing attention in industry and academia. Using LOD resources to provide value to industry is challenging, however, and early expectations have not always been met: issues arise from the alignment of public and experimental corporate standards, from inconsistent URI policies, and from the use of internal, non-formal application ontologies. To add to this, often the reliability of resources is problematic, from service levels to SPARQL endpoint uptime to URI persistence. Not the least, in many cases provenance issues have not properly resolved, and there are serious funding concerns related to government grant-backed resources.   For this reasons, an integrated data appliance (iDA) preloaded with semantically integrated public knowledgebases provides an enterprise-ready “Semantics In-a-box” solution to address those shortcomings effectively. As public datasets exist in many revisions over time, registered and mirrored on many places, with registries often out of date or containing conflicting information, several initiatives have been currently proposed at the W3C and in consortia and industry alliances to align interlinked datasets (such as using vocabulary of interlinked datasets, VoID or PROV-O). For the end user, the dilemma of having to deal with such obstacles as additional non-trivial data mapping as well as the need to have rich authoring, licensing, provenance and versioning (such as developed in PAV) included with the data creates another barrier in broad application of semantically contextualized, integrated experimental and public datasets.   This can be remedied. Using an iDA on a preconfigured enterprise-ready hardware containing semantically integrated sets of public knowledgebases out-of-the-box and providing controlled versioning and maintenance cycles solves this predicament. Integrated client and web applications to visualize explore and query the RDF graphs from a common UI reduce barriers to entry for end users and focus primarily on its scientific utility.   By means of such an approach to better understanding and characterization of toxicity, we show how, starting from semantically integrated experimental results from multi-year toxicology studies performed on different platforms (genomic and metabolic profiling), iDA-hosted public life sciences resources (UniProt, Drugbank, Diseasome, SIDER, Reactome, NCBI Biosystems) can be used to provide models for classification of toxicity types in pre-clinical settings. Due to already pre-aligned RDF with detailed and accurate provenance and versioning, a better a priori determination of adverse effects of drug combinations can be achieved much faster and at much less effort. Rich SPARQL queries allowed to quickly correlate responses across unrelated studies with different experimental models, and to validate system changes associated with known common toxicity mechanisms.   The time and money saved from such an approach has huge socio-economic benefits for drug companies and healthcare alike. Having linked data available in one appliance together with experimental results makes it easy to employ Semantic Web technologies worry free, and, as such, to promote a better understanding of biological systems more readily
  2. http://labs.mondeca.com/sparqlEndpointsStatus/index.html
  3. Approach: Analyze needs – basics (from where? version? link quality? Persistent?), 5-star LOD’s, existing ontologies via web services (NCBO BioPortal: TMO, VoID, PROV-O, …)
  4. Drugs, targets, diseases, trials
  5. Visual SPARQL query
  6. Published KB on the web
  7. Visualize query results in KE
  8. Align drugs with targets and mechanism of action
  9. Look for alternate indications
  10. Proteomics and biosystems, pathways and evidence reference
  11. Functional proteomics – pathway involvement
  12. Query for biosystems a protein is involved in
  13. Check which resource contributed the biosystem
  14. Cluster proteins according to their biological functions
  15. Again, publish on the web for fast, easy query
  16. Metabolites, chemistry, bio functions, locations, experimental reference data (e.g.MS identification)
  17. ICTV (International Committee on Taxonomy of Viruses http://www.ictvonline.org/ ) MiST2 (Microbial Signal Transduction Database http://mistdb.com/ ) BioCyc (Pathway/Genome Database Collection http://www.biocyc.org/ ) NCBI Taxonomy (Domain Taxonomy of Organism http://www.ncbi.nlm.nih.gov/Taxonomy/ ) PATRIC (Pathosystems Resource Integration Center http://patricbrc.vbi.vt.edu/portal/portal/patric/Home )
  18. Microbial pathogens (viruses, bacteria)
  19. Related pathogen outbreaks
  20. Search for pathogen groups for a specific disease type
  21. Search by disease, genomic and proteomics information
  22. Search by pathogen protein sequences
  23. Step 1: Map experiments to RDF – Template generation , scripted transformation, visual mapping review
  24. Step 1: Map to RDF – Term harmonization via one or multiple thesauri; select thesauri for classes during mapping
  25. Step 2: Use public ontologies – BioPortal example; merge applications ontology with parts of formal ontologies to utilize their structure (applied VoID, PROV-O and elements of TMO to informal applications ontology)
  26. Ontology import and merging: building from parts of well-formed public ontologies to final merged application-specific ontology with common vocabularies
  27. Explore common relationships for experimental observations between treatments
  28. Perform iterative visual SPARQL queries with perturbation ranges for each putative marker to establish a model pattern
  29. Enrichment via queries: Public SPARQL endpoints: UniProt, GO, Drugbank, Diseasome, SIDER, Reactome, ChEMBL – import results to enrich the network. Drillout to NCBI BioSystems and Gene – import results to enrich further
  30. Common Toxicity marker across 2 compounds (genes and metabolites) and their involvement in biological systems of diseases
  31. Common Toxicity marker (genes and metabolites) and their involvement in biological systems of diseases: 2 different treatments, pulled apart for better visual exploration
  32. Common Toxicity marker (genes and metabolites) and their involvement in biological systems of diseases: explore relationships in DrugBank and Diseasome and add them selectively to the Knowledge Base
  33. All genes impacted by toxicant
  34. Pharmacogenomic correlations are not necessarily aligned with biological functions – using integrated semantic KBs allows to qualify & validate biomarkers for their bological validity
  35. All genes impacted by toxicant – web-based toxicity screening
  36. Rapid MS-based sequencing for pathogen id
  37. Mapping samples to pathogens to disease outbreak
  38. Web-based screening for different microbial caused diseases
  39. SAGE-N (James Candlin, David Chang)
  40. and Case Studies