SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Data integration framework for discovery and validation:
            Smart merging of experimental and public data across
                          ontologies and taxonomies
                                                     Erich Gombocz1), Robert Stanley1), Chuck Rockey1), Toshiro Nishimura1)
1
    IO Informatics, 2550 Ninth Street, Suite 114, Berkeley, CA 94710, U.S.A. [Correspondence: egombocz@io-informatics.com]


Summary
While a variety of data integration frameworks have been proposed and been used for quite some time,
most of them commonly lack the ability to generate coherent, meaningful datasets across different
disciplines. This report introduces the use of semantic technologies to apply an integrated systems biology
approach using data relationships and merged ontologies to better make sense of findings from genomics,
proteomics, metabolomics and clinical endpoints. This approach remedies many obstacles on the path
towards a functional integration.

Such an undertaking requires the ability to merge datasets independently of their existing taxonomies,
ontologies or semantic standards and to consolidate vocabularies for data classes, terms and their
relationships. “Smart” merging also needs to account for hierarchical adjustments into super- or
subclasses of the new, merged ontology tree whenever needed. The latter specifically applies when
integrating in-house experimental data with public domain reference sources with very specific
taxonomies and hierarchies. In this work, we present examples of toxicity biomarker studies across
different sample types and –OMICs data. To meaningfully integrate such data requires a semantic
representation of the resulting knowledgebase. During data import and relationship mapping, one or
multiple thesauri are applied in the background to auto-generate a coherent network from all sources.

Through use of a combination of experimental observations and publicly available reference information,
the merged network is able to provide insights about complex relationships and interactions by reduction         Fig.1: Diagrammatic overview: Definitions, requirements, example and use cases case for merging of ontologies
to query-driven sub-networks and their subsequent analysis. In this example, we were able to qualify
potential biomarkers within their multi-response activity across tissues and toxicants. The strength and
applicability of such a methodology for rational drug discovery and application in personalized medicine
as well as its impact towards a better understanding of complex biological processes are discussed.


Challenges
         General data coherence: different source data are typically categorized in different taxonomies;
          this results in normalization issues, term inconsistencies and property disparities across large,
          complex experimental data sets
         Pharmacodynamic correlations are not necessarily functionally linked within the biochemical
          network; despite resulting from the same drug perturbation, the observed -OMICs expression
          changes may represent very different biological processes
         In many cases, data relationships are not a priori contained in the data sets
         Different incompatible semantic standards make merging towards a common ontology extremely
          difficult; this also impacts applicability of query and reasoning tools
         Relationship consolidation and class hierarchy validation or adjustment may be required to make
          inferencing and reasoning meaningful
         Lack of intuitive, science-driven tools makes researchers shy away from network data analysis
          approaches in general.
         Scalability and performance issues confine most network approaches to relatively small datasets        Fig.2: Mapping and managing nomenclature via Sentient Thesaurus Manager™(left); a common knowledgebase
                                                                                                                 from genes, proteins, metabolites, tissue analytics and clinical disease codes in Sentient Knowledge Explorer™

Methodology
         Use import mapping to consolidate relationships; separate numeric from non-numeric content
          during class generation to account for scaling via numerical properties,
          Apply controlled vocabularies when importing or merging, using one or multiple thesauri and
          provide authoring capabilities for groups and representative terms for both, classes and
          relationships
         Allow for proper handling of non well-formed RDF
         Provide a facility to import from or directly connect to via URIs
         Exercise a multi-functional query tool supporting graphical and SPARQL queries to reduce
          network complexity
         Facilitate output of the resulting knowledgebase with or without its graphical representation to
          RDF, N3 or triple store backend

Approach
         For output from queries to relational databases (in tsv, csv, xml, fqml) or from devices or            Fig.3: From web query on experimental observations (left) to a multi-source, multi-standard merged ontology and
          databases using PSI-MI, an interactive graphical/textual import mapper for relationships is            network analysis using public reference resources: Drillout to NCBI, UniProt, IntAct, BioGrid, KEGG and HMDB
          applied; it also can be configured for automatic background mapping when deployed in
          conjunction with web-based query to multiple relational databases                                      Conclusions
         Use, author and apply one or multiple thesauri for consolidation of classes and relationships          Using Sentient™ tools as data integration framework, we were able to merge multi-OMICs experimental
         “Drag-to-knowledgebase” function with automatic classification of semantic data in RDF, N3,            observations with publicly available resources in a variety of formats towards a common ontology
          OBO and OWL (local or remote file system)                                                              semantic knowledgebase and apply it to discovery and qualification of toxicity biomarkers. Specifically,
         Permit direct URI input to web-based reference resources                                               we were able to demonstrate the following:
         Utilize entity browser with sorting (by subject, predicate and object) and paging for large datasets          Coherent merging of experimental and public data across standards, taxonomies and ontologies
          to navigate through instance properties                                                                        into an integrated, semantically organized data and relationship network
         Provide means to reduce network complexity: apply criteria for connection depth, numeric                      Ability to focus on visualization and analysis of a sub-network of specific scientific interest (e.g.
          scaling and weighting and other conditions (such as at least, at most, exactly n connections)                  biomarker qualification across tissues, different treatments, different genetic responder profiles,
         Employ a built-in query tool which supports graphical, textual and SPARQL queries to specify                   etc.) in a complex, large dataset
          conditions, then re-plot the resulting sub-networks to reveal hidden or unknown functional                    Qualification of potential biomarker panels for toxicity across tissues and toxicants and gaining
          biological relationships and their entities in context.                                                        insights in complex biological functions involving multiple pathways

Results                                                                                                          Acknowledgements
This poster demonstrates the applicability of a semantic data integration framework to meaningfully              This work was made possible by insights and many discussions with members of IO Informatics’ industry
merge experimental and public data for biomarker discovery and validation. Different ontologies across           and academic centers of excellence members of the Working Group on “Semantic Applications for
diverse standards have been merged into a common ontology knowledgebase. The presented network                   Translational Research”. The authors would like to acknowledge Ted Slater (Pfizer), Jonas Almeida (MD
viewer, the Sentient Knowledge Explorer™, can comfortably handle one million assertions in memory                Anderson Cancer Center), Pat Hurban (CLDA), Alan Higgins (Viamet Pharmaceuticals) and Bruce Mc.
while running on a Pentium 4 with 2GB of RAM, and is able to directly query the knowledgebase.                   Manus and Mark Wilkinson (both from James Hogg iCapture Centre) for their various contributions.

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 

Kürzlich hochgeladen (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

Empfohlen

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Empfohlen (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Data integration framework for discovery and validation

  • 1. Data integration framework for discovery and validation: Smart merging of experimental and public data across ontologies and taxonomies Erich Gombocz1), Robert Stanley1), Chuck Rockey1), Toshiro Nishimura1) 1 IO Informatics, 2550 Ninth Street, Suite 114, Berkeley, CA 94710, U.S.A. [Correspondence: egombocz@io-informatics.com] Summary While a variety of data integration frameworks have been proposed and been used for quite some time, most of them commonly lack the ability to generate coherent, meaningful datasets across different disciplines. This report introduces the use of semantic technologies to apply an integrated systems biology approach using data relationships and merged ontologies to better make sense of findings from genomics, proteomics, metabolomics and clinical endpoints. This approach remedies many obstacles on the path towards a functional integration. Such an undertaking requires the ability to merge datasets independently of their existing taxonomies, ontologies or semantic standards and to consolidate vocabularies for data classes, terms and their relationships. “Smart” merging also needs to account for hierarchical adjustments into super- or subclasses of the new, merged ontology tree whenever needed. The latter specifically applies when integrating in-house experimental data with public domain reference sources with very specific taxonomies and hierarchies. In this work, we present examples of toxicity biomarker studies across different sample types and –OMICs data. To meaningfully integrate such data requires a semantic representation of the resulting knowledgebase. During data import and relationship mapping, one or multiple thesauri are applied in the background to auto-generate a coherent network from all sources. Through use of a combination of experimental observations and publicly available reference information, the merged network is able to provide insights about complex relationships and interactions by reduction Fig.1: Diagrammatic overview: Definitions, requirements, example and use cases case for merging of ontologies to query-driven sub-networks and their subsequent analysis. In this example, we were able to qualify potential biomarkers within their multi-response activity across tissues and toxicants. The strength and applicability of such a methodology for rational drug discovery and application in personalized medicine as well as its impact towards a better understanding of complex biological processes are discussed. Challenges  General data coherence: different source data are typically categorized in different taxonomies; this results in normalization issues, term inconsistencies and property disparities across large, complex experimental data sets  Pharmacodynamic correlations are not necessarily functionally linked within the biochemical network; despite resulting from the same drug perturbation, the observed -OMICs expression changes may represent very different biological processes  In many cases, data relationships are not a priori contained in the data sets  Different incompatible semantic standards make merging towards a common ontology extremely difficult; this also impacts applicability of query and reasoning tools  Relationship consolidation and class hierarchy validation or adjustment may be required to make inferencing and reasoning meaningful  Lack of intuitive, science-driven tools makes researchers shy away from network data analysis approaches in general.  Scalability and performance issues confine most network approaches to relatively small datasets Fig.2: Mapping and managing nomenclature via Sentient Thesaurus Manager™(left); a common knowledgebase from genes, proteins, metabolites, tissue analytics and clinical disease codes in Sentient Knowledge Explorer™ Methodology  Use import mapping to consolidate relationships; separate numeric from non-numeric content during class generation to account for scaling via numerical properties,  Apply controlled vocabularies when importing or merging, using one or multiple thesauri and provide authoring capabilities for groups and representative terms for both, classes and relationships  Allow for proper handling of non well-formed RDF  Provide a facility to import from or directly connect to via URIs  Exercise a multi-functional query tool supporting graphical and SPARQL queries to reduce network complexity  Facilitate output of the resulting knowledgebase with or without its graphical representation to RDF, N3 or triple store backend Approach  For output from queries to relational databases (in tsv, csv, xml, fqml) or from devices or Fig.3: From web query on experimental observations (left) to a multi-source, multi-standard merged ontology and databases using PSI-MI, an interactive graphical/textual import mapper for relationships is network analysis using public reference resources: Drillout to NCBI, UniProt, IntAct, BioGrid, KEGG and HMDB applied; it also can be configured for automatic background mapping when deployed in conjunction with web-based query to multiple relational databases Conclusions  Use, author and apply one or multiple thesauri for consolidation of classes and relationships Using Sentient™ tools as data integration framework, we were able to merge multi-OMICs experimental  “Drag-to-knowledgebase” function with automatic classification of semantic data in RDF, N3, observations with publicly available resources in a variety of formats towards a common ontology OBO and OWL (local or remote file system) semantic knowledgebase and apply it to discovery and qualification of toxicity biomarkers. Specifically,  Permit direct URI input to web-based reference resources we were able to demonstrate the following:  Utilize entity browser with sorting (by subject, predicate and object) and paging for large datasets  Coherent merging of experimental and public data across standards, taxonomies and ontologies to navigate through instance properties into an integrated, semantically organized data and relationship network  Provide means to reduce network complexity: apply criteria for connection depth, numeric  Ability to focus on visualization and analysis of a sub-network of specific scientific interest (e.g. scaling and weighting and other conditions (such as at least, at most, exactly n connections) biomarker qualification across tissues, different treatments, different genetic responder profiles,  Employ a built-in query tool which supports graphical, textual and SPARQL queries to specify etc.) in a complex, large dataset conditions, then re-plot the resulting sub-networks to reveal hidden or unknown functional  Qualification of potential biomarker panels for toxicity across tissues and toxicants and gaining biological relationships and their entities in context. insights in complex biological functions involving multiple pathways Results Acknowledgements This poster demonstrates the applicability of a semantic data integration framework to meaningfully This work was made possible by insights and many discussions with members of IO Informatics’ industry merge experimental and public data for biomarker discovery and validation. Different ontologies across and academic centers of excellence members of the Working Group on “Semantic Applications for diverse standards have been merged into a common ontology knowledgebase. The presented network Translational Research”. The authors would like to acknowledge Ted Slater (Pfizer), Jonas Almeida (MD viewer, the Sentient Knowledge Explorer™, can comfortably handle one million assertions in memory Anderson Cancer Center), Pat Hurban (CLDA), Alan Higgins (Viamet Pharmaceuticals) and Bruce Mc. while running on a Pentium 4 with 2GB of RAM, and is able to directly query the knowledgebase. Manus and Mark Wilkinson (both from James Hogg iCapture Centre) for their various contributions.