Ontologies are seeing a resurgence of interest and usage as big data proliferates, machine learning advances, and integration of data becomes more paramount. The previous models of sometimes labor-intensive, centralized ontology construction and maintenance do not mesh well in today’s interdisciplinary world that is in the midst of a big data, information extraction, and machine learning explosion. In this talk, we will discuss a model of building and maintaining large collaborative, interdisciplinary ontologies along with the data repositories and data services that they empower. We will also introduce the National Institutes of Environmental Health Science’s Child Health Exposure Analysis Resource and describe how we used our methodology to assemble the broad interdisciplinary ontology that covers exposure science and health and integrates with numerous long standing, well used ontologies. We will also describe how this ontology powers an integrated data resource and provide some examples of how it can be used and re-used for interdisciplinary work. If time permits, we will also describe how the methodology and the integrated ontology has been and is being used in other interdisciplinary health and wellness settings.
http://www.opentox.net/events/opentox-usa-2018/building-using-and-maintaining-ontology-enabled-resources
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Mc guinnessopentox20180712durhamnc final
1. Building, Using, and Maintaining
Ontology-Enabled Resources in
Exposure Science and Health
Informatics Settings
Deborah L. McGuinness
Tetherless World Senior Constellation Chair
Professor of Computer, Cognitive, and Web Science
Director RPI Web Science Research Center
RPI Institute for Data Exploration and Application Health Informatics Lead
dlm@cs.rpi.edu , @dlmcguinness
OpenTox Durham, NC July 12, 2018
2. ●Limited data integration without controlled
vocabulary
●Limited reproducibility without shared
definitions
●Difficulty in reuse without provenance
Ontologies can enhance findabiity, accessibility,
interoperability, reusability and research impact
Integrated Understandable Data
McGuinness 7/18
3. What is an Ontology?
An ontology specifies a rich description of the
• Terminology, concepts, nomenclature
• Relationships among concepts and individuals
• Sentences distinguishing concepts, refining
definitions & relationships
relevant to a particular domain or area of interest.
* Based on AAAI ‘99 Ontologies Panel ̶ McGuinness, Welty, Uschold, Gruninger, Lehmann
McGuinness 6/7/2017
• "Pull" for Ontologies. Invited
talk. Semantics for the Web.
Dagstuhl, Germany, 2000.
• Ontologies Come of Age.
Fensel, Hendler, Lieberman,
Wahlster, eds. Spinning the
Semantic Web: Bringing the
World Wide Web to Its Full
Potential. MIT Press, 2003.
McGuinness 7/18
4. Ontology Development Process
**
Use Cases
Existing Ontologies
& Vocabularies
Expert Interviews
Labkey,
Ontology
Fragments
Ontology
Curation
(ongoing)
Reviewers & Curators
* Ontology Development Team
* Domain collaborators
* Invited experts
* "Consumers" (data analysts)
Knowledge Graph
Integration
* Linking data and
metadata content to
domain terms
* Linking workflows
based on semantic
descriptions
Repository
Integration
* Source Datasets
* Analytics source
code
* Results
* Publications
Knowledge-
Enhanced
Search
Finding what
is there that
might be of
use
Semantic
Extract
Transform,
Load
(SETLr)
Expert
Guidance
Sources
Data Reporting
Templates
Data Dictionaries /
Codebooks
Foundational
Ontologies/Vocabularies
Human Aware
Data Acquisition
Framework
Ontology
Browser
Generated
Ontology
* domain concepts
* authoritative
vocabularies
* vetted definitions
* supporting citations
Erickson, McGuinness, McCusker, Chastain, Pinheiro, Rashid, Liang, Liu, Stingone, …
Exemplified by CHEAR
McGuinness 7/18
5. CHEAR Ontology Effort
5
Goal: Encode terminology currently needed by the CHEAR Data Center
Portal, publish an open source extensible ontology integrating general
exposure science and health leveraging best in class terminologies.
Enabling Findable, Accessible, Interoperable, Reusable Data and
Services to support data analysis and interdisciplinary research
Ontologies encode terms and their interrelationships, providing a foundation
for understanding interoperability and reusability (I and R in terms of FAIR)
Ontology-enabled infrastructures - Knowledge Graphs and Ontology-
enabled search services also provide support for finding and accessing
relevant content (the F and A in FAIR)
Child Health Exposure
Analysis Resource Ontology
7. Mapping Data to Meaning:
Semantic Data Dictionaries
Rashid, Chastain, Stingone, McGuinness, McCusker. The Semantic Data Dictionary
Approach to Data Annotation and Integration. Enabling Open Semantic Science, in
Proceedings of the International Web Science Conference Knowledge Graphs Workshop Oct
21, 2017
McGuinness 7/18 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
8. 8
• Ontology support for
mapping and integration
(e.g., education level)
• Ontology informs decisions
about variables that may be
combined, serve as proxy,
or used to derive desired
info (e.g., birth outcomes)
• Ontology Integrity
constraints may help flag
errors (e.g., APGAR > 10)
• Ontology helps expose
implicit information and find
links
Fenton Z-Score
Sex
Birth
weight
Gest
Age
Mother’s Highest
Education Level
Val
Did not attend school 0
Elementary school 1
Technical post-primary 2
Middle school 3
Technical post-middle
school 4
Highschool or junior
college 5
Technical post-junior
college 6
College 7
Graduate 8
Doesn’t know 9
Mother
Education
Val
Less than High
School 0
High School
Graduate or More 1
Support Browsing,Searching, Pooling
Deriving Values, Verification, …
McGuinness, McCusker, Pinheiro, Stingone, et. al. Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
McGuinness 7/18
9. Laboratory Information Mgmt
System (LIMS)-based
backend integrated with
Ontology.
Includes automatic ingest,
access control, data
governance, download, ...
Supports Search study, data
sample, subject, ...
Enables statiscians to ask for
content to support their
studies e.g., find
Child: Birth Weight, Gender,
Gestational Age at Birth
Mother: Age, BMI “early in
pregnancy based on
inclusion criterion for the
particular study”, Parity,
Education
Metals: As, CD, Mn, Mo, Pb
CHEAR Human Aware Data
Acquisition Framework
McGuinness 7/18
Pinheiro, Liang, Rashid, Liu, Chastain, Santos, McCusker, McGuinness
Sample Question: Gennings
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
10. Ontology-Enabled Study Search /
Precision Data Search and Download
Blood Biomarkers for Children’s Health (Study 1)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Urine Biomarkers for Children’s Health (Study 2)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Metabolomic Biomarkers for Children’s Health (Study 3)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
McGuinness 7/18 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
11. Ontology-enabled Framework:
AI Horizons Example **
Ontologies are an important piece;
but are part of a larger integrated
framework
Semanalytics / SemNExt RPI team:
McGuinness, Bennett PIs with
McCusker, Erickson, Seneviratne and the extended Research
groups along with input from IBM HEALS collaborators and
motivation from MANY projects, particularly from NIEHS/
Mount Sinai,
McGuinness, D.L. & Bennett K. Integrating
Semantics and Numerics: Case Study on
Enhancing Genomic and Disease Data
Using Linked Data Technologies. In Proc.
Smart Data 2015, San Jose, CA.McGuinness 7/18 partially supported through IBM AI Horizons Network
12. Probability-Aware
Knowledge Exploration
• Knowledge imported from
drug, protein, and
disease interaction
databases.
• Each interaction given an
evidence-driven
probability.
• Find drugs that could
affect melanoma, filtered
by interaction probability.
• The best hypotheses
were generated using the
highest probabilities.
• Being expanded initially
for genomics-aware
cancer care applications
McCusker, Dumontier, Yan,
He, Dordick, McGuinness.
Finding Melanoma Drugs
through a Probabilistic
Knowledge Graph. PeerJ
4L 32007, 2016.
McGuinness 7/18
13. 13
Text Mining
Linked Data
Biomedical Databases
Omics and Epidemiology Datasets
File uploads of any kind
Fully automated
Reproducible and traceable from diverse
data types:
Tabular CSV & XLS, XML, JSON, HTML,
etc.
Transformed into rigorously modeled
knowledge:
Meaning as structure, global IDs,
foundational ontologies
Tracked and versioned using provenance
standards
RPI knowledge graph technology
curates from diverse sources
McGuinness 7/18 partially supported through IBM AI Horizons Network
14. RPI knowledge graph technology
tracks the provenance of everything
14
All knowledge is encoded with its
provenance and publication details
using "nanopublications" approach.
Knowledge curation traces the
source and method of
transformation:
Linked Data "was quoted from X",
Semantic ETL stores workflows as
provenance.
Inference agents show their
work:
What knowledge was used? What
rules were applied?
Knowledge from user
interactions trace back to the
user.
Who made the claim? What other
knowledge have they added? Do
we trust this person?
McGuinness 7/18 partially supported through IBM AI Horizons Network
15. RPI knowledge graph technology
uses provenance-driven probabilities
15
Experimental Methods
Knowledge Sources
Expert opinions can evaluate
evidence based on experimental
method:
Compute probabilities (edge width)
from expert opinions about protein
interaction assays and knowledge
source reputation.
Probabilities focus on high-
quality results:
Graph exploration relies on a "joint
probability horizon" for searching
out hypothetical connections.
Current probabilities are system
wide, future work will be
Binding
Binding
McGuinness 7/18 partially supported through IBM AI Horizons Network
16. RPI knowledge graph technology
enables probabilistic graph exploration
17
What drugs are likely to have an effect on a
disease?
What diseases is this drug likely to affect?
How are X and Y likely to be connected?
Which tumor mutations might be driving the
tumor?
What pathways are affected by a given drug?
Users can dig into the chain of evidence
supporting the results of graph
exploration:
Everything is linked, justified, and
explainable.
McGuinness 7/18 partially supported through IBM AI Horizons Network
17. Powering Personalized Treatment Agent
18
New
Staging
Guidelines
Axiom
Generation
Ontologi
es
Map
Cancer
Terminology
Patient
Datasets
Synthetic
Patient
Data
Treatment,
and
Monitoring
Guidelines
Semantic
Data
Dictionary
Conversion
Extraction
Reasoner
Visualization
New Data +
Justifications
Seneviratne, Rashid, Chari, McCusker, Bennett, Hendler and McGuinness. Knowledge Integration for Disease Characterization: A
Breast Cancer Example. In Proceedings of the International Semantic Web Conference, Monterey, CA, 2018.
McGuinness 7/18 partially supported through IBM AI Horizons Network
18. Personalized Treatment Agent - Patient “D”
19
Biomarkers dictate
the drug options
D’s Triple positive
combination lead to
downstaging
Overall treatment suggestions change based
on disease characterization using new data and
guidelines
Seneviratne, Rashid, Chari, McCusker, Bennett, Hendler and McGuinness. Knowledge Integration for Disease Characterization: A
Breast Cancer Example. In Proceedings of the International Semantic Web Conference, Monterey, CA, 2018.
McGuinness 7/18 partially supported through IBM AI Horizons Network
19. Discussion
• Ontologies enable FAIR (Findable,
Accessible, Interoperable, Reusable) Data
Resources
• They can support movement across levels
of abstraction
• Use ontology-enabled architectures
• Do NOT build ontologies from scratch
• Selectively and thoughtfully reuse existing
best practice ontologies/vocabularies
• Engage experts in choosing ontology
(portions) and in designing the knowledge
architecture
• Ecosystems and diverse teams are critical
for success – community driven and
maintained ontology-based systems are the
future
• Lets enable the next generation of decision
support together! Please contact us for
collaboration
McGuinness 7/18
20. Conclusion
• Semantics / Ontologies
enable new
personalized and
population level health
• Contact: dlm@cs.rpi.edu
• Thanks to many: RPI Tetherless World team
particularly McCusker, Erickson, Hendler,
Pinheiro, Rashid, Liang, Liu, Chastain; RPI:
Bennett, Dyson, Seneviratne; Mount Sinai
particularly Teitelbaum, Stingone, Mervish,
Gennings, Kovatch; IBM, particularly Das, Chen,
Brown, ….
• Funding: NIEHS 0255-0236-4609 /
1U2CES026555-01, DARPA HR0011-16-2-0030,
IBM-RPI HEALS AI Horizons Network, NSF ACI-
1640840
• Forthcoming book: Ontology Engineering with
Kendall
McGuinness 7/18