1. Causal Reasoning using the
Relation Ontology
Chris Mungall
Lawrence Berkeley National Laboratory
cjmungall@lbl.gov
2. Outline
● The need for an ontology of relations
● Tour of the Relation Ontology
● Use in GO Causal Inference
● Causal Relations for Diseases
● Integrating multiple knowledge graphs
4. Why we need relationship
types
Melania
Trump
person
Barack
Obama
person
Michelle
Obama
person
Vladimir
Putin
Russia
Donald
Trump
person
USA
country
country
person
5. Why we need relationship types
for biological data
Gene C
gene
Gene B
gene
Disease
X
Gene A
gene
Disease
Y
disease
disease
6. Why we need standardised
relationship types for biological
data
B
gene
A
gene
INTERACTS_WITH
B
gene
A
gene
physically interacts with
B
gene
A
gene
binds
database 1
database 2
database 3
7. Why we need standardised
relationship types for biological
data
B
gene
A
gene
affects
B
gene
A
gene
regulates
B
gene
A
gene
PHOSPHORYLATES
database 4
database 5
database 6
11. Relations are the glue for
integration
https://twitter.com/dhimmel/status/810996703901777920
12. OBO Relation Ontology
● An ontology of Relationship Types
◦ Hierarchically organized
● OWL provides mathematical-logical
foundation
● Currently > 450 relations
◦ “Core” relations (e.g. part of)
◦ General purpose (e.g. has input)
◦ Domain-centric (e.g. phosphorylates)
● Originally used for relationships in
ontologies
◦ Now used in Knowledge Graphs, Linked Data
18. Description Logics provide basis for
logical reasoning
● TBox
◦ Classes and class axioms
⚫e.g. nucleus SubClassOf organelle, part_of some
cytoplasm
◦ (Most ontologies are TBox-centric)
● ABox
◦ Instances and instance-level axioms
⚫e.g. patient123 has_sequence genome567
◦ (Typically not asserted in ontologies)
● RBox
◦ Object Properties (aka Relations)
⚫e.g. part of is Transitive
◦ (RO is RBox-centric)
20. InverseOf Axioms
regulates regulated by
x regulates y ⬄ y regulated by x
105
InverseOf
Axioms
Note: relations often have
an arbitrary canonical
direction, properties of
inverse is trivially inferred
21. Domain and Range
expressed in
material
anatomical
entity
expressed in
Domain: gene
Range: material anatomical entity
gene
221
Domain/Range
Axioms
BFO and OBO Core used for constraints
22. Characteristics
● Transitive
◦ x R y / y R z ➔ x R z
◦ Examples: part of, develops from
● Symmetric
◦ x R y ➔ y R x
◦ Examples: adjacent to
● Reflexive
● Anti-symmetric
● Functional
129
Axioms
24. Property Chains
● More compact way to write SWRL rules
◦ Uses function composition symbol ‘•’
◦ Less expressive
◦ Examples:
⚫child_of • has_brother ➔ has uncle
⚫negatively regulates • negatively regulates ➔
positively regulates
139
Property Chain
Axioms
25. RO Release Process
● All coordinated via GitHub
◦ Issues: https://github.com/oborel/obo-relations/isssues
◦ All changes proposed via Pull Requests
https://github.com/oborel/obo-relations/pulls
◦ Validated by Travis-CI
◦ Merged by core editors
● All released are vetted
◦ Automatically
⚫ HermiT OWL Reasoner
⚫ ROBOT Release Tool
⚫ Ontology Development Kit Docker
◦ Manually
https://github.com/INCATools/ontology-development-kit
26. RO Core
● Generic: apply across
multiple domains
● E.g.
⚫every finger part of a hand
⚫every M phase part of a cell cycle
⚫Cambridge part of UK
29. How RO is used
● Ontologies:
◦ Relationships between classes
◦ Widely used in OBO
● Knowledge Graphs:
⚫SPARQL endpoints
⚫Neo4J and other graph databases
⚫JSON-LD
⚫Relational Databases (e.g. GMOD/Chado)
30. Usage of RO in OBO
● Count of number of ontologies
using each relation
31. Use of RO in Knowledge
Graphs
● GO Causal Annotation Graphs
● Disease/Phenotype Graphs
32. GO’s initial attempts at
causality
GO:0086094
positive regulation of ryanodine-sensitive
calcium-release channel activity by
adrenergic receptor signaling pathway
involved in positive regulation of cardiac
muscle contraction
Mungall’s law[??*]: an inexpressive bio-database schema
will be abused to the maximum extent possible in order for
curators to express complexities of biology
[*] I have a feeling I’m not the first to express this
GO:0086023
adenylate cyclase-activating adrenergic receptor
signaling pathway involved in heart process
subClassOf
43. Shortcut Relations and inference
rules unify perspectives
Any GO kinase
activity
Any GO activity
GeneProduct1 GeneProduct2
directly
regulates
enabled
by
enabled
by
phosphorylates
GO-CAM
View
(activity
centric,
semantics
on nodes)
entity-
centric
(SIF,
CausalTab, ..)
46. MONDO: Monarch Disease
Ontology
● Unifies multiple disease resources
● Diseases as states
● Diseases have causal basis in
◦ disruption of a process
◦ dysfunction of a structure, causing
disruption of a biological process
● Diseases have features
◦ also causally linked
49. Unifying multiple knowledge
graphs
● KGs emerging as popular ML
representation
◦ node embedding, NNs, link prediction
● Challenge
◦ combining different KGs together
● Different standards
◦ RO/OBO
◦ Wikidata
◦ SIO http://sio.semanticscience.org/
◦ Many KGs have no standards, ad-hoc
relations
⚫ e.g. SemMedDB
54. Conclusions
● Standardized relations required for
◦ ontologies
◦ knowledge graphs
◦ bioinformatics exchange formats
● RO provides
◦ Broad set of relations
◦ Different use cases
◦ OWL axiomatization enables inference
● Uses
◦ GO
◦ Disease and phenotype
55. Acknowledgments
● Relation Ontology
◦ Matt Brush
◦ David Osumi-Sutherland
◦ James Overton
◦ Jim Balhoff
◦ Suzanna Lewis
◦ Anne Thessen
◦ Mike Sinclair
◦ David Hill
◦ Kimberley Van Auken
◦ Larry Hunter
◦ Barry Smith
◦ Alan Ruttenberg
◦ Melissa Haendel
◦ Paul Thomas
● MONDO
◦ Nicole Vasilevsky
◦ Peter Robinson
◦ EBI curators
◦ GARD curators
◦ ClinGen curators
● BioLink
◦ Harold Solbrig
◦ Deepak Unni
◦ Seth Carbon
◦ Gregg Stuppe
◦ Laurent-Phillipe Albou
◦ Tim Putman
◦ Kent Shefchek
◦ Chris Bizon
◦ Michel Dumontier
◦ Lance Hannestad
◦ Richard Bruskiewich