1. Reasoning with the RNA Ontology
Chris Mungall
Lawrence Berkeley National Laboratory
Reasoning with the RNA Ontology – p.1/28
2. What is a reasoner?
A reasoner implements a generalized decision procedure
which takes a collection of logical axioms and finds the
entailments of these axioms and whether or not the axioms
are satisfiable
An ontology can be considered as a collection of
axioms (in contrast to a terminology)
1. Relationships: is a (SubClass), partOf, ...
2. Definitions
3. Constraints
We can also treat data as collections of axioms
Reasoning with the RNA Ontology – p.2/28
3. Examples of Ontology Axioms
GNRATetraloop is a Tetraloop
Tetraloop is a RNAStructure
Reasoning with the RNA Ontology – p.3/28
4. Examples of Ontology Axioms
GNRATetraloop is a Tetraloop
Tetraloop is a RNAStructure
A translation to first-order predicate logic:
GNRATetraloop(x) → Tetraloop(x)
Tetraloop(x) → RNAStructure(x)
Reasoning with the RNA Ontology – p.3/28
5. Examples of Ontology Axioms
GNRATetraloop is a Tetraloop
Tetraloop is a RNAStructure
A translation to first-order predicate logic:
GNRATetraloop(x) → Tetraloop(x)
Tetraloop(x) → RNAStructure(x)
Set theoretic:
GNRATetraloop ⊆ Tetraloop ⊆ RNAStructure
Reasoning with the RNA Ontology – p.3/28
6. Examples of Ontology Axioms
GNRATetraloop is a Tetraloop
Tetraloop is a RNAStructure
A translation to first-order predicate logic:
GNRATetraloop(x) → Tetraloop(x)
Tetraloop(x) → RNAStructure(x)
Set theoretic:
GNRATetraloop ⊆ Tetraloop ⊆ RNAStructure
Entailment: GNRATetraloop is a RNAStructure
Reasoning with the RNA Ontology – p.3/28
7. The reasoning square
Classifying Validation
Finding
Inference
inconsistent
Ontology of unstated
axioms in
relationships in
the ontology
the ontology
Determining
Inference
if a dataset
Data of unstated
is valid
facts in data
Reasoning with the RNA Ontology – p.4/28
8. The reasoning square
Classifying Validation
disjoint
N N N
Tetraloop
N N N
Purine Pyramidine
Ontology
N A R A
X
GNRA
Tetraloop N G N
GNRA
Tetraloop
X
Data G A A
T therm
23SRNA G A G
C G A
region
C G A
Reasoning with the RNA Ontology – p.5/28
9. Ontology Languages
First Order Logic (Common Logic ISO standard)
Highly Expressive
Undecidable : No tractable decision procedures
OWL and Description Logics
Restricted subset of FOL with highly convenient
constructs for describing classes
Reasoners are heavily tested on existing ontologies
OBO
Initially an ad-hoc format for the Gene Ontology
Now an alternate syntax for Common Logic
Reasoners based on rule application
Reasoning with the RNA Ontology – p.6/28
10. Common Logic
Common Logic is an ISO specification for First Order Logic
(FOL)
Syntaxes
CLIF - Lisp-like (derived from KIF)
XCL - XML
CG - Conceptual Graphs
A CL text consists of CL sentences (axioms)
Sentences can be atomic, boolean or logically
quantified
Atomic sentence: a predicate followed by zero or
more arguments
Boolean sentence: and, or, if ( → ), iff ( ↔ )
Quantified sentence: forall (∀), exists(∃)
Reasoning with the RNA Ontology – p.7/28
12. Reasoning with FOL
Undecidable.
FOL Theorem provers are not guaranteed to terminate
The Horn logic subset has desirable computational
properties
Head ← Body
Logic Programming
SWRL
Datalog
Relational Model, Relational Algebra
non-monotonic and probabilistic extensions
Reasoning with the RNA Ontology – p.9/28
13. OWL-DL
OWL belongs to a family of logic known as Description
Logics, circumscribed subsets of FOL that are guaranteed
to be decidable
Variety of notations (syntaxes):
RDF-XML - Default, but it’s a mess
OWL-XML - Easier to manipulate computationally
Manchester Syntax - Easy on the eye
Constructs
Property (relation) unary predicates: Functional,
Transitive, Symmetric, ...
Class Axioms: SubClass, EquivalentClass,
DisjointWith, ...
Descriptions
OWL2 has lots of tool and reasoners to choose from
Reasoning with the RNA Ontology – p.10/28
14. Descriptions in OWL
A Description is a (possibly recursive) tree structure that
formally identifies membership criteria for a class.
Can be combined using logical connectives: AND, OR,
NOT
AND : intersectionOf
OR : unionOf
NOT : complementOf
Restrictions
Restrict class membership based on some property
ONLY : example (paired with CWWONLY Guanine)
SOME :
Quantified cardinality restrictions
Example: CWWAGBasePair = hasPart only (A and pair-
Reasoning with the RNA Ontology – p.11/28
15. OWL Reasoners
Decision Procedure based on tableau calculus
Refutation-based, repeated applications of de-Morgan’s
law
Reasoning with the RNA Ontology – p.12/28
16. OWL Reasoners
Decision Procedure based on tableau calculus
Refutation-based, repeated applications of de-Morgan’s
law
Widely used and tested on ontologies
Many reasoners can now classify the larger biological
ontologies in acceptable time
Reasoning with the RNA Ontology – p.12/28
17. OWL Reasoners
Decision Procedure based on tableau calculus
Refutation-based, repeated applications of de-Morgan’s
law
Widely used and tested on ontologies
Many reasoners can now classify the larger biological
ontologies in acceptable time
Less widely used on data
RDF triplestores are commonly used but these lack key
OWL constructs.
OWLGRES is a promising technology here.
Reasoning with the RNA Ontology – p.12/28
18. No Unique Name Assumption
Classes and instances are potentially equivalent unless
declared otherwise. Given ontology axiom:
Functional(fivePrimeTo)
An instance axioms:
A(b1)
A(b2)
A(b3)
b1 fivePrimeTo b2
b1 fivePrimeTo b3
A reasoner will not say this is inconsistent. It will infer that
b2=b3. To get a reasoner to detect the inconsistency we
must explicitly declare all base instances to be distinct:
b1 differentFrom b2
b1 differentFrom b3
b2 differentFrom b3
Reasoning with the RNA Ontology – p.13/28
19. The Open World Assumption
Unstated facts are not assumed to be false. Given ontology
axioms
A SubClassOf Base
UnpairedBase equivalentTo some
(Base that pairedWith 0 Base)
An instance axioms:
A(b1)
A(b2)
A(b3)
b1 fivePrimeTo b2
b2 fivePrimeTo b3
A reasoner will not infer b1, b2 or b3 to be UnpairedBases.
We need to explicitly declare this:
UnpairedBase(b1)
UnpairedBase(b2)
Reasoning with the RNA Ontology – p.14/28
20. OBO
Initially an ad-hoc format for the Gene Ontology
Graph-centric
Terminological features
Formal Semantics
Initially lacked formal semantics. Formal definition
written in natural language in Relations Ontology.
Translation to OWL-DL (Horrocks et al)
With OBO 1.3, every OBO document is a
Common Logic Text
OBO-Core consists only of atomic sentences
OBO-CL allows arbitrary logical formulae
OBO-H OBO-Core plus horn rules
Reasoning with the RNA Ontology – p.15/28
21. Reasoning over OBO ontologies
Strategies
convert to OWL and use an OWL reasoner
convert to CL and use a FOL theorem prover
Use a rule-based reasoner
Java implementation: OBO-Edit
Prolog implementation: Easy to extend
SQL implementation: slow but scales over massive
ontologies and datasets
Limitations: limited support for negation
Reasoning with the RNA Ontology – p.16/28
22. Are Description Logics enough?
Some things that cannot be done in OWL-2:
Define relations using arithmetic:
Define relations using intersection, union and negation
Declare relations with > 2 arguments
Makes reasoning about change harder
Model cyclic structures
Any structure with an acyclic path through some
combination of relations (Carbon rings, RNA molecules)
Reasoning with the RNA Ontology – p.17/28
23. Arithmetic in relations
We cannot express this in OWL:
upstreamOf (x, y) ← end(x) < start(y)
In OWL we must:
explicitly name all the bases, and declare a 5’ to 3’ connection
relation between them
declare < as the transitive version of the 5’ to 3’ relation
This is feasible with RNA, but not DNA
Reasoning with the RNA Ontology – p.18/28
24. Relation Boolean Constructs
We cannot express this in OWL:
overlaps = ends.af ter.startOf ∩ starts.bef ore.endOf
disconnected = ¬overlaps
This severely limits OWL when applied to instance data
involving intervals
Reasoning with the RNA Ontology – p.19/28
25. N-ary relations and time
In OWL, all relations must be binary. N-ary relations are
useful for reasoning about change.
As the RNA molecule folds, unpaired bases become
paired:
¬paired with CWW(b1, b5, t0)
paired with CWW(b1, b5, t1)
instance of (b1, UnpairedBase, t0)
instance of (b1, PairedBase, t1)
There are a variety of (awkward) techniques for
translating N-ary relations to binary
¬paired with CWW(b1@t0, b5@t0)
paired with CWW(b1@t1, b5@t1) Reasoning with the RNA Ontology – p.20/28
26. Cyclic descriptions
OWL Descriptions are tree-like. Cyclic descriptions are
required for RNA Structures. Proposed def of GNRA
Tetraloop:
GNRA TetraloopMotif =
hasPart some
( Nucleobase and
fivePrimeTo some
(G and fivePrimeTo some
(Nucleobase and fivePrimeTo some
(Purine and fivePrimeTo some
(A and fivePrimeTo some
(Nucleobase and pairsWithCWW som
and pairsWithTHS some G)))
and pairsWithTSH some A)
and pairsWithCWW some Nucleobase)
Reasoning with the RNA Ontology – p.21/28
27. Tree-like classification structure
GNRA TetraloopMotif = hasPart so
and fivePrimeTo some (G and
N
(Nucleobase and fivePrimeTo some
N G
fivePrimeTo some (A and fivePrimeTo
A N
and pairsWithCWW some Nucleoba
R
THS some G))) and pairsWithTSH
A
sWithCWW some Nucleobase)
G N
N
Reasoning with the RNA Ontology – p.22/28
28. Tree-like classification structure
GNRA TetraloopMotif = hasPart so
and fivePrimeTo some (G and
N
(Nucleobase and fivePrimeTo some
N G
fivePrimeTo some (A and fivePrimeTo
A N
and pairsWithCWW some Nucleoba
R
THS some G))) and pairsWithTSH
A
sWithCWW some Nucleobase)
G N
N
C G A
G A A
Reasoning with the RNA Ontology – p.22/28
29. Tree-like classification structure
GNRA TetraloopMotif = hasPart so
and fivePrimeTo some (G and
N
(Nucleobase and fivePrimeTo some
N G
fivePrimeTo some (A and fivePrimeTo
A N
and pairsWithCWW some Nucleoba
R
THS some G))) and pairsWithTSH
A
sWithCWW some Nucleobase)
G N
N
C G A A A G
G A A A G C
Reasoning with the RNA Ontology – p.22/28
30. Labeled sub-descriptions
We would like to do something like this, if it were possible in
OWL:
GNRATetraloopMotif =
hasPart some
(Nucleobase[1] and fivePrimeTo some
(G[2] and fivePrimeTo some
(Nucleobase[3] and fivePrimeTo some
(Nucleobase[4] and fivePrimeTo some
(A[5] and fivePrimeTo some
(Nucleobase[6] and pairsWithCW
and pairsWithTHS some G[2])))
and pairsWithTSH some A[5])
and pairsWithCWW some Nucleobase[6])
Reasoning with the RNA Ontology – p.23/28
31. Rules
SWRL (Semantic Web Rule Language) extends OWL with
rules. We can add this to the ontology:
nucleotide(?b0),
g(?b1),
nucleotide(?b2),
purine(?b3),
a(?b4),
nucleotide(?b5),
followedBy(?b0, ?b1),
followedBy(?b1, ?b2),
followedBy(?b2, ?b3),
followedBy(?b3, ?b4),
followedBy(?b4, ?b5),
pairedWithTHS(?b4, ?b1),
pairedWithCWW(?b5, ?b0)
--> partOfGNRATetraloop(?b0)
Reasoning with the RNA Ontology – p.24/28
32. Is SWRL the answer?
Bonus: Can be extended with arithmetic operators (to
define upstreamOf)
Negative: only binary relations
Negative: only instance classification
We cannot use the previous definition for ontology
classification
Negative: we cannot infer the existence of undeclared
entities
We can tell a base is part of a tetraloop motif, but we
can’t infer the tetraloop motif instance
Reasoning with the RNA Ontology – p.25/28
33. Description Graphs
An extension of OWL to allow representation of cyclic
structures[?].
Possibly part of OWL3?
Implemented in HermiT reasoner
Largely new and untested
Reasoning with the RNA Ontology – p.26/28
34. OBO Graphs
Cyclic structures can be described in OBO, the graph is
translated to simple rules. These rules can be executed us-
ing LP or even SQL.
Reasoning with the RNA Ontology – p.27/28
35. OBO Graphs
Reasoning with the RNA Ontology – p.27/28
36. Conclusions
There is no one single ideal subset of FOL for reasoning
The RNA Ontology should employ as expressive a logic
as it needs
But first the RNAO must exist
Reasoning with the RNA Ontology – p.28/28
37. Conclusions
There is no one single ideal subset of FOL for
reasoning
All subsets have limitations.
DLs cannot express a lot of what we need for
primary and secondary sequence structures
The RNA Ontology should employ as expressive a logic
as it needs
But first the RNAO must exist
Reasoning with the RNA Ontology – p.28/28
38. Conclusions
There is no one single ideal subset of FOL for reasoning
The RNA Ontology should employ as expressive a logic
as it needs
An incorrect formally specified definition is worse
than a correct informally specified definition
Hybrid reasoning approaches are feasible
The basic instance classification problem is just not
that hard (compared to RNA bioinformatics as a
whole)
Special purpose algorithms will probably beat
general purpose reasoners
But first the RNAO must exist
Reasoning with the RNA Ontology – p.28/28
39. Conclusions
There is no one single ideal subset of FOL for reasoning
The RNA Ontology should employ as expressive a logic
as it needs
But first the RNAO must exist
Perhaps its too early to worry too much about
reasoning
Priority: simple term lists, basic isa hierarchy, with
definitions written for humans, plus motif definitions
in some compact notation
Reasoning with the RNA Ontology – p.28/28