What's New in Teams Calling, Meetings and Devices March 2024
ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification
1. Biomedical Ontologies for data
integration and verification
Michel Dumontier and Robert Hoehndorf
Carleton University, University of Cambridge
ISMB tutorial @ Vienna. July 16,2011
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 1
2. Outline
1. General background (10min)
o an introduction to the use-case: systems biology, SBML and BioModels
2. Ontological analysis (45 min)
o how to express domain content as formal knowledge using the Web Ontology Language
(OWL)
3. Application of formal ontology to consistency and data verification (30min)
o how to use the OWL formalization to verify the accuracy of annotations, data and constraints
in a domain
4. Break (30min)
5. Mapping, repair and disambiguation using ontologies (30min)
o how to relax and disambiguate constraints on ontologies to obtain consistent representation of
domain content
6. Knowledge discovery, retrieval and querying (15min)
o how to answer questions that require the inference of knowledge through automated
reasoning
7. Efficient implementation in software systems (15min)
o how to convert ontologies in efficient formal representations amenable to high-throughput
analyses
8. Applications in Bioinformatics (25min)
1. how the formalized ontologies can be used to perform bioinformatics analyses
– Discussion and questions (15min)
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 2
3. Systems Biology
We create and simulate biological models to :
• gain insight into the structure and function
of biochemical networks
• reveal metabolic and signalling
capabilities so as to predict phenotypes
• undertake metabolic engineering to
maximize some desired product
To do this, we need
• to integrate & manage our data &
knowledge in a coherent, scalable and
machine understandable manner
• efficient software to execute
computationally demanding simulations
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 3
4. Bio-ontologies
• Provide rich human and machine understandable descriptions of
the terms they purport to describe
• Have value for semantic annotation of data, which allows
integration across domains (granularity, species, experimental
methods)
• Facilitate granular and cross-domain queries
• Can be used to obtain explanations for inferences drawn
• Can be efficiently processed by algorithms and software
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 4
5. Biomodels are semantically annotated
SBML models
• EBI managed resource
• 600+ models available as
SBML
• 300+ models are curated
with GO process, function
and component terms,
and has links to protein
databases.
• Possible to browse by
GO terms:
http://www.ebi.ac.uk/biomodels-main/
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 5
6. Objective:
Computational Knowledge Discovery
• Terminological resources increasingly being used to
annotate SBML-based biomolecular models
o Makes it easier to explore or find models
• By converting models into formal representations of
knowledge we get to:
o validate the accuracy of the annotations
o infer knowledge explicit in terminological resources
o discover biological implications inherent in the models.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 6
7. SBML
XML-based representation of biochemical models, their
components (compartments, species, reactions, events),
descriptors (rules, constraints, functions, units)
Consider the following enzymatic reaction:
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 7
11. SBML models may feature several
components
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 11
12. SBML specifies the number and kind of
attributes models and components can have
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 12
13. It’s up to the modeler to use those
attributes in a meaningful way
what models have you produced?
14. Biomodels are semantically annotated
SBML models
• EBI managed resource
• 600+ models available as
SBML
• 300+ models are curated
with GO process, function
and component terms,
and has links to protein
databases.
• Possible to browse by
GO terms:
http://www.ebi.ac.uk/biomodels-main/
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 14
15. Energy (ATP) is produced from glycolysis (break
down of glucose) in a series of enzyme-catalyzed
biochemical reactions.
Fermentation regenerates NAD+ so it can be re-
used to metabolize more glucose
Analysis and optimization of metabolic pathways
important for biotechnology
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 15
16. Gene Ontology
• over 30,000 terms
• covers
o biological processes
o molecular functions
o cellular components
• terms organized around "is
a" hierarchy
• terms further described with
'has part'/'part of'; 'regulates'
and '+ regulates', '- regulates'
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 16
17. Chemical Entities of Biological Interest
(ChEBI)
recently refactored to be in line with formal
(reasoning capable) ontology
scope includes chemical entities (atoms,
substances, groups, molecules), roles and
subatomic particles
large numbers of curated molecules
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 17
18. SBML annotations are captured using the
Resource Description Framework (RDF)
<species metaid="_525530" id="GLCi"
Implicit subject compartment="cyto"
and xml attributes initialConcentration="0.097652231064563">
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
The annotation element xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
stores the RDF xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#"
xmlns:bqbiol="http://biomodels.net/biology-qualifiers/"
xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
subject <rdf:Description rdf:about="#_525530">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/>
predicate <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation> object
</species>
The intent is to express that the species represents a substance composed of glucose
molecules
We also know from the SBML model that this substance is located in the cytosol and with
a (initial) concentration of 0.09765M
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 18
20. It looks like another XML syntax, but it has RDF semantics!
What is the meaning of SBML’s RDF annotation?
<rdf:Description about=“#_551383”>
<bqmodel:is>
<rdf:Bag>
<rdf:li rdf:resource="urn:miriam:taxonomy:4932"/>
</rdf:Bag>
</bqmodel:is>
</annotation>
• The intent is to indicate that the model is a model of a yeast
• RDF semantics: #_551383 is a member of a set that is related by
bqmodel:is to a collection (rdf:Bag) that has a single member –
yeast (4932)
• RDF semantics does not match the intent!
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 20
21. Can we formalize and automatically verify the intended
meaning of the RDF annotation?
BioModels.net biology qualifiers
is, identity
The biological entity represented by the model element
has identity with the subject of the referenced resource
(modeling object B). This relation might be used to link
a reaction to its exact counterpart in a database, for
instance.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 21
22. Biomodels: Qualifiers
Qualifiers for the biological object represented by the model
component.
encodes/isEncodedBy
hasPart/isPartOf
hasProperty/isPropertyOf
hasVersion/ isVersionOf
is
isDescribedBy
isHomologTo
occursIn
http://www.ebi.ac.uk/miriam/main/qualifiers/
23. In this tutorial
You will learn how to create accurate knowledge
representations of annotated SBML models.
Features
• ontological commitment: terms in a vocabulary
correspond to formally defined classes and relations and
expressions formulated using the Web Ontology Language
(OWL) have an unambiguous interpretation
• upper level ontology of types and relations to distinguish
and constrain model entities to the spatio-temporal entities
they represent
• Reasoning to uncover inconsistencies, and how to repair
them.
• Advanced applications of OWL ontologies for answering
questions and providing biological insight
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 23
24. What is a model?
How does it differ from the thing it is
a model of?
25. Conceptualization (SBML)
• 2 kinds of entities:
o in silico: model components
o in vivo: the entities represented by a model
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 25
26. Conceptualization
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 26
27. SBML Conceptualization
• Instances of SBML model entities are syntactic entities
(in XML)
• SBML models represent biological phenomena and
structures (e.g., Cell cycle processes, Yeast cells, ...)
• Here we focus on Model, Compartment, Species,
Reaction
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 27
28. Formalization
• Formalization is the process by which we map a
conceptualization into a logical representation, which has a
particular interpretation.
• We first express the basic nature of what the terms refer to
by defining them in using a formal language. Next, we can
logically combine the terms to form expressions, which
have an unambiguous interpretation, and hence can be
automatically reasoned about.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 28
30. The Semantic Web
It is about standards for publishing, sharing and querying
knowledge drawn from diverse sources
It enables the answering of
sophisticated questions
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 30
31. The Semantic Web effort aims to develop an interoperable set
of standards for knowledge representation and reasoning
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 31
32. URI/IRI
• Uniform Resource Identifiers (URI) and
Internationalized Resource Identifiers (IRI) are
identifiers for resources, given a particular protocol
• We’re familiar with Uniform Resource Locators, which
species the use of the HTTP protocol to obtain a
document with that identifier.
– http://dumontierlab.com
• International Resource Identifiers (IRIs) include an
expanded set of international characters
• URI/IRIs are the basis for naming resources on the
Semantic Web.
– As names, they can also be used to identify non-information
resources, like people and places
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 32
33. Entity naming
• Uniform Resource Identifiers (URI) are identifiers for resources
given a particular protocol. Internationalized Resource Identifiers
(IRI) include an expanded set of international characters
• URI/IRIs can be used to name entities, both for digital media and
non-informational entities like people and places.
• Uniform Resource Name (URN) – only a name
o MIRIAM - Minimal Information Required In the Annotation of Models
data source and identifier combined in a single IRI -
urn:miriam:source:identifier
e.g. urn:miriam:uniprot:P62158
~ 40 sources defined at EBI registry...
• Uniform Resource Locator (URL) – a resolvable name
o Bio2RDF - Makes life sciences data available on the Semantic Web
o http://bio2rdf.org/uniprot:P62158
o content-type negotiation and explicit URLs resolve to an HTML/RDF/etc description
of it.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 33
34. Semantic Technologies: RDF vs OWL
RDF: simple triples, graph-based queries, supports
very large amount of data
OWL: significantly more expressive language,
strong axioms, inference capabilities, consistency
verification, but can be rather slow
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 34
35. Resource Description Framework (RDF)
Allows one to talk about anything
Uniform Resource Identifier (URI) can be used as entity
names
Bio2RDF specifies its naming convention
http://bio2rdf.org/uniprot:P05067 uniprot:P05067
is a name for Amyloid precursor protein
http://bio2rdf.org/omim:104300 omim:104300
is a name for Alzheimer disease
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 35
36. Resource Description Framework (RDF)
Allows one to express statements
“Amyloid
precursor
protein”
A RDF statement consists of: rdfs:label
– Subject: resource identified by a URI
uniprot:P05067
– Predicate: resource identified by a URI
rdf:type
– Object: resource or literal
uniprot:Protein
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 36
37. RDF has multiple serializations
RDF/XML
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:u="http://bio2rdf.org/uniprot:"
<rdf:Description rdf:about=“&u;Q16665">
<rdf:type rdf:resource=“&u;Protein"/>
</rdf:Description>
</rdf:RDF>
RDF/N3
PREFIX u: <http://bio2rdf.org/uniprot:>
<u:Q16665> a <u:Protein> .
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 37
38. Multi-Source Data Integration
Syntactic data integration depends on consistent naming
is a
uniprot:P05067 uniprot:Protein uniprot:Protein
UniProt
has name
+
located in located in
uniprot:P05067 go:Membrane uniprot:P05067 go:Membrane
Gene Ontology
+ interacts with
uniprot:P05067
interacts with
uniprot:P05067 uniprot:P05067
iRefIndex Unified view
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 38
39. Building statements creates knowledge
Amyloid Alzheimer
precursor Disease
protein
label label
is involved in
uniprot:P05067 omim:104300
is a is a
Protein Disease
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 39
40. Bio2RDF’s
RDFized data
fits together
syntactic integration ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with 40
41. SGD as RDF-based Linked Open Data
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 41
42. Bio2RDF links and provisions 40 high value
datasets
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 42
43. Bio2RDF now serving over
40 billion triples of linked biological data
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 43
44. SGD is provided by Bio2RDF and forms part of the
growing linked open data cloud
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 44
45. Semantic Integration
• Requires a level of abstraction/generalization
where the relationship between each resource
is formalized
– classes
– relations
– individuals
• How do we ensure that our representation
facilitates integration across datasets?
• How can we get our formalization to
interoperate with ontologies?
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 45
46. RDF-based Linked Data
• Provides the basis for simple data syndication and
syntactic data integration
o IRIs
o Statements (aka triples) take the form of
o <subject> <predicate> <object>
• Easy to implement
o stand-alone datasets
o logical layer over databases
• Limited reasoning
o class and property hierarchies
o domain/range restrictions
o can’t automatically discover inconsistency
• Standardized Queries - SPARQL
• Scalable - to billions of triples
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 46
48. The Web Ontology Language (OWL)
Has Explicit Semantics
Can therefore be used to capture knowledge in
a machine understandable way
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 48
49. OWL - The Web Ontology Language
• Enhanced vocabulary (strong axioms) to express
knowledge relating to classes, properties, individuals and
data values
o quantifiers (existential, universal, cardinality restriction)
o negation
o disjunction
o property characteristics
o complex classes in domain and range restrictions
o property chains
• Advanced reasoning
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 49
50. Advanced Reasoning
• Consistency: determines whether the ontology contains
contradictions.
• Satisfiability: determines whether classes can have
instances.
• Subsumption: is class C1 implicitly a subclass of C2?
• Classification: repetitive application of subsumption to
discover implicit subclass links between named classes
• Realization: find the most specific class that an individual
belongs to.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 50
51. OWL Challenges and Solutions
Inconsistency:
• needs to be resolved to ask any questions involving the
ontology
• Solution: explicitly accommodate multiple meanings,
remove contradictory axioms
Unsatisfiability (of a class):
• may indicate a modelling error
• needs to be resolved to ask meaningful questions about
the class
• Solution: explicitly accommodate multiple meanings,
redefine class, remove contradicting class restrictions
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 51
52. OWL Challenges and Solutions
Scalability:
• answers to OWL queries requires reasoning
• inference in OWL is highly complex (worst case: 2
NEXPTIME)
• highly optimized reasoners are getting better and better,
but can still be slow with large ontologies
• tractable OWL profiles (EL, QL, RL) enable more efficient
and guaranteed polynomial-time inferences
• use ontology modularization approaches to increase
performance
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 52
53. OWL can help you create rich, machine-
understandable descriptions!
• transform our expert knowledge into axioms and
expressions that can be automatically reasoned about
o a transcription factor is
a protein
that binds to DNA
and regulates the expression of a gene.
o can we mine 'omic datasets to discover which
proteins are transcription factors?
• create rich expressions from combinations of classes,
relations and individuals
• assert statements of truth using axioms.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 53
54. Linked data and OWL: Motivation
• use OWL reasoning to identify mistakes in RDF data
o incorrect content of assertions
o incorrect use of relations
o conflicting conceptualizations
o incorrect same-as assertions
• verify, fix and exploit Linked Data through expressive OWL
reasoning
• generate/infer new triples to write back into RDF and use
for efficient retrieval
Proposal:
Represent SBML biomodels into OWL from the implicit
relations and explicit attributes in XML/RDF.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 54
55. Elements of OWL 2.0
• The “ontology” of OWL 2 consists of:
• Classes
• Object properties
• Data properties
• Individuals
• Expressions
• Axioms
• Plus RDF stuff (like datatypes)
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 55
56. Axiomatization
• Axioms are statements that are assumed to be true in the
domain
• Axioms formally interrelate terms from conceptualization
step
every statement can be reduced to an expression based only on
primitive terms
Therefore: every axiom expressed only using primitive terms
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 56
57. Classes and class axioms
• a class is a set of individuals that share one or more characteristics
o a protein
• classes can be organized in a hierarchy using subClassOf axioms
o i.e. every member of C2 is a member of C1
o subClassOf (protein molecule)
• special classes
o owl:Thing is the superclass of all things
o owl:Nothing is the subclass of all things, denotes an empty set
• classes can be made disjoint from one another
o i.e. there is no member of C1 that is also a member of C2
o disjointClasses (protein DNA )
• classes can be said to be equivalent
o i.e. all members of C1 are members of C2 and all members of C2
are members of C1
o EquivalentClass (Peptide Polypeptide )
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 57
58. Object Properties and axioms
• an object property OP is a relation between two individuals
o 'has part' is an object property that denotes the mereological
relation between two individuals
• OPs can be organized in a hierarchy
o given OP1 and OP2 and OP2 is a subproperty of OP1 then if
an individual x is connected by OP2 to an individual y, then x is
also connected by OP1 to y.
o subPropertyOf ('has proper part' 'has part')
o owl:TopObjectProperty, owl:BottomObjectProperty
• We can restrict the domain and range to allowed values
• ObjectPropertyDomain ('is participant in', 'process')
• ObjectPropertyRange ('is participant in', 'physical entity')
• We can also assert objects to be disjoint or equivalent
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 58
59. description of object properties
• Inverse
o we say that 'has part' is an inverse for 'is part of'
o we can also refer to this as inv('is part of')
• Symmetric
o to cases where the inverse relation is the very same relation
o e.g. the inverse for 'is related to' is 'is related to‘
• Transitive
o a transitive relation if individual x is connected to an individual y
that is connected by to an individual z, then x is also connected
by to z
o e.g. 'has part' is transitive
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 59
60. description of object properties
• Reflexive
o reflexive infers that the relation automatically refers back to the
individual
o e.g. 'has part' is reflexive because protein has itself as a part.
• Functional
o restrict the range of the relation to a single individual, and
therefore all individuals in the range must be the same.
o e.g. 'has unique identifier‘
• Inverse Functional
o restrict the domain of the relation to a single individual, therefore
all individuals in the domain must be the same
o e.g. 'is unique identifier of'
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 60
61. Class Expressions
Class expressions are rich descriptions of classes through the
logical combination of ontological primitives (classes, object
properties, datatype properties, individuals)
Protein subClassOf
molecule and ‘has proper part’ min 2 ‘amino acid residues’
Combinations specified using logical operators
• conjunction (and), disjunction (or), negation (not)
Object or data property expressions provide a qualified cardinality
over the relation
o minimum: rel min # Y
o maximum: rel max # Y
o exact: rel exactly # Y (minimum + maximum)
o some: rel min 1 Y
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 61
62. Class Expressions
o The quantifications can qualified by the object type
o rel only Y – the only values allowed are of type Y
• To form complex class expressions like
o 'molecule' and not 'dna'
o 'has part' min 2 'amino acid'
o 'is located in' only ('nucleus' or 'cytoplasm')
• and be expressed as axioms in the ontology
Protein subClassOf
molecule and ‘has proper part’ min 2 ‘amino acid residues’
Transcription Factor equivalentTo
‘protein’
and ‘has disposition’ some ‘to bind to DNA’
and ‘has function’ some ‘to regulate gene expression’
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 62
63. What do the following mean, and
what biological thing might you
annotate with it?
C equivalentTo
‘has part’ exactly 2 polypeptide
M subClassOf
DNA and not molecule
64. OWL has multiple syntaxes
Functional-Style Syntax
ClassAssertion( :Person :Robert)
RDF Syntax
RDF/XML
<Person rdf:about="Robert"/>
RDF Turtle
:Robert rdf:type :Person .
Manchester Syntax
Individual: Robert
Types: Person
OWL/XML Syntax
<ClassAssertion> <Class IRI="Person"/> <NamedIndividual IRI="Robert"/>
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 64
65. OWL Reasoners
OWL DL Reasoners
• Pellet: Clark & Parsia, dual-licensed, Java.
• Fact++: Manchester University, open-source, C++ with a Java API.
• HermiT: Oxford University, open-source, Java.
• Racer Pro: Racer Systems, commercial, Lisp with a Java API.
OWL Profile/subset reasoners
• Jena: Hewlett-Packard, open-source, Java.
• OWLIM: Ontotext, dual-licensed, Java.
• CB:
• CEL:
• JCEL (Pellet)
• ELLY:
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 65
66. Formalization of XML/RDF using OWL
• For every triple, we want to create an axiom that
makes a commitment as to what the terms refer
to and what their combination necessarily
implies.
• We will also commit to expressing our knowledge
in a consistent manner, and this will allow other
information resources to be semantically
integrated (the expressions are comparable and
share the same semantics)
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 66
67. Triples to axioms
Convert RDF triples into OWL axioms.
Triple in RDF:
<nucleus> <part-of> <cell>
• Nucleus and Cell are classes
• part-of is a relation between 2 classes
• intended meaning:
every instance of Nucleus is partOf some instance of Cell
• formalize as OWL axiom:
Nucleus SubClassOf:
part-of some Cell
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 67
68. Triples to axioms: Many possible formalizations –
knowledge of logics and domain expertise comes in
handy here!
Convert RDF triples into OWL axioms.
Triple in RDF:
<C1 R C2>
• C1 and C2 are classes, R a relation between 2 classes
• intended meaning:
o C1 SubClassOf: C2
Challenge:
o C1 SubClassOf: R some C2
Formalizing data requires
o C1 SubClassOf: R only C2
one to commit to a
o C2 SubClassOf: R some C1
o C1 SubClassOf: S some C2
particular meaning – to
o C1 DisjointFrom: C2
make an ontological
o C1 and C2 SubClassOf: owl:Nothing commitment
o R some C1 DisjointFrom: R some C2
o C1 EquivalentClasses C2
o ...
• in general: P(C1, C2), where P is an OWL axiom (template)
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 68
69. Triples to axioms
Triple in RDF:
<Cytosol> <isLocationOf> <HXK1>
• Cell and HXK1 are classes
• isLocationOf is an axiom pattern involving 2 classes
• intended meaning:
every instance of HXK1 is located at some instance of Cytosol
• not intended:
for every instance of Cytosol, there is an instance of HXK1 located
in it.
HXK1 subClassOf
hasLocation some Cytosol
inv(isLocationOf) some Cytosol
70. Triples to axioms
Challenges
Formalizing RDF triples in OWL may introduce new OWL
object properties.
• Which object properties should be included?
• What axioms hold for included object properties?
• Can domain and range restrictions be generalized across
multiple domains, i.e., reused across multiple linked data
sources to ensure consistency between them?
Integration of OWL ontologies requires a common
semantic platform
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 70
71. Axiom Patterns for Triples
<nucleus> <part-of> <cell>
?X part-of ?Y
•translated to axiom pattern
?X subClassOf: part-of some ?Y
-> Nucleus subClassOf: part-of some Cell
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 71
72. Implementation
• expand relations in RDF based on relational patterns
• relational patterns are OWL axioms with 2 variables (which
are filled by subject and object, respectively)
• implementation based on OWL API
• adopt implementation of relational patterns in OBO
language (http://code.google.com/p/obo2owl/)
Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Herre,
Heinrich, and Rebholz-Schuhmann, Dietrich (2010). Relational patterns in OWL
and their application to OBO. OWL: Experiences and Directions (OWLED).
paper: http://www.webont.org/owled/2010/papers/owled2010_submission_3.pdf
presentation: http://www.slideshare.net/micheldumontier/relational-patterns-in-
owl-and-their-application-to-obo
BMC Bioinformatics: http://www.biomedcentral.com/1471-2105/11/441
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 72
73. Another way?
http://oppl2.sourceforge.net/
• OPPL is an abstract formalism that allows for
manipulating ontologies written in OWL.
• Use OPPL to select triples and create the axioms
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 73
74.
75. Which types and relations should we
use for our axiom patterns?
76. Top level ontologies contain generalized
(domain independent) classes and
relations
They can be used to constrain what can be said about these
entities (and hence will later be useful for checking the
consistency of data annotated using these terms).
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 76
77. Basic classes in top-level ontologies
• Material entity
• Example: Apple, Human, Cell, Planet
• Has mass as an quality
• Located in space and time
• Independent of other entities
• it exists in whole whenever it exists
• Quality
• Example: mass, color, concentration
• Dependent: always the quality of some entity
• Quality of object: size, shape, length
• Quality of process: duration, rate
• Quality of quality: shade (of color), intensity
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 77
78. Basic classes in top-level ontologies
• Function
• e.g. to bind, to catalyze (a reaction), to kill bacteria
• Dependent: always the function of some thing
• Similar to a property of an object
• Represents the potential to do something (an action) in
some process
• capabilities, dispositions and tendencies
• Process
• Example: running a marathon, binding, cell division
• Located in space and time
• Independent of other entities
• Temporally extended
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 78
79. Top-level ontologies can make a
commitment to these being disjoint
Material object, Process, Function and Quality are mutually
disjoint.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 79
81. Relations in top-level ontologies
• domain and range restrictions from top-level
ontology can be applied for general relations,
e.g.:
• ‘has material part’ can be restricted with "Material
object" as both domain and range
• ‘participates in’ can be restricted with a domain of
"Material object" and a range of "Process“
• re-use of relations (between instances) enables
inferences across resources
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 81
82. Relations impose additional constraints,
such that inconsistencies arise when
incorrectly used
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 82
83. Alignment with top-level ontology
Foundation of domain classes and relations in top-level
ontology:
• every domain class becomes a subclass of a class in top-
level ontology
• every object property used in OWL axioms becomes a sub-
property of an object property in the top-level ontology
• assert additional axioms to restrict domain classes and
delimit it from other domains (where appropriate)
o e.g., if a particular resources uses (in RDF) the relation
part-of exclusively between processes, the additional
constraint can be added to this relation
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 83
85. Top-level ontology
Application of a top-level ontology:
• can help to make the ontological commitment that is
employed within an information system explicit,
• can guarantee basic agreement about fundamental,
common types,
• Basic agreement about common relations,
• provides common domain and range restrictions across
multiple domains, and therefore
• enables re-use of relations and types across data sources,
domains, levels of granularities, information systems.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 85
86. Formalization of SBML Models:
• SBML models and model annotations are
converted into OWL axioms by making SBML's
ontological commitment explicit
• Implementation as conversion patterns
An explicit ontological commitment
establishes and implements a one-to-one
correspondence between SBML expressions
and a formal interpretation within an ontology.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 86
87. Bridging the gap: combine in vivo entities
and in silico entities in a common model
(an ontology) defined with axioms
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 87
88. Formalization
Reaction:
A reaction represents some transformation, transport or binding
process, typically a chemical reaction, that can change the
amount of one or more species. (Hucka et al.)
vs
a Model component that is part-of a Model and represents
some Process
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 88
89. Formalizing SBML models using OWL
Model component(x): a model entity that is part of a model
'model component' equivalentClass
'model entity' that 'is part of' some 'model'
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 89
90. Assumption 1: Every model represents
a material entity
OWL Axiom:
Model SubClassOf: represents some MaterialEntity
Conversion rule: a Model annotated with class C represents:
If C is a SubClassOf MaterialEntity then
M SubClassOf: represents some C
If C is a SubClassOf Function then
M SubClassOf: represents some (has-function some C)
If C is a SubClassOf Process then
M SubClassOf: represents some (has-function some (realized-by only
C))
91. BIOMODEL 82: Converting Model
Annotated with heterotrimeric G-protein complex cycle
(GO:0031684):
• represents an object O1
• O1 has a function F1
• F1 is realized by processes of the type heterotrimeric G-
protein complex cycle
• M SubClassOf: represents some O1
• O1 SubClassOf: (has-function some (realized-by
only GO:0031684)
92. Assumption 2: Every compartment
represents a material object
Compartment(x): a model component that represents a
material object which is part of the object represented by the
model to which the component belongs
Compartment subClassOf 'model component'
and represents some 'Material object'
Conversion rule:
• represents an object O2
• part of the object represented by the model
• compartment’s species represent objects that are located in O2
• C SubClassOf: represents some A2
• A2 SubClassOf: located-in some A1
93. BIOMODEL 82: Converting Compartment “Cell”
Annotated with Cell (GO:0005623)
• represents an object O2
• O2 is a kind of Cell
• O2 is a part of O1 (represented by BIOMODEL 82)
• C SubClassOf: represents some O2
• O2 SubClassOf: Cell and part-of some O1
94. Assumption 3: Every species
represents a material object
Species(x): a model component that represents a material
object which is part of the entity represented by the
compartment of which the species is a part
Species subClassOf 'model component'
and represents some 'Material object'
Species represents an O3 which
• can have functions
• the functions can be realized by processes
• can have qualities (charge, amount, …)
• is located in O2
95. BIOMODEL 82: Converting Species “GTP”
Annotated with GTP (CHEBI:15996)
• represents an object O3
• O3 is a kind of GTP
• O3 is located-in O2 (represented by “Cell”
compartment)
• S SubClassOf: represents some O3
• O3 SubClassOf: GTP and located-in some O2
• O3 SubClassOf: GTP and located-in some (Cell
and part-of some (has-function some (realized-by
only GO:0031684)))
96. Reactions as Functions, not Processes
Reactions represent Functions. Why not processes?
- Functions are capabilities while processes are
manifestations of these capabilities
- Processes have a duration, a time of occurrence,
participants, etc.
- Functions can be realized multiple times,
processes occur only once
- Processes may be represented by simulations
97. Assumption 4: Every reaction
represents a functional entity
Reaction(x): a model component that can include reactants,
products and modifiers and represents a functional entity
Reaction subClassOf 'model component'
and 'represents' some (
‘material entity’ and ‘has function’ some Function)
ListOfReactions(x): a List that has only Reactions as members
ListOfReactions
EquivalentTo:
List and 'has member' only 'reaction'
98. BIOMODEL 82: Converting Reaction “GTP-binding”
Annotated with GTP binding (GO:0005525)
• represents an object O4
• O4 has a function F4
• F4 is a kind of GTP binding
• F4 is realized by P4
• P4 has-input O3 (GTP)
•R SubClassOf: represents some (has-function some F4)
•F4 SubClassOf: GTP binding and realized-by only P
•P SubClassOf: has-input some O3
100. How would you formalize a model
annotate with:
A) heart
B) to pump blood
C) heart palpitations
101. SBML2OWL: Implementation
1. Read the model
• libSBML - http://sbml.org/Software/libSBML
2. Extract annotations from model & components
• libSBML & Jena - http://jena.sourceforge.net
3. Formalize each annotation according to the formalization
rules
• OWLAPI - http://owlapi.sourceforge.net/
4. Integrate with external ontologies
• OWLAPI
5. Reasoning
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 101
102. SBML2OWL: Implementation
Application to BioModels repository yields:
• OWL ontology with
• more than 300,000 classes
• More than 800,000 axioms
• 90,000 complex model annotations
• includes all referenced ontologies
o GO
o ChEBI
o Celltype
o FMA
o PATO
o (KEGG, Reactome)
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 102
103. SBML2OWL: Implementation
OWLAPI:
• Ontology consists of
o a signature (classes, object properties, individuals)
o a set of axioms
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 103
105. Verification, querying, integration
What can we do with the combined knowledge base?
1. Verification
2. Querying
3. Interoperability and knowledge integration
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 105
106. Operations on OWL ontologies
Consistency checking will identify contradictions in the stated
and inferred knowledge. Consistency checking also helps to
implement other reasoning tasks.
• Satisfiability: determines whether classes can have
instances.
• Subsumption: is class C1 implicitly a subclass of C2?
Check if C1 and not C2 is unsatisfiable, i.e., there is no
instance of C1 that is not also an instance of C2
• Classification: repetitive application of subsumption to
discover implicit subclass links between named classes
• Realization: find the most specific class that an individual
belongs to. Does individual a classify into the class C?
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 106
107. Practical reasoning with OWL
ontologies
• Ontology editors such as Protege interface with reasoners to
perform consistency and class satisfiability,
classification, realisation, and provide explanations.
• Some reasoners are setup to be used as the command line
to execute requests including SPARQL querying.
• Programmatic use of reasoners via APIs. Maximal flexibility,
e.g., one can request all subclasses of a given class,
including implicit once, or all entailed statements with a
specified subject and predicate
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 107
108. Operations on OWL ontologies
Consistency checking will identify contradictions in the stated
and inferred knowledge. Consistency checking also helps to
implement other reasoning tasks
• Satisfiability: determines whether classes can have
instances.
• Subsumption: is class C1 implicitly a subclass of C2?
Check if C1 and not C2 is unsatisfiable, i.e., there is no
instance of C1 that is not also an instance of C2
• Classification: repetitive application of subsumption to
discover implicit subclass links between named classes
• Realization: find the most specific class that an individual
belongs to. Does individual a classify into the class C?
Check if a : ¬C is consistent with the underlying ontology.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 108
109. Classifying the ontology
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 109
110. Classifying the ontology
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 110
111. Classifying the ontology
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 111
112. Verification
• Use of OWL reasoning for classification
• Which classes are unsatisfiable?
• Unsatisfiable classes are equivalent to owl:Nothing
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 112
113. Model verification
After reasoning, we found 27 models to be inconsistent
reasons
1. our representation - functions sometimes found in the place
of physical entities (e.g. entities that secrete insulin). better
to constrain with appropriate relations
2. SBML abused - species used as a measure of time
3. constraints in the ontologies themselves mean that the
annotation is simply not possible
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 113
114. Compartments/species annotated with
functions or processes
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 114
116. Biological inconsistency: Biomodel 176
[Term]
id: GO:0016887
name: ATPase activity
is a: GO:0017111
intersection of: GO:0003824 ! catalytic activity
intersection of: has input CHEBI:15377 ! water
intersection of: has input CHEBI:15422 ! ATP
intersection of: has output CHEBI:16761 ! ADP
intersection of: has output CHEBI:26020 !
phosphates
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 116
117. Finding inconsistencies with
axiomatically enhanced ontologies
We add:
• GO: ATP + Water the only inputs (=2 quantification)
• ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all
different (disjointness)
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 117
118. Consistency repair
• Unsatisfiable classes result from contradictory class
definitions
• Conflict in asserted axioms, in imported ontologies or
through combination of both
• Conflicts can be hidden through domain/range
restrictions, subclass relations, axioms for relations,
etc.
• Conflicting axioms may be challenging to identify!
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 118
120. Protege 4: Explanation Workbench
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 120
121. Ontology repair and disambiguation
• Ontological commitment may have been too strong
• Complex relations (between classes) can be relaxed by
explicitly introducing a disjunction
• Example:
o Assumption 1: models represent material objects
o model is annotated with the process Glycolysis
o process and material object are disjoint, therefore the
KB will contain unsatisfiable classes
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 121
122. Disambiguation pattern
disambiguation pattern: models annotated with X represents
material objects X, or
material objects with function X, or
material objects with function that is realized by X.
disambiguation patterns are applicable if multiple alternatives
are mutually disjoint
automated reasoning will then eliminate all but one option
123. Disambiguation: Model annotations
Assertion:
M SubClassOf: represents some C or represents
some (has-function some C) or represents some
(has-function some (realized-by only C))
C SubClassOf: MaterialEntity
Then:
• represents some C is satisfiable
• represents some (has-function some
C) and represents some (has-function some
(realized-by only C)) are unsatisfiable
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 123
124. Disambiguation: Model annotations
Assertion:
M SubClassOf: represents some C or represents
some (has-function some C) or represents some
(has-function some (realized-by only C))
C SubClassOf: Function
Then:
• represents some (has-function some C) is
satisfiable
• represents some C and represents some (has-
function some (realized-by only C)) are
unsatisfiable
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 124
125. Disambiguation: Model annotations
Assertion:
M SubClassOf: represents some C or represents
some (has-function some C) or represents some
(has-function some (realized-by only C))
C SubClassOf: Process
Then:
• represents some (has-function some (realized-by
only C)) is satisfiable
• represents some C and represents some (has-
function some C) are unsatisfiable
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 125
126. Aside from the disjunction pattern,
what else could be used for
consistency repair?
127. Once consistent, we can query the
ontology and infer new knowledge
what would YOU ask of your
formalized knowledge base?
128. Knowledge discovery and retrieval
• All queries are of the form:
o Query class: Y
o List all subclasses (and descendant classes),
equivalent classes, superclasses (and ancestor
classes)
o Some OWL reasoners perform only classification and
output the classified taxonomy
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 128
129. Knowledge discovery and retrieval
• Query: list all models
• Query type: subclasses
• Query class: Model
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 129
130. Knowledge discovery and retrieval
• Query: list all reactions that are part of
BIOMD0000000169
• Query type: subclasses
• Query class:
Reaction and part-of some BIOMD0000000169
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 130
131. Knowledge discovery and retrieval
• Query: list all models that represent Glycolysis
• Query type: subclasses
• Query class:
Model and represents some (has-function some
(realized-by only Glycolysis))
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 131
132. Knowledge discovery and retrieval
• Query: list all models that have a compartment
that represents a part of a Cell in which a sugar
is located
• Query type: subclasses
• Query class:
Model and has-part some (Compartment and
represents some (part-of some Cell and
contains some Sugar))
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 132
133. Knowledge discovery and retrieval
• Query: list all Model entities that represent
catalytic activity involving sugar in the
endocrine pancreas
• Query type: subclasses
• Query class:
represents some (has-function some 'catalytic
activity' and realized-by only (has-participant
some (sugar and contained-in some (part-of
some 'Endocrine pancreas'))))
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 133
134. Knowledge discovery and retrieval
• Query: list all Model entities that represent
mutagenic central nervous system drugs in the
gastrointestinal system
• Query type: subclasses
• Query class:
represents some (has-part some ('has role' some
'central nervous system drug' and 'has role'
some mutagen and part-of some
'Gastrointestinal system')
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 134
135. Answering questions
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 135
136. Automated reasoning
• more than 800,000 axioms
• included ontologies contains several thousand axioms
o GO has approx. 35,000 classes
o ChEBI contains almost 100,000 classes
o complex definitions of classes create links between
large ontologies
• Reasoning in OWL 2 DL is highly complex (worst-case
2NEXPTIME complete - 2^(2^n) - with n the number of
operators used in the ontology)
• Consequence: OWL reasoning can rarely be employing in
a large scale.
• Expressive OWL reasoners do not classify the formalized
biomodels repository.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 136
137. OWL Reasoners
OWL DL Reasoners
• Pellet: Clark & Parsia, dual-licensed, Java.
• Fact++: Manchester University, open-source, C++ with a Java
API.
• HermiT: Oxford University, open-source, Java.
• Racer Pro: Racer Systems, commercial, Lisp with a Java API.
OWL Profile/subset reasoners
• Jena: Hewlett-Packard, open-source, Java.
• OWLIM: Ontotext, dual-licensed, Java.
• CB:
• CEL:
• JCEL (Pellet)
• ELLY:
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 137
138. Implementation in information systems
• Classification of model ontology: 10-120min
• Answering complex queries: up to several
hours
• Consequence: OWL reasoning can rarely be
employing in a large scale
• Subsets of OWL allow tractable (polynomial-
time) automated reasoning
• OWL EL suitable for ontologies with a large
number of classes
• Problem: convert ontologies into tractable
subset of OWL
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 138
139. OWL Profiles
• OWL 2 defines three different tractable profiles:
• EL
o polynomial time reasoning for schema and data
o Useful for ontologies with large conceptual part
• QL
o fast (logspace) query answering using RDBMs via SQL
o Useful for large datasets already stored in RDBs
• RL
o fast (polynomial) query answering using rule-extended
DBs
o Useful for large datasets stored as RDF triple
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 139
140. OWL RL
Features:
• identity of classes, instances, properties
• subproperties, subclasses, domains, ranges
• union and intersection of classes (some restrictions)
• property characterizations (functional, symmetric, etc)
• property chains
• keys
• some property restrictions (but not all inferences are
possible)
Limitations:
• not all datatypes are available
• no datatype restrictions
• no minimum or exact cardinality restrictions
• maximum cardinality only with 0 and 1
• some consequences cannot be drawn
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 140
141. OWL EL
Features
• existential quantification to a class expression or data range
• existential quantification to an individual or a literal
• self-restriction
• enumerations involving a single individual or a single literal
• intersection of classes and data range
• class axioms: subClassOf, equivalence, disjointness
• property axioms: domain, range, equivalence, transitive, reflexive, inclusion
with or without property chains; functional data properties. keys.
• assertions (sameAs, DifferentFrom, Class, Object Property, Data Property,
Negative Object/Data Property
Not supported
• universal quantification to a class expression or a data range
• cardinality restrictions
• disjunction (union)
• class negation
• enumerations involving more than one individual
• object properties: disjoint, symmetric,
asymmetric, irreflexive, inverse, functional and inverse-functional
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 141
142. Ontology modularization
Can we automatically extract a large (maximal) OWL (EL, QL,
RL) module from an ontology?
1. D EquivalentTo: not A (not EL)
2. C EquivalentTo: not B (not EL)
3. B subClassOf: A (EL)
Inference:
• D subClassOf: C (EL) (Inference from (1)-(3))
EL module of (1)-(3):
• {B subClassOf: A}, or
• {B subClassOf: A, D subClassOf: C}
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 142
143. EL Vira modularization
http://el-vira.googlecode.com
• ontology modularization
• identify EL, QL, RL axioms in deductive closure
• retain signature of ontology
• maximality is an open problem
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 143
144. Outcomes
The SBML-derived ontologies can be
i) checked for their consistency, thereby uncovering erroneous
curations
ii) infer attributes and relations of the substances,
compartments and reactions beyond what was originally
described in the models
iii) answer sophisticated questions across a model knowledge
base
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 144
146. Phenotypes
Phenotypes are observable characteristics of an
organism.
Examples include:
– Red hair
– Heart rate of 120bpm
– Absent arm
– Malfunctional liver
Phenotypes include comparisons such as
Increased heart rate
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 146
147. Phenotype and anatomy ontologies
anatomy ontologies: > 100,000 classes
– FMA, MA, WA, ZFA, FA, GO-CC, ...
phenotype ontologies: > 20,000 classes
– HPO, MP, WBPhenotype, FBcv, APO, ...
quality ontology: > 2,000 classes
– PATO
process and function ontologies: > 25,000 classes
– Gene Ontology, ...
alignments between anatomy ontologies
– UBERON, various mappings
148. Phenotype: Example question
Find all regions in the human, mouse, fish, fly,
worm and yeast genome that are associated
with tetralogy of Fallot.
151. Phenotype descriptions
Overriding aorta (HP:0002623):
– Q: overlap with (PATO:0001590)
– E1: Aorta (FMA:3734)
– E2: Membranous part of interventricular septum
(FMA:7135)
HP:0002623 EquivalentTo:
phene-of some (has-part some (FMA:3734 and
has-quality some (PATO:0001590 and towards some FMA:7135)))
153. Mouse phenotype
Overriding aorta (MP:0000273):
– Q: overlap with (PATO:0001590)
– E1: Aorta (MA:0000062)
– E2: Membranous interventricular septum (MA:0002939)
MP:0000273 EquivalentTo:
phene-of some (has-part some (MA:0000062 and
has-quality some (PATO:0001590 and towards some
MA:0002939)))
Consequence: MP:000272 EquivalentTo: HP:0002623
154. Absence: absent appendix
Absent appendix:
– Q: lacks all parts of type (PATO:0002000)
– E1: Human body (FMA:20394)
– E2: Appendix (FMA:14542)
AbsentAppendix EquivalentTo: LacksParts and towards some Appendix and
inheres-in some HumanBody
AbsentAppendix EquivalentTo: LacksParts and towards some {Appendix} and
inheres-in some HumanBody
AbsentAppendix EquivalentTo: phene-of some (HumanBody and not has-part
some Appendix)
155. Absence and inconsistency
AbsentAppendix SubClassOf: phene-of some (HumanBody and not has-part
some Appendix)
HumanBody SubClassOf: has-part some Appendix
HumanBody(John). AbsentAppendix(x). has-phene(John,x).
156. Inconsistency removal
– Removal of conflicting axioms (has-part/part-of in anatomy)
– Contextualize anatomy:
• Normal and HumanBody SubClassOf: has-part some
(Normal and Appendix)
– Use of non-monotonic reasoning
157. Ontology of phenotypes
Different formal expressions for phenotypes based on
– qualities,
– anatomical parts,
– functions,
– processes
161. PhenomeBLAST
– apply definition patterns to yeast, fly, worm, fish, mouse
and human phenotypes and integrate in single ontology
– phenotype alignment through OWL reasoning
– more than 300,000 classes and 1,000,000 axioms
– combination of HermiT (for EL Vira modularization), CB
and CEL reasoner
– classification time: 7 minutes
http://phenomeblast.googlecode.org
163. Comparison of phenotypes
direct comparison of phenotypes:
– disease phenotypes, e.g., tetralogy of Fallot
– phenotypes associated with genetic mutations
(genotypes in mouse, fish, etc.)
164. Comparison of phenotypes
When the phenotype annotation of a genotype becomes a
subclass of a disease phenotype, then we can infer a gene-
disease association if
– disease phenotypes sufficient for having the disease
– mutation phenotypes necessary for having a specific
genotype
Inference over ontologies can establish a formal proof for a
gene-disease association.
165. Knowledge discovery
Similarity-based comparison allows for incomplete and noisy
information.
– pairwise comparison of phenotypes
– similarity: weighted Jaccard index
– result: similarity matrix between phenotypes
– (quantitative) evaluation based on predicting orthology,
pathway, disease
– identify novel gene-disease associations
169. What does the future hold?
Better formalized ontologies
Dynamic generation of knowledge
through semantic web services
…
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 169
170. Summary - RDF and OWL
RDF provides
• light-weight semantics
• fast queries
• highly scalable implementations
• large volumes of data (e.g., DBPedia, other Linked Data
repositories)
OWL provides
• Constructs to formalize the intended semantics
• An OWLAPI to develop, manage, and serialize OWL
ontologies
• Efficient reasoners of get inferences, compute modules
and get explanations.
• syntactic subset for better performance, albeit some
inferences may be lost
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 170
171. Summary - OWL & Formal languages
• Formal logic-based languages can be used to formalize the
meaning of terms used in discourse. While normally
restricted in terms of what can be expressed, the statements
formed can be automatically reasoned about.
• OWL is based on description logics and formalizes the
meaning of terms with axioms. Axioms can be used to
characterize and distinguish classes, relations and
individuals. Rich expressions can be crafted from logical
combinations of language primitives including conjunction,
disjunction, negation and object/dataproperty restrictions.
• OWL reasoners provide a number of services including
computing subsumption, satisfiability, entailment, realization
and query answering.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 171
172. Summary - Exploitation of ontologies
• verification: automated reasoning can reveal contradictory
definitions of classes (unsatisfiable classes), instances that
violate constraints in the ontology (often leading to
inconsistent ontologies) and reveal hidden inferences (that
may be considered invalid through manual verification
• querying: ontologies define an explicit, formal language
based on which queries to a knowledge base can be
performed; queries can be made for instances and for
classes satisfying complex conditions
• repair: through explicit definitions using disjunction,
constraints can be relaxed and contradictions reduced
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 172
173. Summary - Ontology
Ontology is not philosophy!
• an ontology is a specification of a conceptualization of a
domain
• a conceputalization is a system of categories accounting for
a particular view on the world
• ontologies are used to make some aspects of the intended
meaning of terms in a vocabulary explicit
• ontologies (in computer science) may utilize philosophical
theories
• formalized ontologies can be used by humans and
automated systems as a basis for communication and data
exchange
• Ontologies are useful tools for translational research
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 173
174. Summary - Implementation in information systems
• The OWLAPI is a reference implementation of the OWL
specification and facilitates the development, management
and serialization of expressive OWL ontologies. The
OWLAPI also facilitates modularization and getting
explanations.
• OWL provides a syntactic subset of the language for
efficient reasoning. These so-called OWL profiles (EL, RL,
QL) have well understood computational properties and can
lead to better performance, but with some inferences lost.
• Formal ontology makes it possible to not only retrieve data
(similar to db), but also query the concepts themselves
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 174
175. Summary - evaluation
• ontologies are tools to support science
• Ontologies can provide insight into real biological/scientific
problems
• quantifiable evaluation can be performed, e.g., based on
precision/recall or ROC analysis
• application of ontologies may go beyond reasoning alone
and use statistical analyses (enrichment), semantic
similarity, graph algorithms, clustering, etc.
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 175
176. Conclusions
• Ontologies + Semantic Web enables
• Integration
• Verification
• Analysis
• Discovery
• Translational research
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 176
177. Acknowledgements
George Gkoutos
Heinrich Herre
Janet Kelso
Dietrich Rebholz-Schuhmann
Anika Oellrich
Michael Ashburner
Dan Cook
John Gennari
Paul Schofield
178. michel_dumontier@carleton.ca
leechuck@leechuck.de
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 178