SlideShare ist ein Scribd-Unternehmen logo
1 von 178
Downloaden Sie, um offline zu lesen
Biomedical Ontologies for data
  integration and verification
 Michel Dumontier and Robert Hoehndorf

Carleton University, University of Cambridge
  ISMB tutorial @ Vienna. July 16,2011
  ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   1
Outline
1. General background (10min)
    o    an introduction to the use-case: systems biology, SBML and BioModels
2. Ontological analysis (45 min)
    o    how to express domain content as formal knowledge using the Web Ontology Language
         (OWL)
3. Application of formal ontology to consistency and data verification (30min)
    o    how to use the OWL formalization to verify the accuracy of annotations, data and constraints
         in a domain
4. Break (30min)
5. Mapping, repair and disambiguation using ontologies (30min)
    o    how to relax and disambiguate constraints on ontologies to obtain consistent representation of
         domain content
6. Knowledge discovery, retrieval and querying (15min)
    o    how to answer questions that require the inference of knowledge through automated
         reasoning
7. Efficient implementation in software systems (15min)
    o    how to convert ontologies in efficient formal representations amenable to high-throughput
         analyses
8. Applications in Bioinformatics (25min)
    1.   how the formalized ontologies can be used to perform bioinformatics analyses
– Discussion and questions (15min)


               ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies      2
Systems Biology

We create and simulate biological models to :
• gain insight into the structure and function
  of biochemical networks
• reveal metabolic and signalling
  capabilities so as to predict phenotypes
• undertake metabolic engineering to
  maximize some desired product

To do this, we need
 • to integrate & manage our data &
   knowledge in a coherent, scalable and
   machine understandable manner

• efficient software to execute
  computationally demanding simulations
           ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   3
Bio-ontologies




• Provide rich human and machine understandable descriptions of
  the terms they purport to describe
• Have value for semantic annotation of data, which allows
  integration across domains (granularity, species, experimental
  methods)
• Facilitate granular and cross-domain queries
• Can be used to obtain explanations for inferences drawn
• Can be efficiently processed by algorithms and software

         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   4
Biomodels are semantically annotated
            SBML models
• EBI managed resource
• 600+ models available as
  SBML
• 300+ models are curated
  with GO process, function
  and component terms,
  and has links to protein
  databases.
• Possible to browse by
  GO terms:

                                           http://www.ebi.ac.uk/biomodels-main/



         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   5
Objective:
Computational Knowledge Discovery
• Terminological resources increasingly being used to
  annotate SBML-based biomolecular models
   o Makes it easier to explore or find models

• By converting models into formal representations of
  knowledge we get to:
   o validate the accuracy of the annotations
   o infer knowledge explicit in terminological resources
   o discover biological implications inherent in the models.




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   6
SBML

XML-based representation of biochemical models, their
components (compartments, species, reactions, events),
descriptors (rules, constraints, functions, units)

Consider the following enzymatic reaction:




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   7
SBML captures reaction kinetics using
        an XML-based format

<?xml version="1.0" encoding="UTF-8"?>
<sbml level="2" version="3" xmlns="http://www.sbml.org/sbml/level2/version3">
  <model name="EnzymaticReaction">
    <listOfUnitDefinitions>
       <unitDefinition id="per_second">
          <listOfUnits>
             <unit kind="second" exponent="-1"/>
          </listOfUnits>
       </unitDefinition>
       <unitDefinition id="litre_per_mole_per_second">
          <listOfUnits>
             <unit kind="mole" exponent="-1"/>
             <unit kind="litre" exponent="1"/>
             <unit kind="second" exponent="-1"/>
          </listOfUnits>
       </unitDefinition>
    </listOfUnitDefinitions>
    <listOfCompartments>
       <compartment id="cytosol" size="1e-14"/>
    </listOfCompartments>
    <listOfSpecies>
       <species compartment="cytosol" id="ES" initialAmount="0" name="ES"/>
       <species compartment="cytosol" id="P" initialAmount="0" name="P"/>
       <species compartment="cytosol" id="S" initialAmount="1e-20" name="S"/>
       <species compartment="cytosol" id="E" initialAmount="5e-21" name="E"/>
    </listOfSpecies>




                    ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   8
<listOfReactions>
   <reaction id="veq">
      <listOfReactants>
         <speciesReference species="E"/>
         <speciesReference species="S"/>
      </listOfReactants>
      <listOfProducts>
         <speciesReference species="ES"/>
      </listOfProducts>
      <kineticLaw>
         <math xmlns="http://www.w3.org/1998/Math/MathML">
            <apply>
               <times/>
               <ci>cytosol</ci>
               <apply>
                  <minus/>
                  <apply>
                    <times/>
                    <ci>kon</ci>
                    <ci>E</ci>
                    <ci>S</ci>
                  </apply>
                  <apply>
                    <times/>
                    <ci>koff</ci>
                    <ci>ES</ci>
                  </apply>
               </apply>
            </apply>
         </math>
         <listOfParameters>
            <parameter id="kon" value="1000000" units="litre_per_mole_per_second"/>
            <parameter id="koff" value="0.2" units="per_second"/>
         </listOfParameters>
      </kineticLaw>
   </reaction>



                 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   9
<reaction id="vcat" reversible="false">
           <listOfReactants>
              <speciesReference species="ES"/>
           </listOfReactants>
           <listOfProducts>
              <speciesReference species="E"/>
              <speciesReference species="P"/>
           </listOfProducts>
           <kineticLaw>
              <math xmlns="http://www.w3.org/1998/Math/MathML">
                 <apply>
                    <times/>
                    <ci>cytosol</ci>
                    <ci>kcat</ci>
                    <ci>ES</ci>
                 </apply>
              </math>
              <listOfParameters>
                 <parameter id="kcat" value="0.1" units="per_second"/>
              </listOfParameters>
           </kineticLaw>
        </reaction>
     </listOfReactions>
   </model>
</sbml>




                      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   10
SBML models may feature several
        components




   ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   11
SBML specifies the number and kind of
attributes models and components can have




     ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   12
It’s up to the modeler to use those
   attributes in a meaningful way


what models have you produced?
Biomodels are semantically annotated
            SBML models
• EBI managed resource
• 600+ models available as
  SBML
• 300+ models are curated
  with GO process, function
  and component terms,
  and has links to protein
  databases.
• Possible to browse by
  GO terms:

                                           http://www.ebi.ac.uk/biomodels-main/



         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   14
Energy (ATP) is produced from glycolysis (break
                  down of glucose) in a series of enzyme-catalyzed
                  biochemical reactions.

                  Fermentation regenerates NAD+ so it can be re-
                  used to metabolize more glucose

                  Analysis and optimization of metabolic pathways
                  important for biotechnology




ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   15
Gene Ontology

• over 30,000 terms

• covers
   o biological processes
   o molecular functions
   o cellular components

• terms organized around "is
  a" hierarchy

• terms further described with
  'has part'/'part of'; 'regulates'
  and '+ regulates', '- regulates'

          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   16
Chemical Entities of Biological Interest
                (ChEBI)
recently refactored to be in line with formal
(reasoning capable) ontology

scope includes chemical entities (atoms,
substances, groups, molecules), roles and
subatomic particles

large numbers of curated molecules




          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   17
SBML annotations are captured using the
     Resource Description Framework (RDF)
                                           <species metaid="_525530" id="GLCi"
Implicit subject                         compartment="cyto"
and xml attributes                       initialConcentration="0.097652231064563">

                                           <annotation>
                                             <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     The annotation element              xmlns:dc="http://purl.org/dc/elements/1.1/"
                                         xmlns:dcterms="http://purl.org/dc/terms/"
     stores the RDF                      xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#"
                                         xmlns:bqbiol="http://biomodels.net/biology-qualifiers/"
                                         xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
           subject                             <rdf:Description rdf:about="#_525530">
                                                <bqbiol:is>
                                                  <rdf:Bag>
                                                   <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/>
                    predicate                      <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/>
                                                  </rdf:Bag>
                                                </bqbiol:is>
                                               </rdf:Description>
                                             </rdf:RDF>
                                            </annotation>                                 object
                                           </species>
The intent is to express that the species represents a substance composed of glucose
molecules
We also know from the SBML model that this substance is located in the cytosol and with
a (initial) concentration of 0.09765M
             ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies         18
annotated models contain references to
entities described elsewhere

Pubmed - papers                        <model>
                                       <annotation>
ChEBI - chemicals                     `<bqmodel:isDescribedBy>
                                            <rdf:Bag>
UniProt - proteins                            <rdf:li rdf:resource="urn:miriam:pubmed:17667951"/>
KEGG - chemicals,                           </rdf:Bag>
                                          </bqmodel:isDescribedBy>
reactions                                  <bqbiol:hasPart>
E.C. - reactions                            <rdf:Bag>
                                              <rdf:li rdf:resource="urn:miriam:kegg.pathway:sce00010"/>
Gene Ontology -                               <rdf:li rdf:resource="urn:miriam:obo.go:GO%3A0019642"/>
                                           </rdf:Bag>
functions, reactions,                    </bqbiol:hasPart>
compartments                             <bqmodel:is>
                                          <rdf:Bag>
Taxonomy - organism                         <rdf:li rdf:resource="urn:miriam:taxonomy:4932"/>
                                          </rdf:Bag>
                                         </bqmodel:is>




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies        19
It looks like another XML syntax, but it has RDF semantics!
      What is the meaning of SBML’s RDF annotation?

     <rdf:Description about=“#_551383”>
      <bqmodel:is>
       <rdf:Bag>
          <rdf:li rdf:resource="urn:miriam:taxonomy:4932"/>
        </rdf:Bag>
      </bqmodel:is>
     </annotation>


• The intent is to indicate that the model is a model of a yeast
• RDF semantics: #_551383 is a member of a set that is related by
  bqmodel:is to a collection (rdf:Bag) that has a single member –
  yeast (4932)
• RDF semantics does not match the intent!
         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   20
Can we formalize and automatically verify the intended
meaning of the RDF annotation?

                 BioModels.net biology qualifiers




  is, identity
  The biological entity represented by the model element
  has identity with the subject of the referenced resource
  (modeling object B). This relation might be used to link
  a reaction to its exact counterpart in a database, for
  instance.
        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   21
Biomodels: Qualifiers

Qualifiers for the biological object represented by the model
  component.
encodes/isEncodedBy
hasPart/isPartOf
hasProperty/isPropertyOf
hasVersion/ isVersionOf
is
isDescribedBy
isHomologTo
occursIn


                              http://www.ebi.ac.uk/miriam/main/qualifiers/
In this tutorial
You will learn how to create accurate knowledge
representations of annotated SBML models.

Features
 • ontological commitment: terms in a vocabulary
   correspond to formally defined classes and relations and
   expressions formulated using the Web Ontology Language
   (OWL) have an unambiguous interpretation
 • upper level ontology of types and relations to distinguish
   and constrain model entities to the spatio-temporal entities
   they represent
 • Reasoning to uncover inconsistencies, and how to repair
   them.
 • Advanced applications of OWL ontologies for answering
   questions and providing biological insight
          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   23
What is a model?

How does it differ from the thing it is
           a model of?
Conceptualization (SBML)

• 2 kinds of entities:
   o in silico: model components
   o in vivo: the entities represented by a model




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   25
Conceptualization




     ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   26
SBML Conceptualization

 • Instances of SBML model entities are syntactic entities
   (in XML)

 • SBML models represent biological phenomena and
   structures (e.g., Cell cycle processes, Yeast cells, ...)

 • Here we focus on Model, Compartment, Species,
   Reaction




        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   27
Formalization

• Formalization is the process by which we map a
  conceptualization into a logical representation, which has a
  particular interpretation.

• We first express the basic nature of what the terms refer to
  by defining them in using a formal language. Next, we can
  logically combine the terms to form expressions, which
  have an unambiguous interpretation, and hence can be
  automatically reasoned about.




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   28
Have you heard of the Semantic Web?
The Semantic Web

It is about standards for publishing, sharing and querying
                    knowledge drawn from diverse sources
                                  It enables the answering of
                                      sophisticated questions




        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   30
The Semantic Web effort aims to develop an interoperable set
  of standards for knowledge representation and reasoning




        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   31
URI/IRI

• Uniform Resource Identifiers (URI) and
  Internationalized Resource Identifiers (IRI) are
  identifiers for resources, given a particular protocol
• We’re familiar with Uniform Resource Locators, which
  species the use of the HTTP protocol to obtain a
  document with that identifier.
  – http://dumontierlab.com
  • International Resource Identifiers (IRIs) include an
    expanded set of international characters
  • URI/IRIs are the basis for naming resources on the
    Semantic Web.
  – As names, they can also be used to identify non-information
    resources, like people and places
        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   32
Entity naming

• Uniform Resource Identifiers (URI) are identifiers for resources
  given a particular protocol. Internationalized Resource Identifiers
  (IRI) include an expanded set of international characters
• URI/IRIs can be used to name entities, both for digital media and
  non-informational entities like people and places.

• Uniform Resource Name (URN) – only a name
   o   MIRIAM - Minimal Information Required In the Annotation of Models
         data source and identifier combined in a single IRI -
          urn:miriam:source:identifier
         e.g. urn:miriam:uniprot:P62158
         ~ 40 sources defined at EBI registry...

• Uniform Resource Locator (URL) – a resolvable name
   o   Bio2RDF - Makes life sciences data available on the Semantic Web
        o   http://bio2rdf.org/uniprot:P62158
        o   content-type negotiation and explicit URLs resolve to an HTML/RDF/etc description
            of it.
              ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   33
Semantic Technologies: RDF vs OWL


RDF: simple triples, graph-based queries, supports
very large amount of data

OWL: significantly more expressive language,
strong axioms, inference capabilities, consistency
verification, but can be rather slow




        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   34
Resource Description Framework (RDF)
                                                          Allows one to talk about anything


Uniform Resource Identifier (URI) can be used as entity
names
Bio2RDF specifies its naming convention


http://bio2rdf.org/uniprot:P05067                                              uniprot:P05067
   is a name for Amyloid precursor protein


http://bio2rdf.org/omim:104300                                                    omim:104300
   is a name for Alzheimer disease

      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies     35
Resource Description Framework (RDF)
                                                           Allows one to express statements

                                                                                 “Amyloid
                                                                                 precursor
                                                                                  protein”

A RDF statement consists of:                                                              rdfs:label
– Subject: resource identified by a URI
                                                                              uniprot:P05067
– Predicate: resource identified by a URI
                                                                                              rdf:type
– Object: resource or literal
                                                                               uniprot:Protein




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies              36
RDF has multiple serializations

RDF/XML
<?xml version="1.0"?>
<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:u="http://bio2rdf.org/uniprot:"

    <rdf:Description rdf:about=“&u;Q16665">
        <rdf:type rdf:resource=“&u;Protein"/>
    </rdf:Description>
</rdf:RDF>



RDF/N3
PREFIX u: <http://bio2rdf.org/uniprot:>

<u:Q16665> a <u:Protein> .



        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   37
Multi-Source Data Integration
                             Syntactic data integration depends on consistent naming


                      is a
uniprot:P05067                    uniprot:Protein                                                uniprot:Protein
                  UniProt
                                                                           has name

                      +
                   located in                                                    located in
uniprot:P05067                    go:Membrane                  uniprot:P05067                    go:Membrane

              Gene Ontology

                      +                                                interacts with
                                                                                                 uniprot:P05067
                 interacts with
uniprot:P05067                    uniprot:P05067
                  iRefIndex                                    Unified view



         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies                     38
Building statements creates knowledge

        Amyloid                                                       Alzheimer
        precursor                                                      Disease
         protein

                  label                                                         label
                                     is involved in
    uniprot:P05067                                                 omim:104300

                  is a                                                         is a

          Protein                                                      Disease




    ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   39
Bio2RDF’s
                                               RDFized data
                                               fits together



syntactic integration   ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with   40
SGD as RDF-based Linked Open Data




   ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   41
Bio2RDF links and provisions 40 high value
                datasets




     ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   42
Bio2RDF now serving over
40 billion triples of linked biological data




   ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   43
SGD is provided by Bio2RDF and forms part of the
           growing linked open data cloud




Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   44
Semantic Integration

 • Requires a level of abstraction/generalization
   where the relationship between each resource
   is formalized
   – classes
   – relations
   – individuals
 • How do we ensure that our representation
   facilitates integration across datasets?
 • How can we get our formalization to
   interoperate with ontologies?
     ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   45
RDF-based Linked Data

• Provides the basis for simple data syndication and
  syntactic data integration
   o IRIs
   o Statements (aka triples) take the form of
   o <subject> <predicate> <object>
• Easy to implement
   o stand-alone datasets
   o logical layer over databases
• Limited reasoning
   o class and property hierarchies
   o domain/range restrictions
   o can’t automatically discover inconsistency
• Standardized Queries - SPARQL
• Scalable - to billions of triples
        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   46
What do you know of OWL?
The Web Ontology Language (OWL)
       Has Explicit Semantics




Can therefore be used to capture knowledge in
       a machine understandable way
      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   48
OWL - The Web Ontology Language

• Enhanced vocabulary (strong axioms) to express
  knowledge relating to classes, properties, individuals and
  data values
   o quantifiers (existential, universal, cardinality restriction)
   o negation
   o disjunction
   o property characteristics
   o complex classes in domain and range restrictions
   o property chains

• Advanced reasoning




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   49
Advanced Reasoning

• Consistency: determines whether the ontology contains
  contradictions.

• Satisfiability: determines whether classes can have
  instances.

• Subsumption: is class C1 implicitly a subclass of C2?

• Classification: repetitive application of subsumption to
  discover implicit subclass links between named classes

• Realization: find the most specific class that an individual
  belongs to.

        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   50
OWL Challenges and Solutions

Inconsistency:
 • needs to be resolved to ask any questions involving the
   ontology
 • Solution: explicitly accommodate multiple meanings,
   remove contradictory axioms

Unsatisfiability (of a class):
• may indicate a modelling error
• needs to be resolved to ask meaningful questions about
  the class
• Solution: explicitly accommodate multiple meanings,
  redefine class, remove contradicting class restrictions


         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   51
OWL Challenges and Solutions

Scalability:
• answers to OWL queries requires reasoning
• inference in OWL is highly complex (worst case: 2
  NEXPTIME)
• highly optimized reasoners are getting better and better,
  but can still be slow with large ontologies
• tractable OWL profiles (EL, QL, RL) enable more efficient
  and guaranteed polynomial-time inferences
• use ontology modularization approaches to increase
  performance




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   52
OWL can help you create rich, machine-
      understandable descriptions!
• transform our expert knowledge into axioms and
  expressions that can be automatically reasoned about
   o a transcription factor is
       a protein
       that binds to DNA
       and regulates the expression of a gene.
   o can we mine 'omic datasets to discover which
     proteins are transcription factors?
• create rich expressions from combinations of classes,
  relations and individuals
• assert statements of truth using axioms.


       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   53
Linked data and OWL: Motivation

• use OWL reasoning to identify mistakes in RDF data
   o incorrect content of assertions
   o incorrect use of relations
   o conflicting conceptualizations
   o incorrect same-as assertions
• verify, fix and exploit Linked Data through expressive OWL
  reasoning
• generate/infer new triples to write back into RDF and use
  for efficient retrieval

Proposal:
Represent SBML biomodels into OWL from the implicit
relations and explicit attributes in XML/RDF.

        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   54
Elements of OWL 2.0

• The “ontology” of OWL 2 consists of:
  •   Classes
  •   Object properties
  •   Data properties
  •   Individuals
  •   Expressions
  •   Axioms
  •   Plus RDF stuff (like datatypes)




          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   55
Axiomatization

• Axioms are statements that are assumed to be true in the
  domain
• Axioms formally interrelate terms from conceptualization
  step

every statement can be reduced to an expression based only on
primitive terms

Therefore: every axiom expressed only using primitive terms




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   56
Classes and class axioms

• a class is a set of individuals that share one or more characteristics
   o a protein
• classes can be organized in a hierarchy using subClassOf axioms
   o i.e. every member of C2 is a member of C1
   o subClassOf (protein molecule)
• special classes
   o owl:Thing is the superclass of all things
   o owl:Nothing is the subclass of all things, denotes an empty set
• classes can be made disjoint from one another
   o i.e. there is no member of C1 that is also a member of C2
   o disjointClasses (protein DNA )
• classes can be said to be equivalent
   o i.e. all members of C1 are members of C2 and all members of C2
     are members of C1
   o EquivalentClass (Peptide Polypeptide )



          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   57
Object Properties and axioms

• an object property OP is a relation between two individuals
   o 'has part' is an object property that denotes the mereological
     relation between two individuals
• OPs can be organized in a hierarchy
   o given OP1 and OP2 and OP2 is a subproperty of OP1 then if
     an individual x is connected by OP2 to an individual y, then x is
     also connected by OP1 to y.
   o subPropertyOf ('has proper part' 'has part')
   o owl:TopObjectProperty, owl:BottomObjectProperty
• We can restrict the domain and range to allowed values
• ObjectPropertyDomain ('is participant in', 'process')
• ObjectPropertyRange ('is participant in', 'physical entity')
• We can also assert objects to be disjoint or equivalent



          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   58
description of object properties

• Inverse
 o we say that 'has part' is an inverse for 'is part of'
 o we can also refer to this as inv('is part of')

• Symmetric
 o to cases where the inverse relation is the very same relation
 o e.g. the inverse for 'is related to' is 'is related to‘

• Transitive
 o a transitive relation if individual x is connected to an individual y
   that is connected by to an individual z, then x is also connected
   by to z
 o e.g. 'has part' is transitive




        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   59
description of object properties

• Reflexive
 o reflexive infers that the relation automatically refers back to the
   individual
 o e.g. 'has part' is reflexive because protein has itself as a part.

• Functional
 o restrict the range of the relation to a single individual, and
   therefore all individuals in the range must be the same.
 o e.g. 'has unique identifier‘

• Inverse Functional
 o restrict the domain of the relation to a single individual, therefore
   all individuals in the domain must be the same
 o e.g. 'is unique identifier of'




        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   60
Class Expressions

Class expressions are rich descriptions of classes through the
logical combination of ontological primitives (classes, object
properties, datatype properties, individuals)

Protein subClassOf
   molecule and ‘has proper part’ min 2 ‘amino acid residues’

Combinations specified using logical operators
  • conjunction (and), disjunction (or), negation (not)

Object or data property expressions provide a qualified cardinality
over the relation
    o minimum: rel min # Y
    o maximum: rel max # Y
    o exact:      rel exactly # Y (minimum + maximum)
    o some:       rel min 1 Y
       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   61
Class Expressions

 o   The quantifications can qualified by the object type
      o rel only Y – the only values allowed are of type Y

• To form complex class expressions like
 o 'molecule' and not 'dna'
 o 'has part' min 2 'amino acid'
 o 'is located in' only ('nucleus' or 'cytoplasm')

• and be expressed as axioms in the ontology
Protein subClassOf
  molecule and ‘has proper part’ min 2 ‘amino acid residues’

Transcription Factor equivalentTo
  ‘protein’
  and ‘has disposition’ some ‘to bind to DNA’
  and ‘has function’ some ‘to regulate gene expression’
        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   62
What do the following mean, and
 what biological thing might you
        annotate with it?

        C equivalentTo
 ‘has part’ exactly 2 polypeptide

        M subClassOf
     DNA and not molecule
OWL has multiple syntaxes
Functional-Style Syntax

ClassAssertion( :Person :Robert)

RDF Syntax
RDF/XML

<Person rdf:about="Robert"/>

RDF Turtle

:Robert rdf:type :Person .

Manchester Syntax

Individual: Robert
Types: Person

OWL/XML Syntax

<ClassAssertion> <Class IRI="Person"/> <NamedIndividual IRI="Robert"/>
             ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   64
OWL Reasoners

OWL DL Reasoners
• Pellet: Clark & Parsia, dual-licensed, Java.
• Fact++: Manchester University, open-source, C++ with a Java API.
• HermiT: Oxford University, open-source, Java.
• Racer Pro: Racer Systems, commercial, Lisp with a Java API.

OWL Profile/subset reasoners
• Jena: Hewlett-Packard, open-source, Java.
• OWLIM: Ontotext, dual-licensed, Java.
• CB:
• CEL:
• JCEL (Pellet)
• ELLY:



          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   65
Formalization of XML/RDF using OWL

• For every triple, we want to create an axiom that
  makes a commitment as to what the terms refer
  to and what their combination necessarily
  implies.
• We will also commit to expressing our knowledge
  in a consistent manner, and this will allow other
  information resources to be semantically
  integrated (the expressions are comparable and
  share the same semantics)


       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   66
Triples to axioms

Convert RDF triples into OWL axioms.

Triple in RDF:
<nucleus> <part-of> <cell>

• Nucleus and Cell are classes
• part-of is a relation between 2 classes
• intended meaning:
   every instance of Nucleus is partOf some instance of Cell

• formalize as OWL axiom:
    Nucleus SubClassOf:
     part-of some Cell

         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   67
Triples to axioms: Many possible formalizations –
knowledge of logics and domain expertise comes in
                    handy here!
Convert RDF triples into OWL axioms.

Triple in RDF:
<C1 R C2>
 • C1 and C2 are classes, R a relation between 2 classes
 • intended meaning:
     o C1 SubClassOf: C2
                                                             Challenge:
     o C1 SubClassOf: R some C2
                                              Formalizing data requires
     o C1 SubClassOf: R only C2
                                                     one to commit to a
     o C2 SubClassOf: R some C1
     o C1 SubClassOf: S some C2
                                                particular meaning – to
     o C1 DisjointFrom: C2
                                                   make an ontological
     o C1 and C2 SubClassOf: owl:Nothing                   commitment
     o R some C1 DisjointFrom: R some C2
     o C1 EquivalentClasses C2
     o ...
 • in general: P(C1, C2), where P is an OWL axiom (template)
          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   68
Triples to axioms

Triple in RDF:
<Cytosol> <isLocationOf> <HXK1>

• Cell and HXK1 are classes
• isLocationOf is an axiom pattern involving 2 classes
• intended meaning:
   every instance of HXK1 is located at some instance of Cytosol
• not intended:
   for every instance of Cytosol, there is an instance of HXK1 located
      in it.

HXK1 subClassOf
 hasLocation some Cytosol
 inv(isLocationOf) some Cytosol
Triples to axioms
Challenges

Formalizing RDF triples in OWL may introduce new OWL
object properties.

 • Which object properties should be included?
 • What axioms hold for included object properties?
 • Can domain and range restrictions be generalized across
   multiple domains, i.e., reused across multiple linked data
   sources to ensure consistency between them?

Integration of OWL ontologies requires a common
semantic platform


         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   70
Axiom Patterns for Triples

<nucleus> <part-of> <cell>

?X part-of ?Y

•translated to axiom pattern
?X subClassOf: part-of some ?Y

-> Nucleus subClassOf: part-of some Cell


       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   71
Implementation

• expand relations in RDF based on relational patterns
• relational patterns are OWL axioms with 2 variables (which
  are filled by subject and object, respectively)
• implementation based on OWL API
• adopt implementation of relational patterns in OBO
  language (http://code.google.com/p/obo2owl/)

Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Herre,
Heinrich, and Rebholz-Schuhmann, Dietrich (2010). Relational patterns in OWL
and their application to OBO. OWL: Experiences and Directions (OWLED).
paper: http://www.webont.org/owled/2010/papers/owled2010_submission_3.pdf
presentation: http://www.slideshare.net/micheldumontier/relational-patterns-in-
owl-and-their-application-to-obo

BMC Bioinformatics: http://www.biomedcentral.com/1471-2105/11/441


            ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   72
Another way?
               http://oppl2.sourceforge.net/
• OPPL is an abstract formalism that allows for
  manipulating ontologies written in OWL.
• Use OPPL to select triples and create the axioms




       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   73
Which types and relations should we
    use for our axiom patterns?
Top level ontologies contain generalized
  (domain independent) classes and
                relations




They can be used to constrain what can be said about these
entities (and hence will later be useful for checking the
consistency of data annotated using these terms).
        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   76
Basic classes in top-level ontologies

• Material entity
  • Example: Apple, Human, Cell, Planet
  • Has mass as an quality
  • Located in space and time
  • Independent of other entities
  • it exists in whole whenever it exists

• Quality
  • Example: mass, color, concentration
  • Dependent: always the quality of some entity
  • Quality of object: size, shape, length
  • Quality of process: duration, rate
  • Quality of quality: shade (of color), intensity

         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   77
Basic classes in top-level ontologies

• Function
  • e.g. to bind, to catalyze (a reaction), to kill bacteria
  • Dependent: always the function of some thing
  • Similar to a property of an object
  • Represents the potential to do something (an action) in
    some process
  • capabilities, dispositions and tendencies

• Process
  • Example: running a marathon, binding, cell division
  • Located in space and time
  • Independent of other entities
  • Temporally extended

         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   78
Top-level ontologies can make a
   commitment to these being disjoint




Material object, Process, Function and Quality are mutually
disjoint.




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   79
Basic Relations in Top Level Ontologies

• relations (object properties) in OWL hold between
  instances
• Mereological: parthood
– ‘has part’, ‘has proper part’, ‘has component part’
• Participatory
– ‘is participant in’, ‘is agent in’, ‘is target in'
• Spatial
– ‘is connected to’, ‘located in’, ‘contains’, ‘is adjacent to’
• Temporal
– ‘derives from’, ‘precedes’, ‘meets’, ‘overlaps’, etc
• Referential
– ‘describes’, ’denotes’, ‘represents’
           ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   80
Relations in top-level ontologies

• domain and range restrictions from top-level
  ontology can be applied for general relations,
  e.g.:
  • ‘has material part’ can be restricted with "Material
    object" as both domain and range
  • ‘participates in’ can be restricted with a domain of
    "Material object" and a range of "Process“
• re-use of relations (between instances) enables
  inferences across resources




        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   81
Relations impose additional constraints,
such that inconsistencies arise when
incorrectly used




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   82
Alignment with top-level ontology

Foundation of domain classes and relations in top-level
ontology:
 • every domain class becomes a subclass of a class in top-
   level ontology
 • every object property used in OWL axioms becomes a sub-
   property of an object property in the top-level ontology
 • assert additional axioms to restrict domain classes and
   delimit it from other domains (where appropriate)
    o e.g., if a particular resources uses (in RDF) the relation
      part-of exclusively between processes, the additional
      constraint can be added to this relation



         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   83
What’s the role of top level
       ontologies?
Top-level ontology

Application of a top-level ontology:

• can help to make the ontological commitment that is
  employed within an information system explicit,
• can guarantee basic agreement about fundamental,
  common types,
• Basic agreement about common relations,
• provides common domain and range restrictions across
  multiple domains, and therefore
• enables re-use of relations and types across data sources,
  domains, levels of granularities, information systems.




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   85
Formalization of SBML Models:
 • SBML models and model annotations are
 converted into OWL axioms by making SBML's
 ontological commitment explicit
 • Implementation as conversion patterns


  An     explicit    ontological     commitment
  establishes and implements a one-to-one
  correspondence between SBML expressions
  and a formal interpretation within an ontology.


      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   86
Bridging the gap: combine in vivo entities
and in silico entities in a common model
    (an ontology) defined with axioms




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   87
Formalization

Reaction:

A reaction represents some transformation, transport or binding
process, typically a chemical reaction, that can change the
amount of one or more species. (Hucka et al.)

vs

a Model component that is part-of a Model and represents
some Process




            ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   88
Formalizing SBML models using OWL

Model component(x): a model entity that is part of a model

'model component' equivalentClass
 'model entity' that 'is part of' some 'model'




          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   89
Assumption 1: Every model represents
          a material entity
OWL Axiom:
Model SubClassOf: represents some MaterialEntity

Conversion rule: a Model annotated with class C represents:

If C is a SubClassOf MaterialEntity then
M SubClassOf: represents some C

If C is a SubClassOf Function then
M SubClassOf: represents some (has-function some C)

If C is a SubClassOf Process then
M SubClassOf: represents some (has-function some (realized-by only
C))
BIOMODEL 82: Converting Model
Annotated with heterotrimeric G-protein complex cycle
(GO:0031684):



   • represents an object O1
   • O1 has a function F1
   • F1 is realized by processes of the type heterotrimeric G-
   protein complex cycle
   • M SubClassOf: represents some O1
   • O1 SubClassOf: (has-function some (realized-by
   only GO:0031684)
Assumption 2: Every compartment
 represents a material object
Compartment(x): a model component that represents a
material object which is part of the object represented by the
model to which the component belongs

  Compartment subClassOf 'model component'
   and represents some 'Material object'

Conversion rule:
  • represents an object O2
  • part of the object represented by the model
  • compartment’s species represent objects that are located in O2
  • C SubClassOf: represents some A2
  • A2 SubClassOf: located-in some A1
BIOMODEL 82: Converting Compartment “Cell”

Annotated with Cell (GO:0005623)

   • represents an object O2
   • O2 is a kind of Cell
   • O2 is a part of O1 (represented by BIOMODEL 82)


   • C SubClassOf: represents some O2
   • O2 SubClassOf: Cell and part-of some O1
Assumption 3: Every species
 represents a material object
Species(x): a model component that represents a material
object which is part of the entity represented by the
compartment of which the species is a part

 Species subClassOf 'model component'
  and represents some 'Material object'

Species represents an O3 which
  • can have functions
  • the functions can be realized by processes
  • can have qualities (charge, amount, …)
  • is located in O2
BIOMODEL 82: Converting Species “GTP”

Annotated with GTP (CHEBI:15996)

  • represents an object O3
  • O3 is a kind of GTP
  • O3 is located-in O2 (represented by “Cell”
  compartment)


  • S SubClassOf: represents some O3
  • O3 SubClassOf: GTP and located-in some O2
  • O3 SubClassOf: GTP and located-in some (Cell
  and part-of some (has-function some (realized-by
  only GO:0031684)))
Reactions as Functions, not Processes

Reactions represent Functions. Why not processes?

- Functions are capabilities while processes are
manifestations of these capabilities
- Processes have a duration, a time of occurrence,
participants, etc.
- Functions can be realized multiple times,
processes occur only once
- Processes may be represented by simulations
Assumption 4: Every reaction
represents a functional entity
Reaction(x): a model component that can include reactants,
products and modifiers and represents a functional entity

 Reaction subClassOf 'model component'
  and 'represents' some (
    ‘material entity’ and ‘has function’ some Function)

ListOfReactions(x): a List that has only Reactions as members

ListOfReactions
   EquivalentTo:
     List and 'has member' only 'reaction'
BIOMODEL 82: Converting Reaction “GTP-binding”

Annotated with GTP binding (GO:0005525)

  • represents an object O4
  • O4 has a function F4
  • F4 is a kind of GTP binding
  • F4 is realized by P4
  • P4 has-input O3 (GTP)


  •R SubClassOf: represents some (has-function some F4)
  •F4 SubClassOf: GTP binding and realized-by only P
  •P SubClassOf: has-input some O3
ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   99
How would you formalize a model
annotate with:

A) heart
B) to pump blood
C) heart palpitations
SBML2OWL: Implementation

1. Read the model
   • libSBML - http://sbml.org/Software/libSBML
2. Extract annotations from model & components
   • libSBML & Jena - http://jena.sourceforge.net
3. Formalize each annotation according to the formalization
   rules
   • OWLAPI - http://owlapi.sourceforge.net/
4. Integrate with external ontologies
   • OWLAPI
5. Reasoning




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   101
SBML2OWL: Implementation

Application to BioModels repository yields:
• OWL ontology with
   • more than 300,000 classes
   • More than 800,000 axioms
   • 90,000 complex model annotations

• includes all referenced ontologies
   o GO
   o ChEBI
   o Celltype
   o FMA
   o PATO
   o (KEGG, Reactome)


          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   102
SBML2OWL: Implementation

OWLAPI:
• Ontology consists of
  o a signature (classes, object properties, individuals)
  o a set of axioms




          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   103
SBML2OWL: Implementation

Reference implementation: SBML
Harvester http://code.google.com/p/sbmlharvester/




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   104
Verification, querying, integration

What can we do with the combined knowledge base?

1. Verification
2. Querying
3. Interoperability and knowledge integration




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   105
Operations on OWL ontologies

Consistency checking will identify contradictions in the stated
and inferred knowledge. Consistency checking also helps to
implement other reasoning tasks.
 • Satisfiability: determines whether classes can have
   instances.
 • Subsumption: is class C1 implicitly a subclass of C2?
   Check if C1 and not C2 is unsatisfiable, i.e., there is no
   instance of C1 that is not also an instance of C2
 • Classification: repetitive application of subsumption to
   discover implicit subclass links between named classes
 • Realization: find the most specific class that an individual
   belongs to. Does individual a classify into the class C?


          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   106
Practical reasoning with OWL
ontologies
• Ontology editors such as Protege interface with reasoners to
  perform consistency and class satisfiability,
  classification, realisation, and provide explanations.

• Some reasoners are setup to be used as the command line
  to execute requests including SPARQL querying.

• Programmatic use of reasoners via APIs. Maximal flexibility,
  e.g., one can request all subclasses of a given class,
  including implicit once, or all entailed statements with a
  specified subject and predicate



         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   107
Operations on OWL ontologies

Consistency checking will identify contradictions in the stated
and inferred knowledge. Consistency checking also helps to
implement other reasoning tasks
 • Satisfiability: determines whether classes can have
   instances.
 • Subsumption: is class C1 implicitly a subclass of C2?
   Check if C1 and not C2 is unsatisfiable, i.e., there is no
   instance of C1 that is not also an instance of C2
 • Classification: repetitive application of subsumption to
   discover implicit subclass links between named classes
 • Realization: find the most specific class that an individual
   belongs to. Does individual a classify into the class C?
   Check if a : ¬C is consistent with the underlying ontology.

          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   108
Classifying the ontology




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   109
Classifying the ontology




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   110
Classifying the ontology




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   111
Verification

• Use of OWL reasoning for classification
• Which classes are unsatisfiable?
• Unsatisfiable classes are equivalent to owl:Nothing




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   112
Model verification

After reasoning, we found 27 models to be inconsistent

reasons
 1. our representation - functions sometimes found in the place
    of physical entities (e.g. entities that secrete insulin). better
    to constrain with appropriate relations
 2. SBML abused - species used as a measure of time
 3. constraints in the ontologies themselves mean that the
    annotation is simply not possible




          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   113
Compartments/species annotated with
      functions or processes




    ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   114
Biological inconsistency: Biomodel 176




    ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   115
Biological inconsistency: Biomodel 176

[Term]
id: GO:0016887
name: ATPase activity
is a: GO:0017111
intersection of: GO:0003824 ! catalytic activity
intersection of: has input CHEBI:15377 ! water
intersection of: has input CHEBI:15422 ! ATP
intersection of: has output CHEBI:16761 ! ADP
intersection of: has output CHEBI:26020 !
    phosphates




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   116
Finding inconsistencies with
axiomatically enhanced ontologies

We add:
• GO: ATP + Water the only inputs (=2 quantification)
• ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all
  different (disjointness)




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   117
Consistency repair

 • Unsatisfiable classes result from contradictory class
   definitions
 • Conflict in asserted axioms, in imported ontologies or
   through combination of both
 • Conflicts can be hidden through domain/range
   restrictions, subclass relations, axioms for relations,
   etc.
 • Conflicting axioms may be challenging to identify!




       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   118
Consistency repair




ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   119
Protege 4: Explanation Workbench




     ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   120
Ontology repair and disambiguation

 • Ontological commitment may have been too strong
 • Complex relations (between classes) can be relaxed by
   explicitly introducing a disjunction

 • Example:
  o Assumption 1: models represent material objects
  o model is annotated with the process Glycolysis
  o process and material object are disjoint, therefore the
    KB will contain unsatisfiable classes




       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   121
Disambiguation pattern




disambiguation pattern: models annotated with X represents
material objects X, or
material objects with function X, or
material objects with function that is realized by X.

disambiguation patterns are applicable if multiple alternatives
are mutually disjoint

automated reasoning will then eliminate all but one option
Disambiguation: Model annotations

Assertion:
M SubClassOf: represents some C or represents
some (has-function some C) or represents some
(has-function some (realized-by only C))

C SubClassOf: MaterialEntity
Then:
• represents some C is satisfiable
• represents some (has-function some
  C) and represents some (has-function some
  (realized-by only C)) are unsatisfiable
       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   123
Disambiguation: Model annotations

Assertion:
M SubClassOf: represents some C or represents
some (has-function some C) or represents some
(has-function some (realized-by only C))

C SubClassOf: Function
Then:
• represents some (has-function some C) is
  satisfiable
• represents some C and represents some (has-
  function some (realized-by only C)) are
  unsatisfiable
       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   124
Disambiguation: Model annotations

Assertion:
M SubClassOf: represents some C or represents
some (has-function some C) or represents some
(has-function some (realized-by only C))

C SubClassOf: Process
Then:
• represents some (has-function some (realized-by
  only C)) is satisfiable
• represents some C and represents some (has-
  function some C) are unsatisfiable
       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   125
Aside from the disjunction pattern,
   what else could be used for
        consistency repair?
Once consistent, we can query the
ontology and infer new knowledge


  what would YOU ask of your
  formalized knowledge base?
Knowledge discovery and retrieval

 • All queries are of the form:
  o   Query class: Y
  o   List all subclasses (and descendant classes),
      equivalent classes, superclasses (and ancestor
      classes)
  o   Some OWL reasoners perform only classification and
      output the classified taxonomy




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   128
Knowledge discovery and retrieval

 • Query: list all models
 • Query type: subclasses
 • Query class: Model




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   129
Knowledge discovery and retrieval

 • Query: list all reactions that are part of
   BIOMD0000000169
 • Query type: subclasses
 • Query class:
 Reaction and part-of some BIOMD0000000169




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   130
Knowledge discovery and retrieval

 • Query: list all models that represent Glycolysis
 • Query type: subclasses
 • Query class:
 Model and represents some (has-function some
   (realized-by only Glycolysis))




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   131
Knowledge discovery and retrieval

 • Query: list all models that have a compartment
   that represents a part of a Cell in which a sugar
   is located
 • Query type: subclasses
 • Query class:
 Model and has-part some (Compartment and
   represents some (part-of some Cell and
   contains some Sugar))




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   132
Knowledge discovery and retrieval

 • Query: list all Model entities that represent
   catalytic activity involving sugar in the
   endocrine pancreas
 • Query type: subclasses
 • Query class:
 represents some (has-function some 'catalytic
   activity' and realized-by only (has-participant
   some (sugar and contained-in some (part-of
   some 'Endocrine pancreas'))))



       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   133
Knowledge discovery and retrieval

 • Query: list all Model entities that represent
   mutagenic central nervous system drugs in the
   gastrointestinal system
 • Query type: subclasses
 • Query class:
 represents some (has-part some ('has role' some
   'central nervous system drug' and 'has role'
   some mutagen and part-of some
   'Gastrointestinal system')



      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   134
Answering questions




     ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   135
Automated reasoning

• more than 800,000 axioms
• included ontologies contains several thousand axioms
   o GO has approx. 35,000 classes
   o ChEBI contains almost 100,000 classes
   o complex definitions of classes create links between
     large ontologies
• Reasoning in OWL 2 DL is highly complex (worst-case
  2NEXPTIME complete - 2^(2^n) - with n the number of
  operators used in the ontology)

• Consequence: OWL reasoning can rarely be employing in
  a large scale.
• Expressive OWL reasoners do not classify the formalized
  biomodels repository.
        ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   136
OWL Reasoners

OWL DL Reasoners
• Pellet: Clark & Parsia, dual-licensed, Java.
• Fact++: Manchester University, open-source, C++ with a Java
  API.
• HermiT: Oxford University, open-source, Java.
• Racer Pro: Racer Systems, commercial, Lisp with a Java API.

OWL Profile/subset reasoners
• Jena: Hewlett-Packard, open-source, Java.
• OWLIM: Ontotext, dual-licensed, Java.
• CB:
• CEL:
• JCEL (Pellet)
• ELLY:


         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   137
Implementation in information systems

• Classification of model ontology: 10-120min
• Answering complex queries: up to several
  hours
• Consequence: OWL reasoning can rarely be
  employing in a large scale

• Subsets of OWL allow tractable (polynomial-
  time) automated reasoning
• OWL EL suitable for ontologies with a large
  number of classes
• Problem: convert ontologies into tractable
  subset of OWL
     ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   138
OWL Profiles


• OWL 2 defines three different tractable profiles:
  • EL
   o polynomial time reasoning for schema and data
   o Useful for ontologies with large conceptual part
  • QL
   o fast (logspace) query answering using RDBMs via SQL
   o Useful for large datasets already stored in RDBs
  • RL
   o fast (polynomial) query answering using rule-extended
     DBs
   o Useful for large datasets stored as RDF triple



         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   139
OWL RL

Features:
 • identity of classes, instances, properties
 • subproperties, subclasses, domains, ranges
 • union and intersection of classes (some restrictions)
 • property characterizations (functional, symmetric, etc)
 • property chains
 • keys
 • some property restrictions (but not all inferences are
   possible)
Limitations:
 • not all datatypes are available
 • no datatype restrictions
 • no minimum or exact cardinality restrictions
 • maximum cardinality only with 0 and 1
 • some consequences cannot be drawn
         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   140
OWL EL
Features
 • existential quantification to a class expression or data range
 • existential quantification to an individual or a literal
 • self-restriction
 • enumerations involving a single individual or a single literal
 • intersection of classes and data range
 • class axioms: subClassOf, equivalence, disjointness
 • property axioms: domain, range, equivalence, transitive, reflexive, inclusion
   with or without property chains; functional data properties. keys.
 • assertions (sameAs, DifferentFrom, Class, Object Property, Data Property,
   Negative Object/Data Property
Not supported
 • universal quantification to a class expression or a data range
 • cardinality restrictions
 • disjunction (union)
 • class negation
 • enumerations involving more than one individual
 • object properties: disjoint, symmetric,
   asymmetric, irreflexive, inverse, functional and inverse-functional

            ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   141
Ontology modularization

Can we automatically extract a large (maximal) OWL (EL, QL,
RL) module from an ontology?


 1. D EquivalentTo: not A                (not EL)
 2. C EquivalentTo: not B                (not EL)
 3. B subClassOf: A                     (EL)
Inference:
 • D subClassOf: C                       (EL) (Inference from (1)-(3))

EL module of (1)-(3):
• {B subClassOf: A}, or
• {B subClassOf: A, D subClassOf: C}

          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   142
EL Vira modularization
                                              http://el-vira.googlecode.com
•   ontology modularization
•   identify EL, QL, RL axioms in deductive closure
•   retain signature of ontology
•   maximality is an open problem




          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   143
Outcomes

The SBML-derived ontologies can be

 i) checked for their consistency, thereby uncovering erroneous
curations

 ii) infer attributes and relations of the substances,
compartments and reactions beyond what was originally
described in the models

 iii) answer sophisticated questions across a model knowledge
base



         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   144
Questions?
Phenotypes

Phenotypes are observable characteristics of an
 organism.
Examples include:
  – Red hair
  – Heart rate of 120bpm
  – Absent arm
  – Malfunctional liver
Phenotypes include comparisons such as
 Increased heart rate

       ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   146
Phenotype and anatomy ontologies
anatomy ontologies: > 100,000 classes
    – FMA, MA, WA, ZFA, FA, GO-CC, ...
phenotype ontologies: > 20,000 classes
    – HPO, MP, WBPhenotype, FBcv, APO, ...
quality ontology: > 2,000 classes
    – PATO
process and function ontologies: > 25,000 classes
    – Gene Ontology, ...
alignments between anatomy ontologies
    – UBERON, various mappings
Phenotype: Example question



Find all regions in the human, mouse, fish, fly,
worm and yeast genome that are associated
with tetralogy of Fallot.
Tetralogy of Fallot
Tetralogy of Fallot


– Overriding aorta (HP:0002623)
– Ventricular septal defect (HP:0001629)
– Pulmonic stenosis (HP:0001642)
– Right ventricular hypertrophy (HP:0001667)
Phenotype descriptions


Overriding aorta (HP:0002623):
  – Q: overlap with (PATO:0001590)
  – E1: Aorta (FMA:3734)
  – E2: Membranous part of interventricular septum
    (FMA:7135)

HP:0002623 EquivalentTo:
 phene-of some (has-part some (FMA:3734 and
 has-quality some (PATO:0001590 and towards some FMA:7135)))
Human-mouse anatomy mappings
Overriding aorta (HP:0002623):
  – Q: overlap with (PATO:0001590)
  – E1: Aorta (FMA:3734)
      • FMA:3734 EquivalentTo: MA:0000062
  – E2: Membranous part of interventricular septum
    (FMA:7135)
      • FMA:7135 EquivalentTo: MA:0002939
Mouse phenotype

Overriding aorta (MP:0000273):
   – Q: overlap with (PATO:0001590)
   – E1: Aorta (MA:0000062)
   – E2: Membranous interventricular septum (MA:0002939)

MP:0000273 EquivalentTo:
phene-of some (has-part some (MA:0000062 and
has-quality some (PATO:0001590 and towards some
MA:0002939)))

Consequence: MP:000272 EquivalentTo: HP:0002623
Absence: absent appendix

Absent appendix:
  – Q: lacks all parts of type (PATO:0002000)
  – E1: Human body (FMA:20394)
  – E2: Appendix (FMA:14542)

AbsentAppendix EquivalentTo: LacksParts and towards some Appendix and
inheres-in some HumanBody

AbsentAppendix EquivalentTo: LacksParts and towards some {Appendix} and
inheres-in some HumanBody

AbsentAppendix EquivalentTo: phene-of some (HumanBody and not has-part
some Appendix)
Absence and inconsistency




AbsentAppendix SubClassOf: phene-of some (HumanBody and not has-part
some Appendix)

HumanBody SubClassOf: has-part some Appendix

HumanBody(John). AbsentAppendix(x). has-phene(John,x).
Inconsistency removal


– Removal of conflicting axioms (has-part/part-of in anatomy)
– Contextualize anatomy:
   • Normal and HumanBody SubClassOf: has-part some
     (Normal and Appendix)
– Use of non-monotonic reasoning
Ontology of phenotypes
Different formal expressions for phenotypes based on
   – qualities,
   – anatomical parts,
   – functions,
   – processes
Tetralogy of Fallot
Mouse model
Mouse model
PhenomeBLAST

   – apply definition patterns to yeast, fly, worm, fish, mouse
     and human phenotypes and integrate in single ontology
   – phenotype alignment through OWL reasoning
   – more than 300,000 classes and 1,000,000 axioms
   – combination of HermiT (for EL Vira modularization), CB
     and CEL reasoner
   – classification time: 7 minutes

http://phenomeblast.googlecode.org
Phenotype alignments
Comparison of phenotypes


direct comparison of phenotypes:
   – disease phenotypes, e.g., tetralogy of Fallot
   – phenotypes associated with genetic mutations
     (genotypes in mouse, fish, etc.)
Comparison of phenotypes


When the phenotype annotation of a genotype becomes a
subclass of a disease phenotype, then we can infer a gene-
disease association if
   – disease phenotypes sufficient for having the disease
   – mutation phenotypes necessary for having a specific
     genotype

Inference over ontologies can establish a formal proof for a
gene-disease association.
Knowledge discovery

Similarity-based comparison allows for incomplete and noisy
information.

   – pairwise comparison of phenotypes
   – similarity: weighted Jaccard index
   – result: similarity matrix between phenotypes
   – (quantitative) evaluation based on predicting orthology,
     pathway, disease
   – identify novel gene-disease associations
Evaluation
http://PhenomeBrowser.net
What does the future hold?
                                                      Better formalized ontologies

                                           Dynamic generation of knowledge
                                              through semantic web services

                                                                                              …




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies       169
Summary - RDF and OWL

RDF provides
• light-weight semantics
• fast queries
• highly scalable implementations
• large volumes of data (e.g., DBPedia, other Linked Data
  repositories)

OWL provides
• Constructs to formalize the intended semantics
• An OWLAPI to develop, manage, and serialize OWL
  ontologies
• Efficient reasoners of get inferences, compute modules
  and get explanations.
• syntactic subset for better performance, albeit some
  inferences may be lost
         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   170
Summary - OWL & Formal languages
• Formal logic-based languages can be used to formalize the
  meaning of terms used in discourse. While normally
  restricted in terms of what can be expressed, the statements
  formed can be automatically reasoned about.

• OWL is based on description logics and formalizes the
  meaning of terms with axioms. Axioms can be used to
  characterize and distinguish classes, relations and
  individuals. Rich expressions can be crafted from logical
  combinations of language primitives including conjunction,
  disjunction, negation and object/dataproperty restrictions.

• OWL reasoners provide a number of services including
  computing subsumption, satisfiability, entailment, realization
  and query answering.
         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   171
Summary - Exploitation of ontologies

• verification: automated reasoning can reveal contradictory
  definitions of classes (unsatisfiable classes), instances that
  violate constraints in the ontology (often leading to
  inconsistent ontologies) and reveal hidden inferences (that
  may be considered invalid through manual verification

• querying: ontologies define an explicit, formal language
  based on which queries to a knowledge base can be
  performed; queries can be made for instances and for
  classes satisfying complex conditions

• repair: through explicit definitions using disjunction,
  constraints can be relaxed and contradictions reduced

          ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   172
Summary - Ontology
             Ontology is not philosophy!
• an ontology is a specification of a conceptualization of a
  domain
• a conceputalization is a system of categories accounting for
  a particular view on the world
• ontologies are used to make some aspects of the intended
  meaning of terms in a vocabulary explicit
• ontologies (in computer science) may utilize philosophical
  theories
• formalized ontologies can be used by humans and
  automated systems as a basis for communication and data
  exchange
• Ontologies are useful tools for translational research

         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   173
Summary - Implementation in information systems
• The OWLAPI is a reference implementation of the OWL
  specification and facilitates the development, management
  and serialization of expressive OWL ontologies. The
  OWLAPI also facilitates modularization and getting
  explanations.

• OWL provides a syntactic subset of the language for
  efficient reasoning. These so-called OWL profiles (EL, RL,
  QL) have well understood computational properties and can
  lead to better performance, but with some inferences lost.

• Formal ontology makes it possible to not only retrieve data
  (similar to db), but also query the concepts themselves


         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   174
Summary - evaluation

• ontologies are tools to support science

• Ontologies can provide insight into real biological/scientific
  problems

• quantifiable evaluation can be performed, e.g., based on
  precision/recall or ROC analysis

• application of ontologies may go beyond reasoning alone
  and use statistical analyses (enrichment), semantic
  similarity, graph algorithms, clustering, etc.




         ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   175
Conclusions

• Ontologies + Semantic Web enables
   • Integration
   • Verification
   • Analysis
   • Discovery
   • Translational research




      ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   176
Acknowledgements

       George Gkoutos
        Heinrich Herre
         Janet Kelso
Dietrich Rebholz-Schuhmann
        Anika Oellrich
      Michael Ashburner
          Dan Cook
        John Gennari
        Paul Schofield
michel_dumontier@carleton.ca
                                    leechuck@leechuck.de



ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies   178

Weitere ähnliche Inhalte

Ähnlich wie ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

Real World Applications of OWL
Real World Applications of OWLReal World Applications of OWL
Real World Applications of OWLMichel Dumontier
 
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLSSBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLScsandit
 
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS cscpconf
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
 
Big data in metabolism
Big data in metabolismBig data in metabolism
Big data in metabolismAlichy Sowmya
 
Standards and software: practical aids for reproducibility of computational r...
Standards and software: practical aids for reproducibility of computational r...Standards and software: practical aids for reproducibility of computational r...
Standards and software: practical aids for reproducibility of computational r...Mike Hucka
 
Online Chemical Database with Modelling Environment
Online Chemical Database with Modelling EnvironmentOnline Chemical Database with Modelling Environment
Online Chemical Database with Modelling EnvironmentSSA KPI
 
Formal representation of models in systems biology
Formal representation of models in systems biologyFormal representation of models in systems biology
Formal representation of models in systems biologyMichel Dumontier
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
IRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomicsIRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomicsPanagiotis Arapitsas
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLDr. Haxel Consult
 
BMI 201 - Investigating Term Reuse and Overlap in Biomedical Ontologies
BMI 201 - Investigating Term Reuse and Overlap in Biomedical OntologiesBMI 201 - Investigating Term Reuse and Overlap in Biomedical Ontologies
BMI 201 - Investigating Term Reuse and Overlap in Biomedical OntologiesMaulik Kamdar
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Jarle Pahr
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Anubis Hosein
 

Ähnlich wie ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification (20)

Real World Applications of OWL
Real World Applications of OWLReal World Applications of OWL
Real World Applications of OWL
 
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLSSBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
 
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
PhDc exam presentation
PhDc exam presentationPhDc exam presentation
PhDc exam presentation
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
 
Big data in metabolism
Big data in metabolismBig data in metabolism
Big data in metabolism
 
Standards and software: practical aids for reproducibility of computational r...
Standards and software: practical aids for reproducibility of computational r...Standards and software: practical aids for reproducibility of computational r...
Standards and software: practical aids for reproducibility of computational r...
 
Online Chemical Database with Modelling Environment
Online Chemical Database with Modelling EnvironmentOnline Chemical Database with Modelling Environment
Online Chemical Database with Modelling Environment
 
Formal representation of models in systems biology
Formal representation of models in systems biologyFormal representation of models in systems biology
Formal representation of models in systems biology
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
IRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomicsIRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomics
 
Bioprocessing
BioprocessingBioprocessing
Bioprocessing
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBL
 
BMI 201 - Investigating Term Reuse and Overlap in Biomedical Ontologies
BMI 201 - Investigating Term Reuse and Overlap in Biomedical OntologiesBMI 201 - Investigating Term Reuse and Overlap in Biomedical Ontologies
BMI 201 - Investigating Term Reuse and Overlap in Biomedical Ontologies
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
C04821220
C04821220C04821220
C04821220
 
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
 

Mehr von Michel Dumontier

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsMichel Dumontier
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsMichel Dumontier
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...Michel Dumontier
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemMichel Dumontier
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Michel Dumontier
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Michel Dumontier
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...Michel Dumontier
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerMichel Dumontier
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureMichel Dumontier
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesMichel Dumontier
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRMichel Dumontier
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsMichel Dumontier
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationMichel Dumontier
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessMichel Dumontier
 

Mehr von Michel Dumontier (20)

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
 
Evaluating FAIRness
Evaluating FAIRnessEvaluating FAIRness
Evaluating FAIRness
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health System
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University Dinner
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star Lecture
 
Are we FAIR yet?
Are we FAIR yet?Are we FAIR yet?
Are we FAIR yet?
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR Metrics
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 

Kürzlich hochgeladen

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Kürzlich hochgeladen (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

  • 1. Biomedical Ontologies for data integration and verification Michel Dumontier and Robert Hoehndorf Carleton University, University of Cambridge ISMB tutorial @ Vienna. July 16,2011 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 1
  • 2. Outline 1. General background (10min) o an introduction to the use-case: systems biology, SBML and BioModels 2. Ontological analysis (45 min) o how to express domain content as formal knowledge using the Web Ontology Language (OWL) 3. Application of formal ontology to consistency and data verification (30min) o how to use the OWL formalization to verify the accuracy of annotations, data and constraints in a domain 4. Break (30min) 5. Mapping, repair and disambiguation using ontologies (30min) o how to relax and disambiguate constraints on ontologies to obtain consistent representation of domain content 6. Knowledge discovery, retrieval and querying (15min) o how to answer questions that require the inference of knowledge through automated reasoning 7. Efficient implementation in software systems (15min) o how to convert ontologies in efficient formal representations amenable to high-throughput analyses 8. Applications in Bioinformatics (25min) 1. how the formalized ontologies can be used to perform bioinformatics analyses – Discussion and questions (15min) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 2
  • 3. Systems Biology We create and simulate biological models to : • gain insight into the structure and function of biochemical networks • reveal metabolic and signalling capabilities so as to predict phenotypes • undertake metabolic engineering to maximize some desired product To do this, we need • to integrate & manage our data & knowledge in a coherent, scalable and machine understandable manner • efficient software to execute computationally demanding simulations ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 3
  • 4. Bio-ontologies • Provide rich human and machine understandable descriptions of the terms they purport to describe • Have value for semantic annotation of data, which allows integration across domains (granularity, species, experimental methods) • Facilitate granular and cross-domain queries • Can be used to obtain explanations for inferences drawn • Can be efficiently processed by algorithms and software ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 4
  • 5. Biomodels are semantically annotated SBML models • EBI managed resource • 600+ models available as SBML • 300+ models are curated with GO process, function and component terms, and has links to protein databases. • Possible to browse by GO terms: http://www.ebi.ac.uk/biomodels-main/ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 5
  • 6. Objective: Computational Knowledge Discovery • Terminological resources increasingly being used to annotate SBML-based biomolecular models o Makes it easier to explore or find models • By converting models into formal representations of knowledge we get to: o validate the accuracy of the annotations o infer knowledge explicit in terminological resources o discover biological implications inherent in the models. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 6
  • 7. SBML XML-based representation of biochemical models, their components (compartments, species, reactions, events), descriptors (rules, constraints, functions, units) Consider the following enzymatic reaction: ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 7
  • 8. SBML captures reaction kinetics using an XML-based format <?xml version="1.0" encoding="UTF-8"?> <sbml level="2" version="3" xmlns="http://www.sbml.org/sbml/level2/version3"> <model name="EnzymaticReaction"> <listOfUnitDefinitions> <unitDefinition id="per_second"> <listOfUnits> <unit kind="second" exponent="-1"/> </listOfUnits> </unitDefinition> <unitDefinition id="litre_per_mole_per_second"> <listOfUnits> <unit kind="mole" exponent="-1"/> <unit kind="litre" exponent="1"/> <unit kind="second" exponent="-1"/> </listOfUnits> </unitDefinition> </listOfUnitDefinitions> <listOfCompartments> <compartment id="cytosol" size="1e-14"/> </listOfCompartments> <listOfSpecies> <species compartment="cytosol" id="ES" initialAmount="0" name="ES"/> <species compartment="cytosol" id="P" initialAmount="0" name="P"/> <species compartment="cytosol" id="S" initialAmount="1e-20" name="S"/> <species compartment="cytosol" id="E" initialAmount="5e-21" name="E"/> </listOfSpecies> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 8
  • 9. <listOfReactions> <reaction id="veq"> <listOfReactants> <speciesReference species="E"/> <speciesReference species="S"/> </listOfReactants> <listOfProducts> <speciesReference species="ES"/> </listOfProducts> <kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <times/> <ci>cytosol</ci> <apply> <minus/> <apply> <times/> <ci>kon</ci> <ci>E</ci> <ci>S</ci> </apply> <apply> <times/> <ci>koff</ci> <ci>ES</ci> </apply> </apply> </apply> </math> <listOfParameters> <parameter id="kon" value="1000000" units="litre_per_mole_per_second"/> <parameter id="koff" value="0.2" units="per_second"/> </listOfParameters> </kineticLaw> </reaction> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 9
  • 10. <reaction id="vcat" reversible="false"> <listOfReactants> <speciesReference species="ES"/> </listOfReactants> <listOfProducts> <speciesReference species="E"/> <speciesReference species="P"/> </listOfProducts> <kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <times/> <ci>cytosol</ci> <ci>kcat</ci> <ci>ES</ci> </apply> </math> <listOfParameters> <parameter id="kcat" value="0.1" units="per_second"/> </listOfParameters> </kineticLaw> </reaction> </listOfReactions> </model> </sbml> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 10
  • 11. SBML models may feature several components ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 11
  • 12. SBML specifies the number and kind of attributes models and components can have ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 12
  • 13. It’s up to the modeler to use those attributes in a meaningful way what models have you produced?
  • 14. Biomodels are semantically annotated SBML models • EBI managed resource • 600+ models available as SBML • 300+ models are curated with GO process, function and component terms, and has links to protein databases. • Possible to browse by GO terms: http://www.ebi.ac.uk/biomodels-main/ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 14
  • 15. Energy (ATP) is produced from glycolysis (break down of glucose) in a series of enzyme-catalyzed biochemical reactions. Fermentation regenerates NAD+ so it can be re- used to metabolize more glucose Analysis and optimization of metabolic pathways important for biotechnology ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 15
  • 16. Gene Ontology • over 30,000 terms • covers o biological processes o molecular functions o cellular components • terms organized around "is a" hierarchy • terms further described with 'has part'/'part of'; 'regulates' and '+ regulates', '- regulates' ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 16
  • 17. Chemical Entities of Biological Interest (ChEBI) recently refactored to be in line with formal (reasoning capable) ontology scope includes chemical entities (atoms, substances, groups, molecules), roles and subatomic particles large numbers of curated molecules ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 17
  • 18. SBML annotations are captured using the Resource Description Framework (RDF) <species metaid="_525530" id="GLCi" Implicit subject compartment="cyto" and xml attributes initialConcentration="0.097652231064563"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" The annotation element xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" stores the RDF xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/"> subject <rdf:Description rdf:about="#_525530"> <bqbiol:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/> predicate <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/> </rdf:Bag> </bqbiol:is> </rdf:Description> </rdf:RDF> </annotation> object </species> The intent is to express that the species represents a substance composed of glucose molecules We also know from the SBML model that this substance is located in the cytosol and with a (initial) concentration of 0.09765M ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 18
  • 19. annotated models contain references to entities described elsewhere Pubmed - papers <model> <annotation> ChEBI - chemicals `<bqmodel:isDescribedBy> <rdf:Bag> UniProt - proteins <rdf:li rdf:resource="urn:miriam:pubmed:17667951"/> KEGG - chemicals, </rdf:Bag> </bqmodel:isDescribedBy> reactions <bqbiol:hasPart> E.C. - reactions <rdf:Bag> <rdf:li rdf:resource="urn:miriam:kegg.pathway:sce00010"/> Gene Ontology - <rdf:li rdf:resource="urn:miriam:obo.go:GO%3A0019642"/> </rdf:Bag> functions, reactions, </bqbiol:hasPart> compartments <bqmodel:is> <rdf:Bag> Taxonomy - organism <rdf:li rdf:resource="urn:miriam:taxonomy:4932"/> </rdf:Bag> </bqmodel:is> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 19
  • 20. It looks like another XML syntax, but it has RDF semantics! What is the meaning of SBML’s RDF annotation? <rdf:Description about=“#_551383”> <bqmodel:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:taxonomy:4932"/> </rdf:Bag> </bqmodel:is> </annotation> • The intent is to indicate that the model is a model of a yeast • RDF semantics: #_551383 is a member of a set that is related by bqmodel:is to a collection (rdf:Bag) that has a single member – yeast (4932) • RDF semantics does not match the intent! ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 20
  • 21. Can we formalize and automatically verify the intended meaning of the RDF annotation? BioModels.net biology qualifiers is, identity The biological entity represented by the model element has identity with the subject of the referenced resource (modeling object B). This relation might be used to link a reaction to its exact counterpart in a database, for instance. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 21
  • 22. Biomodels: Qualifiers Qualifiers for the biological object represented by the model component. encodes/isEncodedBy hasPart/isPartOf hasProperty/isPropertyOf hasVersion/ isVersionOf is isDescribedBy isHomologTo occursIn http://www.ebi.ac.uk/miriam/main/qualifiers/
  • 23. In this tutorial You will learn how to create accurate knowledge representations of annotated SBML models. Features • ontological commitment: terms in a vocabulary correspond to formally defined classes and relations and expressions formulated using the Web Ontology Language (OWL) have an unambiguous interpretation • upper level ontology of types and relations to distinguish and constrain model entities to the spatio-temporal entities they represent • Reasoning to uncover inconsistencies, and how to repair them. • Advanced applications of OWL ontologies for answering questions and providing biological insight ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 23
  • 24. What is a model? How does it differ from the thing it is a model of?
  • 25. Conceptualization (SBML) • 2 kinds of entities: o in silico: model components o in vivo: the entities represented by a model ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 25
  • 26. Conceptualization ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 26
  • 27. SBML Conceptualization • Instances of SBML model entities are syntactic entities (in XML) • SBML models represent biological phenomena and structures (e.g., Cell cycle processes, Yeast cells, ...) • Here we focus on Model, Compartment, Species, Reaction ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 27
  • 28. Formalization • Formalization is the process by which we map a conceptualization into a logical representation, which has a particular interpretation. • We first express the basic nature of what the terms refer to by defining them in using a formal language. Next, we can logically combine the terms to form expressions, which have an unambiguous interpretation, and hence can be automatically reasoned about. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 28
  • 29. Have you heard of the Semantic Web?
  • 30. The Semantic Web It is about standards for publishing, sharing and querying knowledge drawn from diverse sources It enables the answering of sophisticated questions ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 30
  • 31. The Semantic Web effort aims to develop an interoperable set of standards for knowledge representation and reasoning ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 31
  • 32. URI/IRI • Uniform Resource Identifiers (URI) and Internationalized Resource Identifiers (IRI) are identifiers for resources, given a particular protocol • We’re familiar with Uniform Resource Locators, which species the use of the HTTP protocol to obtain a document with that identifier. – http://dumontierlab.com • International Resource Identifiers (IRIs) include an expanded set of international characters • URI/IRIs are the basis for naming resources on the Semantic Web. – As names, they can also be used to identify non-information resources, like people and places ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 32
  • 33. Entity naming • Uniform Resource Identifiers (URI) are identifiers for resources given a particular protocol. Internationalized Resource Identifiers (IRI) include an expanded set of international characters • URI/IRIs can be used to name entities, both for digital media and non-informational entities like people and places. • Uniform Resource Name (URN) – only a name o MIRIAM - Minimal Information Required In the Annotation of Models  data source and identifier combined in a single IRI - urn:miriam:source:identifier  e.g. urn:miriam:uniprot:P62158  ~ 40 sources defined at EBI registry... • Uniform Resource Locator (URL) – a resolvable name o Bio2RDF - Makes life sciences data available on the Semantic Web o http://bio2rdf.org/uniprot:P62158 o content-type negotiation and explicit URLs resolve to an HTML/RDF/etc description of it. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 33
  • 34. Semantic Technologies: RDF vs OWL RDF: simple triples, graph-based queries, supports very large amount of data OWL: significantly more expressive language, strong axioms, inference capabilities, consistency verification, but can be rather slow ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 34
  • 35. Resource Description Framework (RDF) Allows one to talk about anything Uniform Resource Identifier (URI) can be used as entity names Bio2RDF specifies its naming convention http://bio2rdf.org/uniprot:P05067 uniprot:P05067 is a name for Amyloid precursor protein http://bio2rdf.org/omim:104300 omim:104300 is a name for Alzheimer disease ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 35
  • 36. Resource Description Framework (RDF) Allows one to express statements “Amyloid precursor protein” A RDF statement consists of: rdfs:label – Subject: resource identified by a URI uniprot:P05067 – Predicate: resource identified by a URI rdf:type – Object: resource or literal uniprot:Protein ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 36
  • 37. RDF has multiple serializations RDF/XML <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:u="http://bio2rdf.org/uniprot:" <rdf:Description rdf:about=“&u;Q16665"> <rdf:type rdf:resource=“&u;Protein"/> </rdf:Description> </rdf:RDF> RDF/N3 PREFIX u: <http://bio2rdf.org/uniprot:> <u:Q16665> a <u:Protein> . ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 37
  • 38. Multi-Source Data Integration Syntactic data integration depends on consistent naming is a uniprot:P05067 uniprot:Protein uniprot:Protein UniProt has name + located in located in uniprot:P05067 go:Membrane uniprot:P05067 go:Membrane Gene Ontology + interacts with uniprot:P05067 interacts with uniprot:P05067 uniprot:P05067 iRefIndex Unified view ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 38
  • 39. Building statements creates knowledge Amyloid Alzheimer precursor Disease protein label label is involved in uniprot:P05067 omim:104300 is a is a Protein Disease ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 39
  • 40. Bio2RDF’s RDFized data fits together syntactic integration ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with 40
  • 41. SGD as RDF-based Linked Open Data ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 41
  • 42. Bio2RDF links and provisions 40 high value datasets ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 42
  • 43. Bio2RDF now serving over 40 billion triples of linked biological data ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 43
  • 44. SGD is provided by Bio2RDF and forms part of the growing linked open data cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 44
  • 45. Semantic Integration • Requires a level of abstraction/generalization where the relationship between each resource is formalized – classes – relations – individuals • How do we ensure that our representation facilitates integration across datasets? • How can we get our formalization to interoperate with ontologies? ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 45
  • 46. RDF-based Linked Data • Provides the basis for simple data syndication and syntactic data integration o IRIs o Statements (aka triples) take the form of o <subject> <predicate> <object> • Easy to implement o stand-alone datasets o logical layer over databases • Limited reasoning o class and property hierarchies o domain/range restrictions o can’t automatically discover inconsistency • Standardized Queries - SPARQL • Scalable - to billions of triples ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 46
  • 47. What do you know of OWL?
  • 48. The Web Ontology Language (OWL) Has Explicit Semantics Can therefore be used to capture knowledge in a machine understandable way ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 48
  • 49. OWL - The Web Ontology Language • Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data values o quantifiers (existential, universal, cardinality restriction) o negation o disjunction o property characteristics o complex classes in domain and range restrictions o property chains • Advanced reasoning ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 49
  • 50. Advanced Reasoning • Consistency: determines whether the ontology contains contradictions. • Satisfiability: determines whether classes can have instances. • Subsumption: is class C1 implicitly a subclass of C2? • Classification: repetitive application of subsumption to discover implicit subclass links between named classes • Realization: find the most specific class that an individual belongs to. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 50
  • 51. OWL Challenges and Solutions Inconsistency: • needs to be resolved to ask any questions involving the ontology • Solution: explicitly accommodate multiple meanings, remove contradictory axioms Unsatisfiability (of a class): • may indicate a modelling error • needs to be resolved to ask meaningful questions about the class • Solution: explicitly accommodate multiple meanings, redefine class, remove contradicting class restrictions ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 51
  • 52. OWL Challenges and Solutions Scalability: • answers to OWL queries requires reasoning • inference in OWL is highly complex (worst case: 2 NEXPTIME) • highly optimized reasoners are getting better and better, but can still be slow with large ontologies • tractable OWL profiles (EL, QL, RL) enable more efficient and guaranteed polynomial-time inferences • use ontology modularization approaches to increase performance ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 52
  • 53. OWL can help you create rich, machine- understandable descriptions! • transform our expert knowledge into axioms and expressions that can be automatically reasoned about o a transcription factor is  a protein  that binds to DNA  and regulates the expression of a gene. o can we mine 'omic datasets to discover which proteins are transcription factors? • create rich expressions from combinations of classes, relations and individuals • assert statements of truth using axioms. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 53
  • 54. Linked data and OWL: Motivation • use OWL reasoning to identify mistakes in RDF data o incorrect content of assertions o incorrect use of relations o conflicting conceptualizations o incorrect same-as assertions • verify, fix and exploit Linked Data through expressive OWL reasoning • generate/infer new triples to write back into RDF and use for efficient retrieval Proposal: Represent SBML biomodels into OWL from the implicit relations and explicit attributes in XML/RDF. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 54
  • 55. Elements of OWL 2.0 • The “ontology” of OWL 2 consists of: • Classes • Object properties • Data properties • Individuals • Expressions • Axioms • Plus RDF stuff (like datatypes) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 55
  • 56. Axiomatization • Axioms are statements that are assumed to be true in the domain • Axioms formally interrelate terms from conceptualization step every statement can be reduced to an expression based only on primitive terms Therefore: every axiom expressed only using primitive terms ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 56
  • 57. Classes and class axioms • a class is a set of individuals that share one or more characteristics o a protein • classes can be organized in a hierarchy using subClassOf axioms o i.e. every member of C2 is a member of C1 o subClassOf (protein molecule) • special classes o owl:Thing is the superclass of all things o owl:Nothing is the subclass of all things, denotes an empty set • classes can be made disjoint from one another o i.e. there is no member of C1 that is also a member of C2 o disjointClasses (protein DNA ) • classes can be said to be equivalent o i.e. all members of C1 are members of C2 and all members of C2 are members of C1 o EquivalentClass (Peptide Polypeptide ) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 57
  • 58. Object Properties and axioms • an object property OP is a relation between two individuals o 'has part' is an object property that denotes the mereological relation between two individuals • OPs can be organized in a hierarchy o given OP1 and OP2 and OP2 is a subproperty of OP1 then if an individual x is connected by OP2 to an individual y, then x is also connected by OP1 to y. o subPropertyOf ('has proper part' 'has part') o owl:TopObjectProperty, owl:BottomObjectProperty • We can restrict the domain and range to allowed values • ObjectPropertyDomain ('is participant in', 'process') • ObjectPropertyRange ('is participant in', 'physical entity') • We can also assert objects to be disjoint or equivalent ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 58
  • 59. description of object properties • Inverse o we say that 'has part' is an inverse for 'is part of' o we can also refer to this as inv('is part of') • Symmetric o to cases where the inverse relation is the very same relation o e.g. the inverse for 'is related to' is 'is related to‘ • Transitive o a transitive relation if individual x is connected to an individual y that is connected by to an individual z, then x is also connected by to z o e.g. 'has part' is transitive ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 59
  • 60. description of object properties • Reflexive o reflexive infers that the relation automatically refers back to the individual o e.g. 'has part' is reflexive because protein has itself as a part. • Functional o restrict the range of the relation to a single individual, and therefore all individuals in the range must be the same. o e.g. 'has unique identifier‘ • Inverse Functional o restrict the domain of the relation to a single individual, therefore all individuals in the domain must be the same o e.g. 'is unique identifier of' ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 60
  • 61. Class Expressions Class expressions are rich descriptions of classes through the logical combination of ontological primitives (classes, object properties, datatype properties, individuals) Protein subClassOf molecule and ‘has proper part’ min 2 ‘amino acid residues’ Combinations specified using logical operators • conjunction (and), disjunction (or), negation (not) Object or data property expressions provide a qualified cardinality over the relation o minimum: rel min # Y o maximum: rel max # Y o exact: rel exactly # Y (minimum + maximum) o some: rel min 1 Y ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 61
  • 62. Class Expressions o The quantifications can qualified by the object type o rel only Y – the only values allowed are of type Y • To form complex class expressions like o 'molecule' and not 'dna' o 'has part' min 2 'amino acid' o 'is located in' only ('nucleus' or 'cytoplasm') • and be expressed as axioms in the ontology Protein subClassOf molecule and ‘has proper part’ min 2 ‘amino acid residues’ Transcription Factor equivalentTo ‘protein’ and ‘has disposition’ some ‘to bind to DNA’ and ‘has function’ some ‘to regulate gene expression’ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 62
  • 63. What do the following mean, and what biological thing might you annotate with it? C equivalentTo ‘has part’ exactly 2 polypeptide M subClassOf DNA and not molecule
  • 64. OWL has multiple syntaxes Functional-Style Syntax ClassAssertion( :Person :Robert) RDF Syntax RDF/XML <Person rdf:about="Robert"/> RDF Turtle :Robert rdf:type :Person . Manchester Syntax Individual: Robert Types: Person OWL/XML Syntax <ClassAssertion> <Class IRI="Person"/> <NamedIndividual IRI="Robert"/> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 64
  • 65. OWL Reasoners OWL DL Reasoners • Pellet: Clark & Parsia, dual-licensed, Java. • Fact++: Manchester University, open-source, C++ with a Java API. • HermiT: Oxford University, open-source, Java. • Racer Pro: Racer Systems, commercial, Lisp with a Java API. OWL Profile/subset reasoners • Jena: Hewlett-Packard, open-source, Java. • OWLIM: Ontotext, dual-licensed, Java. • CB: • CEL: • JCEL (Pellet) • ELLY: ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 65
  • 66. Formalization of XML/RDF using OWL • For every triple, we want to create an axiom that makes a commitment as to what the terms refer to and what their combination necessarily implies. • We will also commit to expressing our knowledge in a consistent manner, and this will allow other information resources to be semantically integrated (the expressions are comparable and share the same semantics) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 66
  • 67. Triples to axioms Convert RDF triples into OWL axioms. Triple in RDF: <nucleus> <part-of> <cell> • Nucleus and Cell are classes • part-of is a relation between 2 classes • intended meaning: every instance of Nucleus is partOf some instance of Cell • formalize as OWL axiom: Nucleus SubClassOf: part-of some Cell ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 67
  • 68. Triples to axioms: Many possible formalizations – knowledge of logics and domain expertise comes in handy here! Convert RDF triples into OWL axioms. Triple in RDF: <C1 R C2> • C1 and C2 are classes, R a relation between 2 classes • intended meaning: o C1 SubClassOf: C2 Challenge: o C1 SubClassOf: R some C2 Formalizing data requires o C1 SubClassOf: R only C2 one to commit to a o C2 SubClassOf: R some C1 o C1 SubClassOf: S some C2 particular meaning – to o C1 DisjointFrom: C2 make an ontological o C1 and C2 SubClassOf: owl:Nothing commitment o R some C1 DisjointFrom: R some C2 o C1 EquivalentClasses C2 o ... • in general: P(C1, C2), where P is an OWL axiom (template) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 68
  • 69. Triples to axioms Triple in RDF: <Cytosol> <isLocationOf> <HXK1> • Cell and HXK1 are classes • isLocationOf is an axiom pattern involving 2 classes • intended meaning: every instance of HXK1 is located at some instance of Cytosol • not intended: for every instance of Cytosol, there is an instance of HXK1 located in it. HXK1 subClassOf hasLocation some Cytosol inv(isLocationOf) some Cytosol
  • 70. Triples to axioms Challenges Formalizing RDF triples in OWL may introduce new OWL object properties. • Which object properties should be included? • What axioms hold for included object properties? • Can domain and range restrictions be generalized across multiple domains, i.e., reused across multiple linked data sources to ensure consistency between them? Integration of OWL ontologies requires a common semantic platform ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 70
  • 71. Axiom Patterns for Triples <nucleus> <part-of> <cell> ?X part-of ?Y •translated to axiom pattern ?X subClassOf: part-of some ?Y -> Nucleus subClassOf: part-of some Cell ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 71
  • 72. Implementation • expand relations in RDF based on relational patterns • relational patterns are OWL axioms with 2 variables (which are filled by subject and object, respectively) • implementation based on OWL API • adopt implementation of relational patterns in OBO language (http://code.google.com/p/obo2owl/) Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Herre, Heinrich, and Rebholz-Schuhmann, Dietrich (2010). Relational patterns in OWL and their application to OBO. OWL: Experiences and Directions (OWLED). paper: http://www.webont.org/owled/2010/papers/owled2010_submission_3.pdf presentation: http://www.slideshare.net/micheldumontier/relational-patterns-in- owl-and-their-application-to-obo BMC Bioinformatics: http://www.biomedcentral.com/1471-2105/11/441 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 72
  • 73. Another way? http://oppl2.sourceforge.net/ • OPPL is an abstract formalism that allows for manipulating ontologies written in OWL. • Use OPPL to select triples and create the axioms ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 73
  • 74.
  • 75. Which types and relations should we use for our axiom patterns?
  • 76. Top level ontologies contain generalized (domain independent) classes and relations They can be used to constrain what can be said about these entities (and hence will later be useful for checking the consistency of data annotated using these terms). ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 76
  • 77. Basic classes in top-level ontologies • Material entity • Example: Apple, Human, Cell, Planet • Has mass as an quality • Located in space and time • Independent of other entities • it exists in whole whenever it exists • Quality • Example: mass, color, concentration • Dependent: always the quality of some entity • Quality of object: size, shape, length • Quality of process: duration, rate • Quality of quality: shade (of color), intensity ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 77
  • 78. Basic classes in top-level ontologies • Function • e.g. to bind, to catalyze (a reaction), to kill bacteria • Dependent: always the function of some thing • Similar to a property of an object • Represents the potential to do something (an action) in some process • capabilities, dispositions and tendencies • Process • Example: running a marathon, binding, cell division • Located in space and time • Independent of other entities • Temporally extended ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 78
  • 79. Top-level ontologies can make a commitment to these being disjoint Material object, Process, Function and Quality are mutually disjoint. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 79
  • 80. Basic Relations in Top Level Ontologies • relations (object properties) in OWL hold between instances • Mereological: parthood – ‘has part’, ‘has proper part’, ‘has component part’ • Participatory – ‘is participant in’, ‘is agent in’, ‘is target in' • Spatial – ‘is connected to’, ‘located in’, ‘contains’, ‘is adjacent to’ • Temporal – ‘derives from’, ‘precedes’, ‘meets’, ‘overlaps’, etc • Referential – ‘describes’, ’denotes’, ‘represents’ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 80
  • 81. Relations in top-level ontologies • domain and range restrictions from top-level ontology can be applied for general relations, e.g.: • ‘has material part’ can be restricted with "Material object" as both domain and range • ‘participates in’ can be restricted with a domain of "Material object" and a range of "Process“ • re-use of relations (between instances) enables inferences across resources ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 81
  • 82. Relations impose additional constraints, such that inconsistencies arise when incorrectly used ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 82
  • 83. Alignment with top-level ontology Foundation of domain classes and relations in top-level ontology: • every domain class becomes a subclass of a class in top- level ontology • every object property used in OWL axioms becomes a sub- property of an object property in the top-level ontology • assert additional axioms to restrict domain classes and delimit it from other domains (where appropriate) o e.g., if a particular resources uses (in RDF) the relation part-of exclusively between processes, the additional constraint can be added to this relation ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 83
  • 84. What’s the role of top level ontologies?
  • 85. Top-level ontology Application of a top-level ontology: • can help to make the ontological commitment that is employed within an information system explicit, • can guarantee basic agreement about fundamental, common types, • Basic agreement about common relations, • provides common domain and range restrictions across multiple domains, and therefore • enables re-use of relations and types across data sources, domains, levels of granularities, information systems. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 85
  • 86. Formalization of SBML Models: • SBML models and model annotations are converted into OWL axioms by making SBML's ontological commitment explicit • Implementation as conversion patterns An explicit ontological commitment establishes and implements a one-to-one correspondence between SBML expressions and a formal interpretation within an ontology. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 86
  • 87. Bridging the gap: combine in vivo entities and in silico entities in a common model (an ontology) defined with axioms ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 87
  • 88. Formalization Reaction: A reaction represents some transformation, transport or binding process, typically a chemical reaction, that can change the amount of one or more species. (Hucka et al.) vs a Model component that is part-of a Model and represents some Process ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 88
  • 89. Formalizing SBML models using OWL Model component(x): a model entity that is part of a model 'model component' equivalentClass 'model entity' that 'is part of' some 'model' ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 89
  • 90. Assumption 1: Every model represents a material entity OWL Axiom: Model SubClassOf: represents some MaterialEntity Conversion rule: a Model annotated with class C represents: If C is a SubClassOf MaterialEntity then M SubClassOf: represents some C If C is a SubClassOf Function then M SubClassOf: represents some (has-function some C) If C is a SubClassOf Process then M SubClassOf: represents some (has-function some (realized-by only C))
  • 91. BIOMODEL 82: Converting Model Annotated with heterotrimeric G-protein complex cycle (GO:0031684): • represents an object O1 • O1 has a function F1 • F1 is realized by processes of the type heterotrimeric G- protein complex cycle • M SubClassOf: represents some O1 • O1 SubClassOf: (has-function some (realized-by only GO:0031684)
  • 92. Assumption 2: Every compartment represents a material object Compartment(x): a model component that represents a material object which is part of the object represented by the model to which the component belongs Compartment subClassOf 'model component' and represents some 'Material object' Conversion rule: • represents an object O2 • part of the object represented by the model • compartment’s species represent objects that are located in O2 • C SubClassOf: represents some A2 • A2 SubClassOf: located-in some A1
  • 93. BIOMODEL 82: Converting Compartment “Cell” Annotated with Cell (GO:0005623) • represents an object O2 • O2 is a kind of Cell • O2 is a part of O1 (represented by BIOMODEL 82) • C SubClassOf: represents some O2 • O2 SubClassOf: Cell and part-of some O1
  • 94. Assumption 3: Every species represents a material object Species(x): a model component that represents a material object which is part of the entity represented by the compartment of which the species is a part Species subClassOf 'model component' and represents some 'Material object' Species represents an O3 which • can have functions • the functions can be realized by processes • can have qualities (charge, amount, …) • is located in O2
  • 95. BIOMODEL 82: Converting Species “GTP” Annotated with GTP (CHEBI:15996) • represents an object O3 • O3 is a kind of GTP • O3 is located-in O2 (represented by “Cell” compartment) • S SubClassOf: represents some O3 • O3 SubClassOf: GTP and located-in some O2 • O3 SubClassOf: GTP and located-in some (Cell and part-of some (has-function some (realized-by only GO:0031684)))
  • 96. Reactions as Functions, not Processes Reactions represent Functions. Why not processes? - Functions are capabilities while processes are manifestations of these capabilities - Processes have a duration, a time of occurrence, participants, etc. - Functions can be realized multiple times, processes occur only once - Processes may be represented by simulations
  • 97. Assumption 4: Every reaction represents a functional entity Reaction(x): a model component that can include reactants, products and modifiers and represents a functional entity Reaction subClassOf 'model component' and 'represents' some ( ‘material entity’ and ‘has function’ some Function) ListOfReactions(x): a List that has only Reactions as members ListOfReactions EquivalentTo: List and 'has member' only 'reaction'
  • 98. BIOMODEL 82: Converting Reaction “GTP-binding” Annotated with GTP binding (GO:0005525) • represents an object O4 • O4 has a function F4 • F4 is a kind of GTP binding • F4 is realized by P4 • P4 has-input O3 (GTP) •R SubClassOf: represents some (has-function some F4) •F4 SubClassOf: GTP binding and realized-by only P •P SubClassOf: has-input some O3
  • 100. How would you formalize a model annotate with: A) heart B) to pump blood C) heart palpitations
  • 101. SBML2OWL: Implementation 1. Read the model • libSBML - http://sbml.org/Software/libSBML 2. Extract annotations from model & components • libSBML & Jena - http://jena.sourceforge.net 3. Formalize each annotation according to the formalization rules • OWLAPI - http://owlapi.sourceforge.net/ 4. Integrate with external ontologies • OWLAPI 5. Reasoning ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 101
  • 102. SBML2OWL: Implementation Application to BioModels repository yields: • OWL ontology with • more than 300,000 classes • More than 800,000 axioms • 90,000 complex model annotations • includes all referenced ontologies o GO o ChEBI o Celltype o FMA o PATO o (KEGG, Reactome) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 102
  • 103. SBML2OWL: Implementation OWLAPI: • Ontology consists of o a signature (classes, object properties, individuals) o a set of axioms ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 103
  • 104. SBML2OWL: Implementation Reference implementation: SBML Harvester http://code.google.com/p/sbmlharvester/ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 104
  • 105. Verification, querying, integration What can we do with the combined knowledge base? 1. Verification 2. Querying 3. Interoperability and knowledge integration ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 105
  • 106. Operations on OWL ontologies Consistency checking will identify contradictions in the stated and inferred knowledge. Consistency checking also helps to implement other reasoning tasks. • Satisfiability: determines whether classes can have instances. • Subsumption: is class C1 implicitly a subclass of C2? Check if C1 and not C2 is unsatisfiable, i.e., there is no instance of C1 that is not also an instance of C2 • Classification: repetitive application of subsumption to discover implicit subclass links between named classes • Realization: find the most specific class that an individual belongs to. Does individual a classify into the class C? ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 106
  • 107. Practical reasoning with OWL ontologies • Ontology editors such as Protege interface with reasoners to perform consistency and class satisfiability, classification, realisation, and provide explanations. • Some reasoners are setup to be used as the command line to execute requests including SPARQL querying. • Programmatic use of reasoners via APIs. Maximal flexibility, e.g., one can request all subclasses of a given class, including implicit once, or all entailed statements with a specified subject and predicate ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 107
  • 108. Operations on OWL ontologies Consistency checking will identify contradictions in the stated and inferred knowledge. Consistency checking also helps to implement other reasoning tasks • Satisfiability: determines whether classes can have instances. • Subsumption: is class C1 implicitly a subclass of C2? Check if C1 and not C2 is unsatisfiable, i.e., there is no instance of C1 that is not also an instance of C2 • Classification: repetitive application of subsumption to discover implicit subclass links between named classes • Realization: find the most specific class that an individual belongs to. Does individual a classify into the class C? Check if a : ¬C is consistent with the underlying ontology. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 108
  • 109. Classifying the ontology ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 109
  • 110. Classifying the ontology ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 110
  • 111. Classifying the ontology ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 111
  • 112. Verification • Use of OWL reasoning for classification • Which classes are unsatisfiable? • Unsatisfiable classes are equivalent to owl:Nothing ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 112
  • 113. Model verification After reasoning, we found 27 models to be inconsistent reasons 1. our representation - functions sometimes found in the place of physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations 2. SBML abused - species used as a measure of time 3. constraints in the ontologies themselves mean that the annotation is simply not possible ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 113
  • 114. Compartments/species annotated with functions or processes ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 114
  • 115. Biological inconsistency: Biomodel 176 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 115
  • 116. Biological inconsistency: Biomodel 176 [Term] id: GO:0016887 name: ATPase activity is a: GO:0017111 intersection of: GO:0003824 ! catalytic activity intersection of: has input CHEBI:15377 ! water intersection of: has input CHEBI:15422 ! ATP intersection of: has output CHEBI:16761 ! ADP intersection of: has output CHEBI:26020 ! phosphates ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 116
  • 117. Finding inconsistencies with axiomatically enhanced ontologies We add: • GO: ATP + Water the only inputs (=2 quantification) • ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all different (disjointness) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 117
  • 118. Consistency repair • Unsatisfiable classes result from contradictory class definitions • Conflict in asserted axioms, in imported ontologies or through combination of both • Conflicts can be hidden through domain/range restrictions, subclass relations, axioms for relations, etc. • Conflicting axioms may be challenging to identify! ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 118
  • 120. Protege 4: Explanation Workbench ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 120
  • 121. Ontology repair and disambiguation • Ontological commitment may have been too strong • Complex relations (between classes) can be relaxed by explicitly introducing a disjunction • Example: o Assumption 1: models represent material objects o model is annotated with the process Glycolysis o process and material object are disjoint, therefore the KB will contain unsatisfiable classes ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 121
  • 122. Disambiguation pattern disambiguation pattern: models annotated with X represents material objects X, or material objects with function X, or material objects with function that is realized by X. disambiguation patterns are applicable if multiple alternatives are mutually disjoint automated reasoning will then eliminate all but one option
  • 123. Disambiguation: Model annotations Assertion: M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C)) C SubClassOf: MaterialEntity Then: • represents some C is satisfiable • represents some (has-function some C) and represents some (has-function some (realized-by only C)) are unsatisfiable ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 123
  • 124. Disambiguation: Model annotations Assertion: M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C)) C SubClassOf: Function Then: • represents some (has-function some C) is satisfiable • represents some C and represents some (has- function some (realized-by only C)) are unsatisfiable ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 124
  • 125. Disambiguation: Model annotations Assertion: M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C)) C SubClassOf: Process Then: • represents some (has-function some (realized-by only C)) is satisfiable • represents some C and represents some (has- function some C) are unsatisfiable ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 125
  • 126. Aside from the disjunction pattern, what else could be used for consistency repair?
  • 127. Once consistent, we can query the ontology and infer new knowledge what would YOU ask of your formalized knowledge base?
  • 128. Knowledge discovery and retrieval • All queries are of the form: o Query class: Y o List all subclasses (and descendant classes), equivalent classes, superclasses (and ancestor classes) o Some OWL reasoners perform only classification and output the classified taxonomy ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 128
  • 129. Knowledge discovery and retrieval • Query: list all models • Query type: subclasses • Query class: Model ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 129
  • 130. Knowledge discovery and retrieval • Query: list all reactions that are part of BIOMD0000000169 • Query type: subclasses • Query class: Reaction and part-of some BIOMD0000000169 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 130
  • 131. Knowledge discovery and retrieval • Query: list all models that represent Glycolysis • Query type: subclasses • Query class: Model and represents some (has-function some (realized-by only Glycolysis)) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 131
  • 132. Knowledge discovery and retrieval • Query: list all models that have a compartment that represents a part of a Cell in which a sugar is located • Query type: subclasses • Query class: Model and has-part some (Compartment and represents some (part-of some Cell and contains some Sugar)) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 132
  • 133. Knowledge discovery and retrieval • Query: list all Model entities that represent catalytic activity involving sugar in the endocrine pancreas • Query type: subclasses • Query class: represents some (has-function some 'catalytic activity' and realized-by only (has-participant some (sugar and contained-in some (part-of some 'Endocrine pancreas')))) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 133
  • 134. Knowledge discovery and retrieval • Query: list all Model entities that represent mutagenic central nervous system drugs in the gastrointestinal system • Query type: subclasses • Query class: represents some (has-part some ('has role' some 'central nervous system drug' and 'has role' some mutagen and part-of some 'Gastrointestinal system') ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 134
  • 135. Answering questions ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 135
  • 136. Automated reasoning • more than 800,000 axioms • included ontologies contains several thousand axioms o GO has approx. 35,000 classes o ChEBI contains almost 100,000 classes o complex definitions of classes create links between large ontologies • Reasoning in OWL 2 DL is highly complex (worst-case 2NEXPTIME complete - 2^(2^n) - with n the number of operators used in the ontology) • Consequence: OWL reasoning can rarely be employing in a large scale. • Expressive OWL reasoners do not classify the formalized biomodels repository. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 136
  • 137. OWL Reasoners OWL DL Reasoners • Pellet: Clark & Parsia, dual-licensed, Java. • Fact++: Manchester University, open-source, C++ with a Java API. • HermiT: Oxford University, open-source, Java. • Racer Pro: Racer Systems, commercial, Lisp with a Java API. OWL Profile/subset reasoners • Jena: Hewlett-Packard, open-source, Java. • OWLIM: Ontotext, dual-licensed, Java. • CB: • CEL: • JCEL (Pellet) • ELLY: ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 137
  • 138. Implementation in information systems • Classification of model ontology: 10-120min • Answering complex queries: up to several hours • Consequence: OWL reasoning can rarely be employing in a large scale • Subsets of OWL allow tractable (polynomial- time) automated reasoning • OWL EL suitable for ontologies with a large number of classes • Problem: convert ontologies into tractable subset of OWL ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 138
  • 139. OWL Profiles • OWL 2 defines three different tractable profiles: • EL o polynomial time reasoning for schema and data o Useful for ontologies with large conceptual part • QL o fast (logspace) query answering using RDBMs via SQL o Useful for large datasets already stored in RDBs • RL o fast (polynomial) query answering using rule-extended DBs o Useful for large datasets stored as RDF triple ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 139
  • 140. OWL RL Features: • identity of classes, instances, properties • subproperties, subclasses, domains, ranges • union and intersection of classes (some restrictions) • property characterizations (functional, symmetric, etc) • property chains • keys • some property restrictions (but not all inferences are possible) Limitations: • not all datatypes are available • no datatype restrictions • no minimum or exact cardinality restrictions • maximum cardinality only with 0 and 1 • some consequences cannot be drawn ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 140
  • 141. OWL EL Features • existential quantification to a class expression or data range • existential quantification to an individual or a literal • self-restriction • enumerations involving a single individual or a single literal • intersection of classes and data range • class axioms: subClassOf, equivalence, disjointness • property axioms: domain, range, equivalence, transitive, reflexive, inclusion with or without property chains; functional data properties. keys. • assertions (sameAs, DifferentFrom, Class, Object Property, Data Property, Negative Object/Data Property Not supported • universal quantification to a class expression or a data range • cardinality restrictions • disjunction (union) • class negation • enumerations involving more than one individual • object properties: disjoint, symmetric, asymmetric, irreflexive, inverse, functional and inverse-functional ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 141
  • 142. Ontology modularization Can we automatically extract a large (maximal) OWL (EL, QL, RL) module from an ontology? 1. D EquivalentTo: not A (not EL) 2. C EquivalentTo: not B (not EL) 3. B subClassOf: A (EL) Inference: • D subClassOf: C (EL) (Inference from (1)-(3)) EL module of (1)-(3): • {B subClassOf: A}, or • {B subClassOf: A, D subClassOf: C} ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 142
  • 143. EL Vira modularization http://el-vira.googlecode.com • ontology modularization • identify EL, QL, RL axioms in deductive closure • retain signature of ontology • maximality is an open problem ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 143
  • 144. Outcomes The SBML-derived ontologies can be i) checked for their consistency, thereby uncovering erroneous curations ii) infer attributes and relations of the substances, compartments and reactions beyond what was originally described in the models iii) answer sophisticated questions across a model knowledge base ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 144
  • 146. Phenotypes Phenotypes are observable characteristics of an organism. Examples include: – Red hair – Heart rate of 120bpm – Absent arm – Malfunctional liver Phenotypes include comparisons such as Increased heart rate ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 146
  • 147. Phenotype and anatomy ontologies anatomy ontologies: > 100,000 classes – FMA, MA, WA, ZFA, FA, GO-CC, ... phenotype ontologies: > 20,000 classes – HPO, MP, WBPhenotype, FBcv, APO, ... quality ontology: > 2,000 classes – PATO process and function ontologies: > 25,000 classes – Gene Ontology, ... alignments between anatomy ontologies – UBERON, various mappings
  • 148. Phenotype: Example question Find all regions in the human, mouse, fish, fly, worm and yeast genome that are associated with tetralogy of Fallot.
  • 150. Tetralogy of Fallot – Overriding aorta (HP:0002623) – Ventricular septal defect (HP:0001629) – Pulmonic stenosis (HP:0001642) – Right ventricular hypertrophy (HP:0001667)
  • 151. Phenotype descriptions Overriding aorta (HP:0002623): – Q: overlap with (PATO:0001590) – E1: Aorta (FMA:3734) – E2: Membranous part of interventricular septum (FMA:7135) HP:0002623 EquivalentTo: phene-of some (has-part some (FMA:3734 and has-quality some (PATO:0001590 and towards some FMA:7135)))
  • 152. Human-mouse anatomy mappings Overriding aorta (HP:0002623): – Q: overlap with (PATO:0001590) – E1: Aorta (FMA:3734) • FMA:3734 EquivalentTo: MA:0000062 – E2: Membranous part of interventricular septum (FMA:7135) • FMA:7135 EquivalentTo: MA:0002939
  • 153. Mouse phenotype Overriding aorta (MP:0000273): – Q: overlap with (PATO:0001590) – E1: Aorta (MA:0000062) – E2: Membranous interventricular septum (MA:0002939) MP:0000273 EquivalentTo: phene-of some (has-part some (MA:0000062 and has-quality some (PATO:0001590 and towards some MA:0002939))) Consequence: MP:000272 EquivalentTo: HP:0002623
  • 154. Absence: absent appendix Absent appendix: – Q: lacks all parts of type (PATO:0002000) – E1: Human body (FMA:20394) – E2: Appendix (FMA:14542) AbsentAppendix EquivalentTo: LacksParts and towards some Appendix and inheres-in some HumanBody AbsentAppendix EquivalentTo: LacksParts and towards some {Appendix} and inheres-in some HumanBody AbsentAppendix EquivalentTo: phene-of some (HumanBody and not has-part some Appendix)
  • 155. Absence and inconsistency AbsentAppendix SubClassOf: phene-of some (HumanBody and not has-part some Appendix) HumanBody SubClassOf: has-part some Appendix HumanBody(John). AbsentAppendix(x). has-phene(John,x).
  • 156. Inconsistency removal – Removal of conflicting axioms (has-part/part-of in anatomy) – Contextualize anatomy: • Normal and HumanBody SubClassOf: has-part some (Normal and Appendix) – Use of non-monotonic reasoning
  • 157. Ontology of phenotypes Different formal expressions for phenotypes based on – qualities, – anatomical parts, – functions, – processes
  • 161. PhenomeBLAST – apply definition patterns to yeast, fly, worm, fish, mouse and human phenotypes and integrate in single ontology – phenotype alignment through OWL reasoning – more than 300,000 classes and 1,000,000 axioms – combination of HermiT (for EL Vira modularization), CB and CEL reasoner – classification time: 7 minutes http://phenomeblast.googlecode.org
  • 163. Comparison of phenotypes direct comparison of phenotypes: – disease phenotypes, e.g., tetralogy of Fallot – phenotypes associated with genetic mutations (genotypes in mouse, fish, etc.)
  • 164. Comparison of phenotypes When the phenotype annotation of a genotype becomes a subclass of a disease phenotype, then we can infer a gene- disease association if – disease phenotypes sufficient for having the disease – mutation phenotypes necessary for having a specific genotype Inference over ontologies can establish a formal proof for a gene-disease association.
  • 165. Knowledge discovery Similarity-based comparison allows for incomplete and noisy information. – pairwise comparison of phenotypes – similarity: weighted Jaccard index – result: similarity matrix between phenotypes – (quantitative) evaluation based on predicting orthology, pathway, disease – identify novel gene-disease associations
  • 168.
  • 169. What does the future hold? Better formalized ontologies Dynamic generation of knowledge through semantic web services … ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 169
  • 170. Summary - RDF and OWL RDF provides • light-weight semantics • fast queries • highly scalable implementations • large volumes of data (e.g., DBPedia, other Linked Data repositories) OWL provides • Constructs to formalize the intended semantics • An OWLAPI to develop, manage, and serialize OWL ontologies • Efficient reasoners of get inferences, compute modules and get explanations. • syntactic subset for better performance, albeit some inferences may be lost ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 170
  • 171. Summary - OWL & Formal languages • Formal logic-based languages can be used to formalize the meaning of terms used in discourse. While normally restricted in terms of what can be expressed, the statements formed can be automatically reasoned about. • OWL is based on description logics and formalizes the meaning of terms with axioms. Axioms can be used to characterize and distinguish classes, relations and individuals. Rich expressions can be crafted from logical combinations of language primitives including conjunction, disjunction, negation and object/dataproperty restrictions. • OWL reasoners provide a number of services including computing subsumption, satisfiability, entailment, realization and query answering. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 171
  • 172. Summary - Exploitation of ontologies • verification: automated reasoning can reveal contradictory definitions of classes (unsatisfiable classes), instances that violate constraints in the ontology (often leading to inconsistent ontologies) and reveal hidden inferences (that may be considered invalid through manual verification • querying: ontologies define an explicit, formal language based on which queries to a knowledge base can be performed; queries can be made for instances and for classes satisfying complex conditions • repair: through explicit definitions using disjunction, constraints can be relaxed and contradictions reduced ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 172
  • 173. Summary - Ontology Ontology is not philosophy! • an ontology is a specification of a conceptualization of a domain • a conceputalization is a system of categories accounting for a particular view on the world • ontologies are used to make some aspects of the intended meaning of terms in a vocabulary explicit • ontologies (in computer science) may utilize philosophical theories • formalized ontologies can be used by humans and automated systems as a basis for communication and data exchange • Ontologies are useful tools for translational research ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 173
  • 174. Summary - Implementation in information systems • The OWLAPI is a reference implementation of the OWL specification and facilitates the development, management and serialization of expressive OWL ontologies. The OWLAPI also facilitates modularization and getting explanations. • OWL provides a syntactic subset of the language for efficient reasoning. These so-called OWL profiles (EL, RL, QL) have well understood computational properties and can lead to better performance, but with some inferences lost. • Formal ontology makes it possible to not only retrieve data (similar to db), but also query the concepts themselves ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 174
  • 175. Summary - evaluation • ontologies are tools to support science • Ontologies can provide insight into real biological/scientific problems • quantifiable evaluation can be performed, e.g., based on precision/recall or ROC analysis • application of ontologies may go beyond reasoning alone and use statistical analyses (enrichment), semantic similarity, graph algorithms, clustering, etc. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 175
  • 176. Conclusions • Ontologies + Semantic Web enables • Integration • Verification • Analysis • Discovery • Translational research ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 176
  • 177. Acknowledgements George Gkoutos Heinrich Herre Janet Kelso Dietrich Rebholz-Schuhmann Anika Oellrich Michael Ashburner Dan Cook John Gennari Paul Schofield
  • 178. michel_dumontier@carleton.ca leechuck@leechuck.de ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 178