The Pragmatics and Formality of Authoring OntologiesOdsl 2016
1. Formality and Pragmatics in
Authoring Ontologies
Robert Stevens
ODLS 2016
School of Computer Science
The University of Manchester
Manchester
United Kingdom
M13 9PL
Robert.stevens@manchester.ac.uk
2. Acknowledgements
• On-going work with Phil Lord on normalising
the Gene Ontology
• The Gene Ontology folk for making GO
• Nico Matentzoglu for my slides
• Mercedes Casteleiro for numbers
3. Formality and Pragmatics
• Formality: Acting strictly according to
procedure or rules
– Ontological formality
– Representational formality
• Pragmatics: Behaviour driven by practical
consequences rather than dogma
• There’s a tension between the two
6. What is Molecular Function in GO?
• Describes “function”…?
GO:0003674
molecular_function
Elemental activities, such as catalysis or
binding, describing the actions of a
gene product at the molecular level. A
given gene product may exhibit one
or more molecular functions.
7. Motivation
• Is GO’s molecular function ontology really
function, “little” processes or both?
• Documented as a function
• Sometimes looks like a process
• Sometimes treated like a process
• Confusion of thing with a function and the
function
• This can make modelling harder than it need be
8. A Couple of Observations
• Pragmatically, we commit to GO – it’s the only
show in town and it works
• There’s a lot of chemicals around in GO MF
• We are biochemistry….!
• Probably few functions – strip out all the “non-
function” stuff and see what’s left
• Then we can look at the ontological nature of GO
MF
• Also, re-create in a more sustainable form
11. There are several dimensions of
classification here
• The amino acids themselves – a chemical dimension
• The size of the amino acids side chain
• The charge on the side chain
• The polarity of the side chain
• The hydrophobicity of the side chain
• We can normalise these into separate hierarchies then put
them back together again
• Our goal is to put entities into separate trees all formed on
the same basis
• Size only talks about size; amino acid only talks about
chemical composition (based on an alpha-carbon with an
amino and carboxylic acid group);and so onof classification
13
13. The process
• Hand-crafted ontologies with a polyhierarchy
are “tangled”
• Usually axiomatically lean
• We classify along one axis and use
“restrictions” to other modules to capture
other axes
• Then re-build the polyhierarchy using the
axiomatically rich ontology
15
14. “Pulling out” dimensions
• Each separate tree must be the same kind of
thing
• We don’t mix continuants, processes,
qualities, etc
• We don’t mix our classification by, for
instance, structure and then charge
• We do that compositionally via defined classes
and automated reasoners
16
15. The amino acid pattern
17
Class: AminoAcid
SubClassOf:
hasSize some Size,
hasPolarity some Polar,
hasCharge some Charge,
hasHydrophobicity some Hydrophobicity
16. An amino acid
18
Class: Lysine
SubClassOf:
AminoAcid,
hasSize some Large,
hasCharge some Positive,
hasPolarity some Polar,
hasHydrophobicity some Hydrophilic
17. Rebuilding the hierarchy
• Class: LargeAminoAcid
– EquivalentTo: AminoAcid
• and hasSize some Large
• Class: PositiveAminoAcid
– EquivalentTo: AminoAcid
– and hasCharge some Positive
• Class: LargePositiveAminoAcid
– EquivalentTo: LargeAminoAcid and PositiveAminoAcid
19
19. Other Ontology Topics as
Factors in GO MF
molecular
function
chemical
chemical
role
reaction
biological
process
cellular
component
cell
protein
sequence
40-60% of terms
mention chemicals
21. Binding
• ~2k terms in the binding bit of GO MF
• Remove the chemicals
• Leaves “binding”
• There is a function “to bind”
• There is a process of binding”
• Linguistically – an infinitive and a
gerund/nominalised verb
22. More “to bind” Functions?
• “to bind” is the basic function
• Specialise to to bind covalently, to bind via
hydrogen, to bind electrostatically
but these are built compositionally with
reference to other ontologies
23. Chemorepellant - chemoattractant
activity
GO:0042056
chemoattractant activity
Providing the environmental signal that
initiates the directed movement of a
motile cell or organism towards a
higher concentration of that signal.
GO:0045499
chemorepellent activity
Providing the environmental signal that
initiates the directed movement of a
motile cell or organism towards a
lower concentration of that signal.
To diffuse
26. Distinctions with no (practical)
difference
• “Distinction without a difference” – making a
distinction where none exists
• Distinctions may exist, but does one need to
make them?
• Does a distinction make a practical difference
to the use case in hand?
• Make no distinction unless it makes a
difference
• Beware of consistency…
30. Some patterns
• hasRealisableEntity some (to_bind and
realisedIn only (binding and hasInput some
chemical)))
• Add “playsrole some role” for a chemical role
like drug
• hasRealisableEntity some (to_catalyse and
realisedIn only (catalysis and hasInput some
chemical and hasOutput some chemical))
31. Actually doing it
• Programmatically using Tawny-OWL
• Asserted tree of molecular realisables and
molecular processes
• Defined classes for the actual terms
• May have to restrict to OWL EL for practical
reasons
• We shall see…
32. Strategies for Defined Classes
• Total post co-ordination
• Total pre co-ordination
• Pre co-ordinate those classes that have been
used in annotation
33. How many GO MF terms are used?
Annotation file
Homo sapiens: Canonical
accessions from UniProt
(goa_human.gaf.gz)
Unfiltered GOA UniProt gene
association file
(goa_uniprot_all.gaf.gz)
Total number of GO-
UniProt annotations 354 515 ~ 354K 294 208 149 ~ 294M
Unique UniProt IDs 19 055 ~ 19K 45 968 890 ~ 46M
Unique active Molecular
Function classes 3 947 ~ 4K 7 521 ~ 7K
Unique active Molecular
Function classes used
more than 5 times
1 313 ~ 1K
34. What have we found?
• Very few functions
• … and some look dispositional
• It looks like physics
• Most functions involve binding – makes sense
• We separate realisables and processes
• We live with a bit of “replication”
• With molecular processes, do we need molecular
funtion?
• WE change the upper reaches of GO MF, but…
• Does it make any practical difference?
35. Formality
• Ontological formality
• Making the right distinctions drives consistent
use of relationships
• Facilitates the kind of analysis we’ve done
• Can also be a barrier to progress
• Representational formality
• Knowing what is being said is useful
• Allows clean interpretation
• Enables useful reasoning
36. Pragmatic Decisions
• Commit enough to achieve goals
• If re-using take on the commitments of that ontology
– If using OBO commit to OBO
– If what you’re using uses something with which you
disagree – get over it
• Axiom pragmatics
• Don’t represent that which isn’t needed
• Truth and beauty
• A counsel of perfection is a counsel of despair
• I’d make “gene product” explicit
Hinweis der Redaktion
Informal definitions of the words formality and pragmatics
I build ontology based applicationis and pragmatics come into play
I like formality (up to a point) but I’d prefer an applicationi that does something over a formal ontology that is not usable – both is great, but I scarifice formality first
1)
#Slide with molecular function title
#add textbox with number of terms
#URL: http://geneontology.org/
D-alanyl carrier activity
acetylcholine receptor regulator activity
antioxidant activity
binding
calcium channel regulator activity
catalytic activity
channel regulator activity
chemoattractant activity
chemorepellent activity
core DNA-dependent RNA polymerase binding promoter specificity activity
electron carrier activity
enzyme regulator activity
guanyl-nucleotide exchange factor activity
metallochaperone activity
mitochondrial RNA polymerase binding promoter specificity activity
molecular function regulator
molecular function regulator
molecular transducer activity
morphogen activity
negative regulation of molecular function
neurotransmitter receptor regulator activity
nucleic acid binding transcription factor activity
nutrient reservoir activity
positive regulation of molecular function
protein tag
receptor regulator activity
regulation of molecular function
signal transducer activity
structural molecule activity
transcription factor activity, core RNA polymerase I binding
transcription factor activity, core RNA polymerase II binding
transcription factor activity, core RNA polymerase III binding
transcription factor activity, core RNA polymerase binding
transcription factor activity, protein binding
transcription factor activity, transcription factor binding
translation regulator activity
transporter activity
title: GO Molecular function
1. molecular_function (GO:0003674)
"Elemental activities, such as catalysis or binding, describing the actions of a gene product at the molecular level. A given gene product may exhibit one
or more molecular functions."
- 1. above in a box at the top of the slide with a text box below into which I can put bullets. the first bullet is
* Describes "function"....?
first slide is a tangled hiearchy (title "Normalisation 1"
"Vehicle" at the top
the leaves are:
fast red sports car
fast green sports car
red lorry
slow yellow lorry
green van
fast red motor cycle
black estate car
green saloon car
red estate car
Then some intermedate, "defined classes" such as:
red vehicle green vehicle
fast red car
red car
and any you can think of andmake it tangled
second slide (title "Normalisation 2")
separate out a set of hierarchies
Vehicle
colour
speed
style
and if you can fit it on, an axiom pattern of
Class: Vehicle
SubClassOf:
hasColour some Colour
hasStyle some Style
hasSpeed some Speed
Normalisation; a paper from Alan Rector (2003)
This pulling out of non-function aspects of GO MF I not complete
Most aspects have OBO support
Not electron and energy
title: Chemorepellant - chemoattractant activity
below, 1 and 2 are some kind of box with the GO term and Id as some form of title with the definition below. this links down to a blob containing 3.
1. chemoattractant activity (GO:0042056)
Providing the environmental signal that initiates the directed movement of a motile cell or organism towards a higher concentration of that signal.
2.chemorepellent activity (GO:0045499)
Providing the environmental signal that initiates the directed movement of a motile cell or organism towards a lower concentration of that signal.
3. both linking down to a blob containing "To diffuse"
RealizableEntity
Some of these functions l look dispoitional
To store, to diffuse and to structurally maintain
Lots of these “functyions” als also imply bidning
This is not a surprise as some binding must happen for anything to happen(as-subclasses
ToCatalyse
:comment "To reduce the activation energy of a reaction, enabling it to go
faster.")
(defclass ToBind
:comment "To interact tightly with another entity, longer than transiently,
such that separating the entity requires significant energy. ToBind
functions are often transitive; A has a function ToBind B, then vice versa
is also true.")
(defclass ToMark
:comment "To bind between this entity X, and another entity Y, so that
a third entity Z can also be bound, and thereby interact with Y."
:super ToBind))
;; #+end_src
(defclass ToStore
:comment "To contain a substance for later use.")
(defclass ToDiffuse
:comment "To spread outward from a single point as a result of Brownian
motion.")
(defclass ToTransport
:comment "To enable the movement of an entity in a directed manner.")
(defclass ToMaintainIntegrity
:comment "To keep the same structure, shape or organisation despite
physical forces, either in compression or in extension.")
(defclass ToProtect
:comment "To prevent an event occuring to this or another entity.")
(defclass ToModulate
:comment "To alter the strength or quantity of some other realisable entity.")
(defclass ToRegulate
:comment "To modulate in a directed manner, as part of a feedback loop."
:super ToModulate)
(defclass ToTransduce
:comment "To change energy from one form to another.")
Talk about Mungall et al’s normalisation of GO
Partial; not down to the bare functions
Intersting point around ribose sugars