All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Knowing what we’re talking about
1. Knowing what we’re
talking about
Robert Stevens
Bio-health Informatics Group
School of Computer Science
University of manchester
Oxford Road
Manchester
United Kingdom
M13 9PL
Robert.Stevens@manchester.ac.uk
2. We have an item of data
• 27
• 27 what?
• Units, with what is 27
associated?
• Even if I told you, would
we interpret what I said
in the same way?
27
5. Mouse tail of 27 mm
• … and we can carry on:
Mouse strain, where was
it raised, on what was it
fed, times, dates, etc.
etc.
• All this data is necessary
to interpret my original
number
• Even if that metadata
exists, we have to agree
on the things the
numbers describe
mouse
tail of
27mm
7. Heterogeneity is rife
• We agree on units (more or less)…
• We don’t agree on much else when it comes to
labels for the entities in our domain
• If we don’t know what we’re talking about….
• It’s difficult to interpret and exchange data and the
results from data
8. Categories and Category Labels
GO:0000368
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation
spliceosomal E complex biosynthesis
spliceosomal CC complex formation
U2-type nuclear mRNA 5'-splice site recognition
9. The Ogden Triangle
“Roast Beef“
Concept
[Ogden, Richards, 1923]
• Humans require words (or at least symbols) to communicate efficiently. The
mapping of words to things is only indirectly possible. We do it by creating
concepts that refer to things.
• The relation between symbols and things has been described in the form of the
meaning triangle:
10. We need to know what we’re talking about…
• … if we don’t, our data are useless
• Ifg we are to interpret our data then we need to
know what entities it describes
• We need to share data and re-use it
• We need to find data; compare data; analyse data
• We need to know what we know….
11. Manchester Mercury
January 1st 1754 Executed 18
Found Dead 34
Frighted 2
Kill'd by falls and other accidents 55
Kill'd themselves 36
Murdered 3
Overlaid 40
Poisoned 1
Scalded 5
Smothered 1
Stabbed 1
Starved 7
Suffocated 5
Aged 1456
Consumption 3915
Convulsion 5977
Dropsy 794
Fevers 2292
Smallpox 774
Teeth 961
Bit by mad dogs 3
Broken Limbs 5
Bruised 5
Burnt 9
Drowned 86
Excessive Drinking 15
List of diseases &
casualties this year
19276 burials
15444 christenings
Deaths by centile
12. A World of Instances
• The world (of information) is made up of things and lots of them
• Instances, individuals, objects, tokens, particulars.
• The Earth is a kind of Planet
• Robert Stevens (NE 67 41 58 A) is a Person
• All the individual Alpha Haemoglobins in my many Instances of Red Blood Cell
• Each cell instance in my Body has copies of some 30,000 Genes
• A Word, language, idea, etc.
• This Table, those Chairs,
• Any Thing with “A”, “The”, “That”, etc. before it….
13. We Put things into
Categories
• All these instances hang about making our world
• Putting these things into categories is a fundamental part of
human cognition
• Psychologists study this as concept formation
• The same instances are put into a category
14. We have Labels for the
Categories and their
Instances
• We label categories with symbols: Words
• “Lion” is a category of big cat with big teeth
• Gene, Protein, Cell, Person, Hydrolase Activity, etc.
• …and, as we’ve already seen, each category can have many labels and any
particular label can refer to more than one category
• Semantic Heterogeneity
• “A lion” is an instance in that category
• Does the category “Lion” exist?
• Lions exist, but the category could just be a human way of talking about
lions
• … we like putting things into categories
15. A Controlled Vocabulary• A specified set of words and phrases for the categories
in which we place instances
• Natural language definitions for those words and
phrases
• A glossary defines, but doesn’t control
• The Uniprot keywords define and control
• Control is placed upon which labels are used to
represent the categories (concepts) we’ve used to
describe the instances in the world
• …, but there is nothing about how things in these
categories are related
Biopolymer
DNA
Enzyme
Nucleic acid
mRNA
Polypeptide
snRNA
tRNA
16. We also like to Relate Things
Together
• Categories have subcategories
• Instances in one category can be related
in some way to instances in another
• Can relate instances to each other in
many different ways
• Is-a, part-of, develops-from, etc.axes
• We can use these relationships to classify
categories
• Things in category A are part is
• If all instances in category A are also in
category B then As are kinds of Bs
Biopolymer
Nucleic Acid Polypeptide
Enzym
e
DNA RNA
tRNA mRNA smRNA
18. Describing Category
Membership
• We can make conditions that any instance must fulfil in order to be a
member of a particular category
• A Phosphatase must have a phosphatase catalytic domain
• A Receptor must have a transmembrane domain
• A codon has three nucleotide residues
• A limb has part that is a joint
• A man has a Y chromosome and an X chromosome
• A woman has only an X chromosome
19. Relationships
• These conditions made from a property and a
successor relationship
• isPartOf, hasPart
• isDerivedFrom
• DevelopsFrom
• isHomologousTo
• …and many, many more
20. A Structured Controlled
Vocabulary
• Not only can we agree on the
labels we give categories
• Can also agree on how the
instances of categories are
related
• And agree on the labels we give
he relations
• Structure aids querying and
captures knowledge with greater
fidelity
Biopolymer
Nucleic Acid Polypeptide
Enzym
e
DNA RNA
tRNA mRNA smRNA
Gene
transcribedFrom
21. A Stronger Definition
• a set of logical axioms designed to account for the intended meaning
of a formal vocabulary used to describe a certain (conceptualisation of)
reality [described in an information system) [Guarino 1998]
• “conceptualisation of” inserted by me
• “Logical axioms” means a formal definition of meaning of terms in a
formal language
• Formal language—something a computer an reason with
• Use symbols to make inferences
• Symbols represent things and their relationships
• Making inferences about things computationally
22. So what is an ontology?
Catalog/
ID
Thesauri
Terms/
glossary
Informal
Is-a
Formal
Is-a
Formal
instance
Frames
(properties)
General
Logical
constraints
Value
restrictions
Disjointness,
Inverse, partof
Gene Ontology
Mouse Anatomy
EcoCyc
PharmGKB
TAMBIS
Arom
After Chris Welty et al
23. What does it all mean
anyway
• To interpret our data we need to know what it is we’re talking
about
• We need to decide the things that we’re talking about and
agree upon them
• We need to agree on how to recognise those entities
• We need to know how they are related to one another
• Ontologies are a mechanism for describing those entities
and their definitions
• There’s more to knowledge representation than ontologies…
24. All this knowledge needs
representing
• We want this knowledge in a computational form
• To make the knowledge available for software (and
humans)
• To help us develop and manage the (often) complex
artefacts
Building ontologies is hard (getting all those relationships in
the right place)
The Web Ontology Language (OWL) is a W3C
recommendation for ontologies on the Semantic Web and in
semantically enabled applications
A knowledge representation language with a strict semantics
that is amenable to autoamted reasoning
25. Web Ontology Language
(OWL)
• W3C recommendation for ontologies for the Semantic
Web
• OWL-DL mapped to a decidable fragment of first order
logic
• Classes, properties and instances
• Boolean operators, plus existential and universal
quantification
• Rich class expressions used in restriction on
properties – hasDomain some (ImnunoGlobinDomain
or FibronectinDomain)
26. What are we saying?
Person
WomanMan
is-ais-a
• Are all instances of Man instances of Person?
• Can an instance of Person be both a Man
and an instance of Woman?
• Can there be any more kinds of Person?
27. What are we saying?
• What kinds of class can fill “has chromosome”?
• How many “Y chromosome” are present?
• Does their have to be a “Y chromosome”?
• What properties are sufficient to be a Man and which are
simply necessary?
Y chromosomeMan has-chromosome
Y chromosomeMan
has-chromosome
X chromosomehas-chromosome
autosomehas-chromosome
1
1
44
29. Necessity and Sufficiency
• An R2A phosphatase must have a fibronectin domain
• Having a fibronectin domain does not a phosphatase make
• Necessity -- what must a class instance have?
• Any protein that has a phosphatase catalytic domain is a
phosphatase enzyme
• All phosphatase enzymes have a catalytic domain
• Sufficiency – how is an instance recognised to be a member
of a class?
32. Problems Ontologies in
Biology Try To Solve
• Provenance – where did it come from, who did it?
• Reproducibility – can I repeat and find results
reported?
• Sharing – can others understand your data?
• Integration – can I readily take multiple (thousands
of) data sets and use them without preparation?
• New knowledge – can we infer new knowledge as
a sum of current knowledge (computationally)?