So einfach geht modernes Roaming fuer Notes und Nomad.pdf
US2TS: Reasoning over multiple open bio-ontologies to make machines and humans happy
1. Reasoning over multiple open bio-
ontologies to make machines and
humans happy
Chris Mungall
cjmungall@lbl.gov
@chrismungall
http://bit.ly/mungall-us2ts-2019
2. Biological data management is hard.
There are many named things.
Drugs 10k
Chemicals 1-50m?
Species
~9 million
Diseases and
Phenotypes
10-50k/species
Cells
1000s+ types
per species)
Experiments
Raw data
Genes 20k/species
Genetic
variants
3m (human)
3. There are many ways to
categorize the things
Genes 20k/species
Gene Ontology
45k functional descriptor
classes
Knowledge Graph Edges
~7m
4. There are many ontologies
to categorize the things
762 ontologies
6. How do we manage this?
MODULARITY REASONING
EL (Elk,
Whelk)
DL (Hermit,
FACT++)
● OBO
● Rector Normalization
● Design Patterns
● Relation Ontology
● ROBOT
7. Open Biological Ontologies (OBO)
http://obofoundry.org
1. Well-integrated
Modular ontologies
(SUBSET of bioportal)
2. Provide technical and
sociotechnological
framework for
cooperation
4. Allow us to
curate all of the
things
3. Provide tools,
best practices and
infrastructure for
forging new
ontologies
@obofoundry
9. RECTOR
NORMALIZATION
Rector 2003
Modularisation of domain ontologies
implemented in description logics and related
formalisms including owl.
+ =
http://www.cs.man.ac.uk/~rector/papers/rector-modularisation-kcap-2003-distrib.pdf
10. Minimal Constructs Needed for
Reactor Normalization
Some
Values From
Intersection
Of
EquivalentTo
SubClassOf
11. OBO Relation Ontology: glue
within and between ontologies
http://obofoundry.org/ontology/ro
12. Spatial Reasoning OWL design
patterns
nucleus
> spatially_disjoint_with.yaml
axiom:
Text: (part-of some %s)
DisjointWith
(part_of some %s)
Vars:
- component1
- component2
Ontology:
(part-of some nucleus)
DisjointWith
(part-of some cytosol)
14. Reasoning detects annotation
errors
Genes are often assigned
functions automatically based on
homology. This is error-prone.
Previous errors include:
• Genes in slime mold
responsible for dorsal fin
development
• Genes in chicken responsible
for lactation
15. Reasoning detects annotation
errors
Genes are often assigned
functions automatically based on
homology. This is error-prone.
Previous errors include:
• Genes in chicken responsible
for lactation
• Genes in slime mold responsible
for dorsal fin development
Dorsal Fin SubClassOf Fin
Fin SubClassOf part-of some Vertebrate
(Part-of some Animal) DisjointWith (part-of some Slime Mold)
18. Pop quick: what OWL profile is this?
'DNA extent' EquivalentTo
'sequence molecular entity extent' and
('has part' only
('deoxyribonucleotide residue' or
(('chemical entity' or
'biological sequence entity') and
(not ('biological sequence unit')))))
19. Combining transitive properties and universal
restrictions can take you strange places
'DNA extent' EquivalentTo
'sequence molecular entity extent' and
('has part' only
('deoxyribonucleotide residue' or
(('chemical entity' or
'biological sequence entity') and
(not ('biological sequence unit'))
)
))
20. Avoid going mad with complex nested boolean
expressions
KEEP IT SIMPLE,
SAPIENS
Disjoint
Classes
Some
Values From
Intersection
Of
Use with caution:
1. Only
2. Not
3. Cardinality
4. Levels of nesting requiring
parentheses
Generally not needed for bio-
ontology T-Box reasoning
1. Data Properties
2. Keys
21. BIG BUCKET OF
MIXED AXIOMS
I've giv'n her all
she's got captain, an'
I canna give her no
more!
1
22. BIG BUCKET OF
MIXED AXIOMS
I've giv'n her all
she's got captain, an'
I canna give her no
more!
WEE BUCKET
OF HARD
AXIOMS
BIG BUCKET OF
EASY AXIOMS
Let me just shoogle these
axioms aroond a wee bit
1
2
HARD: Erythrocyte SubClassOf has_part exactly 0 nucleus
⇒
HARD: Anucleate EquivalentTo has_part exactly 0 nucleus
EASY: Erythrocyte SubClassOf Anucleate
23. BIG BUCKET OF
MIXED AXIOMS
I've giv'n her all
she's got captain, an'
I canna give her no
more!
WEE BUCKET
OF HARD
AXIOMS
BIG BUCKET OF
EASY AXIOMS
Let me just shoogle these
axioms aroond a wee bit
Och aye that’s
just aboot right
1
2
3
24. BIG BUCKET OF
MIXED AXIOMS
I've giv'n her all
she's got captain, an'
I canna give her no
more!
WEE BUCKET
OF HARD
AXIOMS
BIG BUCKET OF
EASY AXIOMS
Let me just shoogle these
axioms aroond a wee bit
Och aye that’s
just aboot right
1
2
3
Now I’ll hand these over to ma
pal the Elk, he’s pure dead fast
4
I’m traveling at the speed of light that’s
why they call me Mr Farenheit
5
THE
END
26. Making the pieces fit together: GO
and CHEBI
GO CHEBI
• Some relationships didn’t make
sense
• E.g. nucleotide isa
carbohydrate
• Acids ⬄ conjugate
bases
27. Making the pieces fit together: GO
and CHEBI
Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., … Lomax, J. (2013). Dovetailing biology and
chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14(1), 513.
GO CHEBI
• Fixed many is-as
• E.g. nucleotide isa
carbohydrate
• Acids ⬄ conjugate
bases
+ OWL reasoning
Harold Drabkin
David Hill
Jane Lomax
Tanya Berardini
Janna Hastings
GO CHEBI
+ Design
Patterns
29. Conclusions
● Maintaining > ~100 classes benefits from reasoning
● Maintaining > ~10000 classes: you will be in maintenance hell without
reasoning
● Reasoning is dead easy for computers
● Reasoning can be hard for humans
○ Keep it simple
○ Use Design Patterns / Templates
○ Use software engineering paradigms
○ Avoid unneccessary complexity
● Sociotechnological aspects of reasoning are hardest
○ “I don’t like the entailments I get when I use your ontology”
http://bit.ly/mungall-us2ts-2019