Ontology and the National Cancer Institute Thesaurus (2005)
ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011
1. An Introduction to Ontology as a
Strategy for Data Integration
Barry Smith
1
2. The problem
• legacy idiosyncracies in handling data
complicated progressively by
• changes in available hardware and software
• turnover of personnel and of collaborations
• explosion of data
• need to get funding (inhibits reuse)
2
3. The result: balkanization
• systems are poorly integrated
• deliver redundant capabilities
• foster error and waste
• prevent comparison and aggregation
• prevent secondary use of data
• lowers ROI on software
3
4. The proposed solution
• vocabulary and meanings change more
slowly than hardware and software (and
scientific theory*)
• semantic interoperability has high initial
cost (governance, commitment) but
considerable long-term value
*atom, electron, cell, bacteria, organism …
4/24
5. How to do it right?
• how create an incremental, evolutionary
process, where what is good survives, and
what is bad fails
• create a scenario in which people will find it
profitable to reuse ontologies, terminologies
and coding systems which have been tried and
tested
6
7. By far the most successful: GO (Gene Ontology)
8
8. GO provides a controlled vocabulary of terms
for use in annotating (describing, tagging) data
• multi-species, multi-disciplinary, open source
• contributing to the cumulativity of scientific
results obtained by distinct research
communities
• compare use of kilograms, meters, seconds in
formulating experimental results
• natural language and logical definitions for all
terms to support consistent human application
and computational exploitation
9
9. What is the key to GO’s success?
• multi-species, multi-disciplinary, open source
• clear rules for ontology development and
maintenance
• over 11 million annotations relating gene
products described in the UniProt, Ensembl and
other databases to terms in the GO
10
10. Extending GO’s success to other fields
Open Biological and Biomedical Ontologies
(OBO) Foundry
• Best practice principles
• Governance
• Review process
• Two-tier membership
http://obofoundry.org
11
12. CONTINUANT OCCURRENT
RELATION
TO TIME
INDEPENDENT DEPENDENT
GRANULARITY
Anatomical
Organism Organ
ORGAN AND Entity
(NCBI Function
ORGANISM (FMA,
Taxonomy) (FMP, CPRO) Phenotypic Biological
CARO) Quality Process
(PaTO) (GO)
CELL AND Cellular Cellular
Cell
CELLULAR Component Function
(CL)
COMPONENT (FMA, GO) (GO)
Molecule
Molecular Function Molecular Process
MOLECULE (ChEBI, SO,
(GO) (GO)
RnaO, PrO)
OBO (Open Biomedical Ontology) Foundry proposal
(Gene Ontology in yellow) 13
13. CONTINUANT OCCURRENT
RELATION
TO TIME
INDEPENDENT DEPENDENT
GRANULARITY
COMPLEX OF Family, Community, Population Population
ORGANISMS Deme, Population Phenotype Process
Anatomical Organ
ORGAN AND Organism Entity Function
ORGANISM (NCBI (FMA, (FMP, CPRO) Phenotypic
Taxonomy) Biological
CARO) Quality
Process
(PaTO)
(GO)
CELL AND Cellular Cellular
Cell
CELLULAR Component Function
(CL)
COMPONENT (FMA, GO) (GO)
Molecule
Molecular Function Molecular Process
MOLECULE (ChEBI, SO,
(GO) (GO)
RnaO, PrO)
Population-level ontologies 14
14. CONTINUANT OCCURRENT
RELATION
TO TIME
INDEPENDENT DEPENDENT
GRANULARITY
Anatomical
Organism Organ
ORGAN AND Entity
environments
(NCBI Function
ORGANISM (FMA,
Taxonomy) (FMP, CPRO) Phenotypic Biological
CARO)
Quality Process
(PaTO) (GO)
CELL AND Cellular Cellular
Cell
CELLULAR Component Function
(CL)
COMPONENT (FMA, GO) (GO)
Molecule
Molecular Function Molecular Process
MOLECULE (ChEBI, SO,
(GO) (GO)
RnaO, PrO)
Environment Ontology
15
26. A spatial environment
is a site that
1. contains a medium (air, water)
2. can contain an organism or a
population of organisms
Some sites are supported and demarcated
by some solid object
29
27. Stationary Sites
1 2 3 4
1: your office when the door is closed; a closed
mouth
2: a rabbit hole; an open mouth
3: the surface of a leaf
4: the Klingon Empire
30
28. Mobile Sites
1 2 3 4
1: a womb; a spaceship
2: a snail’s shell; a
3: the home range of a migrating
herd of buffalo;
4: the niche around a flying buzzard 31
29. At any given instant
a site is coincident with some spatial region
But because there are mobile sites
not: site ≡ spatial region
For stationary sites we can associate
latitute/longitude specifications
32
30. Double hole structure of a
Spatial Environment
Retainer
(a boundary of some
surrounding structure)
Medium
(filling the environing hole)
Tenant
(occupying the central hole)
33
36. RELATION CONTINUANT OCCURRENT
TO TIME
GRANULARITY INDEPENDENT DEPENDENT
Anatomical
Organism Organ Organism-Level
ORGAN AND Entity
(NCBI Function Process
ORGANISM (FMA,
Taxonomy) (FMP, CPRO) Phenotypic (GO)
CARO) Quality
(PaTO)
CELL AND Cellular Cellular
Cell Cellular Process
CELLULAR Component Function
(CL) (GO)
COMPONENT (FMA, GO) (GO)
Molecule Molecular
Molecular Function
MOLECULE (ChEBI, SO, Process
(GO)
RnaO, PrO) (GO)
obofoundry.org
40. Genus-species definitions
System =def. an independent continuant
which is composed of interacting material
entities forming an integrated whole
Ecosystem =def. a system which includes
organisms and the site in which they live
as components
48
41. Biome =def. An ecosystem which contains
populations adapted to the environmental
conditions conserved over its spatial
extent.
Microbiome =def. A biome which contains
the totality of microscopic organisms, their
genetic elements, and interactions in a
given environment.
49
43. habitat
Habitat =def. An ecosystem which can
support the life of a given organism,
population, or community
Realized niche =def. An ecosystem which
is that part of a habitat which supports the
life of a given organism, population or
community
45. Hutchinsonion niche
(niche as volume in a functionally
defined hyperspace)
=def. an n-dimensional hyper-volume
whose dimensions correspond to resource
gradients over which species are
distributed
– degree of slope, exposure to sunlight, soil
fertility, foliage density, salinity...