The National Cancer Institute Thesaurus is described by its authors as "a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research" and which "exhibits ontology-like properties in its construction and use". We performed a qualitative analysis of the Thesaurus in order to assess its conformity with principles of good practice in terminology and ontology design.
MATERIALS AND METHODS:
We used both the on-line browsable version of the Thesaurus and its OWL-representation (version 04.08b, released on August 2, 2004), measuring each in light of the requirements put forward in relevant ISO terminology standards and in light of ontological principles advanced in the recent literature.
RESULTS:
We found many mistakes and inconsistencies with respect to the term-formation principles used, the underlying knowledge representation system, and missing or inappropriately assigned verbal and formal definitions.
CONCLUSION:
Version 04.08b of the NCI Thesaurus suffers from the same broad range of problems that have been observed in other biomedical terminologies. For its further development, we recommend the use of a more principled approach that allows the Thesaurus to be tested not just for internal consistency but also for its degree of correspondence to that part of reality which it is designed to represent.
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Ontology and the National Cancer Institute Thesaurus (2005)
1. 1
Ontology and the NCI
Thesaurus
Barry Smith
with thanks to Werner Ceusters and Louis Goldberg
2. 2
Ontology developments in Buffalo
Department of Philosophy: 8 full-time
ontologists
National Center for Ontological Research
(http://ncor.us)
NYS Center of Excellence in Bioinformatics &
Life Sciences
Werner Ceusters Referent Tracking Pilot EHR
3. 3
GO +
OBO
National Center for Biomedical Ontology
Berkeley Drosophila Genome Project
Cambridge University Department of Genetics
Mayo Clinic
University of Oregon Institute of Neuroscience
University of California San Francisco Medical
Center
University at Buffalo Department of Philosophy
http://ncbo.us
4. 4
A methodology for quality
assurance of ontologies
rules for ontology building based on two millennia of
philosophical research on classification and categorization
targets thus far in the biomedical domain:
– FMA
– SNOMED
– GALEN
– Gene Ontology
– UMLS Semantic Network
– ICF (International Classification of Functioning,
Disability and Health)
– ISO Terminology Standards
– HL7-RIM
6. 6
Ontologies of Reality vs.
Information Models
Data:
sequence, expression, genotype, structure
Data structures:
patterns, clusters, alignments, ...
UMLS-SN: amino acid sequence is_a idea or
concept
Swimming is healthy and has 8 letters
7. 7
New criteria for admission to OBO
(Open Biomedical Ontologies)
Library
Satisfaction of basic principles of ontology
design
Goal: to move beyond information retrieval
and statistical clustering to automatic
reasoning
8. 8
First Rule: Univocity
Terms should have the same meanings on
every occasion of use.
They should refer to the same kinds of
entities in reality
9. 9
Second Rule: Positivity
Complements of kinds are not themselves
kinds.
Terms such as ‘non-mammal’ or ‘non-
membrane’ or ‘other metalworker in New
Zealand’ do not designate genuine kinds
in reality.
10. 10
Third Rule: Objectivity
Which kinds exist is not a function of our
knowledge.
Terms such as ‘unknown’ or ‘unclassified’ or
‘unlocalized’ do not designate biological
natural kinds.
11. 11
Fourth Rule: Single Inheritance
No kind in a classificatory hierarchy
should have more than one is_a
parent on the immediate higher
level
12. 12
Basic ontological relations such as is_a and
part_of should be shared by all ontologies
thing
carblue thing
blue car
is_a1 is_a2
13. 13
Fifth Rule
Use common upper-level categories and
relations (is_a, part_of ...)
• with precise formal definitions for machine
purposes
• with equivalent natural language
definitions for human beings
14. 14
Sixth Rule: Intelligibility of
Definitions
The terms used in a definition should be simpler
(more intelligible) than the term to be defined
otherwise the definition provides no assistance
– to human understanding
– to machine processing
Definitions should be intuitively
meaningful (should not contradict
common sense)
15. 15
The National Cancer Institute
Thesaurus (NCIT)
part of OBO
but does not (yet) satisfy these principles
16. 16
NCIT
“a biomedical vocabulary that provides
consistent, unambiguous codes and
definitions for concepts used in cancer
research”
“exhibits ontology-like properties in its
construction and use”.
17. 17
Goals
to make use of current terminology “best practices”
to relate relevant concepts to one another in a
formal structure, so that computers as well as
humans can use the Thesaurus for a variety of
purposes, including the support of automatic
reasoning;
to speed the introduction of new concepts and
new relationships in response to the emerging
needs of basic researchers, clinical trials,
information services and other users.
18. 18
Formal Definitions
of 37,261 nodes, 33,720 were stipulated to
be primitive in the DL sense
Thus only a small portion of the NCIT
ontology can be used for purposes of
automatic classification and error-
checking.
20. 20
Disease Progression
Definition1
Cancer that continues to grow or spread.
Definition2
Increase in the size of a tumor or spread of
cancer in the body.
Definition3
The worsening of a disease over time. This
concept is most often used for chronic and
incurable diseases where the stage of the
disease is an important determinant of therapy
and prognosis.
21. 21
To make matters worse Disease
Progression has subclass:
Cancer Progression
Definition:
The worsening of a cancer over time. This
concept is most often used for incurable
cancers where the stage of the cancer is
an important determinant of therapy and
prognosis.
23. 23
Confuses definitions with
descriptions
Tuberculosis
Definition
A chronic, recurrent infection caused by the bacterium
Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost
any tissue or organ of the body with the lungs being the most
common site of infection. The clinical stages of TB are primary or
initial infection, latent or dormant infection, and recrudescent or
adult-type TB. Ninety to 95% of primary TB infections may go
unrecognized. Histopathologically, tissue lesions consist of
granulomas which usually undergo central caseation necrosis. Local
symptoms of TB vary according to the part affected; acute
symptoms include hectic fever, sweats, and emaciation; serious
complications include granulomatous erosion of pulmonary bronchi
associated with hemoptysis. If untreated, progressive TB may be
associated with a high degree of mortality. This infection is
frequently observed in immunocompromised individuals with AIDS
or a history of illicit IV drug use.
25. 25
Inherits ontological and terminological incoherence
from source vocabularies such as UMLS-SN
Conceptual Entities
Definition
An organizational header for concepts
representing mostly abstract entities.
Confuses use and mention (swimming is healthy
and has eight letters)
Includes as subtypes:
action, change, color, death, event, fluid,
injection, temperature
27. 27
and problematic synonyms
Anatomic Structure, System, or Substance ~ Anatomic
Structures and Systems
Does ‘anatomic’ apply only to structure or also to system
and substance?
Biological Function ~ Biological Process
some biological processes are the exercises of biological
functions
others (e.g. pathological processes) not
Genetic Abnormality ~ Molecular Abnormality (with
subtype: Molecular Genetic Abnormality) (definitions
not supplied)
28. 28
more problematic synonyms
Diseases and Disorders ~ Disease ~ Disorder
Definition1 for Disease:
A disease is any abnormal condition of the body or mind
that causes discomfort, dysfunction, or distress to the
person affected or those in contact with the person. ...
Definition2 for Disease
A definite pathologic process with a characteristic set of
signs and symptoms. ...
Condition ≠ Process
Definition2 contradicts NCIT’s own classification hierarchy
30. 30
Ontological problems
Abnormal Cell is a top-level class (thus not
subsumed by Cell
Cell is a subclass of Other Anatomic
Concept (so that cells themselves are
concepts)
Normal Cell is a subclass of Microanatomy.
31. 31
Next step
Alignment of OBO ontologies through a common
system of top-level categories in the OBO-UBO
(Upper Biomedical Ontology)
and through a common system of formally
defined relations in the OBO-RO (Relation
Ontology)
see “Relations in Biomedical Ontologies”, Genome
Biology Apr. 2005
Donnelly, M., Bittner, T. and Rosse, C. 2005. 'A
Formal Theory for Spatial Representation and
Reasoning in Biomedical Ontologies'. Artificial
Intelligence in Medicine
32. 32
is_a
A is_a B
Definition
For all x, t if x instance_of A at t then x
instance_of B at t
allows reliable cross-ontology inferences
from ‘abnormal cell’ to ‘cell’
33. 33
part_of
A part_of B
Definition
For all x, t if x instance_of A at t then there is some y, y
instance_of B at t and x part_of y
‘part_of’ is the instance-level part relation, e.g. between
this nucleus and this cell
The all-some structure of such definitions allows
cascading of inferences
(i) within ontologies
(ii) between ontologies
(iii) between ontologies and EHR repositories of
instance-data
34. 34
Cascading inferences
Whichever A you choose, its including B will be
included in some C, which will include as part
also the A with which you begin
The same principle applies to the other relations
located_at, transformation_of, derived_from
etc. in the OBO-RO
(UML treatment here very poor)
35. 35
NCIT as now constituted will block
such automatic reasoning
Neither Normal Cell nor Abnormal Cells are
Cells within the context of the NCIT
36. 36
Some consolations
NCIT is open source
NCIT has broad coverage
NCIT has some formal structure (DL)
NCIT has realized the errors of its ways
NCIT is much, much better than (for
example) the HL7-RIM
Hinweis der Redaktion
Problem example: ‘chromosome’ in Sequence Ontology and in Cell Component Ontology means different things Current solution: two distinct terms involved (qualified by respective namespace)
There is no species called ‘non-rabbit’
There is no biological species: unknown rabbit. See discussion below.