Slides from a presentation on the Knowledge Organization System (KOS) work program for GBIF. KOS developments for biodiversity information resources and input to the emerging Vocabulary Management Task Group (VoMaG).
Links
GBIF KOS prototype tools, http://kos.gbif.org/
Tool: Semantic Wiki prototype, http://terms.gbif.org/wiki/
Tool: ISOcat prototype demo, http://kos.gbif.org/isocat/
GBIF concept vocabulary term browser, http://kos.gbif.org/termbrowser/
GBIF Resources Repository, http://rs.gbif.org/terms/
GBIF Vocabulary Server, http://vocabularies.gbif.org/
GBIF Resources Browser, http://tools.gbif.org/resource-browser/
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).
1.
Knowledge Organization System for GBIF
Virtual Biodiversity Research and Access Network for Taxonomy (ViBRANT)
Dag Endresen
Knowledge Systems Engineer
Éamonn Ó Tuama
Senior Programme Officer, Inventory, Discovery, Access (IDA)
Global Biodiversity Information Facility (GBIF)
31 August 2012
2. Enabling interoperability for the GBIF
network and beyond GEO
BON,
“The ability of two or more systems or IPBE
S
components to exchange information and to
use the information that has been exchanged”
(ref: IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries, ISBN:155937079)
Key requirement:
Common exchange standards and protocols
for biodiversity data …
necessitate agreement on use of common
vocabularies for the classes of objects and
their properties
Knowledge Organisation Systems (KOS)
- can help us manage our vocabularies
3. Knowledge Organisation Systems
... to manage the vocabularies used for
sharing biodiversity information.
- Term lists: glossaries, dictionaries, gazetteers
- Classifications / categorisations: taxonomies
e.g., Dewey Decimal Classification
- Relationships: thesauri, ontologies
simple relationships a model of a domain
4. Darwin Core – a glossary of terms
higherClassification
coordinatePosition
specificEpithet
geodeticDatum
collectionCode
taxonConceptID
taxonRank
collectionCode: The name, acronym, coden, or initialism identifying the
collection or data set from which the record was derived. Examples:
"Mammals", "Hildebrandt", "eBird".
5. AgroVoc vocabulary – a thesaurus
bt Resources
nt Natural resources
nt Biological resources
nt Genetic resources
nt Germplasm
uf Genetic material
uf Germplasm resources
rt Protoplasm
bt = broader term
nt = narrower term rt Genes
uf = used for
rt Gene pools
rt = related term
rt Biodiversity
rt Germplasm collections
rt Gametes
http://aims.fao.org/standards/agrovoc/functionalities/hierarchy
6. Ontology – a model of a domain
collectors take samples
inverseOf
samples are taken by collectors William
Jefferson
Clinton
NHM, Los Angeles County sameAs
differentFrom Bill Clinton
NHM, London
transitiveProperty
A hasAncestor B B hasAncestor C
hasAncestor
Clinton image source: http://www.whitehouse.gov/sites/default/files/first-
family/masthead_image/42bc_header_sm.jpg?1250887359
ontologies = computable dictionaries
7. Term
versus
Concept
“The SKOS (simple knowledge organization system) format is designed to present
KOS data in a format that is suitable for machine inferencing and particularly for use
in the Semantic Web (….) The model [ISO 25964] is based on the understanding that
thesauri show the relationships between concepts – units of thought – and
distinguishes these from the terms that are used to label these concepts. These terms
may be in one or more languages, and one term per language is chosen as a
preferred term for each concept. One or more additional terms for the same concept
may be recorded in the thesaurus as non-preferred terms.”
Will, L. (2012). The ISO 25964 Data Model for the Structure of an Information Retrieval Thesaurus. Bulletin of the American
Society for Information Science and Technology 38(4): 48-51.
Dextre Clarke, S.G. and L. Zeng (2012). From ISO 2788 to ISO 25964: the evolution of thesaurus Standards towards
Interoperability and data modeling. ISQ Information Standards Quarterly 24(1): 20-26.
8. Knowledge Organisation Systems
Key requirement: a platform to support the
development, maintenance and governance of
vocabularies for the biodiversity community
- New dedicated position at GBIF funded through KOS
activi
external projects (ViBRANT, i4Life) in GB ties
IF wo
progr rk
- Review recommendations in KOS task group amm
report and develop implementation roadmap e
- Review GBIF Vocabularies Service and develop
vocabulary management system
- Engage with wider community:
- participation in Dublin Core workshop, Sept 2011
- KOS symposium at TDWG 2011 Conf, Oct 2011
- TDWG Vocabulary Management Task Group, 2012
9. ViBRANT: Task 4.1 Ontology platform (GBIF, JKI)
Description of work:
• “[F]lexible, user-friendly ontology management
environment, enabling users to create, define, extent
and share their own terms and concepts where
needed, providing options for discussions and
annotation, while supporting re-use of terms from
standardized ontologies wherever possible”.
• Extent the functionalities of existing vocabulary services (like
GBIF).
• Collaborative community interface for users and user-
networks, bottom-up, user-friendly and non-technical.
• Flexibility for biologists to express their knowledge regardless
of whether the terminology has been standardized yet or not.
Text from the ViBRANT project summary, page 13 (my highlighting).
9
10. ViBRANT WP4: GBIF tasks and deliverables
Deliverable
4.2:
Ontology
tools:
• “Develop
the
GBIF
ontology
tool
and
produce
an
equivalent
tool
based
on
a
seman<c
wiki.
Deliver
a
single
user
interface
for
ontology
crea<on
and
edi<ng
based
on
user-‐acceptance
of
the
alterna<ve
technologies.”
Text from the ViBRANT project summary, page 14 (my highlighting).
10
12. Why use a flat vocabulary ?
• Maximize the reuse of terms, focus on the definition
and labels for basic terms.
• Low threshold for non-technical biologists and
biodiversity domain experts to access terms and
contribute (compared to richer ontologies).
• Preferred technology: RDF (resource description
framework) and SKOS (simple knowledge organization
system).
• Construction and maintenance of OWL ontologies are
demanding in respect to expertise, effort and costs.
• Maintaining SKOS vocabularies are less demanding.
• RDF resources are designed to be easily extended.
• Ontologies (OWL) can be based on (extend) terms
declared by a RDF/SKOS vocabulary.
• SKOS became a W3C recommendation in 2009.
12
13. Why use OWL (web ontology language) ?
• OWL DL supports machine reasoning through machine
accessible formal semantics.
• OWL provides by default an URI as identifier for classes,
properties, relations and instances.
• E.g. OBO target practical solutions in the biomedical /
biology domain, while OWL is more generic and provide
cross-domain interoperability.
• OWL 1.0 became a W3C recommendation in 2004,
• OWL 2.0 in 2009.
• http://www.w3.org/2007/OWL/
• Recommendation:
• REUSE terms declared by flat vocabularies…
• Start with SKOS - then explore OWL…
13
14. Vocabulary management
1. Mint and maintain concepts and terms, in domain-
Wiki expert working groups.
Vocabulary 2. Release final version as a Concept Vocabulary.
3. REUSE terms from published concept vocabularies
Management and ontologies when designing new DwC-A
1 extensions & controlled value vocabularies. 4
4. Publish at the GBIF Resources Repository.
5. Browse at the GBIF Resources Browser. Resources
Repository
2
ISOcat Concept
Vocabulary Vocabulary GBIF 5
Management (rdf, skos) Resources
1 Browser
proposed
template
processor
DwC-A
extensions &
GBIF Vocabularies
Excel, text, etc… controlled
as a collaborative
Template for Evaluation of
collaborative
3 vocabularies
management tool for
Vocabularies Darwin Core Archive
1 management tools
http://kos.gbif.org/ extensions and controlled
GBIF
Vocabularies
vocabularies.
14
15. GBIF Vocabulary Server (Drupal)
Wiki
Vocabulary
Management
Concept
Vocabulary Resources
ISOcat
(rdf, skos)
Vocabulary
Management Repository GBIF IPT
MS Excel
Template for
Vocabularies
?
Evaluation of various tools for DwC-A
extensions & Scratchpads
collaborative management of
controlled
concept vocabularies (RDF).
vocabularies
GBIF Vocabularies GBIF
Vocabularies
GBIF Vocab Server is based on
as a collaborative Drupal 6 / Scratchpads (v1)
Darwin Core Archive
management tool for extensions and controlled
Darwin Core Archive value vocabularies --> Drupal 7/Scratchpads2
extensions and --> Drupal 8 ?
controlled value
Integration with Scratchpads2?
vocabularies.
Integration with the NPT?
15
16. Semantic wiki forum for terms
Wiki
Vocabulary
Management
Concept
Vocabulary
ISOcat
Vocabulary
(rdf, skos)
Resources
Management
MS Excel
Repository GBIF IPT
Template for
Vocabularies
Evaluation of various tools for
DwC-A
collaborative management of Scratchpads
concept vocabularies (RDF). extensions &
controlled
vocabularies
?
Wiki forum for terms as an
open community platform for
Wiki Forum description and maintenance
for Terms of existing terms.
Replacement tool also for the
GBIF Vocabulary Server?
16
17. GBIF Term browser
Wiki
Vocabulary
Management
Concept
Concept vocabularies
Vocabulary
ISOcat
Vocabulary
(rdf, skos)
Resources stored/deposited at
Management http://rs.gbif.org/terms/
MS Excel
Repository
Template for
Vocabularies
Evaluation of various tools for
collaborative management of
concept vocabularies (RDF).
The GBIF Term Browser
allows a user to browse for
terms defined in widely used
concept vocabularies such as
Darwin Core, Dublin Core,
FOAF, etc., including where
available, translations.
http://kos.gbif.org/termbrowser/
17
18. Biodiversity ontology management
REUSE terms from
Concept RDF vocabularies Ontologies
Vocabulary (rdf, owl)
(rdf, skos)
Evaluation of Evaluation of
tools for the biodiversity
development ontology
of biodiversity repository
ontologies. solutions.
Wiki tool inc. Resources
1 2
Ontology Repository
Management ?? (incl. ontologies?)
18
19. BioPortal ontology repository
Proposal: establish a biodiversity “slice” at the NCBO BioPortal.
• Loading biodiversity ontologies into the NCBO BioPortal promotes
mapping (and reuse of terms) between bio-medical and biodiversity
ontologies.
• An instance of the BioPortal software for biodiversity requires long-term
obligations to host and maintain the resource – does e.g. GBIF have the
resources to offer to host a BioPortal instance?
h=p://bioportal.bioontology.org/projects/168
19
20. GBIF KOS resources
Concept vocabularies (skos:conceptSchema, RDF)
• Darwin Core, Darwin Core “extensions”, NCD, GNA,
Audubon Core (and other vocabularies of concepts).
as a basis and foundation for
Software application schema (XML, XML schema)
• Darwin Core Archive (DwC-A) extensions and
controlled value vocabularies.
• Resources such as the DwC-A extensions and
controlled value vocabularies REUSE terms (URI)
from a vocabulary of terms.
20
21. Biodiversity KOS (based on Darwin Core)
Darwin Core (DwC) is a flat list of terms, expressed using RDF.
à DwC “extensions” (flat vocabularies for declaration of concepts).
à Reuse concepts from other vocabularies whenever possible.
Darwin Core Archive (DwC-A) has a star schema model.
• DwC-A core(s), extensions and controlled value vocabularies
• declared as XML lists of terms.
• DwC-A resources should REUSE terms from Darwin Core and other flat
concept vocabularies.
• New DwC-A core types (data types), eg. sample? Formalize class
entities (ontology). [Current types: Taxon & Occurrence]
à Formalize a governance structure for maintaining KOS resources
based on the principles established for Darwin Core (towards TDWG
VoMaG).
21
22. Darwin Core Archive (DwC-A)
v DwC-A publish DwC records including terms
from DwC-A extensions.
v Simple text based format.
v Zipped single file archive.
Germplasm.txt
22
23. Darwin Core Archive extension (XML term list)
23 http://rs.gbif.org/sandbox/extension/audubon.xml
24. Concept vocabulary (RDF/SKOS)
In progress:
XSLT -> HTML for
human readable
version.
24 http://rs.gbif.org/terms/geotime/geotimeConcept.rdf
25. GBIF Vocabulary Server
The GBIF Vocabulary
Server can assist a
user to create and
manage DwC-A
extensions or
controlled value
vocabularies.
However, it is not
designed to create
RDF/SKOS concept
vocabulary resources
with reusable
concepts.
edit interface It can export XML, but
not RDF.
It is based on
XML export Scratchpads (v1), aka.
Drupal v 6.
25
26. Global Names Architecture (GNA)
Many of the GNA term URI identifiers does
not resolve (404 not found).
The rowType identifiers simply resolve to
the software application schema (to the
DwC-A extension).
We propose to formalize the GNA concept
declarations using RDF/SKOS for
improved re-usability of the GNA terms
and concepts.
26
27. Global Names Architecture (GNA)
RDF/SKOS
XML
The Global Names Architecture (GNA) terms were originally simply declared
by the DwC-A extension. We propose to formalize the GNA concept
27 declarations using RDF/SKOS for improved re-usability of the GNA terms.
28. Global Names Architecture (GNA)
RDF/SKOS
We propose to formalize the GNA concept declarations using RDF/
SKOS for improved re-usability of the GNA terms.
28
30. Controlled value vocabularies
• Geological time periods
• chronostratigraphy
• magnetostratigraphy
• Species interactions
• saproxylic interactions
• pollinators
• Country codes
• Language
• Basis of record
• Taxonomic rank
• Nomenclatural status
• Life form
• Life stage
• …
30
32. Versioning
resources
Move outdated vocabularies to a
separated folder named “deprecated”? No
versions?
Will IPT be aware of this folder?
Note that previous DwC-A datasets could
be mapped to deprecated vocabulary
resources…!
33. Versioning
resources
Version the DwC-A vocabularies and extensions
using a [_DATE] postfix.
Could IPT be made aware of this postfix?
Note that previous DwC-A datasets could be
mapped to outdated vocabulary resources…!
34. Versioning
RDF
vocabularies
Move outdated vocabularies to a
subfolder named “archive/[DATE]”?
Same versioning model for extensions
and vocabularies…?
35. Versioning
RDF
vocabularies
Deprecated and outdated vocabularies and DwC-A resources could
declare their status, eg. using dcterms:isReplacedBy…?
Drawback: the XML document is required to be accessed and parsed to
read resource status.
36. Versioning
vocabulary
resources
• Separated folder named “deprecated”?
• Postfix using [_DATE]?
• Subfolder named “archive/[DATE]”?
• dcterms:isReplacedBy
• Other ideas, solutions?
38. TranslaTon
of
vocabulary
term
descripTons
Expert working groups or a
collaborative expert
community develop new
translations or refine previous
translations. The expert group
Export working file
provides their output as
format from the SKOS
a CSV file, XML data or
file (RDF/SKOS à
as a SKOS/RDF
CSV).
resource.
Archive (SKOS/RDF) Term translations
[DATE]/dwc_translations.rdf Translations for a given
(SKOS/RDF) vocabulary of terms are
dwc_translations.rdf maintained and published as a
SKOS/RDF file at the GBIF
Archive the translations each time Resources Repository
the “active” SKOS file is updated. (http://rs.gbif.org/terms/).
39. Example: master SKOS/RDF resource
en [
es
[
zh [
ja [
http://rs.gbif.org/terms/dwc/dwc_translations.rdf
40. Workflow
for
term
translaTon
dwc_translations_de.csv
dwc_translations_es.csv
dwc_translations_fr.csv
dwc_translations_jp.csv
XSLT
dwc_translations_ru.csv
dwc_translations_zh_Hans.csv
…
expert group
Term translations
(SKOS/RDF) XSLT split
dwc_translations.rdf and merge
cycle
dwc_translations_de.csv
dwc_translations_es.csv
dwc_translations_fr.csv (*)
XSLT
dwc_translations_jp.csv
dwc_translations_ru.csv
dwc_translations_zh_Hans.csv
… dwc_translations_fr.csv (*)
dwc_translations_pt.csv (**) updated
Adding new term translations or updating previous term translations always
starts and ends with the “active” SKOS/RDF resource for translations.
(*) Updated CSV files with translations simply replace extracted previous translations – in the XSLT split and merge cycle.
(**) Adding translations to a new language simply by adding the CSV resource into the XSLT cycle.
41. New data types?
Genomic level observations
Ecological measurements
associated with observations
- complement, not duplicate work A roa
dmap
deve
- GBIF as premier gateway to loped
Q1 20 by
discovery, access 13
- gen
o
- eco mic data
logica
l data
42. Metadata
Essential for discovery and
access to new data types
The GBIF metadata catalogue
system allows interoperability
across distributed metadata
repositories
http://metadata.gbif.org
The challenge ahead ...
populating the catalogue
with high quality, complete
metadata
43. GBIF KOS work-program
Some
suggested
next
steps
• GBIF
Resources
Repository
(h=p://rs.gbif.org/)
• Further
development
of
new
DwC-‐A
extensions
and
controlled
value
vocabularies.
• Workflow
for
the
translaTon
of
term
descripTons.
• ConTnue
the
evaluaTon
of
collaboraTve
tools
for
management
of
flat
vocabularies
of
terms
(RDF/SKOS).
• SemanTc
Wiki,
ISOcat,
Protégé
(web-‐protégé),
…
• New
semanTc
Wiki
for
descripTon
of
terms
/
glossary
of
terms
/
community-‐
driven
discussion
forum
(with
JKI,
Gregor
Hagedorn).
• Discussion,
discovery
and
REUSE
of
exisTng
terms.
• NCBO
BioPortal
as
a
repository
for
biodiversity
ontologies.
• Will
GBIF
contribute
to
mint
new
biodiversity
ontologies?
• BFO
based
OWL
version
of
Darwin
Core…?
• KOS
governance
structure
developed
and
formalized
by
the
(TDWG)
Vocabulary
Management
Task
Group
(VoMaG).
• Roadmap
for
KOS
into
the
GBIF
infrastructure,
portal,
…!
43
44. Furthermore,
I think that
we need
persistent
identifiers!
Cato the Elder ended all his speeches in
the senate of Rome with: "Ceterum
autem censeo Carthaginem esse
delendam" (English: "Furthermore, I
think Carthage must be destroyed").
44