This document discusses building an open biodiversity knowledge management system that can extract, store, and serve information on taxa in an interoperable way across kingdoms of life. It notes that the legacy literature contains over 200 million pages but data is incomplete and disconnected. The Pro-iBiosphere project aims to demonstrate how to markup taxon treatments to make them accessible and linkable. Pilots have marked up over 1,500 treatments of plants, fungi, bryophytes, insects and spiders. The document recommends standardizing markup and applying it prospectively to enhance semantic interoperability of biodiversity data.
1. Interoperability of Taxon
Treatments
Donat Agosti
Plazi
Brussels, June 2, 2014
Supported by the European Commission through its FP7 research funding programme
2. The big question
What is the future of the biological world?
Imagine if we could:
…Predict community level dynamics of ecosystems at
scales from local to global, based on the ecology and
biology of all individual organisms
Harfoot, BIH2013, Rome, 2013
Hardisty, Nature 502, 171 (2013)
BUT: predictive ecology has substantial data needs
3. Biodiversity libraries
200,000,000+ printed pages
1,900,000 species described
20,000,000+ species treatments
17,000 new species per year
BUT: The data are hidden
Incomplete digitization
Publications are not
semantically enhanced
Collections are incomplete
Data is not linked
Most data are not open
4. Interoperability of taxa
Can we build a system (e.g. Open Biodiversity Knowledge
Management System) that includes a component that extracts,
stores and serves and serves information on taxa in a system that
is agnostic of Biota?
Traditionally Floras, Faunas, Mycotas are dealt with by different communities
5. Pro‐iBiosphere project is to develop a blue print of an Open
Knowledge Management System
It is not building a system
Pilots to demonstrate specific issues
interoperability of taxa
explore workflows to produce recommendations of «best»
practices
interoperability of infrastructures
registration of names
advanced publishing
Do not expect production level products
20. Journal of Hymenoptera Research
5170 specimens
4062 plottable specimens from
1138 unique locations
21. Interoperability of taxa
Can we build a system (e.g. Open Biodiversity Knowledge
Management System) that includes a component that extracts,
stores and serves and serves information on taxa in a system that
is agnostic of Biota?
Yes, we can.
23. Plazi
SRS
Digitization and Markup Workflow:
$$$$ ?
find scan «OCR» markup store
?
domain generic domain
Find the right mix of generic and domain specific solutions
25. Markup / data extraction strategies
Dedicated external services, bulk
Applications for individual contributor, small scale
Involve community / crowd / wikimedia
Ad hoc Web Services, individual
Mixed strategies
Combination with re‐publishing, small scale
Create market for treatments, large scale
26. Variation in status labels
Quality Control and Standardization
TaxStatus ctd. Total ctd
REVISED STATUS 10
s. str. 1
sp. n. 130
sp. nov. 4057
sp.n. 3
spec. nov. 34
stat. nov. 56
Status revised 9
subsp. nov. 26
var. nov. 80
(blank)
Grand Total 5965
TaxStatus Total
comb. nov. 246
G. N. 65
gen. nov. 19
gen.nov. 10
hybr. nov. «sp.nov.»
13
n sp 12
n. comb. 2
n. nom. 6
n. sp. 267
n. stat. 5
n. subg. 3
new combination 139
new species 651
NEW STATUS 114
nomen novum 6
nov. spec. 1
Standardize and apply in prospective publishing …
27. Standardization of markup
Formica rufa Linnaeus 1758: 426
Genus name year of pub.
Species
epithet page of
publicat
Name
Authority
Bibliographic reference
Treatment citation
28. Linking of treatment as an example for external links
Treatment
citation
Treatment
identifier
29. Conclusions
• Biodiversity literature is very rich in data
• BL has a basic structure (treatments) across all Biota
• Legacy literature should be strategically marked up
• Prospective literature should be semantically enhanced
• Markup tools exist and should be optimized
• Identifiers for treatments exist to link to treatments
30. Thank you very much!
Donat Agosti
Plazi
agosti@plazi.org