How to Troubleshoot Apps for the Modern Connected Worker
Antabif training
1. ANTABIF Training
getting your data online
Bruno Danis, Anton Van de Putte and Nabil Youdjou
Wednesday 26 October 11
2. Objectives
• familiarize with ANTABIF
• learn about architecture, functionalities
tools and standards we offer
• hands on exercises with dummy and *real*
data
• collect feedback on the fitness for use for
this community
Wednesday 26 October 11
3. On the Menu Today
• Background about ANTABIF
• Technical overview
• Standards, tools and resources
• Functionalities
• Future directions
• Hands on
Wednesday 26 October 11
5. Antarctic Treaty
« In order to promote international
cooperation in scientific investigation in
Antarctica, […], the Contracting Parties
agree that, to the greatest extent feasible and
practicable: […]
Scientific observations and results from
Antarctica shall be exchanged and made
freely available. »
Wednesday 26 October 11
6. SCAR-MarBIN & ANTABIF
• www.scarmarbin.be
• www.antabif.be or www.biodiversity.aq
• Core funding: BELSPO.be
• International Polar Year 2007/08
• Census of Antarctic Marine Life
• Ocean Biogeographic Information System
• Global Biodiversity Information Facility
Wednesday 26 October 11
7. General Philosophy
• Build an electronic ecosystem
• Offer free and open access to data and technology
• Expose all the (biodiversity) data and metadata, in
multiple contexts
• Remain community-driven, and collaborative
• Adopt strong standardization
• Work for science, conservation, management
Wednesday 26 October 11
9. Achievements
• The first RAMS
• Board of 60+ editors
• Feeds WoRMS, CoL and EoL
• 17,098 taxa (RAMS)
• Building a dynamic RAS
• 24,248 taxa (RAS)
Wednesday 26 October 11
18. Metadata
Information about datasets deteriorates over time!
Wednesday 26 October 11
19. Metadata
• preferred MD catalogue = Antarctic Master
Directory (subset of GCMD)
• standard = DIF (Data Interchange Format)
• used by the whole SCAR community
• crawled by Google, Scopus...
Wednesday 26 October 11
20. DarwinCore
"A vocabulary of words that biologists,
hackers, and citizen scientists use to broadly
describe the biodiversity of life on earth."
Wednesday 26 October 11
21. DarwinCore Archive
• Complete package of data
–One file
–Multiple files
• Text Files…
• Self-documenting
• Intended to be shared/distributed
Wednesday 26 October 11
22. DarwinCore Archive
Archives always have a ‘core’ data file
My_data.txt
The
core
data
file
is
a
text
file.
Wednesday 26 October 11
23. DarwinCore Archive
Archives always have a ‘core’ data file
My_data.txt
The
core
data
file
is
a
text
file.
Wednesday 26 October 11
24. DarwinCore Archive
Darwin Core Archive (two files)
meta.xml
describes
the
mappings
in
the
core
data
file
(species.txt)
Wednesday 26 October 11
25. DarwinCore Archive
Multiple extensions are available
Columns
in
extensions
are
mapped
to
Darwin
Core
using
the
meta.xml
file
Wednesday 26 October 11
26. DarwinCore Archive
Many extensions are available
h?p://rs.gbif.org/extension/
Wednesday 26 October 11
27. Spreadsheet templates
• Metadata - describe a database or other
data resource.
• Species Occurrence - store basic species
collections or observational data
• Species Checklists – recording and storing
simple annotated species checklists.
Wednesday 26 October 11
33. Spreadsheet processor
• web application: Excel spreadsheet to
DwC-A.
• Excel files contain data entry and GBIF
metadata profile.
• Worksheet supports publication of primary
biodiversity data
• Processor performs data validation and
transformation and returns a validated
DwC-A
Wednesday 26 October 11
35. DwC-A validator
• tests Darwin Core Archives
• validates the content against the known
extensions and terms registered within the
GBIF network for sharing biodiversity data.
Wednesday 26 October 11
37. IPT - Integrated Publishing Toolkit
• Publishing primary biodiversity data
• Resources
• Metadata
• Source Data (text, zip, SQL)
• Source Mappings
• Visibility
• Published Release
Wednesday 26 October 11
38. The Data Paper concept
• A scholarly journal publication whose primary purpose is to
describe a dataset or group of datasets, rather than to report a
research investigation.
• Benefits of the Data Paper
– Scholarly credit to Data Publishers
– Describe the data in structured human readable form
– Bring the existence of the data to the attention of the
scholarly community
Wednesday 26 October 11
41. Step-by-Step
• Complete metadata of a dataset using metadata editor in IPT
2.0.2
• Generate ‘Data Paper’ manuscript (menu: Manage Resource –
RTF Download)
• Submit the manuscript for possible publication in one of the
PenSoft publication (ZooKeys, PhytoKeys, BioRisks, NeoBiota).
• Revision (if any) is carried out using metadata editor in IPT 2.0.2
and manuscript re-submitted to PenSoft Open Journal System
Wednesday 26 October 11
42. Once paper is accepted
• Digital Object Identifier is assigned to the Data Paper
• Paper is published in (a) print format, (b) PDF format, (c)
semantically enhanced HTML, and (d) XML is archived in
PubMedCentral
• DoI of the Data Paper is linked with the Persistent Identifier
of the metadata document in the GBIF Registry
• Data Paper is indexed by Web of Knowledge (ISI),
PubMedCentral, Scopus, Zoological Record, Google Scholar,
CAB Abstracts, Directory of Open Access Journal (DOAJ),
EBSCO.
Wednesday 26 October 11
43. Important to consider
• Metadata is complete in all the respect
• All the claims are adequately substantiated
• Data described in ‘Data Paper’ is freely available at
the time of submission of the manuscript
Wednesday 26 October 11
44. ORC
• GBIF’s Online Resource Center
• Provides access to documents, best
practices, tools and links
• Wide thematic scope
• Different ways of accessing resources
• Enabling community contributions
• Different levels of resource access
• Multilanguage support
Wednesday 26 October 11
51. ipt. biodiversity.aq
• prepare and clean your data
• publish primary biodiversity data
• publish metadata
• push data and metadata to ANTABIF &
GBIF
• get a Data Paper
Wednesday 26 October 11
53. afg. biodiversity.aq
• (nice-looking) Identification aid
• Publication/sharing platform for customized
Field Guides
• High quality (useful) pictures
• Expert Descriptions
• Built dynamically from various sources
Wednesday 26 October 11
57. PIC
• polarcommons.org
• Emergency solution for orphan datasets
• Setup of a commons
• IT cloud
• Set of norms
• All polar data (IPY)
• Simple procedure!
Wednesday 26 October 11
60. Architecture
• A network of IPTs
• Enhanced data flow
• Community involved in data management
• Enhanced interoperability
• Optimization of research efforts/resources
• Integrative, connected science
• Factual, adaptative conservation
Wednesday 26 October 11
61. Challenges ahead
• Data intensive science
• Data deluge
• Digital divides
• Other data types and integration
• Orphan datasets
• Cultural change
Wednesday 26 October 11