Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

GBIF Checklist bank and the backbone

514 Aufrufe

Veröffentlicht am

Checklist datasets in GBIF and current issues and updates to the GBIF backbone taxonomy.

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

GBIF Checklist bank and the backbone

  1. 1. GBIF Checklist Bank Indexing & Backbone
  2. 2. Checklist Scope 1.846 datasets registered 18 million name records Plazi (1.131), Pensoft (178), CoL GSDs (156)
  3. 3. Denormalized Checklist
  4. 4. Normalized Checklist
  5. 5. Checklist Challenges • Highly relational taxonomic data, almost all records linked in tree & basionym • Wrong or missing records destroy dataset integrity, not just a single record! • Different to flat, unrelated occurrence records • Data Quality • broken referential integrity • bad names or placeholders (e.g. «Unallocated Family») • missing or unused controlled vcabularies, e.g. «art» for rank species • Name strings can be published in several ways • ScientificName • ScientificName + Authorship • Genus + SpeciesEpitheton + Rank + InfraspecificEpitheton + Authorship • Classifications can be published in several ways • Normalised via parentNameUsageID • Normalised via parentNameUsage • Denormalised via Kingdom,Phylum,Class,Order,Family,Genus
  6. 6. Checklist Indexing • Basic archive validation • unique ids • Checklist Normalizer • resolve relations • create implicit taxa from denormalised classification • interpret controlled vocabularies, e.g. rank • match to backbone • match to previous version to keep GBIF ids stable • Checklist Importer • Inserts data to PostgresDB and solr index for searches • Checklist Analyser • generate dataset metrics
  7. 7. Organizing Occurrences • GBIF needs a single, consistent taxonomy • for metrics, search, maps • considerable variation in higher taxa • synonymies can be very large • Catalog of Life is largest single source • ~90% of GBIF occurrence records (thanks to birds) • ~50% of GBIF occurrence names (35% in 2010) • GBIF needs to assemble a taxonomy • originally merged (noisy) names found 
 in occurrences. Resulted in lots of duplicates • improved by stitching together checklist datasets Cronquist classification Mimosaceae: 3,200 species Caesalpiniaceae: 2,000 species Fabaceae: 14,000 species “Modern” classification Fabaceae: 19,200 species Mimosoideae: 3,200 species Cæsalpinioideae: 2,000 species Faboideae: 14,000 species
  8. 8. Current Backbone Issues • Far too many accepted species (acc/syn) • Cactaceae: GBIF 12.062 (342 syn), TPL 2.233 (5.422 syn) + 5.500 unknown • Genus Weingartia: GBIF 129 (0 syn), TPL 8 (26 syn) + 68 unknown • Many accepted names based on the same basionym • Sulcorebutia breviflora Backeb. • Weingartia breviflora (Backeb.) Hentzschel & K.Augustin • No synonyms with different authors possible • Poa pubescens R.Br. synonym of Eragrostis pubescens (R.Br.) Steud. • Poa pubescens Lej. synonym of Poa pratensis L. • merged all names with exact same canonical name • list of known homonym genera (IRMNG) used to disambiguate between larger groups
  9. 9. Backbone Building • Overlay ordered sources • Start with Catalog of Life • Primary source defines status • Create new name if kingdom, canonical name & authorship do not exist in current nub • Ignore source name if … • not a major Linnean rank (infraspecifc ranks are included) • higher ranks above family (configurable per source) • status conflicts with already existing status • hybrid formula, cultivar, candidatus or placeholder names !!! Catalogue of Life Fauna Europaea GRIN Mammal Species World Observations Specimens 8000 Species Lists 10s of taxonomic resources Me
  10. 10. Backbone Assembling Animalia Archaea Bacteria Chromista Fungi Plantae Protozoa Viruses incertae sedis • Nub build starts with 8 kingdoms
  11. 11. Backbone Assembling Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L. • Catalog of Life is added • Defines higher classification Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L.
  12. 12. Backbone Assembling Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L. Cichorium Cichorium intybus L. • Missing genera are created • Tribe is ignored Asteraceae Cichorieae Lam & DC. [tribe] Cichorium intybus L.
  13. 13. Backbone Assembling Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L. Cichorium Linneaus Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clementi • Synonyms respect authors • Author match very loose • Existing genus author updated Plantae Asteraceae Cichorium Linneaus Cichorium intybus Linneaus = Cichorium balearicum Porta = Cichorium byzantinum Clem. = Cichorium byzantinum Clementi
  14. 14. Backbone Assembling Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L. Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Prefer authors from nomenclators Asteraceae Cichorium L. Cichorium byzantinum Clem.
  15. 15. Backbone Assembling Asteraceae Helianthus L. Helianthus anuus L. Agoseris Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Infraspecifics are included Asteraceae Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird
  16. 16. Backbone Assembling Asteraceae Helianthus L. Helianthus anuus L. Agoseris Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird Agoseris eastwoodiae Fedde Agoseris maritima E. Sheld. Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Other source treats them
 as species • Same canonical maritima allowed twice - author different Asteraceae Agoseris eastwoodiae Fedde Agoseris maritima E. Sheld.
  17. 17. Final Cleanup - Basionyms Asteraceae Helianthus L. Helianthus anuus L. Agoseris Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz = Agoseris eastwoodiae Fedde A. a. var. maritima (E. Sheld.) Baird = Agoseris maritima E. Sheld. Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Finally basionyms are detected • by terminal epithet & author within a family • Only 1 accepted per group • the most trusted first stays
  18. 18. Final Cleanup - Autonyms Asteraceae Helianthus L. Helianthus anuus L. Agoseris Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. apargioides A. a. var. eastwoodiae (Fedde) Munz = Agoseris eastwoodiae Fedde A. a. var. maritima (E. Sheld.) Baird = Agoseris maritima E. Sheld. Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Create missing autonyms
  19. 19. Backbone Building Rules • Create missing genus or species in classification • only for accepted taxa • Create missing autonyms for infraspecific • Detect basionyms based on terminal epithet & authorship • Assumes epithet & authorship in family is unique • Converts all but one accepted to synonyms • Flag taxa as doubtful • genus or higher taxon without any species (IRMNG) • species (or infrasp.) with a parent genus (or species) considered to be a synonym • moved to newly accepted genus (or species) • the case for potential children of synonymised basionym combination
  20. 20. Backbone Sources • GBIF Backbone Patch • Catalogue of Life • World Register of Marine Species • Dyntaxa - Svensk taxonomisk databas • GRIN Taxonomy • Fauna Europaea • Integrated Taxonomic Information System • Euro+Med Plantbase • Interim Register of Marine and Nonmarine Genera • The Clements Checklist • IOC World Bird Names • Mammal Species of the World • Paleobiology Database • Nomenclators • International Plant Names Index • Index Fungorum • ZooBank • Prokaryotic Nomenclature Up-to- date • ICTV Master Species List • Organisations • Species Files • Biodiversity Data Journal (Pensoft) • ZooKeys (Pensoft) • PhytoKeys (Pensoft) • Plazi ???
  21. 21. Backbone Matching • Occurrence • fuzzy name match • classification match • allow higher rank matches • Checklist • match kingdom • require straight canonical match • incl authorship comparison • no webservice yet, only embedded
  22. 22. NameUsageParsed Name Backbone Match Citation Dataset Metrics Verbatim Record Metrics Extensions • Checklists & Nub
 same structure • Parent-child hierarchy • normalized classification • flexible ranks • synonyms accepted rel. • Dataset metrics
 as timeseries • Basionym relation Schema
  23. 23. CLB Supported Extensions • Description: human paragraphs about some topic • Distribution: area ranges with statuses • Identifier: additional identifier for the record • Multimedia: image, video, sound • Literature references: bibliography • Occurrence (indexed via occurrence workflows) • Species Profile: extinct, marine, freshwater, terrestrial flags • Types and specimens: (overlaps with Occurrence) • Vernacular names: name with language & region http://rs.gbif.org/extension/gbif/1.0/
  24. 24. Normalizing Classifications

×