Diese Präsentation wurde erfolgreich gemeldet.

Nothing in taxonomy makes sense except in the light of Open Access

1

Teilen

Wird geladen in …3
×
8 von 68
8 von 68

Nothing in taxonomy makes sense except in the light of Open Access

1

Teilen

Herunterladen, um offline zu lesen

Beschreibung

lecture provided at the Systematicas Association Biennial meeting, Oxford, 28.8.2015 http://systass.org/biennial2015/

Transkript

  1. 1. Donat Agosti Plazi http://plazi.org Systematics Association Oxford, 28. August 2015 Nothing in taxonomy makes sense except in the light of Open Access
  2. 2. I want to be able at anytime, anywhere to access, mine and analyse a significant body of published and digitized taxonomic knowledge. I want to build by machine the catalogue of life. I hope taxonomiy communications arrives in the 21st century Vision and hope
  3. 3. 1. The demand Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present. Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only). 2004
  4. 4. 2. The corpus of taxonomic literature
  5. 5. Build and establish a TreatmentBank, such as Plazi, as basis for content mining of and linking to the taxonomic literature 3. The core corpus of taxonomic knowledge: Treatments
  6. 6. 4. Make use of the semantic linked WWW Avoid all the waistful actual publishing! • Publish structured data • Publish open access • Make taxonomic literature first class literature by minting DOIs and making digital copies accessible • Add links to names, treatments, articles, DNA sequences, digital objects • Help by building your own public corpus of citable data Pensoft journals (e.g. Biodiversity Data Journal, Zookeys, Phytokeys) are the gold standard.
  7. 7. Surfing or the seduction of science (for a young kid)
  8. 8. Surfing or the seduction of science (for a young kid)
  9. 9. Surfing or the seduction of science (for a young kid)
  10. 10. Surfing or the seduction of science (for an adult)
  11. 11. Get a copy of the Cyclothone paper Surfing or the seduction of science (for an adult)
  12. 12. Surfing or the imperative for science
  13. 13. Surfing or the imperative for science
  14. 14. Linking treatments and data with external resources NCBI Surfing or the imperative for science
  15. 15. Establish Plazi as, or use Plazi to build TreatmentBank as source for content mining of the taxonomic literature TreatmentBank
  16. 16. What are the species in Amazonia? TreatmentBank
  17. 17. Countries (Region) Australia (Queensland) Export species materials citations (DwC)
  18. 18. Text mining tools: Visualization of treatment content Summary of content of 37 Zootaxa spider publications and 8 Biodiversity Data Journal. (Miller et al., 2015)
  19. 19. Pseudomyrmex ants and Vachellia ant-acacias are a classic example of mutualism in biology. allenii melanoceras ruddiae chiapensis collinsii cookii cornigera globulifera hindsii janzenii mayana sphaerocephala boopis flavicornis hesperius ita janzeni kuenckeli mixtecus nigrocinctus nigropilosus opaciceps particeps peperi reconditus satanicus simulans spinicola subtilissimus veneficus ferrugineus gentlei gracilis Transbiotic link network Associated species linked through references in taxonomic treatments Acacia-ant species: Pseudomyrmex gracili Treatment: redescription Associated ant-acacia: Acacia gentlei Ants Plants Photocredits: Alex Wild Treatment Treatments linked through citations Text mining tools: Visualization of treatment content
  20. 20. What does this mean? The Linking Open Data cloud diagram Linked Open Data Cloud
  21. 21. The demand: scientists and citizen scientists Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present. Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only). Online catalogue Open access Online library
  22. 22. Online catalogue The interest of big science 2004 2005
  23. 23. The demand: scientists and citizen scientists
  24. 24. The scientific challenge: Bridging the gap 1 tnntttccca cgaataaata atataagatt ttgattatta cctccttctt taattttatt 61 attatcaaga agattagttt ataaaggagt aggaacagga tgaactgttt atcctccttt 121 atctaataat ttatatcata atggattttc aactgattta gcaatttttt ctttacatat 181 tgcaggaata tcatcaatta taggagcaat taattttatt tcaacaattt taaatataca 241 tcataaaaat ttatcattag ataaaattcc attgttagtt tgatcaattt taattacagc 301 tattttatta ttattatctt tacctgtatt agcaggtgca attactatat tattaactga 361 tcgaaatcta aatacaactt tttttgatcc ttcgggtgga ggagatccaa ttttatatca 421 acatttattt
  25. 25. Where do we stand?
  26. 26. The bristlemouths are a rapacious family of deep-sea fishes that include the wildly successful genus Cyclothone In contrast, ichthyologists put the likely figure for bristlemouths at hundreds of trillions — and perhaps quadrillions, or thousands of trillions.
  27. 27. The bristlemouths are a rapacious family of deep-sea fishes that include the wildly successful genus Cyclothone
  28. 28. Taxonomy? Source?
  29. 29. Issue USD 266.00 Article USD 48.00
  30. 30. Get a copy of the Cyclothone paper Our contribution for a better understanding of biodiversity
  31. 31. Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages. Source: (Agosti 2005) Access
  32. 32. • Limited access (copyright) • Limited discoverability of content • Research results cannot be cited • Data mining does not work Issues of access
  33. 33. Provide an open access, linked corpus of taxonomic literature A solution
  34. 34. Surfing at breakfast table
  35. 35. article treatment Cites httpURI cites (DOI) Scientific name https://www.wikidata.org/wiki/ Property:P1992 Feed Wikipedia with taxonomic data
  36. 36. Surfing or the imperative for science
  37. 37. Surfing or the imperative for science
  38. 38. Surfing or the imperative for science
  39. 39. LODPDF HNS H Surfing or the imperative for science: Use of name services
  40. 40. The goal
  41. 41. Create a citable open corpus of taxonomic publications
  42. 42. Biodiversity Literature Repository: Record
  43. 43. Biodiversity Literature Repository: RecordTreatment Illustration
  44. 44. http://plazi.org/wiki/Blue_ListPatterson et al., 2014: http://dx.doi.org/10.1186/1756-0500-7-79 Legal issues
  45. 45. Workflow Plazi SRS find scan «OCR» markup store + access
  46. 46. Text <tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margi to a sharp apical tooth, the apex parallel to the an (Holotype with material in mandibles, so mandibles a $ described below from paratypes.) Median clypeus .... </treatment> Semantisch erweiterter Text (TaxonX) … alternatives: From human to machine readable text RDF
  47. 47. Plazi tools: table extraction «Treatment» Wissenschaftliche Artname Verbreitungsnachweis Cataglyphis tartessica workers Variable mean ± SD Head length 11.23 ± 0.12 Head width 11.15 ± 0.12 Scape length 11.47 ± 0.12 Mesosoma length 11.94 ± 0.16 Femur length 12.03 ± 0.14 Cephalic index 0 93.60 ± 3.940 Scape index 128.10 ± 7.660
  48. 48. Plazi tools: discovering of scientific names
  49. 49. Plazi tools: discovering and parsing of bibliographic references
  50. 50. Plazi tools: discovering and parsing of observation data
  51. 51. Plazi tools: discovering of treatments
  52. 52. Treatment: a well defined part of an article that defines the particular usage of a scientific name by an authority at a given time (a page(s) in a publication). Treatment The special case taxonomic literature: The citated elements are treatments, not article Formica obsoleta Linnaeus, 1758: 580
  53. 53. Treatment
  54. 54. Original combinations Reference to an orginal combination Subsequent useages of names cite the referenced treatment What is a treatment?
  55. 55. Treatment and treatment reference and citation Treatmentcitation Treatment references
  56. 56. Treatment Citing of treatments or linking of treatments to treatments By minting persistent httpURIs for treatments, treatments can be cited like a bibliographic reference http://treatment.plazi.org/id/A9FFD1FC-4629-FFB4-968F-AD38386521BA
  57. 57. Status quo • 50,000+ treatments life, daily growth • RDF in Betaversion • GoldenGate Imagine (PDF and text mining tool) in betaversion • Provider for data for NCBI, Wikidata, GBIF, EOL, antweb • Biodiversity Literature Repository functional
  58. 58. Next steps • Collaborate with ContentMine to extract >50 treatments/day
  59. 59. Next steps Planned collaboration with ContentMine to extract treatments on a daly bases http://www.slideshare.net/petermurrayrust/? BioDiv
  60. 60. Next steps • Collaborate with ContentMine to extract 50 treatments/day • 1 Million treatments life • RDF Version accessibl • GoldenGate Imagine (Text mining tool) • Provider für Daten für NCBI, GBIF, EOL, antweb • Biodiversity Literature Repository mit 100,000 bibliographic references and digital copies (PDF, images, etc.)
  61. 61. Next steps BUT
  62. 62. Next steps Avoid all this waste (our next generation will have to clean up)! Publish structured data Publish open access Publish in journals with DOI Add links to names, treatments, articles, DNA sequences, digital objects Help build your own corpus of citable data Pensoft journals (e.g. Biodiversity Data Journal, Zookeys, Phytokeys) are the gold standard.
  63. 63. Thanks! Donat Agosti agosti@plazi.org Acknowledgment: Pensoft, Zenodo/CERN, NCBI, Wikidata, ContentMine

Beschreibung

lecture provided at the Systematicas Association Biennial meeting, Oxford, 28.8.2015 http://systass.org/biennial2015/

Transkript

  1. 1. Donat Agosti Plazi http://plazi.org Systematics Association Oxford, 28. August 2015 Nothing in taxonomy makes sense except in the light of Open Access
  2. 2. I want to be able at anytime, anywhere to access, mine and analyse a significant body of published and digitized taxonomic knowledge. I want to build by machine the catalogue of life. I hope taxonomiy communications arrives in the 21st century Vision and hope
  3. 3. 1. The demand Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present. Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only). 2004
  4. 4. 2. The corpus of taxonomic literature
  5. 5. Build and establish a TreatmentBank, such as Plazi, as basis for content mining of and linking to the taxonomic literature 3. The core corpus of taxonomic knowledge: Treatments
  6. 6. 4. Make use of the semantic linked WWW Avoid all the waistful actual publishing! • Publish structured data • Publish open access • Make taxonomic literature first class literature by minting DOIs and making digital copies accessible • Add links to names, treatments, articles, DNA sequences, digital objects • Help by building your own public corpus of citable data Pensoft journals (e.g. Biodiversity Data Journal, Zookeys, Phytokeys) are the gold standard.
  7. 7. Surfing or the seduction of science (for a young kid)
  8. 8. Surfing or the seduction of science (for a young kid)
  9. 9. Surfing or the seduction of science (for a young kid)
  10. 10. Surfing or the seduction of science (for an adult)
  11. 11. Get a copy of the Cyclothone paper Surfing or the seduction of science (for an adult)
  12. 12. Surfing or the imperative for science
  13. 13. Surfing or the imperative for science
  14. 14. Linking treatments and data with external resources NCBI Surfing or the imperative for science
  15. 15. Establish Plazi as, or use Plazi to build TreatmentBank as source for content mining of the taxonomic literature TreatmentBank
  16. 16. What are the species in Amazonia? TreatmentBank
  17. 17. Countries (Region) Australia (Queensland) Export species materials citations (DwC)
  18. 18. Text mining tools: Visualization of treatment content Summary of content of 37 Zootaxa spider publications and 8 Biodiversity Data Journal. (Miller et al., 2015)
  19. 19. Pseudomyrmex ants and Vachellia ant-acacias are a classic example of mutualism in biology. allenii melanoceras ruddiae chiapensis collinsii cookii cornigera globulifera hindsii janzenii mayana sphaerocephala boopis flavicornis hesperius ita janzeni kuenckeli mixtecus nigrocinctus nigropilosus opaciceps particeps peperi reconditus satanicus simulans spinicola subtilissimus veneficus ferrugineus gentlei gracilis Transbiotic link network Associated species linked through references in taxonomic treatments Acacia-ant species: Pseudomyrmex gracili Treatment: redescription Associated ant-acacia: Acacia gentlei Ants Plants Photocredits: Alex Wild Treatment Treatments linked through citations Text mining tools: Visualization of treatment content
  20. 20. What does this mean? The Linking Open Data cloud diagram Linked Open Data Cloud
  21. 21. The demand: scientists and citizen scientists Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present. Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only). Online catalogue Open access Online library
  22. 22. Online catalogue The interest of big science 2004 2005
  23. 23. The demand: scientists and citizen scientists
  24. 24. The scientific challenge: Bridging the gap 1 tnntttccca cgaataaata atataagatt ttgattatta cctccttctt taattttatt 61 attatcaaga agattagttt ataaaggagt aggaacagga tgaactgttt atcctccttt 121 atctaataat ttatatcata atggattttc aactgattta gcaatttttt ctttacatat 181 tgcaggaata tcatcaatta taggagcaat taattttatt tcaacaattt taaatataca 241 tcataaaaat ttatcattag ataaaattcc attgttagtt tgatcaattt taattacagc 301 tattttatta ttattatctt tacctgtatt agcaggtgca attactatat tattaactga 361 tcgaaatcta aatacaactt tttttgatcc ttcgggtgga ggagatccaa ttttatatca 421 acatttattt
  25. 25. Where do we stand?
  26. 26. The bristlemouths are a rapacious family of deep-sea fishes that include the wildly successful genus Cyclothone In contrast, ichthyologists put the likely figure for bristlemouths at hundreds of trillions — and perhaps quadrillions, or thousands of trillions.
  27. 27. The bristlemouths are a rapacious family of deep-sea fishes that include the wildly successful genus Cyclothone
  28. 28. Taxonomy? Source?
  29. 29. Issue USD 266.00 Article USD 48.00
  30. 30. Get a copy of the Cyclothone paper Our contribution for a better understanding of biodiversity
  31. 31. Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages. Source: (Agosti 2005) Access
  32. 32. • Limited access (copyright) • Limited discoverability of content • Research results cannot be cited • Data mining does not work Issues of access
  33. 33. Provide an open access, linked corpus of taxonomic literature A solution
  34. 34. Surfing at breakfast table
  35. 35. article treatment Cites httpURI cites (DOI) Scientific name https://www.wikidata.org/wiki/ Property:P1992 Feed Wikipedia with taxonomic data
  36. 36. Surfing or the imperative for science
  37. 37. Surfing or the imperative for science
  38. 38. Surfing or the imperative for science
  39. 39. LODPDF HNS H Surfing or the imperative for science: Use of name services
  40. 40. The goal
  41. 41. Create a citable open corpus of taxonomic publications
  42. 42. Biodiversity Literature Repository: Record
  43. 43. Biodiversity Literature Repository: RecordTreatment Illustration
  44. 44. http://plazi.org/wiki/Blue_ListPatterson et al., 2014: http://dx.doi.org/10.1186/1756-0500-7-79 Legal issues
  45. 45. Workflow Plazi SRS find scan «OCR» markup store + access
  46. 46. Text <tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margi to a sharp apical tooth, the apex parallel to the an (Holotype with material in mandibles, so mandibles a $ described below from paratypes.) Median clypeus .... </treatment> Semantisch erweiterter Text (TaxonX) … alternatives: From human to machine readable text RDF
  47. 47. Plazi tools: table extraction «Treatment» Wissenschaftliche Artname Verbreitungsnachweis Cataglyphis tartessica workers Variable mean ± SD Head length 11.23 ± 0.12 Head width 11.15 ± 0.12 Scape length 11.47 ± 0.12 Mesosoma length 11.94 ± 0.16 Femur length 12.03 ± 0.14 Cephalic index 0 93.60 ± 3.940 Scape index 128.10 ± 7.660
  48. 48. Plazi tools: discovering of scientific names
  49. 49. Plazi tools: discovering and parsing of bibliographic references
  50. 50. Plazi tools: discovering and parsing of observation data
  51. 51. Plazi tools: discovering of treatments
  52. 52. Treatment: a well defined part of an article that defines the particular usage of a scientific name by an authority at a given time (a page(s) in a publication). Treatment The special case taxonomic literature: The citated elements are treatments, not article Formica obsoleta Linnaeus, 1758: 580
  53. 53. Treatment
  54. 54. Original combinations Reference to an orginal combination Subsequent useages of names cite the referenced treatment What is a treatment?
  55. 55. Treatment and treatment reference and citation Treatmentcitation Treatment references
  56. 56. Treatment Citing of treatments or linking of treatments to treatments By minting persistent httpURIs for treatments, treatments can be cited like a bibliographic reference http://treatment.plazi.org/id/A9FFD1FC-4629-FFB4-968F-AD38386521BA
  57. 57. Status quo • 50,000+ treatments life, daily growth • RDF in Betaversion • GoldenGate Imagine (PDF and text mining tool) in betaversion • Provider for data for NCBI, Wikidata, GBIF, EOL, antweb • Biodiversity Literature Repository functional
  58. 58. Next steps • Collaborate with ContentMine to extract >50 treatments/day
  59. 59. Next steps Planned collaboration with ContentMine to extract treatments on a daly bases http://www.slideshare.net/petermurrayrust/? BioDiv
  60. 60. Next steps • Collaborate with ContentMine to extract 50 treatments/day • 1 Million treatments life • RDF Version accessibl • GoldenGate Imagine (Text mining tool) • Provider für Daten für NCBI, GBIF, EOL, antweb • Biodiversity Literature Repository mit 100,000 bibliographic references and digital copies (PDF, images, etc.)
  61. 61. Next steps BUT
  62. 62. Next steps Avoid all this waste (our next generation will have to clean up)! Publish structured data Publish open access Publish in journals with DOI Add links to names, treatments, articles, DNA sequences, digital objects Help build your own corpus of citable data Pensoft journals (e.g. Biodiversity Data Journal, Zookeys, Phytokeys) are the gold standard.
  63. 63. Thanks! Donat Agosti agosti@plazi.org Acknowledgment: Pensoft, Zenodo/CERN, NCBI, Wikidata, ContentMine

Weitere Verwandte Inhalte

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

×