SlideShare a Scribd company logo
1 of 42
The Biodiversity Heritage Library: Liberating
                     the World’s Biodiversity Literature

Thomas Garnett                                     EOL Fellows March 2010
BHL- Why?

       The cited half-life of
       publications in taxonomy
       is longer than in any other
       scientific discipline
         -Macro-economic case for open access, Tom Moritz

       -Current taxonomic
       literature often relies on
       texts and specimens > 100
       years old.
      Levinus Vincent
      Elenchus tabularum, pinacothecarum, 1719
.
                                                    2
BHL – Why?

  The Taxonomic
    Impediment

“The taxonomic
impediment is a term
that describes the gaps
of knowledge in our
taxonomic system”
                                       - Darwin Declaration, 1998

Georges Louis Leclerc, comte de Buffon
Histoire naturelle : générale et particulière (Oiseaux), 1799-1808


                                                                     3
BHL Members: US/UK
•   Academy of Natural Science (Philadelphia, PA)
•   American Museum of Natural History (New York, NY)
•   California Academy of Science (San Francisco, CA)
•   The Field Museum (Chicago, IL)
•   Harvard University Botany Libraries (Cambridge, MA)
•   Harvard University, Ernst Mayr Library of the Museum of
    Comparative Zoology (Cambridge, MA)
•   Marine Biological Laboratory / Woods Hole Oceanographic
    Institution (Woods Hole, MA)
•   Missouri Botanical Garden (St. Louis, MO)
•   Natural History Museum (London, UK)
•   The New York Botanical Garden (New York, NY)
•   Royal Botanic Gardens, Kew (Richmond, UK)
•   Smithsonian Institution Libraries (Washington, DC)
BHL Members: BHL-Europe
•   Museum für Naturkunde - Leibniz-Institut   •   Stichting Nationaal Natuurhistorisch
    für Evolutions- und                            Museum, Naturalis
    Biodiversitätsforschung an der Humboldt-   •   National Botanic Garden of Belgium
    Universität zu Berlin                      •   Royal Museum for Central Africa,
•   Natural History Museum, UK                 •   Royal Belgian Institute of Natural
•   Narodni muzeum NMP CZ                          Sciences
•   Angewandte Informationstechnik             •   Bibliothèque nationale de France
    Forschungsgesellschaft mbH                 •   Museum national d’histoire naturelle
•   Freie Universität Berlin FUBBGBM           •   Consejo Superior de Investigaciones
•   Georg-August-Universität Göttingen             Cientificas
    Stiftung Öffentlichen Rechts               •   Università degli Studi di Firenze
•   Naturhistorisches Museum Wien              •   Royal Botanic Garden, Edinburgh
•   Hungarian Natural History Museum           •   Species 2000
•   Museum and Institute of Zoology, Polish    •   John Wiley & Sons limited
    Academy of Sciences
•   University of Copenhagen                   •   Helsingin yliopisto UH-Viikki
BHL Members: BHL-China
• Chinese Academy of Science – Institute of
  Botany
• Chinese Academy of Science – Institute of
  Zoology
• Chinese Academy of Science – Institute of
  Microbiology
• Chinese Academy Science - Institute of
  Oceanography
BHL is a Focused Program
•   Though BHL has is composed of libraries it
    has been a domain-specific program, not just a
    digital library project. It arose from and is
    responsive to the biodiversity community
    composed of the disciplines of taxonomy,
    systematics, evolutionary biology, ecology,
    conservation, and wildlife management. These
    are the primary audience.
Biomechanics
                                 Biochemistry    Biomagnetism
                                                                                        Core
                            Bioelectronics            Zoos   Radioecology
                     Bioacoustics
                                                                                         Supporting
                   Petrology                     Agricultural ecology Sedimentation
                                        Paleontology Biogeomorphology            Orogeny
           Geophysics                                                               Microscopy
                                    BioclimatologyForestry         Restoration
       Geochemistry          History of         Scientific drawing ecology               Taxidermy
      Stratigraphy           Natural sciences& illustration            Soil science         Vivariums,
                              Animal biochemistry         Aquaculture
                                                                                            terrariums,
   Geomicrobiology      Natural History – Animal culture Medical botany / zoology           aquariums
                        Terminology, Abbrv.                            Cyanobacteria
 Geomorphology                                                                                   Immunology
                       Specimen catalogs                          Natural History –
 Toponymy                     Ecophysiology                       Dictionaries & Encyclopedias animal
                                                                                              Wile
                                                                                              trade
 Physical geography Collection &                                       Natural History –             Virology
                      preservation
                                                                       Biographies           Environmental
  Mineralogy              Continental drift
                                                                    Natural History –        Policy
  Socio-cultural        Plate tectonics                             Directories
   Anthropology                    Oceanography                      Economic botany        Environmental
                           Plant Culture           Microbial ecology                        Management
                                                                     Geobiology
          Ethnology          History of discoveries, Seismology
                                                                                         Biophysics
                             Exploration & travelBioluminescence Hydrology
            Plant lore              Phenology       Atlases & Gazeteers               Cytology
                                        Wildlife conservation                          Genetics
              Melioration
                                          Coral Islands, Reefs & Atolls
                     Physical Anthropology                                Fluid dynamics
Topical terms                Crops and climate           Prehistoric archaeology                 Outliers
                                          Agricultural meteorology
derived from LCSH
Core Literature
                     Botany Plant conservation
           Phytogeography              Plant anatomy
                    Plant physiology Plant ecology
              Spermatophyta, Phanerogams             Cryptogams
            Biological diversity      Evolution
              Phylogenetic relationships Evolutionary genetics
                     Scientific voyages and expeditions
            Pre-Linnaean works         Linnaean works
       Biodiversity conservation Conservation biology
    Ecosystem management       Endangered species & ecosystems
       Extinction         Classification, Nomenclature
                           Biogeography
    Zoology/Botany--Morphology      Zoology/Botany--Anatomy
        Zoology/Botany--Embryology      Zoology/Botany--
      Reproduction Zoology/Botany--Geographical distribution
             Classification, systematics and taxonomy
          Zoology Invertebrates Chordates     Vertebrates
                           Animal Behavior
Stats: Now Online

• 70,630 volumes
• 26.4 million pages




                       Oldest book: Schöffer’s Herbarius, 1484.
What is the plan?
Digitize the core literature of biodiversity. Full works, not
bits & pieces.
Open Access: all content can be repurposed, reused,
reformatted.
Congruent: must fit in to a dynamic knowledge ecology.
Scan public domain biodiversity literature.
Negotiate rights to digitize copyrighted materials.
Ingest content digitized by others.
Provide interfaces & APIs for repository.
     GUIs
     Services for data mining & citation resolution
BHL Digital Preservation
•   Committed to long-term storage, curation,
    and preservation of digital text assets for
    the world-wide biodiversity community
•   BHL is a steward for this literature.
•   To keep this content available and open for
    the future requires careful organizational
    planning.
•   Preservation is both a technical and
    political/social process.
BHL Relationship with Non-Profit Journal
              Publishers

Opt in Copyright Model: The BHL works with professional societies and
   associations to integrate their publications into the BHL in a way that
   serves the societies’ missions and goals
BHL indexes the articles using Taxonomic Intelligence, thereby vastly
   increasing their usability.
Publishers’ content is embedded in the emerging knowledge ecology
   that is sweeping biology in this century .
73 Permission Agreements to date. More under negotiation.
Integration with gray literature in later phases of project.
Scanning = human work
Scan & Store: Internet Archive


                          Storage in Petaboxes




Scanning on Scribes
Referrers: 1 Jan 08 – 31 Jan 10




Jan 1, 2008 – Jan 31, 2010
Name Finding via TaxonFinder
SOAP response Name finding via TaxonFinder Submit Extract names
                                                  to NameBank
  Image from Scanner Converted to text OCR
                     via OC OCR OCR




         Name Finding in action
   with Taxonomic Intelligence…
OCR error rate for names only


Of the 3,003 names, 1,056 were incorrectly transcribed by OCR.


                                     Top OCR errors
                                      1   Insert Space   8       n->v
                35.16%                2    Omit Space    9       l->i
                                      3       e->c       10      r->i
                                      4       u->I       11      u->ii
                                      5       u->n       12      h->l
                                      6        i->l      13      h->ii
                                      7       c->e       14      e->o
Considerations

• Improving OCR software is out of scope
  – Google’s Tesseract is only viable open
    source option
  – Flurry of activity in 2006-2007, quiet since
• Rekeying is expensive given size of
  corpus
  – Will not scale
Name finding statistics

• 27.7 million pages scanned
• 70.4 million name strings found
• 56.2 million names verified with a
  NameBankID
• 1.4 million unique names with a
  NameBankID
• 3.3 million unique names *without* a
  NameBankID
  – This is where the interesting data live!!!
http://www.biodiversitylibrary.org/name/Physeter_catodon
PDF Generation Stats
Mandate for new development

• display / manage articles

• meet community demands for
  bibliography / citation management

• build from more open source tools
Development goals re: citations

• Create a repository for community-vetted
  taxonomic bibliographies.
• Ability to ingest, display, download, and
  index articles so that the BHL can operate
  as an article repository.
• Build from existing community of work
  around Drupal / Biblio.
  – In use by collaborators
http://www.citebank.org
http://citebank.org/search
http://citebank.org/node/47423
Services
•    OpenURL
    – Facilitate links to citations: protologues, articles, references
       • Documentation:
            http://www.biodiversitylibrary.org/openurlhelp.aspx
•    Names Service
    – Return all occurrences of a name throughout BHL digitized
       corpus
       • Documentation: http://bit.ly/2e6sg9
    – Access to 51million name strings using TaxonFinder
            – 1.4million unique names
    – Working out a strategy for obscure species
    – Algorithm improvements to detect nomenclatural & taxonomic
       acts
•    New API
Services: OpenURL




                                          http://www.biodiversitylibrary.org/openurl?
                         pid=title:3934&volume=14&issue=&spage=301&date=1879




http://www.tropicos.org/Name/1200408
Services: OpenURL Disambiguation

• Looking for:

• BHL returns:
Services: OpenURL Results
EOL Interfaces
   Taxonomic name finding enhancements
  – Nomenclatural acts in web services
  – Other algorithms / verification
• WoRMS data
• Improvement
  – Ranking results
  – Visualization
• LifeDesks
  – Bibliography sharing
  – Resolve to articles
Thank You Tom
We welcome your input and advice.
Tom Garnett
Biodiversity Heritage Library Program
  Director
garnettt@si.edu
202-633-2238

More Related Content

What's hot

The Social Daily Activity Correlation of Olive Baboon (Papio Anubis) in Gash...
The Social Daily Activity Correlation of Olive Baboon (Papio  Anubis) in Gash...The Social Daily Activity Correlation of Olive Baboon (Papio  Anubis) in Gash...
The Social Daily Activity Correlation of Olive Baboon (Papio Anubis) in Gash...Sryahwa Publications
 
Use of Radioactive Isotope in Tropical Fish feeding
Use of Radioactive Isotope in Tropical Fish feedingUse of Radioactive Isotope in Tropical Fish feeding
Use of Radioactive Isotope in Tropical Fish feedingJesutofunmi Osunlana
 
Jacques Benveniste - A TRUE LEGEND AMONG MYTHS
Jacques Benveniste - A TRUE LEGEND AMONG MYTHSJacques Benveniste - A TRUE LEGEND AMONG MYTHS
Jacques Benveniste - A TRUE LEGEND AMONG MYTHSAnton Fedorenko
 
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...ijtsrd
 
Building the Atlas of Living Australia
Building the Atlas of Living AustraliaBuilding the Atlas of Living Australia
Building the Atlas of Living AustraliaUniversity of Adelaide
 
Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...
Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...
Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...STEPS Centre
 
Shravan Shetty on Biology Careers
Shravan Shetty on Biology Careers Shravan Shetty on Biology Careers
Shravan Shetty on Biology Careers Shravan Shetty
 

What's hot (8)

The Social Daily Activity Correlation of Olive Baboon (Papio Anubis) in Gash...
The Social Daily Activity Correlation of Olive Baboon (Papio  Anubis) in Gash...The Social Daily Activity Correlation of Olive Baboon (Papio  Anubis) in Gash...
The Social Daily Activity Correlation of Olive Baboon (Papio Anubis) in Gash...
 
Use of Radioactive Isotope in Tropical Fish feeding
Use of Radioactive Isotope in Tropical Fish feedingUse of Radioactive Isotope in Tropical Fish feeding
Use of Radioactive Isotope in Tropical Fish feeding
 
Jacques Benveniste - A TRUE LEGEND AMONG MYTHS
Jacques Benveniste - A TRUE LEGEND AMONG MYTHSJacques Benveniste - A TRUE LEGEND AMONG MYTHS
Jacques Benveniste - A TRUE LEGEND AMONG MYTHS
 
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
 
Building the Atlas of Living Australia
Building the Atlas of Living AustraliaBuilding the Atlas of Living Australia
Building the Atlas of Living Australia
 
Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...
Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...
Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...
 
Jmb
Jmb  Jmb
Jmb
 
Shravan Shetty on Biology Careers
Shravan Shetty on Biology Careers Shravan Shetty on Biology Careers
Shravan Shetty on Biology Careers
 

Viewers also liked

Qwizdom - Hcf lcm qwizdom
Qwizdom  - Hcf lcm qwizdomQwizdom  - Hcf lcm qwizdom
Qwizdom - Hcf lcm qwizdomQwizdom UK
 
Qwizdom - Proportion
Qwizdom  - ProportionQwizdom  - Proportion
Qwizdom - ProportionQwizdom UK
 
Eol 2010 rapid response
Eol 2010 rapid responseEol 2010 rapid response
Eol 2010 rapid responseTanya Dewey
 
Qwizdom - Ratio qwizdom
Qwizdom  - Ratio qwizdomQwizdom  - Ratio qwizdom
Qwizdom - Ratio qwizdomQwizdom UK
 
#1 to Sell For 2008
#1 to Sell For 2008#1 to Sell For 2008
#1 to Sell For 2008AVentre
 
#1 to Sell For 2009
#1 to Sell For 2009#1 to Sell For 2009
#1 to Sell For 2009AVentre
 
Qwizdom - Higher indices qwizdom 10 r1
Qwizdom  - Higher indices qwizdom 10 r1Qwizdom  - Higher indices qwizdom 10 r1
Qwizdom - Higher indices qwizdom 10 r1Qwizdom UK
 
Qwizdom - Year 9 percentages
Qwizdom  - Year 9 percentagesQwizdom  - Year 9 percentages
Qwizdom - Year 9 percentagesQwizdom UK
 

Viewers also liked (9)

Ux General V8 0
Ux General V8 0Ux General V8 0
Ux General V8 0
 
Qwizdom - Hcf lcm qwizdom
Qwizdom  - Hcf lcm qwizdomQwizdom  - Hcf lcm qwizdom
Qwizdom - Hcf lcm qwizdom
 
Qwizdom - Proportion
Qwizdom  - ProportionQwizdom  - Proportion
Qwizdom - Proportion
 
Eol 2010 rapid response
Eol 2010 rapid responseEol 2010 rapid response
Eol 2010 rapid response
 
Qwizdom - Ratio qwizdom
Qwizdom  - Ratio qwizdomQwizdom  - Ratio qwizdom
Qwizdom - Ratio qwizdom
 
#1 to Sell For 2008
#1 to Sell For 2008#1 to Sell For 2008
#1 to Sell For 2008
 
#1 to Sell For 2009
#1 to Sell For 2009#1 to Sell For 2009
#1 to Sell For 2009
 
Qwizdom - Higher indices qwizdom 10 r1
Qwizdom  - Higher indices qwizdom 10 r1Qwizdom  - Higher indices qwizdom 10 r1
Qwizdom - Higher indices qwizdom 10 r1
 
Qwizdom - Year 9 percentages
Qwizdom  - Year 9 percentagesQwizdom  - Year 9 percentages
Qwizdom - Year 9 percentages
 

Similar to Biodiversity Heritage Library: Liberating World's Literature

Intro to aDNA and bioarchaeology
Intro to aDNA and bioarchaeologyIntro to aDNA and bioarchaeology
Intro to aDNA and bioarchaeologykwopschall
 
Archaeology its correlation with other subjects
Archaeology its correlation with other subjectsArchaeology its correlation with other subjects
Archaeology its correlation with other subjectsaghalyaG1
 
Conservation biology
Conservation biologyConservation biology
Conservation biologyQamar iqbal
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010tgarnett
 
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceQuentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceICZN
 
Bhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaborationBhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaborationtgarnett
 
Sbc174 evolution2014 week3
Sbc174 evolution2014 week3Sbc174 evolution2014 week3
Sbc174 evolution2014 week3Yannick Wurm
 
Living world in space and time
Living world in space and timeLiving world in space and time
Living world in space and timeSyedaFari2
 
introduction to biology .chapter 1 (1st year )
introduction to biology .chapter 1 (1st year )introduction to biology .chapter 1 (1st year )
introduction to biology .chapter 1 (1st year )SyedaFari2
 
Living world in space and time
Living world in space and timeLiving world in space and time
Living world in space and timeSyedaFari2
 
Evolution Natural Selection And Speciation 6371
Evolution Natural Selection And Speciation 6371Evolution Natural Selection And Speciation 6371
Evolution Natural Selection And Speciation 6371Geonyzl Alviola
 
Evolution, Natural Selection, and Speciation
Evolution, Natural Selection, and SpeciationEvolution, Natural Selection, and Speciation
Evolution, Natural Selection, and Speciationcgales
 
ZOO1-Branches of biology
ZOO1-Branches of biologyZOO1-Branches of biology
ZOO1-Branches of biologyLeizlAnnaMaria
 
Lec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_societyLec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_societyArvenz Gavino
 
Anphibian biology and husbandry
Anphibian biology and husbandryAnphibian biology and husbandry
Anphibian biology and husbandryandreafuentesarze
 
Lec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_societyLec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_societyArvenz Gavino
 
Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...
Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...
Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...Pat (JS) Heslop-Harrison
 

Similar to Biodiversity Heritage Library: Liberating World's Literature (20)

Intro to aDNA and bioarchaeology
Intro to aDNA and bioarchaeologyIntro to aDNA and bioarchaeology
Intro to aDNA and bioarchaeology
 
Archaeology its correlation with other subjects
Archaeology its correlation with other subjectsArchaeology its correlation with other subjects
Archaeology its correlation with other subjects
 
Conservation biology
Conservation biologyConservation biology
Conservation biology
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010
 
Evidence of Evolution
Evidence of EvolutionEvidence of Evolution
Evidence of Evolution
 
EOL Intro
EOL IntroEOL Intro
EOL Intro
 
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceQuentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
 
Lec 3 socio
Lec 3 socioLec 3 socio
Lec 3 socio
 
Bhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaborationBhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaboration
 
Sbc174 evolution2014 week3
Sbc174 evolution2014 week3Sbc174 evolution2014 week3
Sbc174 evolution2014 week3
 
Living world in space and time
Living world in space and timeLiving world in space and time
Living world in space and time
 
introduction to biology .chapter 1 (1st year )
introduction to biology .chapter 1 (1st year )introduction to biology .chapter 1 (1st year )
introduction to biology .chapter 1 (1st year )
 
Living world in space and time
Living world in space and timeLiving world in space and time
Living world in space and time
 
Evolution Natural Selection And Speciation 6371
Evolution Natural Selection And Speciation 6371Evolution Natural Selection And Speciation 6371
Evolution Natural Selection And Speciation 6371
 
Evolution, Natural Selection, and Speciation
Evolution, Natural Selection, and SpeciationEvolution, Natural Selection, and Speciation
Evolution, Natural Selection, and Speciation
 
ZOO1-Branches of biology
ZOO1-Branches of biologyZOO1-Branches of biology
ZOO1-Branches of biology
 
Lec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_societyLec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_society
 
Anphibian biology and husbandry
Anphibian biology and husbandryAnphibian biology and husbandry
Anphibian biology and husbandry
 
Lec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_societyLec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_society
 
Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...
Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...
Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...
 

Recently uploaded

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Recently uploaded (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

Biodiversity Heritage Library: Liberating World's Literature

  • 1. The Biodiversity Heritage Library: Liberating the World’s Biodiversity Literature Thomas Garnett EOL Fellows March 2010
  • 2. BHL- Why? The cited half-life of publications in taxonomy is longer than in any other scientific discipline -Macro-economic case for open access, Tom Moritz -Current taxonomic literature often relies on texts and specimens > 100 years old. Levinus Vincent Elenchus tabularum, pinacothecarum, 1719 . 2
  • 3. BHL – Why? The Taxonomic Impediment “The taxonomic impediment is a term that describes the gaps of knowledge in our taxonomic system” - Darwin Declaration, 1998 Georges Louis Leclerc, comte de Buffon Histoire naturelle : générale et particulière (Oiseaux), 1799-1808 3
  • 4.
  • 5. BHL Members: US/UK • Academy of Natural Science (Philadelphia, PA) • American Museum of Natural History (New York, NY) • California Academy of Science (San Francisco, CA) • The Field Museum (Chicago, IL) • Harvard University Botany Libraries (Cambridge, MA) • Harvard University, Ernst Mayr Library of the Museum of Comparative Zoology (Cambridge, MA) • Marine Biological Laboratory / Woods Hole Oceanographic Institution (Woods Hole, MA) • Missouri Botanical Garden (St. Louis, MO) • Natural History Museum (London, UK) • The New York Botanical Garden (New York, NY) • Royal Botanic Gardens, Kew (Richmond, UK) • Smithsonian Institution Libraries (Washington, DC)
  • 6. BHL Members: BHL-Europe • Museum für Naturkunde - Leibniz-Institut • Stichting Nationaal Natuurhistorisch für Evolutions- und Museum, Naturalis Biodiversitätsforschung an der Humboldt- • National Botanic Garden of Belgium Universität zu Berlin • Royal Museum for Central Africa, • Natural History Museum, UK • Royal Belgian Institute of Natural • Narodni muzeum NMP CZ Sciences • Angewandte Informationstechnik • Bibliothèque nationale de France Forschungsgesellschaft mbH • Museum national d’histoire naturelle • Freie Universität Berlin FUBBGBM • Consejo Superior de Investigaciones • Georg-August-Universität Göttingen Cientificas Stiftung Öffentlichen Rechts • Università degli Studi di Firenze • Naturhistorisches Museum Wien • Royal Botanic Garden, Edinburgh • Hungarian Natural History Museum • Species 2000 • Museum and Institute of Zoology, Polish • John Wiley & Sons limited Academy of Sciences • University of Copenhagen • Helsingin yliopisto UH-Viikki
  • 7. BHL Members: BHL-China • Chinese Academy of Science – Institute of Botany • Chinese Academy of Science – Institute of Zoology • Chinese Academy of Science – Institute of Microbiology • Chinese Academy Science - Institute of Oceanography
  • 8. BHL is a Focused Program • Though BHL has is composed of libraries it has been a domain-specific program, not just a digital library project. It arose from and is responsive to the biodiversity community composed of the disciplines of taxonomy, systematics, evolutionary biology, ecology, conservation, and wildlife management. These are the primary audience.
  • 9. Biomechanics Biochemistry Biomagnetism Core Bioelectronics Zoos Radioecology Bioacoustics Supporting Petrology Agricultural ecology Sedimentation Paleontology Biogeomorphology Orogeny Geophysics Microscopy BioclimatologyForestry Restoration Geochemistry History of Scientific drawing ecology Taxidermy Stratigraphy Natural sciences& illustration Soil science Vivariums, Animal biochemistry Aquaculture terrariums, Geomicrobiology Natural History – Animal culture Medical botany / zoology aquariums Terminology, Abbrv. Cyanobacteria Geomorphology Immunology Specimen catalogs Natural History – Toponymy Ecophysiology Dictionaries & Encyclopedias animal Wile trade Physical geography Collection & Natural History – Virology preservation Biographies Environmental Mineralogy Continental drift Natural History – Policy Socio-cultural Plate tectonics Directories Anthropology Oceanography Economic botany Environmental Plant Culture Microbial ecology Management Geobiology Ethnology History of discoveries, Seismology Biophysics Exploration & travelBioluminescence Hydrology Plant lore Phenology Atlases & Gazeteers Cytology Wildlife conservation Genetics Melioration Coral Islands, Reefs & Atolls Physical Anthropology Fluid dynamics Topical terms Crops and climate Prehistoric archaeology Outliers Agricultural meteorology derived from LCSH
  • 10. Core Literature Botany Plant conservation Phytogeography Plant anatomy Plant physiology Plant ecology Spermatophyta, Phanerogams Cryptogams Biological diversity Evolution Phylogenetic relationships Evolutionary genetics Scientific voyages and expeditions Pre-Linnaean works Linnaean works Biodiversity conservation Conservation biology Ecosystem management Endangered species & ecosystems Extinction Classification, Nomenclature Biogeography Zoology/Botany--Morphology Zoology/Botany--Anatomy Zoology/Botany--Embryology Zoology/Botany-- Reproduction Zoology/Botany--Geographical distribution Classification, systematics and taxonomy Zoology Invertebrates Chordates Vertebrates Animal Behavior
  • 11. Stats: Now Online • 70,630 volumes • 26.4 million pages Oldest book: Schöffer’s Herbarius, 1484.
  • 12. What is the plan? Digitize the core literature of biodiversity. Full works, not bits & pieces. Open Access: all content can be repurposed, reused, reformatted. Congruent: must fit in to a dynamic knowledge ecology. Scan public domain biodiversity literature. Negotiate rights to digitize copyrighted materials. Ingest content digitized by others. Provide interfaces & APIs for repository. GUIs Services for data mining & citation resolution
  • 13. BHL Digital Preservation • Committed to long-term storage, curation, and preservation of digital text assets for the world-wide biodiversity community • BHL is a steward for this literature. • To keep this content available and open for the future requires careful organizational planning. • Preservation is both a technical and political/social process.
  • 14. BHL Relationship with Non-Profit Journal Publishers Opt in Copyright Model: The BHL works with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals BHL indexes the articles using Taxonomic Intelligence, thereby vastly increasing their usability. Publishers’ content is embedded in the emerging knowledge ecology that is sweeping biology in this century . 73 Permission Agreements to date. More under negotiation. Integration with gray literature in later phases of project.
  • 16. Scan & Store: Internet Archive Storage in Petaboxes Scanning on Scribes
  • 17.
  • 18. Referrers: 1 Jan 08 – 31 Jan 10 Jan 1, 2008 – Jan 31, 2010
  • 19.
  • 20. Name Finding via TaxonFinder
  • 21. SOAP response Name finding via TaxonFinder Submit Extract names to NameBank Image from Scanner Converted to text OCR via OC OCR OCR Name Finding in action with Taxonomic Intelligence…
  • 22. OCR error rate for names only Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. Top OCR errors 1 Insert Space 8 n->v 35.16% 2 Omit Space 9 l->i 3 e->c 10 r->i 4 u->I 11 u->ii 5 u->n 12 h->l 6 i->l 13 h->ii 7 c->e 14 e->o
  • 23. Considerations • Improving OCR software is out of scope – Google’s Tesseract is only viable open source option – Flurry of activity in 2006-2007, quiet since • Rekeying is expensive given size of corpus – Will not scale
  • 24. Name finding statistics • 27.7 million pages scanned • 70.4 million name strings found • 56.2 million names verified with a NameBankID • 1.4 million unique names with a NameBankID • 3.3 million unique names *without* a NameBankID – This is where the interesting data live!!!
  • 26.
  • 27.
  • 28.
  • 29.
  • 31. Mandate for new development • display / manage articles • meet community demands for bibliography / citation management • build from more open source tools
  • 32. Development goals re: citations • Create a repository for community-vetted taxonomic bibliographies. • Ability to ingest, display, download, and index articles so that the BHL can operate as an article repository. • Build from existing community of work around Drupal / Biblio. – In use by collaborators
  • 36.
  • 37. Services • OpenURL – Facilitate links to citations: protologues, articles, references • Documentation: http://www.biodiversitylibrary.org/openurlhelp.aspx • Names Service – Return all occurrences of a name throughout BHL digitized corpus • Documentation: http://bit.ly/2e6sg9 – Access to 51million name strings using TaxonFinder – 1.4million unique names – Working out a strategy for obscure species – Algorithm improvements to detect nomenclatural & taxonomic acts • New API
  • 38. Services: OpenURL http://www.biodiversitylibrary.org/openurl? pid=title:3934&volume=14&issue=&spage=301&date=1879 http://www.tropicos.org/Name/1200408
  • 39. Services: OpenURL Disambiguation • Looking for: • BHL returns:
  • 41. EOL Interfaces Taxonomic name finding enhancements – Nomenclatural acts in web services – Other algorithms / verification • WoRMS data • Improvement – Ranking results – Visualization • LifeDesks – Bibliography sharing – Resolve to articles
  • 42. Thank You Tom We welcome your input and advice. Tom Garnett Biodiversity Heritage Library Program Director garnettt@si.edu 202-633-2238