8. First impressions, meh…
• OMG it’s full of plants
• It’s all old stuff
• Where the $#@! are the articles?
9. “More hack, less yack”
“[to] be able to move some subset of the
world … from the leverage point of the
command line.”
Steven E. Jones The Emergence of the Digital Humanities
11. No articles? No problem!
• Data is available for download
• Also an API (and OAI-PMH, yuck!)
• So, let’s go find the articles…
12. Find articles - “simples”
Title Volume Page
Journal Volume
Start page
– end pageArticle
Extracting scientific articles from a large digital archive: BioStor and the
Biodiversity Heritage Library doi:10.1186/1471-2105-12-187
Mapping between
BHL and articles
26. Synthetic documents
S. Michael Machines as readers: A solution to the copyright problem
“we proposed to scan works digitally to extract their
intellectual content, and then generate by machine
synthetic works that capture this content … and
distribute them free of copyright”
27. Cited, linkable specimens
NMNH Vertebrate Zoology
Herpetology Collections
11194
CAS Herpetology Collection Catalog
MCZ Herpetology Collection
Herpetology Collection (University
of Kansas Biodiversity Research
Center)
9619
6720
5818
http://iphylo.blogspot.co.uk/2012/02/gbif-specimens-in-biostor-who-are-top.html
31. PubMed Central for biodiversity
• Taxonomic names
• Geographic localities
• Specimen codes
• Handle XML, PDF, OCR text
• Store facts as well as documents
32. Google figured out how to manage
abundance while every other media
company in the world was trying to
manufacture scarcity, and for that
we should be grateful.
Siva Vaidhyanathan The Googlization of everything
(and why we should worry)