This is a talk proposed by Gaurav Vaidya (University of Colorado Boulder), William Ulate (Missouri Botanical Garden), Robert Guralnick (University of Colorado Boulder), and Trish Rose-Sandler (Missouri Botanical Garden)
The Art of Life project is developing a metadata schema for describing and improving access to the natural history images contained in the 38 million pages digitized by the Biodiversity Heritage Library. These images paint a vibrant picture of Europeans’ first encounters with exotic plants and animals in the 17th and 18th centuries, drawn by some of the finest illustrators in the world. They also provide valuable documentation of when, where, and who first observed a species. We present our preliminary schema, which accommodates a variety of delivery systems, including Flickr and Wikimedia Commons, and varying levels of information detail.
All creatures great and small: metadata for biodiversity illustrations
1. All creatures great and small
:metadata for biodiversity
illustrations
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
2. What is Art of Life?
• Full title - The Art of Life: Data Mining and Crowdsourcing the
Identification and Description of Natural History Illustrations
from the Biodiversity Heritage Library (BHL)
• Grant given to Missouri Botanical Garden in St Louis
• Funded by National Endowment for the Humanities
• Runs May 2012-April 2014
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
3. What is BHL?
• A consortium of 14 natural history, botanical libraries and research
institutions
• An open access digital library for legacy biodiversity literature
• An open data repository of taxonomic names and bibliographic
information
• An increasingly global effort
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
4. DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
5. Why the need for Art of Life?
Problem statement – users want access to images,
access to images is limited to page by page scroll or
viewing selection of images in Flickr, not searchable
by image content (e.g. corn, zea mays)
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
6. Table of
contents
Page by
page
scroll
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
7. DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
8. 5 Primary Objectives of Art of Life
Objective 1: Define an appropriate metadata schema for natural history illustrations
Objective 2: Build software tools to automatically identify illustrations in the BHL corpus
Objective 3: Enhance existing tools to enable the initial sorting, viewing, and editing of these
identified visual resources.
Objective 4: Integrate tagging applications to enable a community of users to edit descriptive
metadata for the illustrations
Objective 5: Integrate the descriptive metadata generated by users back into BHL portal both for
access and preservation
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
9. DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
10. Current status of Art of Life
• Development of the algorithm is about 70% complete and will
be done by Jan 2013
• Draft schema available for public review
http://tinyurl.com/9hm7nsb
• Classifier tool – reusing an existing BHL tool developed by Joel
Richard called Macaw http://code.google.com/p/macaw-
book-metadata-tool/
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
11. Art of Life Schema
Needs to support three objectives:
(1) to enable the discovery, description and use of the
identified images by artists, biologists, humanities
scholars, librarians, and educators;
(2) to make BHL’s metadata and images available to other
platforms; and
(3) to import crowdsourced metadata generated in other
platforms back into BHL.
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
12. Schema landscape review
– VRA Core 4.0 (borrowed 9 elements)
– LIDO
– Darwin Core (borrowed 2 elements)
– Dublin Core
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
13. ART OF LIFE SCHEMA ELEMENTS red =required
Title
Type
Date
Copyright
Source
Agent
Subjects
Description
Inscription
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
14. Example of illustration described using Art of Life schema
Title Stictospiza formosa
Type Paintings
Date Publication: 1898
Agent Author: Arthur G. Butler (1844-1925)
Illustrator: F.W. Frohawk (1861-1946)
Description A pair of finches with green and yellow bodies resting on reeds
Subjects Scientific name: Amandava formosa (Latham, 1790)
Vernacular Name: Green Avadavat or Green Munia
Accepted Name: Amandava formosa (Latham, 1790)
Birds, finches
Inscriptions bottom center: Green Amaduvade Waxbill (Stictospiza formosa)
Source Butler, Arthur Gardiner. Foreign finches in captivity. Hull and London: Brumby and
Clarke, limited,1889 (2nd edition). This image comes from the Biodiversity Heritage
Library, and is available online at biodiversitylibrary.org/page/17195895
Rights Public domain
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
15. Thanks to Art of Life team!
Co-PIs:
Doug Holland and Trish Rose-Sandler, from Missouri Botanical Garden
Algorithm development:
Ed Bachta and Charlie Moad; from Indianapolis Museum of Art
Schema development:
Gaurav Vaidya and Robert Guralnick, from University of Colorado, Boulder
William Ulate, from Missouri Botanical Garden
Programming:
Mike Lichtenberg, Missouri Botanical Garden
Former PI for Art of Life
Chris Freeland, Washington University
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
16. Interested? Here’s how you can help
• We welcome your feedback on the schema! http://tinyurl.com/9hm7nsb
• If you know of scholars and users who would be interested in these types
of images and would be interested either in participating in our survey or
a brief focus groups about the schema please have them contact me
trish.rose-sandler@mobot.org
• Would love to talk with other folks about their experiences with
crowdsourcing of metadata, particularly if you’ve used flickr or Wikimedia
commons
DLF Forum 2012 Denver CO Trish Rose-Sandler, Missouri Botanical Garden Art of Life project
This project aims to develop software tools and a metadata schema for visual resources contained within the scanned literature made available through BHL digitization activities.
Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.” One of our primary audiences are taxonomists who use BHL to find the first occurrence of name for a species in the historic literature. Also to track how that name has changed over time. What began as an consortium in the US and UK it is now an increasingly global effort – there are now BHL nodes in China, Australia, Egypt, Europe, Brazil and soon to be AfricaThe BHL US/UK operation is made up of 6 fulltime staff located at both the Smithsonian Libraries and Missouri Botanical Garden as well as contributions from staff at the member institutions who allow a certain percentage of their staff time to work on BHL (comes out to a little over 16 FTEs from the 14 member institutions)Missouri Botanical Garden has been involved in the BHL both as a data contributor (our library has digitized books and journals for the project) and as the home for the technical development of the project
At this point we have a critical mass of content-We have over 57 thousand titles-108 thousand volumes-Almost 40 million pagesWe have a portal where we serve all of BHL content at biodiversitylibrary.orgAll of BHL data is open access andwe provide data in a variety of exportsWe encourage digital library aggregators to incorporate our records into their portals and library catalogs. We provide data in ways that it can be mined & recontextualized
Problem Statement- Art of Life evolved out of a need in the BHL that was expressed by our users. We had a critical mass of content online, BHL users knew there were amazing images within the BHL pages but there was no easy way to find them other than opening up a BHL book or volume and scrolling through page by page to find illustrations. There is no descriptive metadata attached to the illustration that would tell you the content of the image, date when they were created or who was involved in their creation.
We have created a BHL account in Flickr and pushed over 48,000 images so far but but this is all a very manual process that takes considerable staff time.
This is the Art of Life workflow diagram which identifies the 4 processes the illustrations will go through as they move through each stage of the workflowThe Extract stage is where BHL pages will be run through an algorithm to identify which pages contain illustrations, whether they be full plates or only a section of the pageAt the Classify stage, the pages with illustrations will be tagged by Art of Life staff as being one or several broad types such as drawing/painting, photograph, diagram and even map. For the Describe stage, the illustrations will be pushed into platforms such as Flickr and Wikimedia Commons where both the general public and specialists can describe them in much greater detail such as adding a title, creator, date (if different from date of publication), and subjectsIn the Share stage, the metadata for the illustrations will be reingested back into the BHL portal for searching there. And of course we want to be able to preserve any contributed metadata from external platforms. We also want to broaden the audience for these illustrations because we believe they have a wide appeal to artists, biologists, humanities scholars, particularly historians of science; librarians, education and outreach. Many of the audiences don’t know about BHL and won’t go to the BHL platform looking for the content so we want to push the illustrations out to environments where they already are: Encyclopedia of Life, ARTstor, and even ITunes.
A challenge for this project will be to identify the schema, or perhaps schemas, that can serve the metadata needs of a mix of audiences, asshown For example, an art historian reviewing an illustration may be interested in knowing the artist and geographic location where the work was created in orderto understand how the artist was influenced by his or her locality. A scientist, considering the same illustration, may be interested in knowing the species name and geographic distribution of the organism depicted in the illustration to compare the development of the species with related species from that area. Both have a need for the geographic metadata contained within the text, but from different perspectives.Since we wanted to push these illustrations out into other platforms for crowdsourcing the descriptions and then bring that metadata back into the BHL platform we needed a schema that would help guide users in what information to contribute and how to record it and also to create some consistency in those descriptions so they are easier to bring back to BHL Rather than inventing a new schema from scratch we really wanted to adopt an existing schema or schemas so that when we shared the described illustrations beyond the BHL, the metadata could easily interoperate with data in other systems .
We ended up choosing most of the schema elements from the VRA Core because its elements and attributes were mostly closely aligned with the types of information we felt were important to record. But also because its relationship of works linking to one or more images fit nicely with the BHL pages often containing one or more illustrations on a single page. The only thing the VRA Core lacked was a way to record an acceptedName and CommonName for a species. VRA Core has a subject attribute type of scientificName but Taxonomists would be interested in knowing the multiple names by which species are known. Darwin Core was able to fulfill this need and so we borrowed 2 elements from that schema