Big Data in the Arts and Humanities: Stirling presentation
University of Montana Digitization at the National Archives
1. Case study: Digitizing BIA Letters at the National Archives Steve McCann Northwest Archivists Western Roundup Seattle, 2010
2. Project Overview Mission: Create a researcher’s database of BIA Letters Model: SWORP Collection (SW Oregon Research Project) “SWORP aims to repatriate these materials to the Native American Tribes.” http://nwda-db.wsulibs.wsu.edu/findaid/ark:/80444/xv14723 Focus: Native American History of Montana Estimated 1 – 2 million documents in BIA letters 1st Pass: Blackfeetdocuments, 1907 – 1939 Typed – possible OCR candidates Sponsors: Smithsonian, and both the NAS and Mansfield Library at UM
3. National Archives Digitization Policy Cameras & scanners allowed, but with ambient light only No Flash “At their very best, page images can provide an experience that is extremely close to the physical reality of the book.” Sutherland, Juliet. “A Mass Digitization Primer”. Library Trends. Vol. 57,No. 1. Summer 2008. pp. 17-23.
4. Effects of Light on Archival Materials Effects evident depending on material Anywhere from ~ 100 photo flashes to 108 & up (Schaeffer, 2001) Best Practice = “No flash photography, it is a distraction to users.” (Millerand Galbraith, 2010)
5. Designing a Portable Digitization Studio Equipment List: Camera: Canon Rebel XSI Lens: Canon EF-S 18-55 IS Tripod: Manfrotto 055X Pro with a center post that folds out horizontally Tripod Head: 484RC2 Mini Ball Laptop: Dell 131L Carrying Case: Pelican 1300 Bubble level that is attached to the hot shoe Seagate 500GB external hard drive Total cost: ~ $1,400
6. Workflow for Digitizing: v.1 Staffing: 2 Researchers Request Materials Set up equipment 1 captures & 1 turns pages Transfer images from camera to PC FTP images to library server
7. Workflow for Digitizing: v.2 Staffing: 2 Researchers Request Materials Set up equipment 1 captures & 1 turns pages Transfer images from PC to external hard drive Transfer images from camera to PC
8. Capabilities of Digital Cameras Canon Rebel XSI 3.39 Megapixels 24 bit RAW 15-16 MB ~ 2,200 x 1,500 pixels ~ 31 x 21 inches = 72 PPI ~ 120 hours ~ 15,000 captures = 125 captures per hour
10. Processing Materials Batch conversions: RAW to JPG RAW to TIFF (size ballooned x3) Automatic light levels & Sharpening “Ownership” established with banding once the images are placed on the web.
11. Final Product ABBYY FineReader OCR: Central Classified Files 1907-1939 Blackfeet Agency 054 National Archives and Records Administration Washington DC Lb I A" DEPARTMENT OF THE INTERIOR UNITED STATES INDIAN SERVICE BlaokfeotAeency, Browninc, Montana, January 25, 1916. / <l»„br />Co.i'aissloner of Indian Affairs, p* £ g o ! Washin(jton, D. 0. 'J i— 3 ? I Sir: 5 e * J X w ^r S Transmitted herewith is certified stenographic ___/ . transcript of the proceedings of the meeting of those members of the tribe who were opposed to the delegation headed "by Robert J. Hamilton, which meeting was held at the Agency on January HO, in accordance with Office tele-Tarn of the 12th instant. Very respectfully. Superintendent. 1GH7
13. Questions? References: Cox, Richard J. 2007. Machines in the archives: Technology and the coming transformation of archival reference. First Monday 12, no. 11:. Miller, Lisa, Galbraith, Steven K., and RLG Partnership Working Group on Streamlining Photography and Scanning. 2010. "Capture and Release": Digital Cameras in the Reading Room. Dublin, OH: OCLC Research. Rose, Steve, and Evison, Gillian. 2005. The Use of Personal Scanners and Digital Cameras within OULS Reading Rooms: Offering a Customer Focused Service for the 21st Century. Oxford University Library Services. Schaeffer, Terry T. 2001. Effects of light on materials in collections : data on photoflash and related sources. Los Angeles: Getty Conservation Institute. Sutherland, Juliet. 2008. A Mass Digitization Primer. Library Trends 57, no. 1:17-23.