Prioritizing Digitization by Marc Holtman (City Archives Amsterdam), British Library Feb 23, 2010
1. Prioritizing digitization British Library Centre for Conservation, February 23 2010 The scanning on demand system of the Amsterdam City Archives
2. Projectleader for the Imagebank Projectleader development search and retrieval applications Projectleader digitization Started working at the Amsterdam Archives in 2001 Who am I? British Library Centre for Conservation, February 23 2010 Marc Holtman Current job Coordination all digitization projects Development workflow Development workflow tools Archiefbank Imagebank
3. Archiefbank online with more than 7 million scans and 15.000 registered users Image Bank online with 300.000 high end quality scans 2010 2001 – Developing of the Image Bank: building an application and digitizing of 25.000 photo’s, drawings and prints 2006 – Developing the Archiefbank : expanding of the inventories with integration of scans, Indexes, scanning on demand service and a workflow for large scale digitization 2003 – Developing an application for online inventories: all inventories, no scans 2000 - Start with digitization of highlights from the collections and three large genealogy sources History British Library Centre for Conservation, February 23 2010 Brief history of digitization at the Amsterdam Archives
4. From (relatively) small scale digitization to a “scan it all” approach And a spectacular growth of users on the website Trigger was an ongoing decline in visitors of our reading rooms Turning point in 2006 History British Library Centre for Conservation, February 23 2010 From small selections to large scale digitization 512.592 17.958 2006 224.050 25.014 2002 40.048 26.598 1998 27.738 1992 29.788 1988 24.027 1982 Website Reading rooms Year Visitors
5. Users expects to find everything digitally available …… when we have 20 miles of archives in our repositories Strategy British Library Centre for Conservation, February 23 2010 Everybody should be able to consult digitized documents 24 /7 online But where to start? And how to finance? After the realization of the online inventories users started to ask “ Where’s the button for the images?”
6. Strategy Q. How long does it take to scan it all? 1 feet = 2.000 scans Production = 10.000 scans a week A. 406 years Q. How many scans can be made from 20 miles of archives? A. 739.200.001 scans British Library Centre for Conservation, February 23 2010 The pessimistic math
7. It was clear we had to: Strategy British Library Centre for Conservation, February 23 2010 Rethink our policy in prioritizing digitization Rethink our financial principals on digitization Develop a workflow in which large scale and low costs are starting points Develop a user friendly web application
8. And started thinking about the documents the users need for their research Users only need a few documents, not everything that is being digitized We stopped thinking about the 20 miles of archives in our repositories British Library Centre for Conservation, February 23 2010 The user priorities The documents needed for your research should be the first documents to be digitized, not the last This asks for client-driven digitization Prioritizing digitization
9. The user doesn’t commit to anything by placing a request, but neither does the archive In principle all requests are honored, unless It can not be digitized for material reasons Copyright material Disclosure restrictions apply In the Archiefbank we let the user set priorities in digitization Prioritizing digitization British Library Centre for Conservation, February 23 2010 The user priorities All archive files can be requested for digitization via the online inventories
11. After digitization the originals can not be requested in the reading room anymore The scans in the scanning on request service are made for the purpose of archival research Not as a substitute for the originals Nevertheless, digitization does have a real conservation function Conservation of the originals remains our major concern Prioritizing digitization British Library Centre for Conservation, February 23 2010 The preservation side Damage or loss of the originals caused by use is ruled out
12. If the material is too fragile, or asks for complex restoration we cancel the request for digitization If necessary – and possible – our restoration employees perform small restorations All inventory nrs are checked before they are transported to the digitizer Basic rules: We perform small preservation tasks Prioritizing digitization The preservation side Removal of staples repackaging when necessary The sequence of the originals is not checked or altered We do not number the originals British Library Centre for Conservation, February 23 2010
14. Digital preservation: all scans are stored in a controlled e-repository environment (OAIS) Prioritizing digitization The preservation side British Library Centre for Conservation, February 23 2010
15. Hundreds to millions of scans in each project Purpose of digitization varies from accessibility to substitution of the originals Besides the selection made by users we scan on project basis Prioritizing digitization British Library Centre for Conservation, February 23 2010 Digitization projects Grants from (national) program, often on specific topics Cooperation with Amsterdam district councils and services
16. But: consulting the scans at our reading rooms is for free In the Netherlands free access to archives is legislated Users have to pay to get access to the scans But for reproductions you have to pay We regard reading and downloading of digitized archival documents via the web as delivery of reproductions Grants for digitization are not enough for realizing our vision Financing British Library Centre for Conservation, February 23 2010 The idea is that by buying scans the audience makes (part of the) financing of digitization possible
17. Customers think a low price is important This means that costs for producing and storing scans have to be as low as possible Archival research easily runs into the use of dozens to hundreds of documents The price of an ordinary copy in our reading room should be the benchmark 100 scans should not cost € 1000 The costs when purchasing scans online should be competitive with travel costs when visiting our reading room Financing British Library Centre for Conservation, February 23 2010 Pricing policy
18. Reducing incidental costs (production of scans): Digitization on al large scale only is possible when both incidental and structural costs are as low as possible Reducing structural costs (storage of scans): 1. Standardized and efficiently organized workflow Financing British Library Centre for Conservation, February 23 2010 Reducing costs 2. Choosing quality standards that fit the purpose of the scans 3. Filesizes as small as possible
19. Financing British Library Centre for Conservation, February 23 2010 Reducing costs 2. Choosing quality standards that fit the purpose of the scans In every project we choose a quality that fits the purpose of the digitizing Scanning a modern, printed book for means of accessibility is not the same as scanning of a vulnerable charter for preservation Price rates scanning, external partner 0,10 € “ Legibility”, auto-feed 0,20 – 0,40 € “ Legibility” 2 – 10 € High-end Price comparison scanning costs
20. Example of scan with a “legibility” standard of quality Financing British Library Centre for Conservation, February 23 2010 Reducing costs 2. Choosing quality standards that fit the purpose of the scans Is this scan ok for the purpose of doing archival research: yes Is this scan ok for the publication in an art book: no
21. 3. Filesizes as small as possible We use a combination of 1 and 3 Storage costs still are considerably high when producing large quantities of scans In order to bring structural costs down file size of the scans has to be as low as possible This can be achieved in three ways 1. Skimming on resolution 3. Using (lossless or lossy) compression on the files 2. Skimming on bit depth / amount of colors (only possible in formats like TIFF and PNG) Financing British Library Centre for Conservation, February 23 2010 Reducing costs
22. Financing British Library Centre for Conservation, February 23 2010 Reducing costs 3. Filesizes as small as possible Storage of 500.000 images Avg size per scan uncompressed = 22,1 MB Price rate : 1 TB, storage in a controlled e-repository environment on two separate locations, including IT costs € 3.500 (NLD, jan 2010) € 210.000 € 4.340 € 38.500 € 380.500 Costs 10 years € 21.000 € 434 € 3.850 € 38.500 Costs 1 year 6 TB 124 GB 1,1 TB 11 TB Storage JPEG 2000 (part 1) JPEG 4 (200 dpi) JPEG 10 Tiff uncompressed Fileformat
23. Also, digitization simply is a powerfull way to fulfill our mission: making our archives accessible What we win by digitization is more than what we can simply measure in euro’s as income For example, after digitizing logistics and physical reading room with climate control and security isn’t necessary anymore for these documents when requested What should we put in and what not? Calculating real costs and income is difficult Financing British Library Centre for Conservation, February 23 2010 Costs and income Archiefbank € 50.000 Webservices € 390.000 Total € 200,000 Digitization projects € 140,000 Digitization on request Costs Archiefbank (2009) € 90,000 Government (digitization) € 390.000 Total € 200,000 Project funding € 100,000 Sales of scans Income Archiefbank (2009)
24. Conclusion in our framework is that the scanning on request service is financially feasible Financing British Library Centre for Conservation, February 23 2010 Costs and income Archiefbank € 50.000 Webservices € 390.000 Total € 200,000 Digitization projects € 140,000 Digitization on request Costs Archiefbank (2009) € 90,000 Government (digitization) € 390.000 Total € 200,000 Project funding € 100,000 Sales of scans Income Archiefbank (2009)
25. Goals of digitization projects vary from access to substitution of the originals In every project quality standard and method are set, depending on purpose and type of material We always work on project basis Every type of document can be digitized in this workflow We developed a standardized workflow for all digitization British Library Centre for Conservation, February 23 2010 Standardized workflow Workflow
26. Scanning is contracted out Identification of the file and assigning filenames by means of an order ticket Always scanning of complete inventory numbers Use of workflow tools for managing the originals and performing of checks on scans Workflow British Library Centre for Conservation, February 23 2010 workflow Principles
27. Workflow British Library Centre for Conservation, February 23 2010 Weekly schedule scanning on demand 1 Contact with customers 3 Coordination and administration 4 Returning the originals 6 Checking scans 6 Preparing the originals 4 Retrieving the originals Hours Task
28. Archiefbank British Library Centre for Conservation, February 23 2010 Demonstration of the Archiefbank More: http://www.slideshare.net/ktheimer