For his Michaelmas 2010 Arcadia fellowship, Ed Chamberlain investigated ways to speed up the digitisation process in academic libraries. He identified three problem areas and explored issues surrounding corresponding potential solutions, including automated book scanners and the Espresso print-on-demand machine. The seminar will recount his findings, and provide an opportunity to discuss how libraries can successfully interface with innovative technologies.
About the Speaker:
Ed Chamberlain works as Systems Development Librarian at Cambridge University Library
His library career so far has spanned three sectors, including Oxford, the London Library and the Natural History Museum. Here, Ed was involved in the early creation and development of online services based around digitised materials, including the Bio-Diversity Heritage Library mass-digitisation project. He has a BA in Politics from the University of East Anglia and an MA in Library and Information management at Loughborough University.
Ed took up his current position in 2007 and has taken a lead in the redevelopment of online services and systems supporting both electronic and print library resources. Ed's professional interests include all aspects of online library and information services, especially web design trends and underlying software architecture. He is also interested in new standards of metadata, including emerging semantic web based services and open publishing models for both data and content.
Please email your intent to attend to Michelle Heydon, mh569@cam.ac.uk
This talk is part of the Arcadia Project Seminars series.
Date: Tuesday 3rd May 2011
Time: 18:00-19:15 - refreshments from 17:45
Venue: Old Combination Room (OCR), Wolfson College
31. What can we copy? Estimations of University of Cambridge holdings within the public domain. R.Pollock 2009 657,361 19 3458,116 Total 0 0 2130,509 1970-2009 0 0 262,974 1960-1970 0 0 118,251 1950-1960 4,361 6 72,692 1940-1950 9,057 10 90,576 1930-1940 19,667 25 78,670 1920-1930 24,195 40 60,489 1910-1920 45,734 65 70,360 1900-1910 56,850 85 66,883 1890-1900 60,171 90 66,857 1880-1890 48,035 95 50,564 1870-1880 43,734 100 43,734 1860-1870 40,970 100 40,970 1850-1860 304,587 100 304,587 1400-1850 No. PD % PD Items Pub. Date
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
Editor's Notes
This is primarily a recap of my Michalemas 2010 fellowship. It covered a lot of ground, but I’ve tried to string tonight's talk into something coherent. If the fellowship had been based around a single question, It would have most likely have been this.
This was the answer … 61 pages worth of report … Worth stating I think that both the report and tonight's presentation are really about what we could do in Cambridge, its not about a choice of policy, that’s not really what the Arcadia project is about, I’ve used the UL as an example and hook for my investigation, but a lot of it is non-specific.
Check room for Librarians … Take a step back and remember why we are doing this stuff The everything online expectation is an interesting one. By their own admission, Google believe they have digitized 15 million works since 2004, what they believe to be about 12% of everything published. There is of course a difference between everything and what is needed or useful, but we do need to challenge this whilst also trying to meet it … Then we have digitization as preservation, ( rationale being by minimising access to physical items rather than replacing physical items with digital) – not currently covered under fair usage in the States. My oroject focused upon 1+2 as key drivers, especially in the context of Cambridge libraries and where we are right now.. Three is important, but a whole career onto itself.
To spend a term on something, I really should care ‘enough’ about it. I’ve worked on large scale digitsation projects before, and its great. You get a chance to completely rethink all aspects of a library service. But that seems a bit different to my experiences in Cambridge, which is still fairly traditional in its approach. And I feel that libraries in general are not fullfilling their tremendous potential here Also, lets identify and elephant, Cambridge never signed up with Google, which is proably as much in our favour as it is not Google grab headlines with their digitization efforts, but its their library partners who have to be trusted with the long term preservation of their work. Also, they’ve recently deleted their video archive. Is there an alternative ?I wondered how we could digitize usefully and sustainably without a google style partner. We are in this for the long term. Google may not be. Is a one site UL with its great 30 minute stack request sustainable for much longer?
Digitisation is focused on special collections Relatively slow, manual process done to exemplar standards Not scalable to the whole collection (at an effective cost) Restrictions on material
So here we have a library reader -they can request material and copies of individual images, and meanwhile, our digitisation proceeds one, and they can access the results, but the operations are not centered around them. What is stopping us growing our digitisation program?
Copyright legislation Cost / time Difficulty in reading on a screen
So, for three barriers, three solutions … Copyright legislation => Speed up / rationalise copyright analysis Cost /time => Explore automated book scanning People prefer to a book to a screen => Explore print on demand
So, in the spirit of the Arcadia fellowship, how can technology help us overcome this, what is out there right now that can I found three technologies, and imagined a digitsation workflow based around them. All three areas of focus were based upon existing developments from somewhat outside of the library sphere Chose a specific development based upon only upon personal awareness
Full or partial digitization of a work instead of a stack request initiated from a catalogue Straight to desktop in less than a day Order a bound print copy as an option If it’s a public domain work then made available for all, under Creative Commons License… In my mind, this is the untapped digital potential of any library service
Expectation of modern society - retail from tescos to, ebook purchasing I assumed it could be self sustaining - no large external donor needed Practically - Is a one-site UL sustainable forever? Can our exemplar sub 30 minute stack request really continue to work for us? Ragnathans’ laws … ‘ Every book its reader’ - let them choose digitsation rather than google, Microsfot Save the time of the reader – we can possibly expand this concept with digital material to present fewer barriers’ - if they want digital text - they should be able to get it!
Explore each area in turn … Visit case studies Assemble facts and figures where possible Draw out advantages and disadvantages of each piece of technology
1) Copyright and copyright calculation
Fiendish stuff 70 years after death is just the tip of the iceberg Complexity slows down decisions I would argue that complexity and risk upsets risk-averse Librarians – we are not hedge fund managers , we don’t like making controversial or bad decisions – especially not in high profile institutions like Cambridge. But we d know we can only fully digitise what is in the Public Domain
Copyright legislation as a set of rules into which data about a work is fed Out comes a result (yes/ no/ probably) Sounds like a job for a machine, rather than a person …
Open Knowledge Foundation - Public domain works project / Europeana Now exists as a machine accessible API Feed in bib data - get a results
Out of the 100 samples, 76 returned an expected result given the data available , 24 were judged as incorrect. Out of these, a further eight could have been correct if a publication date + 150 years safe cut-off date was assumed. Great technology to potentially assist in decision making As useful in asserting what is not in the public domain, as opposed to what is Data we can provide is incomplete for the whole task – author death date is incorrect,
Ion book saver Kirtas IA scribe machine DIT book scanner
Two in Cambridge at the press Used in the Cambridge libraries Collections project CUP let me take a look!
But with a human watching just in case … Cost saving? Still quicker than ‘by hand’
But images are also sent to India for a two week tidy-up Quick enough for on-demand?
Focus on improving access rather than preservation Would a preservation quality image be too expensive to produce for an on-demand approach? For the iPad and Kindle - text is as important as a scanned image
91% of Cambridge academics surveyed would be interested in a full text digital copy of an out-of copyright work 62% would be interested in a partial digital copy of an in-copyright work if available
Around 19% of CUL’s collections fall within the public domain Niche interest in this area - 2% of circulation transactions affected material from 1850 -1920 In fact, a lot of this has already been digitised - some 20% of our collection held digitally by Hathi Trust (OCLC). Getting hold of this is not as straightforward as is should be – I looked at licensing and terms and conditions in the report
Cheaper than current services … Total: £236.25 for 360 pages But still not that cheap… About £25 for a 350 page work (cost modelling based around the Kirtas)
Survey information reveals that academic users would prefer to pay under £ 1 5 for a digitised copy Achieving this at cost or with a small surplus would be a challenge. Attempting to recoup capital investment directly would push costs beyond this ‘sweet-spot’ price point Should they have to pay at all? “ I should add: I don't think the university should be charging academics for access to research materials. Seriously, the two questions about funds struck me as ridiculous.”
Final stage of the proposed workflow ….
Lots of interest in the machine - it gets people through the door, vital for any shop. Great for long tail, not really suitable for Da vinci code. Needs full time staff to run, occasionally needs fixing. Most staff in the store can find and print off a book for customers. Takes 5-15 minutes to print, depending on nature of digtisation (PASS AROUND EXAMPLES ) (Quality is variable - pass around examples) Strong interest in self publishing Increasing amounts of material available from a variety of sources (Project Gutenburg, Google Books, publishers) - both in and out of copyright
65% would also be interested in a print facsimile 42% of academic respondents would be willing to pay £ 1 0-£ 1 5, 33% £ 1 5- £ 2 5
£10 per 350 page volume Blackwells have a pricing model that does not recoup capital
High upfront cost – any model that attempts to recoup capital through charges prices itself out of market High upfront cost – High risk of failure ‘ Innovators dilemma’ - we are in effect in competition with our bread and butter services
Aiming to hit a moving target of user expectation – iPad and Kindle have radically reshaped reader expectations of online reading Danger of early adoption – not understanding or being aware of longer term issues (ejournals)
Demand is high Breakthrough technology – getting cheaper If I’m going to reach a final conclusion, its that we need to do something, here are some options, but to me they don't quite stack up yet.
Google continues to digitise, despite legal setbacks and gain the headlines Users continue to digitise themselves… Privately in research groups ‘ Socially’( http://library.nu/ and other academic torrent sites) Many in academia now chose to ignore or challenge inflexibilities of copyright to get the material they need
Remove barriers - Make it easier to get material people need for free for them (or cheaply) - Print or digital Lower costs – new approaches, new models of working The three new librarians—Andy Burkhardt, Champlain College, VT; Catherine Johnson, University of Baltimore; and Carissa Tomlinson, Towson University, MD—who presented on the “virtues” of next-gen librarians seemed to have an abundance of audacity, however, describing solutions to common problems, like how to engage first-year students in the library, with a little ingenuity. The next-gen virtues they identified include collegiality, playfulness, collaboration, flexibility, creativity, courage, and service-orientation, characteristics that must span the profession if we are to move our libraries ahead.