1. What libraries can learn from Google –
and what they can do better
Günter Mühlberger
University Innsbruck Library
2. Agenda
• Introduction
• A story about digitisation
• The continuation of the story
• Some conclusions
3. Introduction
• Department for Digitisation and Digital Preservation
– Founded in 2002, 14 FTE, R&D and Digitisation Services
– Since 1998 coordinated several R&D EU projects in the digital library
domain
– Currently involved in several projects, e.g.: IMPACT (mass-digitisation
of textual material, text recognition and language technologies),
Prestoprime (long term preservation of audio-visual material), both
projects will set up a CoC
– Coordinator of the library network eBooks on Demand (EOD) with 30
member libraries in 13 countries: Digitisation on Demand service
• Several medium and large scale digitisation projects + respective
applications for searching, browsing, archiving
– Catalogue cards
– Newspapers and newspaper clippings
– Books and journals
• Our mission
– To make a valuable contribution to an up to date digital library
4. A short story
• January 2007
– Collection of 30.000 books from a monastery
“Servitenbibliothek” as present to the library
– No spare shelves at the library for such a collection since
a collection of German dissertations occupies the best
magazines
– Suggestion to get rid of the dissertations
– Decision to digitize first and than to throw them away
• During 2007
– Several experiments with document scanners, cutting of
the documents, workflows, etc.
5. Digitisation of dissertations
• 2008 – mid 2010
– Real production process with two parallel document scanners and
up to 70.000 pages per day, 50.000 pages as average
– Average of 2’ per dissertation (110 pages) including ALL steps in
the workflow
– Convincing scan quality: Tests show that OCR will be nearly perfect
– All extra pages (supplements, tables, etc.) are treated extra
– Single cutting of documents too time consuming
– Change of paper quality
• Summer 2010
– We have processed 216.000 dissertations with 24 mill. pages,
1800 shelf meters
– 400 GB image data (TIFF IV bitonal)
– Overall time invested: 8000 hours or 5 person years
– High quality industrial equipment for less than 50.000 EUR
– Tests for OCR processing the 24 mill. pages are encouraging
6. Continuation of the story
• How can we give access to this large collection?
– Copyright comes in
• Investigations on Austrian copyright
– We are allowed to scan for preservation purposes. O.k!
– We are allowed to store for preservation. O.k!
– We are allowed to print out a copy and use it instead of what we
had before we digitised everything. Hm!
– We are allowed to use this copy for interlibrary loan – but need to
get it back. Uups!
– We are not allowed to make them available to the public. O.k!
– We are not allowed to make them available to our researchers and
students at the university. Uups!
– We are not allowed to make them available to other libraries
owning the same dissertations. Pff!
– We are allowed to provide access on a handful of dedicated
computers at the library. Mmh!
7. Some more considerations
• “Making available” is a new kind of use
– Copying, distribution, translation, exhibiting, etc. are traditional use
forms and publisher contracts cover this kind of use
– In 2003 (following the EU Directive on Copyright from 2001) a new
kind of use was introduced: “making available”
– Since this is a new right “old” contracts (usually) do not cover this
right.
– The author is therefore the right holder, not the publisher.
– In some countries it is more complicated (e.g. Germany) but as a rule
of thumb most authors in Europe still have the right to decide by
whom, when and how their digitised work will be made available to
the public
• Dissertations
– Even simpler since no publishers or RROs are involved
– Dissertations were printed on behalf of the authors, never distributed
via the book market
8. Our approach to copyright
• Let’s the social Internet work for us
– Dissertations will be made available online, but only title page,
table of contents and abstract/introduction will be shown to
everyone
– Under discussion: Maybe also some more pages and search
snippets
– Readers will get the chance to write a short “Request”: I would
need this book for my scientific work, etc.
– Readers will be encouraged to contact potential right holders (“Do
the diligent search for us”)
• Registration mechanism
– A big displayer will appear: If you are the author or if you know the
author/right holder – please help us!
– Authors will need to register (personal coordinates), set some
options and confirm their statement
9. Authorisation
• Copyright options
– They may want to make a general statement: Open Access,
Creative Commons, All rights reserved
– A cooperation with authors organisation (RRO) will make sense
– Or they may want to make a specific statement: This library is
allowed to do that and that. Than it is a simple bilateral, non-
exclusive contract.
• How to identify the right holder?
– Digital signatures or eCards would make life much easier.
• Current plan:
– Author provides address.
– He receives a letter with a list of TAN codes which will be needed
for any action within the system.
– If he chooses to “reserve all rights” the data are transferred to the
RRO(s)
– Minimal risk remains but can be neglected
10. Our dream
• We hope
– That it becomes a “self-runner” where those who need the
information will convince those who have the rights to
provide free access – or at least provide some access rights
for libraries
– That authors will understand why it is so important that
libraries digitise current material and provide access to
everyone
– That users will understand that authors have rights
(copyright and personal rights) which need to be respected
– That RROs and publishers will understand that not everyone
is interested in “making money with books written 30 years
ago” but that many are also willing to support the idea of
open access
– That thousands and ten-thousands of authors and readers
will take part
11. What we can learn from Google
• Mission of Google is to organise the information universe based on
technological innovation
– Therefore books are highly important (they contain much better information
than websites)
– Digitisation of books was just one step towards the overall objective
• If you have a mission, do the first step first and afterwards sort out the
problems
– Organise the cheapest way to scan, build your own machines, workflow, etc.
– Make a reasonable compromise between quantity and quality
– Be innovative (take what is here but put it together in a new way)
• Convert problems into chances
– Rather sure that Google underestimated the impact of copyright
– Settlement was probably not foreseen from the very beginning, but now it is a
great business opportunity for them
– If it comes, it will allow them to make a lot of money
• Battle on books is won /lost in the 20th century not in the public
domain
– Who reads books from 19th, 18th or 17th century?
12. What libraries can do better
• Libraries also need to follow their mission: to preserve the
intellectual heritage of mankind and to provide free access
to everyone
– Google is not a library
– It does many things as if it were a library (and better), but it never
will become a library
– Preservation comprises analogue AND digital preservation (go hand
in hand)
• to digitise (collect) everything
– Libraries are collection holders, not Google or anyone else
– Digitisation (and everything what is connected) has to be part of
the daily business and not only of projects
– Digitisation should be twofold: on-demand AND via mass
digitisation (including cutting of documents and 20th century
material)
– A natural consequence is to also collect modern material in digital
format (right from the beginning, pre-press files)
13. What libraries can do better
• to cooperate among each other (nationally and internationally)
– Most libraries have the same books, even duplicates within an
institution
– Swedish books in Austria, German books in Sweden, etc.
– Open access material will no longer belong to one library, but to
everyone!
– Therefore it makes definitely sense to cut one book and store the
pages digitally and analogue (acid free box)
• to involve readers (and right holders)
– Libraries have a “natural authority” which needs to be exploited as a
market advantage
– Libraries are much nearer to authors and readers than anyone else, but
they need to give them the chance to express themselves
– They may be slow, old-fashioned and technologically not on the fore-
front but they are trustful organisations and are able to mobilise
thousands or even hundred thousands of users