1. Digitisation at the Wellcome Library: Lessons learned & shared.
Historical Newspapers in the Digital Age, Bolzano
October, 2014 Dave Thompson Digital Curator, Wellcome Library
2. The Wellcome Library
•Part of Wellcome Collection, astonishing public venue in London developed by the Wellcome Trust. Where people can learn more about medicine through the ages & across cultures.
•More than 10,000 readers visit us each year, including historians, academics, students, health professionals & consumers, journalists, artists & members of the general public.
3. Digitisation in the Wellcome Library
•Strategic approach, conscious planned decisions.
•Library transformation strategy, physical to digital.
•From ‘project’ to ‘production’.
•Digitisation as a sustainable end-to-end process.
4. Overview – four IT systems…
1.Workflow management system – ‘Goobi’ = PRODUCTION.
2.Digital object repository – ‘Preservica’ = STORAGE.
3.Front end - ‘the player’ = ACCESS.
4.Temporary & permanent storage for content = 70tb
6. Digitisation: Image upload
Digitised images (Internally or externally digitised) are imported into Goobi & normalised to JPEG2000.
7. Digitisation: Upload, ftp, harvesting
ftp’d content can be automatically imported into Goobi & processed or IA content can be automatically harvested.
8. Digitisation: METS/ALTO for access
Content is OCR’d & METS /ALTO files are created in Goobi. Manual/automatic.
11. Or from a different perspective…
Goobi (METS/OCR)
Preservica
In-house
Institutions
Contractors
Harvesting
TIFF or JP2
TIFF or JP2
HD & ftp
TIFF or JP2
Normalises TIFF to JP2
Manual
Automatic
Jpylyzer validates JP2
Auto harvesting of JP2 & DMD
Grey literature
PDF
Project Managers / Ingest Officer
Project Managers
Ingest Officer / Digital Curator
Snagging
Snagging
12. Lesson 1 - Digitisation as a social activity
1.Digitisation is not a technical problem; it’s a social activity between creator & user.
2.Internally: Digitisation engages with all parts of the organisation, & draws of many different skills.
3.Externally: Engaging with (Between…?) creators & users, moving data into public realms, providing access.
http://www.emmanueladegbola.com/networking-leads/
13. Projects & workflows
1.Standardised processes to deal with differences in content & themes.
2.Use ‘projects’ & workflows to define activities & automated steps to handle material from transfer/acquisition to dissemination.
3.Projects & workflows allow us to manage our processes & to report activity.
http://www.amross.sd/
14. Standardised formats
1.Digitisation process built around a small number of formats.
2.Only accept – or create - TIFF or JPEG2000 image format for digitisation. MPEG2 for video.
3.Share our JPEG2000 profile with creators & validate images at point of processing.
4.Standardised metadata format(s) for discovery – MARC - & retrieval – ALTO/JSON.
http://blog.absolutvision.com/en/jpeg2000-format/
15. Lesson 2 – It’s a strategic issue
1.Given the scale & complexity clear strategic direction is essential.
2.Digitisation has to support an institutions users & their information needs.
3.Digitisation has to be a strategic decision supporting an institutions purpose.
4.Digitisation doesn’t change the mission of an organisation.
16. Industrialisation of processes
1.Digitisation built around a small number of formats. Workflows built around a small number of pre-defined steps.
2.Common workflow activities mean less system development, we can build our own processes.
3.Easier for humans to learn, less training, more certainty/reliability.
4.Industrialisation supports processes that are sustainable.
http://www.howtobeadad.com/2013/14723/unicorn-poop-how-i-fell-in- love-with-the-daughter-i-never-had
17. Lesson 3 – sustainability or bust
1.Digitisation has to be a sustainable process.
2.Processes have to be scalable to ambition.
3.Design, re-design & review processes constantly & integrate with existing services.
4.Digitisation as evolution, learn from what has been done, apply & move forward.
http://planetivy.com/gaming/25273/natural-selection-2-gaming-evolution-in-action/
18. Automation is key
1.Automation is essential to scalability & efficiency.
2.Within digitisation some activities very susceptible to automation. Automate them.
3.Automation standardises processes. Good for life cycle management of data.
4.Automated processes maximise investment in digitisation & support scalability.
http://www.technibble.com/automating-computer-business-for- profit/
19. Automated harvesting of IA content
Content processed automatically, including creation of METS & ALTO.
Goobi has a ‘repository’ of IA identifiers for searching/harvesting.
Goobi harvests data from Internet Archive website.
Content available in the player.
Content stored in Preservica.
DDS creates JSON for the player & pre- caches some content.
20.
21. Lesson 4: Nothing without imagination
1.The power of digitisation can only be revealed if we can imagine the uses the data can be put to.
2.Digitisation is not an exercise in technology for its own sake.
3.There is nothing that cannot be achieved, but it takes more than kit, tools, computers, software.
4.Digitisation is about engaging with creators & consumers, with the data & with the future.
22. Digitisation is not a separate activity
•Starts with alignment with the institutional mission.
•Builds on strategic vision.
•Digitisation as a strategic activity, planned & supported.
•Integrate all institutional systems, bibliographic, IT & human.
http://ocdindia.com/
23. Lesson 5 – The complete package
1.Digitisation is much more than sticking stuff under a camera or on a scanner.
2.Digitisation has to be developed as a whole & complete end-to-end process.
http://veritusgroup.com/how-to-create-a-dynamic-strategy-for- every-single-donor-a-step-by-step-process/
24. So, lessons learned
•Digitisation is a social activity.
•Digitisation as a planned strategic activity.
•Digitisation has to be a sustainable & scalable activity.
•Automation is key.
•Nothing without imagination.
•Digitisation has to be a complete package.