The Bodleian Library is upgrading its digitization systems and workflow to accommodate larger digitization projects and increased production volumes. It is transitioning from a mixture of bespoke and legacy systems to the Goobi workflow management system running on a new dedicated server cluster and network infrastructure. The Goobi system has entered final testing and the new hardware is being built and tested. Key challenges have included developing consensus on metadata standards, integrating disparate systems, and ensuring adequate system performance and data transfer capabilities. Further integration work is still needed to fully replace all existing systems.
2. Background
oExisting long-running and very experienced digitisation studio.
oPrimarily low-volume, very high-quality work. Special collections material.
oSome project-funded larger scale projects, but not in the recent past.
3. Existing systems
A mixture of bespoke applications, and a diverse mix of technologies:
•MySQL
•MS Access
•VBA
•Perl
•PHP
•Python
•Windows batch files
•Imagemagick
•Shell scripts / cron
4. ‘Systems’ limitations
Physical hardware nearing end of lifetime.
Physical hardware performance inadequate for existing production volume.
Network limitations.
Commercially supported software at or past end of lifetime.
Bespoke or locally developed software past end of lifetime, and not suitable for incremental
upgrade and revision.
Lack of in-house resources to build a completely new workflow system from scratch.
Poor or non-existent documentation.
5. Project work and ‘mass’ digitisation
Newly funded major digitisation projects:
•Polonsky foundation: 500,000 images (3 years) – Greek & Hebrew manuscripts and incunabula.
•Chinese: 1,000,000 images.
Need to substantially increase production, while maintaining quality.
Existing systems already inadequate for current production levels.
6. Solution
Software workflow:
◦ Goobi – phased introduction. Phase 1: ‘large’ projects only, Phase 2: smaller commercial orders.
New hardware infrastructure:
◦ Dedicated server cluster (virtualised)
◦ Upgraded network infrastructure
◦ Custom built from the ground-up to support high-volume digitisation.
Repository:
◦ ‘Databank’
Delivery:
◦ Digital.Bodleian
◦ Viewer.Bodleian
7. Current State of Play
Software workflow:
◦ Goobi – Entering final testing phase, prior to roll-out.
New hardware infrastructure:
◦ Dedicated server cluster (virtualised on dedicated hardware) – In build and test.
◦ Upgraded network infrastructure – Nov. 2014 [move to a new building]
◦ Custom built from the ground-up to support high-volume digitisation.
Repository:
◦ ‘Databank’ – In production.
Delivery:
◦ Digital.Bodleian – ‘Soft’ launch, not in full public launch.
◦ Viewer.Bodleian – In production. Version 1.
8. Goobi workflow (1)
Create process
Insert UUID and export path [as process properties]
Order and check physical item
Photography
TIFF verification [JHOVE2]
Jpeg generation
Jpeg verification [JHOVE2]
QA
Jpeg2000 creation [Kakadu + Python]
10. Problems / Lessons learned
Metadata ‘ruleset’:
•Difficulties getting consensus from disparate groups of stake-holders, e.g. curators, and technical specialists.
•Information gathering / consultation time-consuming, and returns poor.
Systems integration:
•Difficulties integrating with elements of our own systems where no ‘out-of-the-box’ or standard solutions exist.
Systems performance:
•Networking bandwidth
•Server loads
•Working storage for ‘in-flight’ data.
•Efficient ‘pipe’ to final repository.
11. Ongoing problems / work remaining
Goobi only replaces part of our existing workflow.
Further development needed to integrate with on-line ordering, order/customer tracking, and
billing systems.
Further development needed to integrate with secure delivery mechanisms for commercial
orders.
Possible integration with other library systems and resources.