Maureen Pennock, Head of Digital Preservation, British Library
An overview of the challenges of preserving an ever-growing and complex set of digital collections and a presentation of the work of the Flashback project.
5. @BL_Labs @BL_DigiSchol @GLAM_labs #bldigital 5
Image credits: BBC_Micro.jpeg: Stuart Bradyderivative work: Ubcule (talk) - BBC_Micro.jpeg, Public Domain, https://commons.wikimedia.org/w/index.php?curid=11672213
6. @BL_Labs @BL_DigiSchol @GLAM_labs #bldigital
Challenges
• Fragility of Storage media
• Integrity & validation
• Proactive lifecycle management
6
7. @BL_Labs @BL_DigiSchol @GLAM_labs #bldigital
Digital preservation is not simply a technical challenge.
It requires an ongoing and typically recursive series of actions
and interventions throughout the lifecycle, to ensure
continued & reliable access to authentic digital objects, for as
long as they are deemed to be of value.
7
Whenever I give a talk about digital preservation, I more often than not start with this slide. Because our digital collections are at the heart of our national digital library, and at the heart of what we do in digital preservation.
Our collection is amazing. It’s so diverse. And it’s huge. We have several petabytes worth of collection content, comprised of millions and millions of unique files. These files contain content ranging from legacy demo discs and decades worth of conference proceedings, to geospatial datasets, electoral registers, digitised newspapers, manuscripts and books, ebooks, ejournals, sound archives, moving images, e-theses, even the UK web archive. The list goes on and on.
But how do we go about preserving all of this content? What risks must we mitigate in order to ensure persistent and long term access? How do we optimise workflows for efficient preservation at this type of scale? How are the needs of digital content different from analogue content? Who should be responsible for digital preservation? What’s the strategy? What’s the policy? What’s the system? What are the challenges?
Well actually, we know the challenges quite well. We spell them out in our digital preservation strategy, which also outlines HOW we’ll go about preserving our digital collections and sustaining their value - not just for this generation of researchers, but also countless future generations.
Now, you might recognise this. It’s a BBC Micro. They were very popular in the 1980s and could be found in several classrooms across the UK, even the occasional home. But no longer – these are technologically obsolete. This technology is outdated, has been surpassed, and is no longer supported.
Technological obsolescence is often regarded as the greatest threat to digital material. This is because as technology changes, it becomes increasingly difficult to reliably access content created on and intended to be accessed on older computing platforms. Yet this is just the long term view: in the shorter term we must also consider everything from media integrity and bit rot to digital rights management and metadata.
Other notable differences between analogue and digital content further add to the challenge:
Fragility of storage media: Storage media degrades. CDs, tapes, discs - they don’t always age as well as expected. This degradation can sometimes have catastrophic effects upon the integrity of the content. Bit rot, for example, can prevent files from rendering correctly if at all; this can happen with no notice and within just a few years, sometimes less, of the media being produced. Now if you’re interested in seeing how this can manifest, take a look at @flipbitbot on Twitter
Integrity & validation: It’s much easier to make unnoticed changes to digital content than to traditional objects. Just think how easy it is to change a word in a file, or to move an object around on a page. And just think as well about changes that can occur when you update a file from one version of a format to another – these happen automatically. These are changes which may affect the authenticity and integrity of the object, and impact on their reusability, particularly for scientific research. Malicious change must be prevented and appropriate change managed;
Proactive Lifecycle management: We need to proactively manage digital content throughout the entire lifecycle. Ongoing technological advances and the fragility of digital content require preservation actions to be taken much earlier in the lifecycle than for traditional collections, and at a much greater frequency. A lifecycle management approach is needed to ensure appropriate actions are taken, and in good time, in order to avoid damage or loss.
Ultimately, dgital preservation is not simply a technical challenge.
It requires an ongoing and typically recursive series of actions and interventions throughout the lifecycle, to ensure continued & reliable access to authentic digital objects, for as long as they are deemed to be of value. It’s not just about technology. It’s also about people. It’s about policy. It’s about resources. It’s about the collections. It’s about research – because we are always shooting at a moving target, and digital preservation is still an emerging discipline. And it’s about a strategy that brings all of that together to ensure persistent access over time.
Our current digital preservation strategy is very much focused around the replacement of our digital repository system. We currently manage content in a four node replication system that we built and developed ourselves. We’ve recently procured a new system called Libsafe, and we’re in the process of working out how to migrate our collections into this new system and make our workflows more efficient.
Alongside Libsafe, we’ll be running a platform called IPS, which stands for the Integrated Preservation Suite. IPS is one of the products of our digital preservation research programme. It’s a new system tat will support preservation planning at scale. Designed to interface with any repository system, it comprises a technical registry, a software repository, and a policy planning database, all accessed by a Preservation Workbench. When a risk alert is received, we can spring into action to initiate and implement a preservation plan across all of the affected files – however many thousand or million that might be! - to mitigate that risk and maintain reliable access to our collections.
Another fantastic research project we have underway is called Flashback. This is a project to safeguard over 100,000 legacy disc-based items that the Library originally acquired on handheld media, many before the turn of the millennium, and process them for ingest to our new repository. Ultimately we want to make them available in our reading rooms using a scalable emulation-based system. We have a little way to go before that’s ready, but we’re on the way.
And that’s me done. This has only been a very quick introduction into digital preservation. There’s so much more I could have said. But I hope we’ve piqued your interest. If you want to know more, come and fund us over the break.