Presented by Sarah Grimm (Wisconsin Historical Society) and Emily Pfotenhauer (WiLS) for the WiLSWorld conference, Madison, Wisconsin, July 24, 2013. Content based on Modules 1 & 2 of the Digital Preservation Outreach and Education (DPOE) Baseline Digital Preservation Curriculum developed by the Library of Congress.
Managing Digital Content Over Time: Identify and Select
1. Managing Digital Content
Over Time
Sarah Grimm, WHS
Emily Pfotenhauer, WiLS
Slides and handouts:
recollectionwisconsin.org/wilsworld2013
Supported by WHRAB
3. DPOE Mission
The mission of the Digital Preservation
Outreach and Education (DPOE)
program of the Library of Congress
is to encourage individuals
and organizations to actively
preserve their digital content, building
on a collaborative network of instructors,
contributors, and institutional partners.
4. Six Training Modules
Identify - what digital content do you have?
Select - what portion of that content is your
responsibility to preserve?
Store - how should your content be stored
for the long term?
Protect - what steps are needed to protect
your digital content?
Manage - what provisions are needed for
long-term management?
Provide - how should your content be made
available over time?
5. What is Digital Content?
Digital content is any content that is
published or distributed in a digital form,
including text, data, sound recordings,
photographs and images, motion
pictures, and software.
◦ Digital materials created from analog
sources
◦ Born-digital content
Digital materials you currently have – or
expect to acquire or create – that you
want to preserve.
6. What’s the Problem?
Increasing amounts of digital assets
are arriving on our doorstep or being
created by us
The digital assets arrive in all formats
and on all formats
Time sensitivity - the longer we wait or
the longer our donors wait the
increased chance that something will
be unreadable
7. Digital Reality in 2013
Everyone is
◦ creating digital content
◦ distributing digital content
◦ using digital content
And we are responsible for
managing digital content now or
expecting to in the near future
8. What are the Challenges?
Who takes the lead?
What can I do?
Where do I start?
The impediments
Too complex (I don’t understand...)
Too daunting (I don’t have time...)
Too technical, etc. (Computers scare me...)
10. Digital Preservation
Digital preservation combines policies,
strategies and actions to ensure
access to reformatted and born digital
content regardless of the challenges of
media failure and technological change.
The goal of digital preservation is the
accurate rendering of authenticated
content over time.
Working group on Defining Digital Preservation, ALA Annual Conference, 6/24/2007
11. Why Do We Identify Content?
Not all digital content can or should be
preserved
Preservation requires an explicit
commitment of resources
Good preservation decisions are based
on an understanding of the possible
content to be preserved
12. First Steps
• Identifying content is a first step to planning
for current and future preservation needs
• Ask: what content
do I have,
will I have,
might I have,
must I have?
An inventory is the best way to identify what
content you have now – and raise awareness
in your institution.
14. If not, do you need permission
to begin an inventory project?
15. Inventory Considerations
Inventory content more important than style
and format
Inventory results should be:
◦ Documented: an inventory should
actually exist
◦ Usable: use a simple format to sort, list,
etc.
◦ Available: accessible to others
◦ Scalable: content will be added during
Select
◦ Current: update periodically
16. Inventory Tips
Don’t let implementing the software
become the focus.
Use software you know and have
available
Stick with a single format; don't
change once you've decided on it.
Be consistent, comprehensive, and
concise
17. How Much Detail to Include
Inventories can be general to detailed
Determine appropriate level of detail for you
Factors in determining level of detail:
◦ Extent of content to be inventoried
◦ Nature & location of content
◦ Resources available to complete
inventory
◦ Timeframe & deadlines for completion
18. What Do You Have?
Identify collections of digital materials.
Provide a brief title and description
Estimated growth over time ***
19. Who Manages It?
Department – currently managing the
collection/digital content
Staff – primary people responsible
Creator (Internal or External) – who
created the digital content
20. What does it consist of?
Medium (6cds, 1 hard drive)
Extent = Format + Amount
(600 .pdfs, 30 .doc)
File Size – (MB, GB, TB)
http://www.csgnetwork.com/memconv.html
21. Date Considerations
Inventories should note:
• Date of inventory and updates to it
• Dates associated with the content
(18721901)
• Date of files – created or modified (2009)
• Date received – if relevant / possible (2011)
22. Content Location
Locations of content are important :
• List primary locations (Network drive
location, Hard drive on Bob’s shelf)
• List locations of all backups/copies (CDs
in the storage room, weekly backup
tapes)
Must remember to change locations as
content moves
23. Analyze the Results
When the inventory is complete, ask
yourselves what digital content
◦ do we have that we didn’t know about?
◦ should we be keeping that we aren’t
now?
◦ will we create or likely acquire in the
future?
◦ are we required to keep?
◦ do we need to review?
24. Goals
Identify potential digital content you
may need to preserve
Treat the inventory as a management
tool that grows as your preservation
program grows
Use it as a planning tool – e.g., to
prepare staff, training, annual growth
Use as a basis for acquiring content,
defining submission agreements,
plans
26. Six Training Modules
Identify - what digital content do you have?
Select - what portion of that content
will be preserved?
Store - how should your content be stored
for the long term?
Protect - what steps are needed to protect
your digital content?
Manage - what provisions are needed for
long-term management?
Provide - how should your content be made
available over time?
27. Why select content to
preserve?
Log jam on the St. Croix River, 1886
Wisconsin Historical Society WHi-2364
28. ● Cost: storage may be cheap,
management is not…especially over
time
● Discovery and dissemination
services: scale, scope, performance,
sustainability
● Quality of content may be variable
● Matching mission to content
Why select content to
preserve?
29. Basic Steps
Review your potential digital
content (go back to inventory)
Define - then apply - selection
criteria
Document (and preserve)
selection decisions
Implement your decisions (Store,
Protect, Manage, and Provide
modules)
Picking fruit
Wisconsin Historical Society WHi-67733
30. What criteria should be used to
select digital content for preservation?
Postal workers sorting mail, 1955
Wisconsin Historical Society WHi-36392
31. Selection Criteria
Mission: Scope of Collections, Collecting
Policies
Records retention manuals/policies (internal
or externally mandated)
Legal & ethical requirements (professional
bodies; your stakeholders; future users)
Uniqueness (only source or preserved
elsewhere? Avoid duplication)
Value (historical, evidential, can’t
reproduce?)
32. Practical Considerations
Stop if or when the answer is NO
● Content
– Does the content have long term value?
– Does it fit your scope and mission?
● Technical
– Is it feasible for you to preserve the
content?
● Access
– Is it possible to make the content
available?
– Are you the only holder of this content?
33. Setting Priorities
Ask yourself which digital content is
● most significant to your organization?
● most extensive?
● most requested/used?
● easiest?
● oldest?
● newest?
● mandated?
● at risk?
34. Include Creators in the
Process
● Communication is key, particularly when
content comes from external creators
● Keep content creators in the conversation
● Arrange a convenient time for them to
talk about your preservation plans
● Identify list of materials to review with
them
● Document the results and send them a
copy
35. Selection Documentation
Supplement your inventory with more
detailed information about the material
you plan to preserve over the long term.
Use
◦ What’s the lifespan of the content?
◦ Will its value/use change over time?
◦ Retention period
36. Access and rights
Access
◦ How will the public access the content?
◦ Is access restricted? How? For how
long?
Rights
◦ Who owns the rights to preserve and
disseminate?
37. Prioritizing
Data criticality
◦ Is it only in digital form? Do we hold the
only copy?
Business/mission criticality
◦ If we lose it, what’s the damage to our
reputation? How will it impact our
function or services?
39. Goals/Outcomes
• Expanded inventory of content to
preserve
…and what you can delete (gray areas
identified)
• Agreements with content creators e.g.
submission agreements, retention
schedules
• Well-defined and documented selection
criteria, policies and procedures
• Better understanding of content for
future planning and growth
Greater knowledge = greater control!
40.
41.
42. File Naming
File Naming
Why is this important?
◦ To prevent accidental overwriting
◦ To help you find it again
Train Wreck Image ID: WHi-2011
Don’t use special characters in your
file/folder titles
(^”<>|? / : @’* &.)
Just because you CAN doesn’t mean you
SHOULD…..
43. Resources
State Library of North Carolina –
◦ Web
http://www.archive.org/details/WhyFileNamingIsImportant
http://www.archive.org/details/HowToChangeAFileName
http://www.archive.org/details/WhatNotToDoWhenNamingFil
es
http://www.archive.org/details/WhatToDoWhenNamingFiles
◦ YouTube
http://digitalpreservation.ncdcr.gov/tutorials.html
44. File Management
Store similar digital items together
◦ Co-locate in a central location
Don’t bury items in multiple levels
Get rid of easy-to-purge items
◦ Rescued or recovered documents
◦ Empty file folders
◦ ~.tmp files
45. File Management
Make decisions about what NOT to
keep
◦ File backups/copies/drafts
◦ Supplementary files that provide no
additional long-term value
◦ Corrupted files
◦ File Formats
Leave breadcrumbs
Determine what you don’t know
Just to start out, we will talk a bit about the upcoming presentation. The curriculum in this program was developed by the Library as part of their Digital Preservation Outreach and Education program. DPOE is part of a national effort to encourage individuals and organizations to actively preserve their digital content. DPOE is training individuals in each region across the US to help people such as yourselves who are dealing daily with ever increasing amounts of digital content. The first 4-day National DPOE Train-the-Trainer workshop was held at the Library of Congress Sept 2011 with people from each region in the US. Since then, additional programs have been held in Indiana, Illinois and Alaska. DPOE National Trainer Network: 63 participants in Train-the-Trainer programs across the US
The trainings are built around these 6 modules. Modules are not designed to necessarily provide specific technological solutions but they are designed to make sure the right questions are being asked about each stage.Identify – Covers the creation of a scalable inventory. You need to understand what you have now and what you expect to have in order to plan for the long-term. Select – Not all digital content you currently have should be kept. Store – Looks at long-term storage requirements and possible options for meeting those requirements. How can you use that information to develop long-term storage management policies?Protect – This module looks at content protection over time (Physical degradation of the bits and bytes) as welll as physical protection of the systems themselves (Disaster recovery, server access)Manage – focus on preservation planning (Policy development, planning, training, funding)Provide – access policies, intellectual property issues + the planning for the technological access of the items.
Anything we may encounter in a digital form is going to fall under digital content. This is going to encompass anything that comes our way and that we are going to have to think about preserving for a long period of time. There are really two sources of this digital material – Those created from physical sources that we turn into digital items – digitization of maps / documentsThose that are created digitally – items on our computers, digital photosA lot of times we are charged with not only holding and preserving that digital content, but also making it available to the public. But this ever increasing digital world is presenting us with new challenges. And what are those problems? ……
Who – is it me? Doesn’t someone in IT do that? Too Complex – I don’t get digital objects, there are too many types and formatsToo Daunting – I’m the only one doing this and I only do this part time – how am I going to manage/ organize all of this stuff?Too Technical – This is too complex and using equipment I don’t understand and don’t have time to learnBut all of this needs to be dealt with, because to not do anything would mean potentially losing things forever.
We are looking to digital preservation for an answer because we realize that being in digital form is not the same as being digitally preserved. Digital preservation is active management of digital content over the long term with access as it’s ultimate goal. With books or documents – We can read it and put it on the shelf and continue to open it and read it for decades with proper handling. However, once something is digitized, we can’t expect to set it aside and then open it in 10 years much less 50 without active management. We must find ways to ensure that the digital item is accessible. In order determine how we are going to preserve something, we must first have an understanding of what we have. We must IDENTIFY it
As stated earlier, the volume and kinds of digital materials we create or inherit are growing.Much of it is useful and even necessary for our work, but much of it is not. Think of the string of e-mails created as people go back and forth discussing a topic. Or different drafts of a document. Or various copies of a digitized object as we try to get it “just right”. How much of that is really worth saving for posterity? The Identify stage helps us figure out what content we have, so we can determine what needs to be kept.Good digital preservation requires an explicit commitment of resources, which - for most organizations - means planning ahead. If you don’t know the extent of the problem, you don’t know what resources you need. The first step in planning for digital preservation is to know where you stand with regard to your digital assets.
And if so, who would you need to get that from?
Ask Audience who has an inventory?
Ask if anyone currently has an inventory and what software is being used
Refer to the handout here …..Nature and Location – is all the information onsite, or would you need to travel to multiple locations to capture everything? Resources – How many people can you get to help – is it just you, a small staff, volunteers? Timeframe – Give yourself a time frame for this. Keep in mind this is never “done”Can have audience pull out the inventory here
What? - Work at the collection level, not the item level. What is the familiar title for the collection?Description – Provide a brief description of what is in the collection. You are collecting information about items that are known and may be in your catalog + items that have come in your door that are waiting to be dealt with + items that are being created (digitization projects) + things you may not even know about yet…….
Creator – so that you can go back to them with any issues
It’s a good idea to note the format of you digital media, or what the digital content is stored on, since some format types last longer than others. Digital content on more fragile media (floppy disc) might be a higher priority.
You should make sure to specify the location of digital content in your inventory. Some things you will want to consider:How will you specify whether content is located online (meaning on your computer hard drive or a network server), or offline (meaning stored on some removable piece of media, like a CD or flash drive)?Location in storage systemKeep in mind that you will need to update the inventory whenever the content moves. If you get too specific you might spend all your time updating file locations.Ask Audience - WHAT OTHER FIELDS ARE NOT INCLUDED THAT WOULD BE HELPFUL
After you’ve compiled your inventory,it can be easy to get overwhelmed. You know you’ve got lots of digital content, but how much of it is really your organization’s responsibility to preserve? Meanwhile, you’ve still got more logs—more new digital content—coming in down the river. One of the goals of selecting content to preserve is to help get your logs moving again—start setting priorities and pick a few things to tackle first so everything can start flowing more efficiently.
Not all of the content you’re dealing with may in fact be appropriate or necessary for you to preserve, and you don’t want to commit resources to preserving materials you don’t have to. You may hear people argue that storage is cheap so we should keep everything. Unfortunately that perspective is rather short-sighted. Storage may be cheap, but preserving the quality of content over the long-term is not. There are periodic migration costs, moving the digital materials into systems where you will preserve it. Monitoring files for corruptionand change. [have you lost bits? Are the files degrading? Not to mention maintaining access to the files, which means updating your discovery and dissemination services every time hardware and software change. [an ongoing, recurring cost] The idea behind long-term preservation is that you will be making this content available in the future. It isn’t enough just to save the content if you can’t access it any more.[Quality]Even if we could keep everything forever, would we want to? Is that manageable given the type of content that you hold? Not all digital content may be preservation quality – if you have high resolution scans of your photos, do you also need to preserve the lower quality versions of these scans? And not all will be significant enough to warrant preservation. [that string of emails about organizing the staff Christmas party…]Does the digital content we take in match our mission and scope of collections? Quite often materials find their way to us that have little or nothing to do with our mission, yet we give hem a home and expend our resources on maintaining them. Maybe there is a better/more logical home for that content? [Maybe you could partner with another org that is better placed to hold and preserve that content.]The selection process for digital content is very analogous to the selection process for non-digital materials – you don’t collect materials for your archive that don’t match your mission, and you should keep that same principles in mind when selecting digital content.
The Basic steps for Selection require you toReview your potential digital content – start with the outcomes of your inventory; look over what you have and think you might have coming in. Understand the implications.Define and then apply criteria for what you will select to preserve. It’s the best way to ensure consistency (across an organization, over time and staffing changes). Document (and preserve) selection decisions: [Why are you keeping things? What is your rationale? You – your staff - and your successors – need to understand why you chose to keep that particular content. Don’t assume it will be obvious to everyone.]Implement your decisions – and stick to your criteria!Don’t take in or keep content not in your definedscope of preservation. Review your selection criteria regularly to ensure they meet your needs. They are there to ensure consistency and can also be a helpful tool in controlling what content comes your way. ( an argument in your arsenal for those times when you need to say ‘no’ to someone).
When you’re first getting started, it’s helpful to treat selection as a managed, structured project in order to plan and coordinate the process [and plan for the future]. The selection criteria you choose will be uniquespecific to your situation, your organization and its mission. So where can you go for guidance to begin this project of defining your selection criteria? Look inside your organization first: are there mission-related documents that might give you clues? existing manualsandpolicies, such as records retention schedules? Or Collecting policies?Also look outside your organization: Are there legal restrictions and/or ethical requirements that will guide your choices?On the question of uniqueness, you may not want to include anything that is preserved elsewhere. You may want to focus only on what meets the needs of your primary audience. And the value of materials - determined by a variety of factors - must be assessed in light of your own situation, the materials themselves, and their place in their wider context, whatever that may be.Taking this wider view will enable you to make intelligent choices regarding your selection. Once you have clarified the ideal of what you WANT to preserve, then you’re ready to consider what you are actually ABLE to preserve.
Even if something fits your desired criteria, it still might not be reasonable for you to select it. You can use decision tree or list of questions to help you decide what’s practical to preserve.You’ve already considered the content in view of your selection criteria. And you should already have answered ‘yes’ to both of these Qs to continue considering the materials you hold.does the content have long term value?does it fit your scope and mission?Next you need to consider Technical issues:is it feasible for you to preserve the content? [Is it a “digital time bomb”? Some formats are a challenge to preserve, such as video/time-based media. Some may be too damaged to preserve. Do you have the skills and resources (either to undertake the preservation yourself or to buy the skills in)?Some types of material may require far more expertise and resources than you have available. AndAccess.Even if we’re not making it public, how useful is a server full of digital content that is safe, but that we can’t access?We need to askis it possible to make the content available over time?Are you the only holder of this content? [Duplication]If it is not feasible to preserve the content, and not possible to make it available and usable, then it probably shouldn’t be included in your selection –especially if you know you are not the only holder of this digital content.
Once you have your selection criteria, it may not be possible to review/select everything at once, so how might you sequence the process? Again, the answer will be different for each organization.Think about what’smost significant to your organization?most extensive? (and therefore a more coherent body of material to manage)most requested/used?Easiest to tackle (e.g. most familiar, most ready for ingest – a quick win for your digital preservation process; very helpful when you are having to prove the value of your efforts to a reluctant administration)Oldest (possible historical importance)Newest(possible immediate interest)Mandated (via local policies, legislation, etc.)At risk? If it were no longer available, what digital files would be the hardest to replace? Some formats become obsolete a lot faster than other formats. PDFs are viable for a really long time – video files, however, get old very quickly.
Because digital preservation is a long-term commitment it’s important to establish solid, ongoing relationships with the creators of your digital content. How many of you are managing digital content created by people outside of your library or archives? Other departments or maybe even other institutions?Communication is key – particularly when the content is from external creators. You’ll need to agree on terms for the transfer and retention of digital content to your library (and even where it’s from others within your library). Ideally, you’d want to review the content with the creators to determine which of their material is really important to be preserved, and ensure that what they’re giving you meets your selection criteria. Be aware that most content creators don’t have a clue as to what an archival format is, or how to create content that is likely to be manageable for long-term access. Education of content creators is very important. Working with them at the outset can save you many headaches later. The other important point here is that this doesn’t need to be just YOUR project – connecting with content creators means you can share the love a bit and put some of the onus on THEM to help YOU
Remember that you need to document your selection process.Start out by adding information to inventory for material that you plan to preserve over the long termSupplement your inventory withUse:'Lifespan' of content? Does its value/use change over time?When will content no longer be active? [retention period: how long will you retain it?
The outcome of going through the work of selection is to gain a sense of control over what you have to deal with, what your scope is, and what your policies and priorities are for selection. This is critical to developing a sustainable program for support of long-term preservation and access.By applying your selection criteria to your inventory, you will have more detailed information to work with in your planning. This documentation can also inform your work with creators of digital content. This might include the creation of submission agreements or other policies so that the content coming in to your organization fits your selection criteria for long-term support.The selection process puts you on the path to a sustainable program. Selecting content is ultimately not a one-time project but a long-term, ongoing process, so formalizing it through policies, schedules and other documented criteria will help you avoid more log jams in the future.
As you are going through the inventory and selection process, you will find things in many places and named in many different ways depending on who worked on the item. Digital items are so much easier to save psychologically for people. 100 items on your hard drive doesn’t take up as much visual space as 100 items in your office. A file that is 1 kb looks pretty much like the one that is 1 MB or 1 GB. There also tends to be more copies of digital items, everyone keeps a draft, or it gets attached to an email and sent to 10 people, or it gets filed in two places. Everybody keeps their own items…project documentation is rarely one person managing the group’s information anymore. Its multiplied by the number of people working on the projectAs a result – EVERYTHING IS SAVED – “just in case” and its often saved more then once
And then add the other computers and storage drives in your organization…….and you get the physical rendering of your organization’s digital items. So we are going to take a few minutes to talk about ways to get some control over this digital mess and talk a bit about File Naming and File Management. Hopefully, these will help you as you sort through your digital content and determine how you will approach the long-term management of the items you are working with.
Accidental Overwriting – ex: photos from a digital camera, meeting minutes/agendasFinding – were the minutes saved as April Minutes, 04 minutes, Board minutes, recent minutes, etcGenerally speaking, avoid special characters in file names. While your system may accept them now, there is no guarantee these characters would move to a new system over time should that be required
This slide contains links to both the web version and the You Tube version of 4 videos created by the State Library of North Carolina about File Naming procedures. They total about 10 minutes and provide some great tips.
As you are creating your inventory, you are likely to discover a lot of really simple places you can clean up the files you are reviewing. Co-locate – It’s OK to move things around if it makes sense to do so. Bury – If you have several layers to hunt through, it can be really hard to find anything – Shallow is betterPurge – Unless there is a really good business reason for keeping them.
File backups – EX: Speeches had multiple drafts Final + copies in several different font sizes Supplementary files – folder of images that were used in a power point. Files you can’t open – CorruptedFormats – may receive Word and pdf – May not want to keep both. Breadcrumbs – OK to leave “sticky notes” (AKA “READ ME”) files in folders. Can give a brief description of contents, retention schedule, any naming conventionsDon’t know – unknown file formats, files on old media (floppies), password protected… and then come up with a plan to deal with theses items.
Once you’ve decided how you want to handle file naming issues and have made file management decisions – Document itIt doesn’t have to be long….. You can distribute it in your organization – post it on an intranet, place it in a procedures manual WHY – You will not be the only keeper of the information. (You weren’t here to ask)It will help others who may be helping you with the inventoryYou can hand it out to organizations/departments you receive information from In order to better manage our files, we will accept these file types and formats, they will be named this way. Do not give us password protected documentsYou don’t have to organize and fix everything, but you do need to give other people the tools to help you.
Key parts of the DPOE ongoing effort are the training calendar and the DPOE ListServ