3. Outline
• Problem with Storage
• Storage vs Backup
• Storage Types
• UW-Madison Options
• Personal Options
• Best Practices
• Use Cases
• Key Takeaways
4. The Problem with Storage
• It’s everywhere!
• All the options seem similar
but slightly different
• Every use case is a little different
5. Storage vs Backup
Storage
Your working files. The files you access regularly and
change frequently. You need to store data safely and
securely but you also need to have access to it. In general,
losing your storage means losing current versions of the
data.
6. Storage vs Backup
Backup
A frequent and regular process of copying your data to a
secure place that is separate from where you keep your
storage. Backup can be overlooked because you don’t
really need it until you lose data, but when you need to
restore a file it can be the most important process you have
in place.
7. Rule of 3
• Keep THREE copies of your data
– TWO onsite
– ONE offsite
• Example:
– One: Network Drive
– Two: External Hard Drive
– Three: Cloud Storage
• This ensures that your storage and backup is not
all in the same place – that’s too risky!
8. Storage Types
• Local storage
– Hard drive, external hard
drive, thumb drive, etc.
• Network storage
– Private cloud, public cloud,
etc.
• Private Cloud = network
storage run by UW
• Public Cloud = network
storage run by vendor
9. UW Data - Storage Options
• Local Storage/Backup Options
– External Hard Drive (TechStore)
• Local IT Options
– Services available depends on your
local IT department
• DoIT Options
– Storage: File and Block Storage
– Backup: Bucky Backup Lite
• Cloud Options
– UW’s Box Account
10. UW Data – DoIT Options
• Storage: File and Block Storage
– File: easy to access, manage and share with
other UW folks
– Block: additional raw storage available over
the network for your server
• Backup: Bucky Backup Lite
– Client runs on your computer or server and
does incremental backups nightly
– You can manage the retention policy and
version control
• Cloud Storage
– UW’s Box Account
11. Personal Data - Storage Options
• Personal Data
– Your personal UW data: UW’s Box Account
– Your personal data: thumb drive, external
hard drive, or cloud options like Box,
Crashplan, Dropbox, etc.
• Discount with Crash Plan – 30% off -
http://go.wisc.edu/crashplan
12. Evaluating Cloud Services
• Lots of options out there – and not all are
created equal
• Read the Terms of Service!
• Servers get hacked all the time. Whatever
you’re storing, you don’t want your
provider to have access to it.
• Data encryption is your friend.
13. Storage & Backup Best Practices
• Think about and plan your data management
strategy before storing data
• If the data has ANY value to you, back it up
• If you have questions, ask for help! Local IT,
RDS, peers, friends, etc.
• Network storage is great, but think about
having a plan in place if you need to access
the data and the network is down
14. Storage & Backup Best Practices
• Put in the appropriate security measures
• Version control can be important especially
when sharing data – plan ahead
• Document who has access to the data and
audit that on a regular basis
• Test your backups – make sure they are
working and you can actually restore a file
• If you use cloud storage, think about an exit
strategy
15. Use Case 1 – Starting Fresh
• If you have a local IT person, contact them
first to talk about services available
• Contact RDS about a data management plan
• If local IT doesn’t have service offerings,
contact DoIT
• If all else fails – at least plan out your data
management strategy (storage, backup, etc.)
before starting to collect/use data
16. Use Case 2 – Leaving UW
• UW Data
– If you have a local IT person, contact them
– If someone will be taking over your work, give them access
to a shared space like Box
– If you are using DoIT services, make sure someone else
still on campus has access to the data
– If you don’t have local IT, and aren’t using shared services
but think the data is valuable to UW contact RDS
• Personal Data
– If you are using UW Box, then transfer the data over to a
personal Box/Dropbox/Cloud account
– Purchase an external hard drive and transfer data over
that way
17. Key Takeaways
• Figure out your storage requirements
– High security? Remote access? Ease of use?
Scalability?
• Ask around – people are happy to help!
– Local IT, Peers, Friends, Family, etc.
• Rule of 3
– 2 onsite, 1 offsite – better to be safe, than sorry
• Test it!
– Make sure it works as advertised and do some
disaster testing
19. Storage & Backup
vs. Preservation
Storage & Backup = short-term
– Working copies
– Expected to change
Preservation = long-term
– Usually the final, “fixed” version/s
20. Thinking Long-Term
• The data you’ve carefully stored is only useful if
it’s readable and understandable
• Many factors affect this:
– Media
• What software did you use to create the data? Does
hardware exist to access it?
– Metadata
• How much contextual information accompanies your data?
Can you understand it? Can a stranger understand it?
– Organization
• Is it all jumbled together? Or have you organized it
meaningfully? Do you know where your data is?
21. Thinking Long-Term
• None of the concepts discussed during this
workshop exist in a vacuum
• Some aspects of preservation feel out of our
control, like too much work
• The truth? It is confusing to plan ahead for
our data in a landscape of quickly changing
services…
• … but it’s worth it.
22. Time to Ponder
• Can you still access your data from…
– 20 years ago?
– 10 years ago?
– 5 years ago?
– 1 year ago?
Let’s talk about the data you’ve kept and
lost.
24. Unreadable Data: Solutions
Now
- Start researching. (Google!) Odds are someone else
has faced the same issue.
- Digital forensics tools such as BitCurator can provide
guidance: http://www.bitcurator.net/
- Don’t assume your data is gone for good.
- Contact me to brainstorm.
25. Unreadable Data: Solutions
Moving forward
• Today’s popular software can become obsolete through
business deals, new versions, or a gradual decline in user base.
(Consider WordPerfect.)
• Anticipate average lifespan of media to be 3-5 years. Migrate
your files every few years, if not more frequently!
• Some file formats are less susceptible to obsolescence than
others
– Open, non-proprietary formats (pick TXT over DOCX, CSV
over XSLX, TIF over JPG)
– Wide adoption
– History of backward compatibility
– Metadata support in open format (XML)
26. Lost Data
Now
• Do a data inventory. List all the places where your
data lives (both physical and digital)
• Plan for consolidating – follow the rule of 3, not the
rule of 17
Moving forward
• Too many copies can be a headache: hard to keep
track of versions and know what is where. It
makes sense to start a data inventory to track your
data, especially at the beginning of a big project
with many people and moving parts.
28. Decontextualized Data: Solutions
Now
• Write contextual information in the form of a readme
file and/or scan written notes.
• Publish as additional bitstream to your datasets.
• Accept that some old data will never have necessary
contextual information. Is it worth it to preserve it?
Moving forward
• Take the time to create metadata.
• At the very least, create a readme file. (Good example
located here: http://hdl.handle.net/2022/17155)
29. Repositories
Disciplinary repositories provide a good home
for data, often with the requirement that you
share it openly.
DataONE: https://www.dataone.org/
Dryad: http://datadryad.org/
Knowledge Network for Biocomplexity:
https://knb.ecoinformatics.org/
31. Institutional Help with Preservation
• IR not yet up to task of managing data… but
that’s in the works.
• UW Libraries is a member of the Digital
Preservation Network
• Several distributed, “dark archive”
preservation systems being explored
• And of course, RDS can help!
32. Final Thoughts
• Preservation = thinking about how your data
organization, metadata, and storage impacts
your ability to access your data years from now.
• Prioritize your most important research. You
might not be able to preserve everything.
• It takes active researcher participation.
• Any plan is better than no plan at all. Start today.
Ask for help.
33. Contact Us
• Research Data Services (RDS)
– http://researchdata.wisc.edu/help/about-us/
• DoIT Storage and Backup
– cci@cio.wisc.edu