SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Data Storage & Preservation
Luke Bluma | Brianna Marshall | Elliott Shuppy
IGERT workshop | November 2014
STORAGE
Outline
• Problem with Storage
• Storage vs Backup
• Storage Types
• UW-Madison Options
• Personal Options
• Best Practices
• Use Cases
• Key Takeaways
The Problem with Storage
• It’s everywhere!
• All the options seem similar
but slightly different
• Every use case is a little different
Storage vs Backup
Storage
Your working files. The files you access regularly and
change frequently. You need to store data safely and
securely but you also need to have access to it. In general,
losing your storage means losing current versions of the
data.
Storage vs Backup
Backup
A frequent and regular process of copying your data to a
secure place that is separate from where you keep your
storage. Backup can be overlooked because you don’t
really need it until you lose data, but when you need to
restore a file it can be the most important process you have
in place.
Rule of 3
• Keep THREE copies of your data
– TWO onsite
– ONE offsite
• Example:
– One: Network Drive
– Two: External Hard Drive
– Three: Cloud Storage
• This ensures that your storage and backup is not
all in the same place – that’s too risky!
Storage Types
• Local storage
– Hard drive, external hard
drive, thumb drive, etc.
• Network storage
– Private cloud, public cloud,
etc.
• Private Cloud = network
storage run by UW
• Public Cloud = network
storage run by vendor
UW Data - Storage Options
• Local Storage/Backup Options
– External Hard Drive (TechStore)
• Local IT Options
– Services available depends on your
local IT department
• DoIT Options
– Storage: File and Block Storage
– Backup: Bucky Backup Lite
• Cloud Options
– UW’s Box Account
UW Data – DoIT Options
• Storage: File and Block Storage
– File: easy to access, manage and share with
other UW folks
– Block: additional raw storage available over
the network for your server
• Backup: Bucky Backup Lite
– Client runs on your computer or server and
does incremental backups nightly
– You can manage the retention policy and
version control
• Cloud Storage
– UW’s Box Account
Personal Data - Storage Options
• Personal Data
– Your personal UW data: UW’s Box Account
– Your personal data: thumb drive, external
hard drive, or cloud options like Box,
Crashplan, Dropbox, etc.
• Discount with Crash Plan – 30% off -
http://go.wisc.edu/crashplan
Evaluating Cloud Services
• Lots of options out there – and not all are
created equal
• Read the Terms of Service!
• Servers get hacked all the time. Whatever
you’re storing, you don’t want your
provider to have access to it.
• Data encryption is your friend.
Storage & Backup Best Practices
• Think about and plan your data management
strategy before storing data
• If the data has ANY value to you, back it up
• If you have questions, ask for help! Local IT,
RDS, peers, friends, etc.
• Network storage is great, but think about
having a plan in place if you need to access
the data and the network is down
Storage & Backup Best Practices
• Put in the appropriate security measures
• Version control can be important especially
when sharing data – plan ahead
• Document who has access to the data and
audit that on a regular basis
• Test your backups – make sure they are
working and you can actually restore a file
• If you use cloud storage, think about an exit
strategy
Use Case 1 – Starting Fresh
• If you have a local IT person, contact them
first to talk about services available
• Contact RDS about a data management plan
• If local IT doesn’t have service offerings,
contact DoIT
• If all else fails – at least plan out your data
management strategy (storage, backup, etc.)
before starting to collect/use data
Use Case 2 – Leaving UW
• UW Data
– If you have a local IT person, contact them
– If someone will be taking over your work, give them access
to a shared space like Box
– If you are using DoIT services, make sure someone else
still on campus has access to the data
– If you don’t have local IT, and aren’t using shared services
but think the data is valuable to UW contact RDS
• Personal Data
– If you are using UW Box, then transfer the data over to a
personal Box/Dropbox/Cloud account
– Purchase an external hard drive and transfer data over
that way
Key Takeaways
• Figure out your storage requirements
– High security? Remote access? Ease of use?
Scalability?
• Ask around – people are happy to help!
– Local IT, Peers, Friends, Family, etc.
• Rule of 3
– 2 onsite, 1 offsite – better to be safe, than sorry
• Test it!
– Make sure it works as advertised and do some
disaster testing
PRESERVATION
Storage & Backup
vs. Preservation
Storage & Backup = short-term
– Working copies
– Expected to change
Preservation = long-term
– Usually the final, “fixed” version/s
Thinking Long-Term
• The data you’ve carefully stored is only useful if
it’s readable and understandable
• Many factors affect this:
– Media
• What software did you use to create the data? Does
hardware exist to access it?
– Metadata
• How much contextual information accompanies your data?
Can you understand it? Can a stranger understand it?
– Organization
• Is it all jumbled together? Or have you organized it
meaningfully? Do you know where your data is?
Thinking Long-Term
• None of the concepts discussed during this
workshop exist in a vacuum
• Some aspects of preservation feel out of our
control, like too much work
• The truth? It is confusing to plan ahead for
our data in a landscape of quickly changing
services…
• … but it’s worth it.
Time to Ponder
• Can you still access your data from…
– 20 years ago?
– 10 years ago?
– 5 years ago?
– 1 year ago?
Let’s talk about the data you’ve kept and
lost.
Unreadable Data
CULPRITS
• Obsolete media
• Obsolete software &
file formats
• Obsolete hardware
CC image by Flickr user wlef70
Unreadable Data: Solutions
Now
- Start researching. (Google!) Odds are someone else
has faced the same issue.
- Digital forensics tools such as BitCurator can provide
guidance: http://www.bitcurator.net/
- Don’t assume your data is gone for good.
- Contact me to brainstorm.
Unreadable Data: Solutions
Moving forward
• Today’s popular software can become obsolete through
business deals, new versions, or a gradual decline in user base.
(Consider WordPerfect.)
• Anticipate average lifespan of media to be 3-5 years. Migrate
your files every few years, if not more frequently!
• Some file formats are less susceptible to obsolescence than
others
– Open, non-proprietary formats (pick TXT over DOCX, CSV
over XSLX, TIF over JPG)
– Wide adoption
– History of backward compatibility
– Metadata support in open format (XML)
Lost Data
Now
• Do a data inventory. List all the places where your
data lives (both physical and digital)
• Plan for consolidating – follow the rule of 3, not the
rule of 17
Moving forward
• Too many copies can be a headache: hard to keep
track of versions and know what is where. It
makes sense to start a data inventory to track your
data, especially at the beginning of a big project
with many people and moving parts.
Decontextualized Data
Coded SPSS
survey
responses
(Useless without
the original
questionnaires)
Decontextualized Data: Solutions
Now
• Write contextual information in the form of a readme
file and/or scan written notes.
• Publish as additional bitstream to your datasets.
• Accept that some old data will never have necessary
contextual information. Is it worth it to preserve it?
Moving forward
• Take the time to create metadata.
• At the very least, create a readme file. (Good example
located here: http://hdl.handle.net/2022/17155)
Repositories
Disciplinary repositories provide a good home
for data, often with the requirement that you
share it openly.
DataONE: https://www.dataone.org/
Dryad: http://datadryad.org/
Knowledge Network for Biocomplexity:
https://knb.ecoinformatics.org/
Databib & re3data
Plan to merge their two projects into one service by the end of 2015.
Institutional Help with Preservation
• IR not yet up to task of managing data… but
that’s in the works.
• UW Libraries is a member of the Digital
Preservation Network
• Several distributed, “dark archive”
preservation systems being explored
• And of course, RDS can help!
Final Thoughts
• Preservation = thinking about how your data
organization, metadata, and storage impacts
your ability to access your data years from now.
• Prioritize your most important research. You
might not be able to preserve everything.
• It takes active researcher participation.
• Any plan is better than no plan at all. Start today.
Ask for help.
Contact Us
• Research Data Services (RDS)
– http://researchdata.wisc.edu/help/about-us/
• DoIT Storage and Backup
– cci@cio.wisc.edu
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Database, 3 Distribution Design
Database, 3 Distribution DesignDatabase, 3 Distribution Design
Database, 3 Distribution Design
Ali Usman
 
Overview of physical storage media
Overview of physical storage mediaOverview of physical storage media
Overview of physical storage media
Srinath Sri
 
The Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data ManagementThe Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data Management
DATAVERSITY
 

Was ist angesagt? (20)

Database, 3 Distribution Design
Database, 3 Distribution DesignDatabase, 3 Distribution Design
Database, 3 Distribution Design
 
Overview of physical storage media
Overview of physical storage mediaOverview of physical storage media
Overview of physical storage media
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
DBMS - RAID
DBMS - RAIDDBMS - RAID
DBMS - RAID
 
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
Distributed dbms
Distributed dbmsDistributed dbms
Distributed dbms
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
Parallel databases
Parallel databasesParallel databases
Parallel databases
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Privacy by Design - taking in account the state of the art
Privacy by Design - taking in account the state of the artPrivacy by Design - taking in account the state of the art
Privacy by Design - taking in account the state of the art
 
Teradata
TeradataTeradata
Teradata
 
Distributed DBMS - Unit 4 - Data Distribution Alternatives:
Distributed DBMS - Unit 4 - Data Distribution Alternatives:Distributed DBMS - Unit 4 - Data Distribution Alternatives:
Distributed DBMS - Unit 4 - Data Distribution Alternatives:
 
Tableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic cultureTableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic culture
 
Introduction to Database Management Systems
Introduction to Database Management SystemsIntroduction to Database Management Systems
Introduction to Database Management Systems
 
Concept of computer files
Concept of computer filesConcept of computer files
Concept of computer files
 
The Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data ManagementThe Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data Management
 

Ähnlich wie Data Storage & Preservation

2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches
George Ang
 

Ähnlich wie Data Storage & Preservation (20)

Data Management 101
Data Management 101Data Management 101
Data Management 101
 
Writing a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolWriting a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPTool
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data Management
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research Data
 
Responsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data Management
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
Data Management 101
Data Management 101Data Management 101
Data Management 101
 
Data Management 101 (2015)
Data Management 101 (2015)Data Management 101 (2015)
Data Management 101 (2015)
 
Data Management Crash Course
Data Management Crash CourseData Management Crash Course
Data Management Crash Course
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management Planning
 
Demography pro sem
Demography pro semDemography pro sem
Demography pro sem
 
Introduction to Digital Preservation
Introduction to Digital PreservationIntroduction to Digital Preservation
Introduction to Digital Preservation
 
Implementing Linked Data in Low-Resource Conditions
Implementing Linked Data in Low-Resource ConditionsImplementing Linked Data in Low-Resource Conditions
Implementing Linked Data in Low-Resource Conditions
 
How to write a data management plan
How to write a data management planHow to write a data management plan
How to write a data management plan
 
Bigdata Analytics using Hadoop
Bigdata Analytics using HadoopBigdata Analytics using Hadoop
Bigdata Analytics using Hadoop
 
Data Analytics: HDFS with Big Data : Issues and Application
Data Analytics:  HDFS  with  Big Data :  Issues and ApplicationData Analytics:  HDFS  with  Big Data :  Issues and Application
Data Analytics: HDFS with Big Data : Issues and Application
 
Data Management Planning in the arts
Data Management Planning in the artsData Management Planning in the arts
Data Management Planning in the arts
 
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Data Storage & Preservation

  • 1. Data Storage & Preservation Luke Bluma | Brianna Marshall | Elliott Shuppy IGERT workshop | November 2014
  • 3. Outline • Problem with Storage • Storage vs Backup • Storage Types • UW-Madison Options • Personal Options • Best Practices • Use Cases • Key Takeaways
  • 4. The Problem with Storage • It’s everywhere! • All the options seem similar but slightly different • Every use case is a little different
  • 5. Storage vs Backup Storage Your working files. The files you access regularly and change frequently. You need to store data safely and securely but you also need to have access to it. In general, losing your storage means losing current versions of the data.
  • 6. Storage vs Backup Backup A frequent and regular process of copying your data to a secure place that is separate from where you keep your storage. Backup can be overlooked because you don’t really need it until you lose data, but when you need to restore a file it can be the most important process you have in place.
  • 7. Rule of 3 • Keep THREE copies of your data – TWO onsite – ONE offsite • Example: – One: Network Drive – Two: External Hard Drive – Three: Cloud Storage • This ensures that your storage and backup is not all in the same place – that’s too risky!
  • 8. Storage Types • Local storage – Hard drive, external hard drive, thumb drive, etc. • Network storage – Private cloud, public cloud, etc. • Private Cloud = network storage run by UW • Public Cloud = network storage run by vendor
  • 9. UW Data - Storage Options • Local Storage/Backup Options – External Hard Drive (TechStore) • Local IT Options – Services available depends on your local IT department • DoIT Options – Storage: File and Block Storage – Backup: Bucky Backup Lite • Cloud Options – UW’s Box Account
  • 10. UW Data – DoIT Options • Storage: File and Block Storage – File: easy to access, manage and share with other UW folks – Block: additional raw storage available over the network for your server • Backup: Bucky Backup Lite – Client runs on your computer or server and does incremental backups nightly – You can manage the retention policy and version control • Cloud Storage – UW’s Box Account
  • 11. Personal Data - Storage Options • Personal Data – Your personal UW data: UW’s Box Account – Your personal data: thumb drive, external hard drive, or cloud options like Box, Crashplan, Dropbox, etc. • Discount with Crash Plan – 30% off - http://go.wisc.edu/crashplan
  • 12. Evaluating Cloud Services • Lots of options out there – and not all are created equal • Read the Terms of Service! • Servers get hacked all the time. Whatever you’re storing, you don’t want your provider to have access to it. • Data encryption is your friend.
  • 13. Storage & Backup Best Practices • Think about and plan your data management strategy before storing data • If the data has ANY value to you, back it up • If you have questions, ask for help! Local IT, RDS, peers, friends, etc. • Network storage is great, but think about having a plan in place if you need to access the data and the network is down
  • 14. Storage & Backup Best Practices • Put in the appropriate security measures • Version control can be important especially when sharing data – plan ahead • Document who has access to the data and audit that on a regular basis • Test your backups – make sure they are working and you can actually restore a file • If you use cloud storage, think about an exit strategy
  • 15. Use Case 1 – Starting Fresh • If you have a local IT person, contact them first to talk about services available • Contact RDS about a data management plan • If local IT doesn’t have service offerings, contact DoIT • If all else fails – at least plan out your data management strategy (storage, backup, etc.) before starting to collect/use data
  • 16. Use Case 2 – Leaving UW • UW Data – If you have a local IT person, contact them – If someone will be taking over your work, give them access to a shared space like Box – If you are using DoIT services, make sure someone else still on campus has access to the data – If you don’t have local IT, and aren’t using shared services but think the data is valuable to UW contact RDS • Personal Data – If you are using UW Box, then transfer the data over to a personal Box/Dropbox/Cloud account – Purchase an external hard drive and transfer data over that way
  • 17. Key Takeaways • Figure out your storage requirements – High security? Remote access? Ease of use? Scalability? • Ask around – people are happy to help! – Local IT, Peers, Friends, Family, etc. • Rule of 3 – 2 onsite, 1 offsite – better to be safe, than sorry • Test it! – Make sure it works as advertised and do some disaster testing
  • 19. Storage & Backup vs. Preservation Storage & Backup = short-term – Working copies – Expected to change Preservation = long-term – Usually the final, “fixed” version/s
  • 20. Thinking Long-Term • The data you’ve carefully stored is only useful if it’s readable and understandable • Many factors affect this: – Media • What software did you use to create the data? Does hardware exist to access it? – Metadata • How much contextual information accompanies your data? Can you understand it? Can a stranger understand it? – Organization • Is it all jumbled together? Or have you organized it meaningfully? Do you know where your data is?
  • 21. Thinking Long-Term • None of the concepts discussed during this workshop exist in a vacuum • Some aspects of preservation feel out of our control, like too much work • The truth? It is confusing to plan ahead for our data in a landscape of quickly changing services… • … but it’s worth it.
  • 22. Time to Ponder • Can you still access your data from… – 20 years ago? – 10 years ago? – 5 years ago? – 1 year ago? Let’s talk about the data you’ve kept and lost.
  • 23. Unreadable Data CULPRITS • Obsolete media • Obsolete software & file formats • Obsolete hardware CC image by Flickr user wlef70
  • 24. Unreadable Data: Solutions Now - Start researching. (Google!) Odds are someone else has faced the same issue. - Digital forensics tools such as BitCurator can provide guidance: http://www.bitcurator.net/ - Don’t assume your data is gone for good. - Contact me to brainstorm.
  • 25. Unreadable Data: Solutions Moving forward • Today’s popular software can become obsolete through business deals, new versions, or a gradual decline in user base. (Consider WordPerfect.) • Anticipate average lifespan of media to be 3-5 years. Migrate your files every few years, if not more frequently! • Some file formats are less susceptible to obsolescence than others – Open, non-proprietary formats (pick TXT over DOCX, CSV over XSLX, TIF over JPG) – Wide adoption – History of backward compatibility – Metadata support in open format (XML)
  • 26. Lost Data Now • Do a data inventory. List all the places where your data lives (both physical and digital) • Plan for consolidating – follow the rule of 3, not the rule of 17 Moving forward • Too many copies can be a headache: hard to keep track of versions and know what is where. It makes sense to start a data inventory to track your data, especially at the beginning of a big project with many people and moving parts.
  • 27. Decontextualized Data Coded SPSS survey responses (Useless without the original questionnaires)
  • 28. Decontextualized Data: Solutions Now • Write contextual information in the form of a readme file and/or scan written notes. • Publish as additional bitstream to your datasets. • Accept that some old data will never have necessary contextual information. Is it worth it to preserve it? Moving forward • Take the time to create metadata. • At the very least, create a readme file. (Good example located here: http://hdl.handle.net/2022/17155)
  • 29. Repositories Disciplinary repositories provide a good home for data, often with the requirement that you share it openly. DataONE: https://www.dataone.org/ Dryad: http://datadryad.org/ Knowledge Network for Biocomplexity: https://knb.ecoinformatics.org/
  • 30. Databib & re3data Plan to merge their two projects into one service by the end of 2015.
  • 31. Institutional Help with Preservation • IR not yet up to task of managing data… but that’s in the works. • UW Libraries is a member of the Digital Preservation Network • Several distributed, “dark archive” preservation systems being explored • And of course, RDS can help!
  • 32. Final Thoughts • Preservation = thinking about how your data organization, metadata, and storage impacts your ability to access your data years from now. • Prioritize your most important research. You might not be able to preserve everything. • It takes active researcher participation. • Any plan is better than no plan at all. Start today. Ask for help.
  • 33. Contact Us • Research Data Services (RDS) – http://researchdata.wisc.edu/help/about-us/ • DoIT Storage and Backup – cci@cio.wisc.edu