SlideShare ist ein Scribd-Unternehmen logo
1 von 34
3/01/13
Data Management




                                   Data Management Basics
Basics
A Workshop for Graduate Students
March 1, 2013

                                        1
WHY MANAGE DATA?




                   Data Management Basics   3/01/13
  2
1. Funders Require It
• National Institutes of Health: Data Sharing Policy (2003)
   • All grants funded at $500K or above must include a Data Sharing Plan




                                                                                       3/01/13
• National Science Foundation: Data Management Plan Requirement
  (2011)
   • All proposals must submit a 2 pp supplementary “Data Management Plan” to




                                                                                       Data Management Basics
     describe how projects will comply with NSF data sharing policy

• National Endowment for the Humanities: Sustainability and Data
  Management Plans Requirement (2012)
   • Digital Humanities Implementation Grants must include a plan to discuss how
     data will be managed, disseminated, and preserved

• OSTP Directive to Funding Agencies (2013)
   • Federal agencies with more than $100M in R&D expenditures must ensure                  3
     that published results of federally funded research are freely available to the
     public within one year of publication -- including data
National Science Foundation
 • Data Management Plan Requirement
   • How projects will conform to NSF data sharing policy
   • Flexible




                                                                                3/01/13
      • “The plan should reflect best practices in your area of research, and
        should be appropriate to the data you generate.”




                                                                                Data Management Basics
 • Directorate for Social, Behavioral and Economic Sciences
   • Discipline-specific guidelines
      • Archeology (Digital Archeological Record)
      • Economics (American Economic Association)


• Universals (for the NSF Universe)
  • What data are generated by your research?                                        4
  • What is your plan for managing the data?
2. It Makes Life Easier
• For you…
  • Increases efficiency
       • Easier to understand the data collected throughout the life cycle of the
         project




                                                                                    3/01/13
       • Easier to find the data that you need throughout the life cycle of the
         project
  •




                                                                                    Data Management Basics
      Satisfies applicable legal obligations
  •   Addresses preservation, documentation, verification issues
  •   Helps reviewers understand the characteristics of your data
  •   Increases citation rates for articles

• For others…
  •   Provides continuity – other researchers can build on your data
  •   Enhances longevity and usability
  •   Facilitates new discoveries
  •   Supports open access                                                               5
3. It’s the Right Thing To Do
Responsible Conduct of Research/Research Ethics
• Data Acquisition, Management, Sharing and Ownership
  • Using the appropriate research method




                                                                           3/01/13
  • Providing attention to detail
  • Obtaining appropriate permissions




                                                                           Data Management Basics
  • Recording data accurately and securely
  • Maintaining data to allow it to confirm research findings,
    establish priority, and be reanalyzed by other researchers.
  • Storing data to protect confidentiality, be secure from physical and
    electronic damage, destruction or theft, and be maintained for the
    appropriate time frame dictated by sponsor and University policies.


Compliance
• Research using Human Subjects (Institutional Review Board)                    6
3/01/13
                              Data Management Basics
Naming Your files
Organizing Your Data
Backup and Storage
Post-Project Considerations

SMART DATA PRACTICES               7
Organizing Your Data
• Getting Started
  • Consider your goals
     • What do you want to get out of managing your data?




                                                               3/01/13
     • What is the most efficient way to organize your data?
  • Figure out your criteria for keeping data




                                                               Data Management Basics
  • Think about where you want your data to end up




                                                                    8
3/01/13
filename = chief identifier for a




                                    Data Management Basics
       research data file


                                         9
Organization




                                       3/01/13
                File




                                       Data Management Basics
              naming
                and
              labeling
Consistency                  Context


                                       10
Some potential components for
your file naming strategy
•   Version number




                                                  3/01/13
•   Date of creation
•   Name of creator




                                                  Data Management Basics
•   Description of content
•   Name of individual/research team/department
•   Publication date
•   Project number



                                                  11
Organizing Your Data




                                                                                                      3/01/13
                                                                                                      Data Management Basics
                                                                                                      12
W. E. B. Du Bois, Niagara delegate meeting, Boston, 1907. W. E. B. Du Bois Papers (MS 312). Special
Collections and University Archives, University Libraries, University of Massachusetts Amherst
Organizing Your Data
• Let’s Clean Up Those File Names
  • abcdefghijklmnopqrstuvwxyz.jpg
     • doesn’t make much sense, does it?




                                            3/01/13
  • How about:




                                            Data Management Basics
     • 20120925_credo_du_bois_rrz_001.jpg

  • And I put it in a directory called:
     • credo_du_bois




                                            13
Organizing Your Data
• Why this structure?
  • Oh, I just made it up! But I’m going to be consistent
     •   20120925 = date I found the image




                                                                                3/01/13
     •   credo = database/collection where I found the image
     •   du_bois = image subject




                                                                                Data Management Basics
     •   rrz = my initials (I am working in a group!)
     •   001 = an accession number (I made that up, too, but I’ll continue to
         use that schema)




                                                                                14
BAD naming practices
• Using generic data file names that may conflict when moved
  from one location to another
• Failing to think about scale




                                                               3/01/13
• Using special characters in a filename such as:




                                                               Data Management Basics
  &*%$£]{!@




                                                               15
Versioning
• Use ordinal numbers (1,2,3) for major version changes and the
  decimal for minor changes: v1, v1.1, v2.6
• Beware of using confusing labels: revision, final, final2,




                                                                     3/01/13
  definitive_copy
• Discard or delete obsolete versions




                                                                     Data Management Basics
• Use an auto-backup facility (if available) rather than saving or
  archiving multiple versions
• Turn on versioning or tracking in collaborative documents or
  storage utilities such as Wikis, GoogleDocs, etc.


                                                                     16
Quiz! File naming by date
What is the best filename?
A. 2012-09-25_Attachment




                                  3/01/13
B. 25 September 2012 Attachment
C. 25092012attch




                                  Data Management Basics
                                  17
Quiz! File naming by description
What is the best filename?
A. dubois_great_barrington_recent_20120925_old
   version.docx




                                                 3/01/13
B. 2012-09-25_dubois_great_barrington_V1.docx




                                                 Data Management Basics
C. FFTX_2365498_old.docx




                                                 18
Organizing Your Data
• Organizational methods
  • Hierarchical
  • Tag-based




                                                        3/01/13
• Retrieval                “Very little skill is




                                                        Data Management Basics
  • Location-based         needed to actually be
  • Search-based           organized and
                           efficient…. just the
                           consciousness to put
                           this file or folder in the
                           right place.”
                                                        19
Organizing Your Data
Use folders!




                                     3/01/13
DuBois
    DuBois_Images




                                     Data Management Basics
         DuBois_Images/1868-1898/
         DuBois_Images/1898-1928/
    DuBois_Letters
         DuBois_Letters/1868-1898/
         DuBois_Letters/1898-1928/
    DuBois_Newspapers/


            etc.

                                     20
Archive what you don’t or won’t
need
• Decide what your final data sets are
  • Once your project is over, weed out obsolete data and decide
    what you want to keep for the long-term




                                                                   3/01/13
• Move files and folders to an ‘Archive’ or ‘Old files’ folder
  • z_archive




                                                                   Data Management Basics
                                                                   21
Backup and Storage




                                                             3/01/13
                                                             Data Management Basics
                                                             22
   January 2011: “Stolen laptop contains cancer cure data”
Backup and Storage
• Backup is an essential component of data management
  • Prevent against accidental or malicious data loss
  • Restore original data




                                                                     3/01/13
• Keep 3 copies




                                                                     Data Management Basics
                                               Original
• Consider
  •   How much?
  •   How frequently?
  •   Which media?                  External              External
                                     Local                Remote
  •   Synchronization

                                                                     23
• Test your system
Backup and Storage
• Accessibility of data depends on storage media and file format
  • Vulnerable to deterioration
  • Become obsolete over time




                                                                   3/01/13
• Plan for disruption




                                                                   Data Management Basics
                                             Original
• Consider
  • Non-proprietary
    file formats
  • Different media types         External              External
    in storage strategy            Local                Remote
  • Migrate data
  • Unencrypted,                                                   24
    uncompressed
Backup and Storage
• Security
  • Encryption can be used for safely moving or storing files,
     • Encrypting files on storage devices (flash drives)




                                                                  3/01/13
     • Encryption during file transfer (ie: WinSCP)
     • Encrypted storage services




                                                                  Data Management Basics
• Deleting Data
  • Weed out obsolete data and decide what you want to keep for
    the long-term
  • Deleting files does not delete files

• Other things to Consider
  • How will the data be used?                                    25
  • Who pays for storage?
Post-Project Activities
• Publication? Sharing?
  • Intellectual Property
  • Copyright




                                             3/01/13
      • Creative Commons




                                             Data Management Basics
• Platforms?
  • ScholarWorks@UMass Amherst
  • ICPSR

• Copyright & Information Policy Librarian
  Laura Quilter
  lquilter@library.umass.edu
                                             26
Data Management is About
Planning
Data management will:
 • Prevent bad things




                                                      3/01/13
   from happening to       Collection   Description
   your data;




                                                      Data Management Basics
 • Make you a more           Storage
                                          Access
   efficient researcher;   and Backup
 • Prepare you for
   grant management.
                                                      27
Data Management Plans
NSF

•     The types of data;




                                                                         3/01/13
•     The standards to be used for data and metadata format and
      content ;




                                                                         Data Management Basics
•     The policies for access and sharing;
•     The policies and provisions for re-use, re-distribution, and the
      production of derivatives; and
•     The plans for archiving and for preservation of access.




                                                                         28
RESOURCES




            Data Management Basics   3/01/13
 29
Planning
• Data Working Group (email datamanagement@library.umass.edu)
   • Digital projects
   • Long-term preservation




                                                                3/01/13
   • Assessment
• Web resources




                                                                Data Management Basics
   • UMass Amherst Libraries: General Resources
       (http://guides.library.umass.edu/datamanagement)
• Discipline-specific
   •   Your faculty
   •   Your mentors
   •   Your professional associations
   •   Industry partners
   •   Public engagement
                                                                30
Backup and Storage
• Storage
   • Udrive (http://www.oit.umass.edu/udrive )
   • Departmental servers
   • CDs/DVDs/external hard drives




                                                                                       3/01/13
• Filesharing (see http://chronicle.com/blogs/profhacker/protecting-your-data/37350)
   • Dropbox




                                                                                       Data Management Basics
   • Google Docs

• Cloud Storage
       • Amazon Web Services
       • Rackspace
       • Microsoft Azure
       • Sugar Sync

• Additional Information
   •   MIT on Backups and Security
       http://libraries.mit.edu/guides/subjects/data-management/backups.html
   •   UK Data Archive on Data Storage                                                 31
       http://www.data-archive.ac.uk/create-manage/storage
   •   UK Preservation Office “Caring for CDs and DVDs”
       http://www.bl.uk/blpac/pdf/cd.pdf
Tools
Information Management                            Desktop Search Tools
• Devonthink                                      • Windows Search
   http://www.devontechnologies.com                  http://www.microsoft.com/en-
• Yojimbo                                            us/download/details.aspx?id=23




                                                                                                3/01/13
   http://www.barebones.com/products/yojimbo      • UltraSearch
• EverNote                                           http://www.jam-software.com/ultrasearch/
   http://www.evernote.com/about/home.php         • Locate 32




                                                                                                Data Management Basics
• Scribe (Mac, Windows, Free)                        http://locate32.cogit.net/
   http://chnm.gmu.edu/tools/scribe/              Tagging Tools
• Springpad                                       • Tabbles
   http://springpadit.com/home                       http://tabbles.net/
Citation Management                               • TaggTool
• Mendeley                                           http://www.taggtool.com/index.php
   http://www.mendeley.com/features/              • TaggedFrog
• Zotero                                             http://lunarfrog.com/taggedfrog/
   http://www.zotero.org/                         Tool Directories
• RefWorks                                        • Bamboo DiRT
   http://guides.library.umass.edu/refworksatum      http://dirt.projectbamboo.org/
   ass                                            • CHNM Research + Tools
                                                                                                32
                                                     http://chnm.gmu.edu/research-and-tools/
Sources
• MIT Data Management
   (http://libraries.mit.edu/guides/subjects/data-management/)
• UK Data Archive




                                                                 3/01/13
   (http://www.data-archive.ac.uk/)
 • MANTRA




                                                                 Data Management Basics
   (http://datalib.edina.ac.uk/mantra/organisingdata.html)
 • Creating Order from Chaos: 9 Great Ideas for Managing Your
   Computer Files
   (http://www.makeuseof.com/tag/creating-order-chaos-9-
   great-ideas-managing-computer-files/)
 • Research Information Management: Tools for the Humanities
   (http://sudamih.oucs.ox.ac.uk/docs/Generic%20Courses/Tools
   %20for%20the%20Humanities%20course%20book.docx)
                                                                 33
Questions/contact
      datamanagement@library.umass.edu




                                         3/01/13
                                         Data Management Basics
                                         34

Weitere ähnliche Inhalte

Was ist angesagt?

A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4Leon Osinski
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycleMarieke Guy
 
Data Management for the Digital Humanities
Data Management for the Digital HumanitiesData Management for the Digital Humanities
Data Management for the Digital HumanitiesThea Atwood
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management PlanKristin Briney
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
Research Data Management: Part 2, Practices
Research Data Management: Part 2, PracticesResearch Data Management: Part 2, Practices
Research Data Management: Part 2, PracticesAmyLN
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementCunera Buys
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Research Data Management: Part 1, Principles & Responsibilities
Research Data Management: Part 1, Principles & ResponsibilitiesResearch Data Management: Part 1, Principles & Responsibilities
Research Data Management: Part 1, Principles & ResponsibilitiesAmyLN
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementcunera
 
Research Data Management for SOE
Research Data Management for SOEResearch Data Management for SOE
Research Data Management for SOELynda Kellam
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
 
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...Kristin Briney
 
Responsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementKristin Briney
 

Was ist angesagt? (20)

A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of Pretoria
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
 
Data Management for the Digital Humanities
Data Management for the Digital HumanitiesData Management for the Digital Humanities
Data Management for the Digital Humanities
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
Research Data Management: Part 2, Practices
Research Data Management: Part 2, PracticesResearch Data Management: Part 2, Practices
Research Data Management: Part 2, Practices
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Data management plans
Data management plansData management plans
Data management plans
 
Research Data Management: Part 1, Principles & Responsibilities
Research Data Management: Part 1, Principles & ResponsibilitiesResearch Data Management: Part 1, Principles & Responsibilities
Research Data Management: Part 1, Principles & Responsibilities
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Research Data Management for SOE
Research Data Management for SOEResearch Data Management for SOE
Research Data Management for SOE
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
 
Research Data Management: How will Northwestern address new sharing requireme...
Research Data Management: How will Northwestern address new sharing requireme...Research Data Management: How will Northwestern address new sharing requireme...
Research Data Management: How will Northwestern address new sharing requireme...
 
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
 
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 
Responsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data Management
 

Ähnlich wie Data managementbasics issr_20130301

Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto UniversityStephanie Simms
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data ManagementUW Research Data Services
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management Wendy Mears
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
How to write a data management plan
How to write a data management planHow to write a data management plan
How to write a data management planOpenExeter
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data ManagementIzzyChad
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate ResearchRebekah Cummings
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementSarah Jones
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management PlanningSarah Jones
 
Data Management Planning in the arts
Data Management Planning in the artsData Management Planning in the arts
Data Management Planning in the artsSarah Jones
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsMarieke Guy
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Introduction to Databases
Introduction to DatabasesIntroduction to Databases
Introduction to DatabasesMohd Tousif
 

Ähnlich wie Data managementbasics issr_20130301 (20)

Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data Management
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
How to write a data management plan
How to write a data management planHow to write a data management plan
How to write a data management plan
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data Management
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
DBMS introduction
DBMS introductionDBMS introduction
DBMS introduction
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management Planning
 
Data Management Planning in the arts
Data Management Planning in the artsData Management Planning in the arts
Data Management Planning in the arts
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate students
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
The Ethics of Digital Preservation
The Ethics of Digital PreservationThe Ethics of Digital Preservation
The Ethics of Digital Preservation
 
Introduction to Databases
Introduction to DatabasesIntroduction to Databases
Introduction to Databases
 
Creating dmp
Creating dmpCreating dmp
Creating dmp
 

Data managementbasics issr_20130301

  • 1. 3/01/13 Data Management Data Management Basics Basics A Workshop for Graduate Students March 1, 2013 1
  • 2. WHY MANAGE DATA? Data Management Basics 3/01/13 2
  • 3. 1. Funders Require It • National Institutes of Health: Data Sharing Policy (2003) • All grants funded at $500K or above must include a Data Sharing Plan 3/01/13 • National Science Foundation: Data Management Plan Requirement (2011) • All proposals must submit a 2 pp supplementary “Data Management Plan” to Data Management Basics describe how projects will comply with NSF data sharing policy • National Endowment for the Humanities: Sustainability and Data Management Plans Requirement (2012) • Digital Humanities Implementation Grants must include a plan to discuss how data will be managed, disseminated, and preserved • OSTP Directive to Funding Agencies (2013) • Federal agencies with more than $100M in R&D expenditures must ensure 3 that published results of federally funded research are freely available to the public within one year of publication -- including data
  • 4. National Science Foundation • Data Management Plan Requirement • How projects will conform to NSF data sharing policy • Flexible 3/01/13 • “The plan should reflect best practices in your area of research, and should be appropriate to the data you generate.” Data Management Basics • Directorate for Social, Behavioral and Economic Sciences • Discipline-specific guidelines • Archeology (Digital Archeological Record) • Economics (American Economic Association) • Universals (for the NSF Universe) • What data are generated by your research? 4 • What is your plan for managing the data?
  • 5. 2. It Makes Life Easier • For you… • Increases efficiency • Easier to understand the data collected throughout the life cycle of the project 3/01/13 • Easier to find the data that you need throughout the life cycle of the project • Data Management Basics Satisfies applicable legal obligations • Addresses preservation, documentation, verification issues • Helps reviewers understand the characteristics of your data • Increases citation rates for articles • For others… • Provides continuity – other researchers can build on your data • Enhances longevity and usability • Facilitates new discoveries • Supports open access 5
  • 6. 3. It’s the Right Thing To Do Responsible Conduct of Research/Research Ethics • Data Acquisition, Management, Sharing and Ownership • Using the appropriate research method 3/01/13 • Providing attention to detail • Obtaining appropriate permissions Data Management Basics • Recording data accurately and securely • Maintaining data to allow it to confirm research findings, establish priority, and be reanalyzed by other researchers. • Storing data to protect confidentiality, be secure from physical and electronic damage, destruction or theft, and be maintained for the appropriate time frame dictated by sponsor and University policies. Compliance • Research using Human Subjects (Institutional Review Board) 6
  • 7. 3/01/13 Data Management Basics Naming Your files Organizing Your Data Backup and Storage Post-Project Considerations SMART DATA PRACTICES 7
  • 8. Organizing Your Data • Getting Started • Consider your goals • What do you want to get out of managing your data? 3/01/13 • What is the most efficient way to organize your data? • Figure out your criteria for keeping data Data Management Basics • Think about where you want your data to end up 8
  • 9. 3/01/13 filename = chief identifier for a Data Management Basics research data file 9
  • 10. Organization 3/01/13 File Data Management Basics naming and labeling Consistency Context 10
  • 11. Some potential components for your file naming strategy • Version number 3/01/13 • Date of creation • Name of creator Data Management Basics • Description of content • Name of individual/research team/department • Publication date • Project number 11
  • 12. Organizing Your Data 3/01/13 Data Management Basics 12 W. E. B. Du Bois, Niagara delegate meeting, Boston, 1907. W. E. B. Du Bois Papers (MS 312). Special Collections and University Archives, University Libraries, University of Massachusetts Amherst
  • 13. Organizing Your Data • Let’s Clean Up Those File Names • abcdefghijklmnopqrstuvwxyz.jpg • doesn’t make much sense, does it? 3/01/13 • How about: Data Management Basics • 20120925_credo_du_bois_rrz_001.jpg • And I put it in a directory called: • credo_du_bois 13
  • 14. Organizing Your Data • Why this structure? • Oh, I just made it up! But I’m going to be consistent • 20120925 = date I found the image 3/01/13 • credo = database/collection where I found the image • du_bois = image subject Data Management Basics • rrz = my initials (I am working in a group!) • 001 = an accession number (I made that up, too, but I’ll continue to use that schema) 14
  • 15. BAD naming practices • Using generic data file names that may conflict when moved from one location to another • Failing to think about scale 3/01/13 • Using special characters in a filename such as: Data Management Basics &*%$£]{!@ 15
  • 16. Versioning • Use ordinal numbers (1,2,3) for major version changes and the decimal for minor changes: v1, v1.1, v2.6 • Beware of using confusing labels: revision, final, final2, 3/01/13 definitive_copy • Discard or delete obsolete versions Data Management Basics • Use an auto-backup facility (if available) rather than saving or archiving multiple versions • Turn on versioning or tracking in collaborative documents or storage utilities such as Wikis, GoogleDocs, etc. 16
  • 17. Quiz! File naming by date What is the best filename? A. 2012-09-25_Attachment 3/01/13 B. 25 September 2012 Attachment C. 25092012attch Data Management Basics 17
  • 18. Quiz! File naming by description What is the best filename? A. dubois_great_barrington_recent_20120925_old version.docx 3/01/13 B. 2012-09-25_dubois_great_barrington_V1.docx Data Management Basics C. FFTX_2365498_old.docx 18
  • 19. Organizing Your Data • Organizational methods • Hierarchical • Tag-based 3/01/13 • Retrieval “Very little skill is Data Management Basics • Location-based needed to actually be • Search-based organized and efficient…. just the consciousness to put this file or folder in the right place.” 19
  • 20. Organizing Your Data Use folders! 3/01/13 DuBois DuBois_Images Data Management Basics DuBois_Images/1868-1898/ DuBois_Images/1898-1928/ DuBois_Letters DuBois_Letters/1868-1898/ DuBois_Letters/1898-1928/ DuBois_Newspapers/ etc. 20
  • 21. Archive what you don’t or won’t need • Decide what your final data sets are • Once your project is over, weed out obsolete data and decide what you want to keep for the long-term 3/01/13 • Move files and folders to an ‘Archive’ or ‘Old files’ folder • z_archive Data Management Basics 21
  • 22. Backup and Storage 3/01/13 Data Management Basics 22 January 2011: “Stolen laptop contains cancer cure data”
  • 23. Backup and Storage • Backup is an essential component of data management • Prevent against accidental or malicious data loss • Restore original data 3/01/13 • Keep 3 copies Data Management Basics Original • Consider • How much? • How frequently? • Which media? External External Local Remote • Synchronization 23 • Test your system
  • 24. Backup and Storage • Accessibility of data depends on storage media and file format • Vulnerable to deterioration • Become obsolete over time 3/01/13 • Plan for disruption Data Management Basics Original • Consider • Non-proprietary file formats • Different media types External External in storage strategy Local Remote • Migrate data • Unencrypted, 24 uncompressed
  • 25. Backup and Storage • Security • Encryption can be used for safely moving or storing files, • Encrypting files on storage devices (flash drives) 3/01/13 • Encryption during file transfer (ie: WinSCP) • Encrypted storage services Data Management Basics • Deleting Data • Weed out obsolete data and decide what you want to keep for the long-term • Deleting files does not delete files • Other things to Consider • How will the data be used? 25 • Who pays for storage?
  • 26. Post-Project Activities • Publication? Sharing? • Intellectual Property • Copyright 3/01/13 • Creative Commons Data Management Basics • Platforms? • ScholarWorks@UMass Amherst • ICPSR • Copyright & Information Policy Librarian Laura Quilter lquilter@library.umass.edu 26
  • 27. Data Management is About Planning Data management will: • Prevent bad things 3/01/13 from happening to Collection Description your data; Data Management Basics • Make you a more Storage Access efficient researcher; and Backup • Prepare you for grant management. 27
  • 28. Data Management Plans NSF • The types of data; 3/01/13 • The standards to be used for data and metadata format and content ; Data Management Basics • The policies for access and sharing; • The policies and provisions for re-use, re-distribution, and the production of derivatives; and • The plans for archiving and for preservation of access. 28
  • 29. RESOURCES Data Management Basics 3/01/13 29
  • 30. Planning • Data Working Group (email datamanagement@library.umass.edu) • Digital projects • Long-term preservation 3/01/13 • Assessment • Web resources Data Management Basics • UMass Amherst Libraries: General Resources (http://guides.library.umass.edu/datamanagement) • Discipline-specific • Your faculty • Your mentors • Your professional associations • Industry partners • Public engagement 30
  • 31. Backup and Storage • Storage • Udrive (http://www.oit.umass.edu/udrive ) • Departmental servers • CDs/DVDs/external hard drives 3/01/13 • Filesharing (see http://chronicle.com/blogs/profhacker/protecting-your-data/37350) • Dropbox Data Management Basics • Google Docs • Cloud Storage • Amazon Web Services • Rackspace • Microsoft Azure • Sugar Sync • Additional Information • MIT on Backups and Security http://libraries.mit.edu/guides/subjects/data-management/backups.html • UK Data Archive on Data Storage 31 http://www.data-archive.ac.uk/create-manage/storage • UK Preservation Office “Caring for CDs and DVDs” http://www.bl.uk/blpac/pdf/cd.pdf
  • 32. Tools Information Management Desktop Search Tools • Devonthink • Windows Search http://www.devontechnologies.com http://www.microsoft.com/en- • Yojimbo us/download/details.aspx?id=23 3/01/13 http://www.barebones.com/products/yojimbo • UltraSearch • EverNote http://www.jam-software.com/ultrasearch/ http://www.evernote.com/about/home.php • Locate 32 Data Management Basics • Scribe (Mac, Windows, Free) http://locate32.cogit.net/ http://chnm.gmu.edu/tools/scribe/ Tagging Tools • Springpad • Tabbles http://springpadit.com/home http://tabbles.net/ Citation Management • TaggTool • Mendeley http://www.taggtool.com/index.php http://www.mendeley.com/features/ • TaggedFrog • Zotero http://lunarfrog.com/taggedfrog/ http://www.zotero.org/ Tool Directories • RefWorks • Bamboo DiRT http://guides.library.umass.edu/refworksatum http://dirt.projectbamboo.org/ ass • CHNM Research + Tools 32 http://chnm.gmu.edu/research-and-tools/
  • 33. Sources • MIT Data Management (http://libraries.mit.edu/guides/subjects/data-management/) • UK Data Archive 3/01/13 (http://www.data-archive.ac.uk/) • MANTRA Data Management Basics (http://datalib.edina.ac.uk/mantra/organisingdata.html) • Creating Order from Chaos: 9 Great Ideas for Managing Your Computer Files (http://www.makeuseof.com/tag/creating-order-chaos-9- great-ideas-managing-computer-files/) • Research Information Management: Tools for the Humanities (http://sudamih.oucs.ox.ac.uk/docs/Generic%20Courses/Tools %20for%20the%20Humanities%20course%20book.docx) 33
  • 34. Questions/contact datamanagement@library.umass.edu 3/01/13 Data Management Basics 34

Hinweis der Redaktion

  1. Starting in January 2011 NSF is requiring that grant proposals have a Data Management Plan.The DMP is described as no more than two pages, specifying the types of data, the standards to be used for data and metadata format and content, policies for accessing and sharing the data.They do state that a valid plan may include only the statement that no detailed plan is needed, but you have to justify that statement.DMP will be reviewed as an integral part of the proposal, coming under the Intellectual Merit or Broader Impacts sections or both. Grant Proposal Guide (GPG), Chapter II.C.2.j NSF Directorates, Programs have additional requirementshttp://www.nsf.gov/bfa/dias/policy/dmp.jspThe Biological Sciences, Engineering, Geosciences, Social, Behavioral and Economic Sciences Directorates are examples having additional requirements for their DMPs.National Institutes of Health expect researchers to include data sharing plans in their proposals as well. This appears to be a trend for other funding agencies. NIH data sharing policy: Data should be made as widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data.NSF data sharing policy:Investigatorsare expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. NEH: All proposals will be required to include both a sustainability plan that discusses long-term support for the project and a data management plan that discusses how research data will be preserved.
  2. The National Science Foundation recognizes the need for flexibility. Different types of data require different plans. The NSF documentation points researchers to some specific sites for more specific protocols. There are numerous others for the rest of the social sciences. You can contact your scholarly associations for more details.You must demonstrate that you know what you data are and how you will manage it to funding agencies.
  3. For you:Since a number of grants are multi-year, renewable propositions; building good data management practices into proposals is crucial.By doing so, problems associated with lab turnover can be addressed (one professor noted that 3-4 more papers would have come out of his lab if not for this type of issue) But also assists you in remembering relevant details and procedures relating to your data and data collection over the long haul as well.Developing a good data archiving plan safeguards your investment of time and money and makes recovery from disaster possible and hopefully, faster and more complete. Addresses your documentation and verification issues. Type of data to be produced, description of the methodology, standards that will be applied.Satisfies many legal obligations such as security measures to protect confidentiality or IP considerations.Good DMP helps reviewers understand your work and increases the visibility. Easily accessible and clearly understood. Preserves your unique contribution to your field. For others:Although provisions are made for restrictions or embargos on data , particularly those having commercial implications, there is an underlying assumption that data should be shared, distributed and built upon. A data management plan gets you to think about and plan for how that will happen.Promoting new discoveries and minimizing duplication of effort. Open access movement Science Commons, PubChem et al which fosters the development of knowledge. Science Commons is an organization that promotes legal and technical mechanisms to remove barriers to sharing scientific information. One way they are looking at that is through the Open Knowledge Definition which sets out to define openness in relation to content and data.
  4. RCR covers a range of topics that speak to the conduct of investigators and the integrity of the research university (where an investigator is defined in UMass COI policy as: the principal investigator and any other person who is responsible for the design, conduct, or reporting of research funded). It is a philosophy of creating an environment for research that encourages quality and ethical principles. Topics include Mentor/Trainee Responsibilities;Publication Practices and Responsible Authorship; Peer Review; Collaborative Science;Communication and Difficult Conversations; and Data Acquisition, Management, Sharing and Ownership. Many of the practices and constraints will be dictated by the discipline, by the lab, the funding conditions, but there are generally accepted  standards that investigators should be aware of and adhere to relative to data ownership, data collection, data protection and data sharing.By following good data practices (or RCR), an investigator can avoid risk of misconduct and comply with policies and regulations regarding intellectual property and animal or human research subjects. Examples of Compliance include protocols for doing research with animals, for biological and environmental safety, and export control. Research using Human Subjects involves having project reviewed by the University’s IRB (a federally mandated body which reviews all sponsored research involving human subjects), obtaining consent, and maintaining confidence of data collected. Research with Human Subject is the domain where privacy (for sensitive data), confidentiality, and security will be major concerns when managing data. Examples of Ethical concerns include Conflicts of Interest (related to financial concerns or intellectual property rights concerns influence the design, conduct or reporting of research), Faculty Consulting, and Whistleblowing. Research Misconduct also in this category. Misconduct: means fabrication, falsification, or plagiarism in proposing, performing, reporting, or reviewing research, not including honest error or difference of opinion; misrepresentation of the procedures and outcomes of research to gain some advantage. Policies to investigate and determine misconduct include fact finding (which means examination of data).
  5. Data = research
  6. Organizing your data is about keeping good records, namely planning file naming conventions and organizing file directories to your advantageWhat are your goals? Based on those goals: how should you organize your data? Are there key themes, categories, people, dates, formats, etc? You might document/store/organize data differently for different outcomes like sharing, preserving, sharing a small subset, etc.What is important to save? If you plan well, you can put your research anywhere.
  7. The most basic part of organizing your data is to consider your filenames. Most computers uses filenames to index content; “Windows search”Clear names will help in retrieving files and should fit with your overall organizational approach for your project.
  8. There are three things to consider when naming files – organization, context, consistency.Organization is important for future access and retrieval –Context could include content-specific or descriptive information Consistency – choose a naming convention and ensure that the rules are followed systematically by always including the same information (such as date and time) in the same order (YYYYMMDD).
  9. http://datalib.edina.ac.uk/mantra/organisingdata.html
  10. How would we name this image file – found in the University Archives?
  11. File naming conventions.
  12. Consistency is key. Use underscores instead of full-stops or spaces because, like special characters, these are parsed differently on different systems The filename should include as much descriptive information that will assist identification independent of where it is storedIf including dates, format them consistently
  13. Scale:if you want to include a project number, don’t limit your project number to 2 digits, or you can only have ninety nine projectsSpecial characters: these are often used for specific tasks in a digital environment
  14. It is important to identify and distinguish versions of research data files consistently. This ensures that a clear audit trail exists for tracking the development of a data file and identifying earlier versions when needed. Thus you will need to establish a method that makes sense to you that will indicate the version of your data files.http://datalib.edina.ac.uk/mantra/organisingdata.html
  15. A – correct. Files using this naming convention are easy to distinguish from one another, easier to browse and locate chronologically.B - Incorrect. File not easy to browse and locate chronologically.C - Incorrect. File not easy to browse and locate chronologically. Filename not immediately intuitive.Tip! If using a date, use the format year-Month-Day: YYYY-MM-DD or YYYY-MM or YYYY-YYYY. This will maintain chronological order of your files.
  16. A – incorrect - date is ambiguous, there could be several ‘old’ versions. B – correct – date is in uniform format and easy to distinguish/sort from files using same date convention. Filename represents more accurately the content. Using a version number convention also makes it easier to distinguish from other versions of the same file.C – incorrect – this is an application generated filename lacking descriptive or context-specific information.
  17. Hierarchical – most commons operating systems default to this way of organizing filesAn item can only go into one place or folder (unless there are duplicates)Must choose a system for categorizing filesWell-adapted to location based findingTag-based – electronic labels or keywords applied to files, flat systemAn item can have many tags, more flexibility with how a file is categorizedTags must be applied consistentlyPlan and then follow the plan. Implement. File things immediately; put things in the right place according to your plan as they are created.
  18. Example of a well-organized file with consistent naming conventions.Major heading with logical subheadings.Individual files under subheadings distinguished by date of analysis, or collection, etc. but be consistentOrganize by category – for example, if you are studying multiple individuals and are collecting many types of documents about them, you could organize first by the individual, then by type of coverage – image, letter, newspaper, then by date. One place for everything – you need a place where you know that you can access your files and folders there. The My Documents folder is the logical and perfect place for this - this is a home for your folders, which contain your files. Think of it in the sense that you wouldn’t put your folders in the yard, nor would you put your filing cabinet in the yard… you put both of them in the house. Your My Documents folder is your “house” of sorts.Plan and implement. File things immediately; put things in the right place according to your plan as they are created.
  19. Personally I recommend still having it in the My Documentsfolder to keep things easy to remember and consistent. With a name like “Archive” it’ll likely be near the top of whatever folder you decide to put it in. To change this, you can add a “z” and a period to the beginning of the name, so the folder could look something like “z.Archive“. This will put it at the bottom of the list so you won’t have to worry about it being in the way all the time.http://www.makeuseof.com/tag/creating-order-chaos-9-great-ideas-managing-computer-files/
  20. V. important component of data management – backup. University of Oklahoma researcher loses years of research due to theft. PC advisor poll from November 2010 indicates that 1 of 13 do not back up important data! [30% back up important data daily; 25% weekly; 21% monthly; 16% rarely backup data; 8% never]http://www.pcadvisor.co.uk/news/security/3248400/poll-30-percent-back-up-data-every-day/
  21. Backup ensures that most recent data will always be accessible and concerns the procedures for saving and synchronizing data. Accidental or malicious data loss due to:hardware faults or failuresoftware or media faultsvirus infection or malicious hackingpower failurehuman errors by changing or deleting filesRecommended practice is to keep 3 copies of your data. How much: What will you need to restore in the event of data loss? Are there backup policies already established for the institutional/network computers you are using and will they be sufficient for your project?How frequently: how critical are the changes being made or the new data being generated? Backup after every change, or at regular intervals. Use automated backup processes. Which media: depends on quantity, file type, project needs. Options include removable media (hard or flash drives), recordable CD/DVD, or network drive. Synchronization: Ensures consistency between backup copies. Use the same or compatible naming conventions for the original project files – label removable media!
  22. Storage concerns the location and media for housing data and is important because digital media are inherently unstable and change rapidly. Media currently available for storing data files are optical media - CDs and DVDs - and magnetic media - hard drives and tapes. Both vulnerable to physical degradation. Storage strategy even for short term projects should include two different forms of media.Non-proprietary file types (follow an open, documented standard; ASCII or Unicode; community-supported; unencrypted; uncompressed):PDF/A, not WordASCII, not Excel MPEG-4, not QuicktimeTIFF or JPEG2000, not GIF or JPGXML or RDF, not RDBMSWhich media: Portable HD? Cloud? Department server? Subject data repository? UK Data Archive recommends using at least two different media types in your storage strategy (optical/magnetic) in addition to local and remote backup copies. Unencrypted is ideal for storing your data because it will make it most easily read by you and others in the future. (MIT)Uncompressed is also ideal for storage, but if you need to do so to conserve space, limit compression to your 3rd backup copy (MIT)
  23. Secure data storage will prevent unauthorized access, changes, disclosure, or destruction of data and includes physical as well as network security. Refers to physical security (passwords, firewalls, anti-virus and anti-malware software) as well as security when sharing or moving files. Encryption is the easiest and most practical method of protecting data stored or transmitted electronically and is particularly essential with sensitive data. (ECU)Moving or storing files, such as back-ups or storage on mobile devices. Individual files can be encrypted, as well as entire storage devices or spaces.http://www.ecu.edu/cs-itcs/itsecurity/DataEncryption.cfmWeeding: Determined by project requirementsHow will the data be used?In-house? Outside users?Restricted?Is it live or “archived?”
  24. These may be things that you will get to toward the end of a project, but are good to think about. Traditional outcomes of research are published papers (much of what tenure and promotion is based on). Growing practice to submit supplemental data files along with manuscripts at the point of publication. Know what your intellectual property is, what your copyrights are and how they apply to data and databases.Much of what is created is considered an “exempt scholarly work”: university automatically waives ownership of this class of IP. UMass Policy: the creator owns IP that is created or discovered here.Copyright providesLegal protection for “original works of authorship”Facts and ideas can not be copyrighted, but their expression canData sets and databases can be protected under copyright as literary works, which includes “tables” and “compilations”Expectations of sharing are have also created an environment where datasets are being shared within communities.It has been recognized, by Creative Commons specifically, that the nature of sharing data sets is fundamentally different than sharing textual documents. Also that the benefits of data sharing outweigh the constraints of applying copyright. They have endorsed a Database Protocol which encourages the unfettered sharing of data through the use of a CC0 license: this essentially puts data into the public domain. Venues for data sharing include Institutional and Disciplinary Repositories. Data Citation means providing a reference to data in the same way as researchers routinely provide a bibliographic reference to printed resources. Important part of validating datasets as a primary research output rather than a by product of research. University resources: university funds, time, and facilities; not use of library, facilities available to the public, or occasional use of office equipment.Exempted scholarly works: Students sign participation agreement (prior to hire as research assistants, for example)Who owns copyright of data?Creator of the dataUnder UMass IP Policy, the creator owns IP that is made, discovered, or created here unlessSignificant use of University resourcesUniversity-commissioned workIP Subject to contractual obligations (ie: sponsored research)Student work (except “exempt scholarly work”)“Exempt Scholarly Work” includesInstruction materials, including text books and class notesResearch articles, monographs, proposalsTheses and dissertations, dramatic works and performances, drawings sculpture, musical compositions and performances, poetry, fiction and non-fictionhttp://www.umass.edu/research/system/files/Intellectual_Propery_Policy_UMA.pdfStop for questions.
  25. These are the elements of data management – thinks that you should think about. Data management will have positive benefits.
  26. You will need somewhere to store your data as you are workingUdrive – you get 1GB, can share files with anyone through the udrive3rd party – many cloud storage providers – Amazon gives you 5GB, Dropbox gives you 2GB, google docs gives you 1GB, but you can purchase more space – 400gb for $100/year, 1TB for $256/year; cloud options provide a nearly infinitely-scalable tier of storage for archiving very large datasets. Prices can range from $0.14/GB to $0.55/GB.OIT security pages have links and instructions for downloading anit-virus and anti-malware software; it has tips for protecting your personal computer from unauthorized access;