An introductory class on research data management for scientists, designed and presented by Lisa Federer, Health and Life Sciences Librarian at UCLA Louise M. Darling Biomedical Library.
Data 101 - An Introduction to Research Data Management
1. Data 101: An Introduction to
Research Data Management
Lisa Federer
Health and Life Sciences Librarian
lmfederer@library.ucla.edu
2. Today’s Session
• General overview of research data management
best practices
• Data management across the research data life
cycle
• UC resources for data management
• Your questions
3. Why Data Management?
• funders often require it
• makes for easier transitions with lab staff
turnover
• helps better document research output
• ensures usefulness of data over its entire life
cycle
• facilitates data sharing
5. Plan The Data Management Plan
NSF guidance: http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp
NIH guidance: http://grants.nih.gov/grants/policy/data_sharing/data_sharing_faqs.htm
6. Plan
The Data Management Plan:
Common Misconceptions
• Does not require that all data must be shared
▫ Sensitive information/patient privacy
▫ Intellectual property rights and commercial value
• Sharing can take many forms
• Funders recognize that different disciplines have
different “cultures” of data sharing
• Sharing “at no more than incremental cost and
within a reasonable time”1
1. National Science Foundation, Dissemination and Sharing of Research Results, www.nsf.gov/bfa/dias/policy/dmp.jsp
7. Plan
The Data Management Plan:
Resources for Help
• Your library!
• DMP Tool: http://dmp.cdlib.org
• DMP Online: http://www.dcc.ac.uk/dmponline
8. Collect
Best Practices for Data
Collection
• Use non-proprietary file formats
▫ Plain text/RTF vs. Microsoft .doc/.docx
▫ PDF/PNG vs. BMP/TIF
• Consider future preservation and context, not
just what is needed right now
• Collect good metadata
Additional info on open formats: http://www.openformats.org
9. Collect
Manage Metadata: Data About Data
• Types of metadata
▫ Descriptive metadata
▫ Technical metadata
▫ Administrative metadata
▫ Use metadata
▫ Preservation metadata
• Many fields have existing metadata standards
▫ MIAME: microarray experiments
▫ Darwin Core: biology and biodiversity
Additional info: http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards
Example of metadata standards: http://sdl.syr.edu/?page_id=32
10. Manage Improving Excel Data with
Check for best practices
11. Manage Improving Excel Data with
Create metadata
12. Manage Improving Excel Data with
Get a unique identifier and citation
13. Manage Improving Excel Data with
• Two ways to use DataUp
▫ DataUp Excel add-in (Windows Excel 2007+ only)
https://bitbucket.org/dataup/main/downloads/DataUpAddIn.zip
▫ DataUp web application
http://www.dataup.org/
14. Share Best Practices for Data Sharing
• Different communities of practice have different
cultures of sharing
▫ Institutional/subject repositories
▫ Person-to-person sharing of data
▫ Publication in data journals
• Data sharing is associated with increased
citations1
1. Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate.
PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308
15. Share UC Tools for Sharing
creates unique identifiers
http://www.cdlib.org/services/uc3/ezid/
cost-effective repository
for sharing and storage
http://www.cdlib.org/services/uc3/merritt/
16. Share Selecting Licenses for Re-Use
Attribution (CC BY) Attribution-ShareAlike
This license lets others distribute, (CC BY-SA)
remix, tweak, and build upon your This license lets others remix, tweak,
work, even commercially, as long and build upon your work even for
as they credit you for the original commercial purposes, as long as they
creation. credit you and license their new
creations under the identical terms.
Attribution- Attribution-NoDerivs
NonCommercial (CC (CC BY-ND)
BY-NC ) This license allows for redistribution,
This license lets others remix, commercial and non-commercial, as
tweak, and build upon your work long as it is passed along unchanged
non-commercially, although their and in whole, with credit to you.
new works must also acknowledge
you and be non-commercial.
For more info see http://creativecommons.org/licenses/
17. Preserve
Share
Planning for Long Term
Preservation
• Especially important for unique and non-
replicable datasets
• Use institutional and subject specialized
repositories to preserve data for the long-term
• LOCKSS: Lots of Copies Keep Stuff Safe
• Talk to the library about your data
18. Preserve
Share Locating Repositories
• Institutional repositories at the UC
▫ Merritt https://merritt.cdlib.org/
▫ eScholarship http://www.escholarship.org/uc/ucla
• Finding subject specific repositories
▫ Databib
http://databib.org/
▫ Datacite repository list
http://datacite.org/repolist
▫ Open Access Directory Data Repositories
http://oad.simmons.edu/oadwiki/Data_repositories
19. Getting Additional Help
• UCLA Library Data Management Guide
http://guides.library.ucla.edu/data-management
• University of California Curation Center (UC3)
http://www.cdlib.org/services/uc3/
• Contact a librarian
▫ Data help: data@library.ucla.edu
▫ Copyright/licensing:
http://www.library.ucla.edu/copyright-publishing-contact-us
▫ Biomedical Library help: biomed-ref@library.ucla.edu
▫ Science and Engineering Library help:
sel-ref@library.ucla.edu
Today’s session will cover a general overview of research data management best practices. We only have an hour today, and this is a big topic, so obviously we can’t cover every single detail that you would need to know. However, keep an eye out for upcoming classes that will cover specific topics in more depth. I also welcome your input into what topics you’d like to see covered. Also, keep in mind that each of you are probably working with very different types of data and come from different fields. I’ll attempt to cover topics that have broad enough applicability that they will be relevant to many different types of data, but keep in mind that there may be differences in
There are many different variationson how people model the research data life cycle, but simply speaking, these are the five steps of the research data process.
NIH – requires data sharing plan for grants with direct costs over $500kNSF - Proposals submitted or due on or after January 18, 2011, must include a supplementary document of no more than two pages labeled “Data Management Plan”.
Demo DMP Tool
Descriptive Metadata enables identification, location and retrieval of information resources by users, often including the use of controlled vocabularies for classification and indexing and links to related resources.Technical Metadata describes the technical processes used to produce, or required to use a digital object. Administrative Metadata is used to manage administrative aspects of the digital object such as intellectual property rights and acquisition. Administrative Metadata also documents information concerning the creation, alteration and version control of the metadata itself. Use Metadata manages user access, user tracking and multi-versioning information.Preservation Metadata, amongst other things, documents actions which have been undertaken to preserve a digital resource such as migrations and checksum calculations.