An introduction to Research Data Management and Data Management Planning presented at the University of the West of England on Wednesday 9th July 2014.
Presentation on how to chat with PDF using ChatGPT code interpreter
DC101 UWE
1. Research Data Management
Sarah Jones
DCC, University of Glasgow
sarah.jones@glasgow.ac.uk
Twitter: @sjDCC
•University of the West of England, 9th
July 2014
Funded by:
2. Programme
• Quiz of funders’ requirements
• Introduction to RDM
• Data management planning
• Demo of DMPonline
• Q&A
3. “the active management and
appraisal of data over the
lifecycle of scholarly and
scientific interest”
Data management is part of
good research practice
What is research data management?
4. Why manage your research data?
• To make your research easier!
• To stop yourself drowning in irrelevant stuff
• In case you need the data later
• To avoid accusations of fraud or bad science
• To share your data for others to use and learn from
• To get credit for producing it
• Because somebody else said to do so
5. RCUK Common Principles on Data Policy
“Publicly funded research data are a public good,
produced in the public interest, which should be
made openly available with as few restrictions as
possible in a timely and responsible manner that
does not harm intellectual property.”
www.rcuk.ac.uk/research/datapolicy
7. Benefits of data sharing data (1)
www.nytimes.com/2010/08/13/health/research
/13alzheimer.html?pagewanted=all&_r=0
“It was unbelievable. Its not science
the way most of us have practiced
in our careers. But we all realised
that we would never get biomarkers
unless all of us parked our egos and
intellectual property noses outside
the door and agreed that all of our
data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
•... scientific breakthroughs
8. Benefits of data sharing (2)
“There is evidence that studies that make their
data available do indeed receive more citations
than similar studies that do not.”
Piwowar H. and Vision T.J 2013 "Data reuse and the open data
citation advantage“ https://peerj.com/preprints/1.pdf
9% - 30% increase
•... more citations
9. If you plan to share your data....
• Have you got consent for sharing?
• Do any licences you’ve signed permit sharing?
• Is your data in suitable formats?
Decisions made early on affect what you can do later
10. Some formats are better for long-term
It’s preferable to opt for formats that are:
• Uncompressed
• Non-proprietary
• Open, documented
• Standard representation (ASCII, Unicode)
Data centres may have preferred formats for deposit e.g.
Type Recommended Non-preferred
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
11. Documentation
What would someone unfamiliar with your
data need in order to find, evaluate,
understand, and reuse them?
Consider the differences between someone inside
your research group, someone outside your
group but in your field, and someone outside
your field.
12. Documentation and standards
Metadata: basic info e.g. title, author, dates, access rights...
Documentation: context, workflows, methods, code, data dictionary...
Use standards wherever possible for interoperability
www.dcc.ac.uk/resources/
metadata-standards
13. Tools for managing data
www.dcc.ac.uk/resources/external/tools-services/
managing-active-research-data
14. Where to store your data?
• Your own drive (PC, server, flash drive, etc.)
– And if you lose it? Or it breaks?
• Somebody else’s drive
• Departmental drive
• “Cloud” drive
– Do they care as much about your data as you do?
15. How to backup?
• 3… 2… 1… backup!
– at least 3 copies of a file
– on at least 2 different media
– with at least 1 offsite
• Use managed services where possible e.g. University
filestores rather than local or external hard drives
• Ask central or local IT team for advice
17. •CREATIVE COMMONS LIMITATIONS
• NC Non-Commercial
• What counts as commercial?
• SA Share Alike
• Reduces interoperability
• ND No Derivatives
• Severely restricts use
www.dcc.ac.uk/resources/
how-guides/license-research-data
License your data for reuse
Outlines pros and cons of each
approach and gives practical advice on
how to implement your licence
18. Data citation
• Makes it easier for readers to locate
the data and validate findings
• Data citations ensure that data
contributors receive proper credit
• Can link to reuse to show impact
• Less danger of rival researchers
‘stealing’ results from those who
publish their data openly
www.dcc.ac.uk/resources/briefing-papers/introduction-curation
/data-citation-and-linking
21. Managing and sharing data:
a best practice guide
• How to write a DMP
• Formatting your data
• Documentation
• Data sharing
• Ethics and consent
• Copyright
• …
http://data-archive.ac.uk/media/2894/managingsharing.pdf
22. Putting the pieces together...
...DMPs
Photo by Dread Pirate Jeff
http://www.flickr.com/photos
/justageek/2851643792
23. What is a data management plan?
A brief plan written at the start of your project to define:
• how your data will be created?
• how it will be documented?
• who will access it?
• where it will be stored?
• who will back it up?
• whether (and how) it will be shared & preserved?
DMPs are often submitted as part of grant applications,
but are useful whenever you’re creating data.
24. Why YOU need a Data
Management Plan
http://blogs.ch.cam.ac.uk/
pmr/2011/08/01/why-
you-need-a-data-
management-plan
What if this was your laptop?
25. Which UK funders require a DMP?
•www.dcc.ac.uk/resources/policy-and-legal/ overview-funders-data-policies
26. DCC Checklist for a DMP
• 13 questions on what’s asked across the board
• Prompts / pointers to help researchers get started
• Guidance on how to answer
www.dcc.ac.uk/sites/default/files/documents
/resource/DMP_Checklist_2013.pdf
27. Common themes in DMPs
1. Description of data to be collected / created
(i.e. content, type, format, volume...)
2. Standards / methodologies for data collection & management
3. Ethics and Intellectual Property
(highlight any restrictions on data sharing e.g. embargoes, confidentiality)
4. Plans for data sharing and access
(i.e. how, when, to whom)
5. Strategy for long-term preservation
28. A useful framework to get you started
Think about why the
questions are being
asked – why is it
useful to consider
that topic?
Look at examples to
help you understand
what to write
•www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/framework.html
29. Tips for writing DMPs
• Seek advice - consult and collaborate
• Consider good practice for your field
• Base plans on available skills & support
• Make sure implementation is feasible
30. Example plans
• Technical plan submitted to AHRC by Bristol Uni
http://data.bris.ac.uk/research/planning/files/2013/08/data.bris-AHRC-example-Technical-
• Rural Economy & Land Use (RELU) programme examples
http://relu.data-archive.ac.uk/data-sharing/planning/examples
• UCSD example DMPs (20+ scientific plans for NSF)
http://rci.ucsd.edu/data-curation/examples.html
• My DMP – a satire (what not to write!)
http://ivory.idyll.org/blog/data-management.html
More at: https://dmponline.dcc.ac.uk/help#DMPhelp
31. Help from the DCC
•https://dmponline.dcc.ac.uk
•www.dcc.ac.uk/resources/how-guides/develop-data-plan
A web-based tool to help researchers
write data management plans
33. Thanks – any questions?
DCC guidance, tools and case studies:
www.dcc.ac.uk/resources
Follow us on twitter:
@digitalcuration and #ukdcc
Credit to Dorothea Salo, Ryan Schryver and colleagues for content from the “Escaping Datageddon”
presentation for slides 4, 11 & 14, available at: http://www.slideshare.net/cavlec/escaping-datageddon
And to the Research360 project at the University of Bath for content from the “Managing your research
data” presentation for slide 10, available at: http://opus.bath.ac.uk/32296
Hinweis der Redaktion
Data is increasing in significance. It will unquestionably matter to your research careers, more than it does to your supervisors’ generation.
Learn good data habits now! You’ll need them later.
Some formats are better for data sharing and long-term preservation than others.
It’s preferable to use formats that are uncompressed (e.g. large, high-quality files like .wav), non-proprietary (i.e. open) standards that are documented and well-understood. This aids preservation and interoperability.
Some data centres have preferred formats for deposit so it’s worthwhile encouraging researchers to consult these to check.
To make sure their data can be understood by themselves, their community and others, researchers should create metadata and documentation.
Metadata is basic descriptive information to help identify and understand the structure of the data e.g. title, author...
Documentation provides the wider context. It’s useful to share the methodology / workflow, software and any information needed to understand the data e.g. explanation of abbreviations or acronyms
There are lots of standards that can be used. The DCC started a catalogue of disciplinary metadata standards which is now being taken forward as an international initiative via an RDA working group
The EC guidelines suggest selecting a suitable repository. The Databib and Re3data lists can be useful for this. They allow you to search and browse by subject. Re3data also allows you to restrict the search by certificates, open access repositories and persistent identifiers.
Guidance from the DCC can also help researchers to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some CC options
Under Horizon 2020 it’s recommended that researchers use CC-0 or CC-BY to make data as open as possible.
I recommend this ICPSR resource
It explains the importance of different questions as a pointer to how to answer
Examples are given. This is the most frequent request we get at DCC - examples help researchers think of what to write for their context
The DCC has produced a How to guide on writing DMPs and developed a tool to help