1. Getting to grips with
Research Data Management
10th
November 2015
Isabel Chadwick,
Research Data Librarian
library-research-support@open.ac.uk
2. Overview of the workshop
• What is Research Data Management?
• Sharing data
• Working with data
• Planning for data
• Useful resources
• Questions?
3. What is Research Data Management?
“Research data management concerns the
organisation of data, from its entry to the research
cycle through to the dissemination and archiving of
valuable results. It aims to ensure reliable
verification of results, and permits new and
innovative research built on existing information."
Digital Curation Centre (2011)
Making the Case for Research Data Management
http://www.dcc.ac.uk/sites/default/files/documents/publications/Making%20the%20case.pdf
4. What is Research Data Management?
Discussion
• Describe your research
• What type of data do you create/use?
• What data management challenges do you face?
5. What is Research Data Management?
UK Data Archive Data Lifecycle model
http://www.data-archive.ac.uk/create-manage/life-cycle
Design research
Plan data
management
Plan consent for
sharing
Locate existing data
Collect data
Capture and create
metadata
Creating data
6. What is Research Data Management?
UK Data Archive Data Lifecycle model
http://www.data-archive.ac.uk/create-manage/life-cycle
Enter data, digitise,
transcribe, translate
Check, validate,
clean data
Anonymise data
Describe data
Manage and store
data
Processing data
7. What is Research Data Management?
UK Data Archive Data Lifecycle model
http://www.data-archive.ac.uk/create-manage/life-cycle
Interpret data
Derive data
Produce research
outputs
Author publications
Prepare data for
publications
Analysing data
8. What is Research Data Management?
UK Data Archive Data Lifecycle model
http://www.data-archive.ac.uk/create-manage/life-cycle
Migrate data to best
format
Migrate data to
suitable medium
Back-up and store
data
Create metadata
and documentation
Archive data
Preserving data
9. What is Research Data Management?
UK Data Archive Data Lifecycle model
http://www.data-archive.ac.uk/create-manage/life-cycle
Distribute data
Share data
Control access
Establish copyright
Assign licences
Promote data
Giving access to data
10. What is Research Data Management?
UK Data Archive Data Lifecycle model
http://www.data-archive.ac.uk/create-manage/life-cycle
Follow-up research
New research
Undertake research
reviews
Scrutinise findings
Teach and learn
Re-using data
11. What is Research Data Management?
Why spend time and effort on this?
• So you can work efficiently and
effectively
–Save time and reduce frustration
–Highlight patterns or connections
that might otherwise be missed
• Because your data is precious
• To enable data re-use and sharing
• To meet funders’ and institutional
requirements
12. What is Research Data Management?
What does the OU expect?
“Research data must be managed to the highest
standards throughout their life-cycle in order to
support excellence in research practice.
In keeping with OU principles of open-ness, it is
expected that research data will be open and
accessible to other researchers, as soon as
appropriate and verifiable, subject to the
application of appropriate safeguards relating to
the sensitivity of the data and legal
requirements.”
OU Principles of Research Data Management, April 2013
http://intranet.open.ac.uk/research-school/strategy-info-governance/docs/CoPamendedJuly
13. What is Research Data Management?
What do funders expect?
“Publicly funded research data are a public good,
produced in the public interest, which should be
made openly available with as few restrictions as
possible in a timely and responsible manner that
does not harm intellectual property.”
RCUK Common Principles on Research Data Policy, 2011
http://www.rcuk.ac.uk/research/datapolicy/
14. What is Research Data Management?
What do funders expect?
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
18. Sharing data
What do you need to share?
• Raw data
• Derived data
• Data underpinning
publications
• Code
• Methods
What are research data in your context?
What would others need to understand your research?
19. Sharing data
Barriers to sharing data: discussion
Discuss barriers to sharing
your research data.
These could be:
•Ethical
•Legal
•Professional
Can these barriers be
overcome?
20. Sharing data
How can I share my data?
OU Data Catalogue in ORO
Data access statements
Online data sharing services
•Figshare
•Zenodo
•CKAN DataHub
•Mendeley Data
Directories
•re3data
Funders’ repository services
•UK Data Service ReShare
•NERC data centres
21. Working with data
“Start as you mean to go on”
The end point of all projects should
involve making the data publicly
available. Many data will be
deposited in national archives which
have regulations for files and
metadata.
Thinking about the requirements at
the beginning of the project will limit
the transformations needed at the
end of the project.
Data Sharing
22. • Shared areas or SharePoint
• Zendto
• Be wary of Dropbox & similar
• OU collaboration tool in pipeline
IT support for researchers:
http://intranet6.open.ac.uk/library/main/supporting-ou-research/re
Working with data
External collaborators: IT Options
23. Working with data
Filing systems
Filing is more than saving files, it’s making
sure you can find them later in your project
•Naming
•Directory Structure
•File Types
•Versioning
All these help to keep your data safe and
accessible.
24. Decide on a file naming convention at the start of your project. Useful file
names are:
•consistent.
•meaningful to you and your colleagues.
•allow you to find the file easily.
Agree on the following elements of a file name:
•Vocabulary
•Punctuation
•Dates (YYYY-MM-DD)
•Order
•Numbers
•Version information
Ideally you should be able to tell what’s in a file before opening it.
Tip: create a readme file detailing the naming scheme.
Working with data
Naming conventions
25. Working with data
File formats
• Unencrypted
• Uncompressed
• Non-proprietary/patent-encumbered
• Open, documented standard
• Standard representation (ASCII, Unicode)
Type Recommended Avoid for data sharing
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
26. Working with data
Metadata & documentation
• Metadata is additional information that is required to
make sense of your files – it’s data about data.
Guidance on disciplinary metadata standards:
http://www.dcc.ac.uk/resources/metadata-standards
27. Working with data
Metadata & documentation (2)
Think FAIR!
Findable
Accessible
Interoperable
Re-usable
Data FAIRport initiative: http://datafairport.org/
28. Working with data
Sensitive data
When working with research participants....
•Ensure you have obtained valid consent
•Consider who needs access to the data
•Inform your participants what will happen with the data after
the project has finished
•Pre-planning and agreeing with participants during the
consent process, on what may and may not be recorded or
transcribed, can be more effective than anonymisation
•Consider controlling access if anonymisation or consent for
sharing are impossible
29. Working with data
Sensitive data (2)
Managing sensitive data
•If possible, collect the necessary data without using
personally identifying information
•De-identify your data upon collection or as soon as
possible thereafter
•Avoid transmitting unencrypted personal data
electronically
•Consider whether you need to keep original collection
instruments (recordings, surveys etc.) once they have
been transcribed and quality assured
30. Planning for data
• Make informed decisions to anticipate
and avoid problems
• Avoid duplication, data loss and
security breaches
• Develop procedures early on for
consistency
• Ensure data are accurate, complete,
reliable and secure
• Save time and effort – make your life
easier!
Data Management Plans are useful
whenever you are creating data to:
31. Planning for data
Which funders require a DMP?
www.dcc.ac.uk/resources/policy-and-legal/ overview-funders-data-policies
Note: Data Management Plans are a requirement of
Horizon 2020 projects included in the Research Data pilot
32. Planning for data
Activity
Think about your own
research.
What actions would you
need to perform on your
data at each stage of the
UKDA’s Lifecycle model?
How would you do this?
Would you need any
additional funding/staff?
34. Planning for data
Tips
• Keep it simple, short and specific
• Seek advice - consult and
collaborate
• Base plans on available skills and
support
• Make sure implementation is
feasible
• Justify any resources or
restrictions needed
35. Library Services
How we can help
• Data Management Plan checking
• Support with setting up new projects
• Advice on preparation of data for sharing
• Data catalogue on ORO
• Online guidance
• Enquiries
• Development of new tools to enable data management
and sharing
Email: library-research-
support@open.ac.uk
36. Useful links
• The OU Research Data Management intranet site:
http://intranet6.open.ac.uk/library/main/supporting-ou-research/research-
data-management
• Digital Curation Centre: http://www.dcc.ac.uk/
• DMPOnline: https://dmponline.dcc.ac.uk/
• UK Data Archive: http://www.data-archive.ac.uk/
• MANTRA: http://datalib.edina.ac.uk/mantra/
• The Orb: http://open.ac.uk/blogs/the_orb
(2 minutes)
Overview of the workshop
When I first planned this workshop, I intended to start with planning and end with sharing as that is the order that you would do things in your project. However the end aim of RDM is to make research data openly available, and I think that discussing why and how to do this first will give further context to why the rdm processes we’re going to cover today should be undertaken.
1 min (5)
Read the quotation.
This quotation from the Digital Curation Centre sums up what Research Data Management is all about. It covers the management of data throughout your research lifecycle (more on that later) and beyond, when you will be sharing your data with other researchers. This is relevant to all research which produces data, although you may find that the methods you use differ depending on your type of research or academic discipline.
A quick word on the Digital Curation Centre (DCC). They are the leading experts in the UK on Research Data Management, and gave us a lot of help when we set up the RDM project. Their website is a great source of information and guidance.
5 minutes (10)
Slide 4 Discussion
Introduce yourself to the person sitting next to you & talk about the type of data which you produce, and any data management challenges you’ve come across.
7 minutes (17)
Data often have a longer lifespan than the research project that creates them. Researchers may continue to work on data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers.
Well organised, well documented, preserved and shared data are invaluable to advance scientific inquiry and to increase opportunities for learning and innovation.
7 minutes (17)
Data often have a longer lifespan than the research project that creates them. Researchers may continue to work on data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers.
Well organised, well documented, preserved and shared data are invaluable to advance scientific inquiry and to increase opportunities for learning and innovation.
7 minutes (17)
Data often have a longer lifespan than the research project that creates them. Researchers may continue to work on data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers.
Well organised, well documented, preserved and shared data are invaluable to advance scientific inquiry and to increase opportunities for learning and innovation.
7 minutes (17)
Data often have a longer lifespan than the research project that creates them. Researchers may continue to work on data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers.
Well organised, well documented, preserved and shared data are invaluable to advance scientific inquiry and to increase opportunities for learning and innovation.
7 minutes (17)
Data often have a longer lifespan than the research project that creates them. Researchers may continue to work on data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers.
Well organised, well documented, preserved and shared data are invaluable to advance scientific inquiry and to increase opportunities for learning and innovation.
7 minutes (17)
Data often have a longer lifespan than the research project that creates them. Researchers may continue to work on data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers.
Well organised, well documented, preserved and shared data are invaluable to advance scientific inquiry and to increase opportunities for learning and innovation.
3 mins (20)
Good data management does require an investment of effort – but ultimately it’s something that can actually save you time, by helping you work more efficiently. Many of us are all too well acquainted with the frustration of trying to track down a fact or a document we know we have somewhere. Good research data management – setting up an organizational system that works for you, and ensuring everything is properly filed or labelled to enable re-identification and retrieval – can make life a lot easier.
And it’s not just a matter of saving time and reducing unnecessary effort (though clearly that’s a major benefit): having everything well ordered can also help you get a better feel of the shape and scope of your research material, which in turn can enable you to spot patterns or connections that might otherwise get missed.
It’s also well worth doing, because the data you’re producing or working with is valuable
As well as this being true for your own research, the data might ultimately be of use to other researchers. Having everything well organized and properly labelled also has the potential to save you a lot of time at the end of a research project, when it comes to deciding what to do with your data – but more of that later.
Finally, there may be requirements imposed by your funding body and/or the university which you need to meet
2 mins (22)
In 2013, the OU wrote a set of principles for research data management. These have since been added as an appendix to the research code of practice.
The principles are high-level, but they confirm the OU’s commitment to ensuring that research data is properly managed and shared as much as possible.
Note: All those engaged in research at the OU, including those involved in collaborating with other institutions, must take personal responsibility for managing their research data in accordance with University and funder requirements
1 min (23)
The RCUK policy was released in 2011, and this has been followed up by all of the UK research councils releasing their own policies. The basic premise (as stated in this slide) is the same for all councils, but there are variations in the ways in which they expect this to be achieved.
Here’s an overview of what the research councils expect.
If you haven’t done so already, find your funder’s research data policy and check that you are compliant.
It’s not only RCUK funders which have requirements, e.g. Horizon 2020 and government funding. Make sure you check out your funder policy as early as possible even if last time you checked they didn’t have one, as more and more policies are being released.
1 mins (24)
Sharing data can have huge impacts on collaboration between researchers world wide as this example shows.
1 min (25)
You might remember this news story about George Osborne basing the austerity plan on research data which had been incorrectly analysed. By making data public these kinds of anomalies are more likely to be spotted and incidents like this less likely to happen!
1 min (26)
And of course there is a personal benefit to you as a researcher. Studies have found that there is between a 9% and a 30% increase in citations for papers which make the underlying data available.
1 min (27)
Think about what research data are in your context.
Depending on your academic discipline and the data type, what you share may vary.
You might want to share raw data, but in some disciplines this might be totally innappropriate, as they will be too vast and meaningless to other people.
You might just want to share your derived, analysed data
Or you might only want to share the data which underpins your publications, but you need to think about whether this will be understandable to others, would they be able to replicate your results? So you might also want to share your code or your methods to enable better understanding.
5 mins discussion
3 mins feedback
(35)
In some cases, there may be concerns about sharing data, or reasons why all or part of a dataset needs to be kept private. These may be ethical (the data is confidential), legal (the dataset includes third party material with restrictions on usage), or professional (you intend to publish the results, and don’t want someone to get there first).
It’s worth noting that many difficulties or concerns about sharing data can be alleviated by advance planning. For example, ensuring you get proper permissions when data is collected can reduce problems with sharing personal data. If your dataset is a combination of third party data and new material, you may need to have a version of the data where these are kept separate. Proper documentation is also important here: this will help keep track of what you’re allowed to do with data, and what’s happened to it in the course of the project.
2 mins (37)
There are a number of ways that you can share your data.
The OU does not currently have the capacity to archive research data and make it publicly available, but there is a project happening which is looking into ways that we can achieve this. The first step will be to include metadata records of research data in ORO, which will directly link to your publications in ORO and also to the underpinning data wherever that may be stored. This should be ready in the autumn, and it will be a requirement that all research data created at the OU is recorded.
Externally, there are a number of repositories. Your funder may well have a repository in which you are required to deposit your data, like the ESRC which has recently re-branded its ESRC datastore. Those who had experienced the datastore will be please to hear that this now seems to be a faster, more user-friendly service than the previous incarnation. Also, the NERC data centres.
In addition to this there are several free, online services like Figshare, which was devised by someone from UCL and is used now by various journals to publish data underpinning research publications. It can also be used as a datastore throughout your project, as it allows online analysis of data, and collaboration with other partners. You may upload unlimited public data and you also get a 1GB allowance for private data.
Zenodo is a similar tool, but can only be used for publication, this was developed by CERN as part of the EU OpenAIRE project and is aimed at the long-tail of science. There is a maximum threshold for upload of 2GB per file, but you are able to include multiple files in one dataset or collection.
CKAN datahub is another similar, free-to-use tool.
There are now a number of journals which specialise in research data, here are 2 examples. Other journals may allow you to link to your data stored in Figshare or Dryad.
And finally here are 2 directories of data repositories, which list a range of repositories according to academic discipline.
4 mins (41)
Start as you mean to go on
Consider all the preparation necessary for making your data shareable and how you can reduce the workload at the end of the project by doing the work during the project
Metadata and documentation (logs, instructions, records)
File formats
File naming
Data security and storage
1 min (42)
Think about names and formats before clicking save
Where do you need this file; is it used by another program?
Do the name and location make sense?
Consideration at the beginning makes it easier to find files and related documents later.
1 min (43)
Vocabulary – choose a standard vocabulary for file names, so that everyone uses a common language.
Punctuation – decide on conventions for if and when to use punctuation symbols, capitals, hyphens and spaces.
Dates – agree on a logical use of dates so that they display chronologically i.e. YYYY-MM-DD.
Order - confirm which element should go first, so that files on the same theme are listed together and can therefore be found easily.
Numbers – specify the amount of digits that will be used in numbering so that files are listed numerically e.g. 01, 002, etc.
1 min (44)
When thinking about file formats, certain formats are more appropriate for long-term preservation and sharing.
Avoid using proprietary formats, these are formats which can only be opened by a specific type of software, like Work and Quicktime, as the software may become obsolete in the future and the files will more difficult to open.
You can of course migrate your files into different formats at the end of your project prior to deposit in a repository or archive, but by thinking about this from the beginning and ensuring the right formats have been used throughout will save you a lot of time when you come to thinking about sharing your data later.
1 min (45)
Slide 19- metadata (1) (2 mins)
It’s not a new idea
Most people do it to a certain extent without thinking
You might organize your collection by artist, title, even colour! This is made much easier in a digital environment
1 min (46)
1. To be Findable any Data Object should be uniquely and persistently identifiable [4]1.1. The same Data Object should be re-findable at any point in time, thus Data Objects should be persistent, with emphasis on their metadata, [4 and JDDCP 4 and JDDCP 6]1.2. A Data Object should minimally contain basic machine readable metadata that allows it to be distinguished from other Data Objects [seeJDDCP 5]1.3. Identifiers for any concept used in Data Objects should therefore be Unique and Persistent [5 and JDDCP 4 and JDDCP 6].
2. Data is Accessible in that it can be always obtained by machines and humans2.1 Upon appropriate authorization [6]2.2 Through a well-defined protocol [7 and JDDCP 5]2.3 Thus, machines and humans alike will be able to judge the actual accessibilty of each Data Object.
3. Data Objects can be Interoperable only if:3.1. (Meta) data is machine-readable [8]3.2. (Meta) data formats utilize shared vocabularies and/or ontologies [9]3.3 (Meta) data within the Data Object should thus be both syntactically parseable and semantically machine-accessible [10]
4. For Data Objects to be Re-usable additional criteria are:4.1 Data Objects should be compliant with principles 1-34.2 (Meta) data should be sufficiently well-described and rich that it can be automatically (or with minimal human effort) linked or integrated, like-with-like, with other data sources [11 and JDDCP 7 and JDDCP 8]4.3 Published Data Objects should refer to their sources with rich enough metadata and provenance to enable proper citation (ref to JDDCP 1-3).
2 mins (50)
In the past researchers gained consent from participants primarily so that they could collect data.
However, many funders are now increasingly requesting researchers to share and preserve their data as part of their requirements.
It is therefore important that participants fully understand:
how you will store, publish and share their data
how you will ensure that their data remains confidential and anonymous (where applicable) throughout the duration of the project and after
Failure to obtain consent could result in non-compliance with your funder's requirements and limit the opportunities you have to share, publish and preserve your data.
If things change, you may be able to go back to your participants and change the details of the agreement.
Anonymisation can be time-consuming, so agreeing what can and can’t be recorded or transcribed may well save you time and effort. For example, if they don’t want you to use names, then conduct the interview without using names.
2 mins (52)
As mentioned before, if possible in the collection process, not using personally identifying information can save time and effort as you will have less to anonymise.
Make sure you are storing your sensitive data sensibly. If possible, de-identify your data upon collection, this will reduce the damage is a security breach happens.
Make sure you are encrypting your data if you have to send it electronically (eg by email)
Do you need to keep the original recording? If it’s been transcribed, what value does it hold? By destroying it as early as possible you are reducing the risk.
Slide 9 – Planning for data 2 minutes (62)
Slide 11 – Which funders require a DMP? (2 mins)
•Quick overview – point out EPSRC does not require one, and Horizon 2020 only for projects included in the pilot
•However, the OU recommends that all researchers write a DMP regardless of whether their funder requires them to do so or not, as it is a useful exercise for ensuring that data will be managed responsibly throughout the lifecycle.
Slide 10 – Data Management Planning Activity (5 minutes)
Think about the research you are working on at the moment, or a recent project. Consider the actions you will need to take and the barriers you might face at all the different stages of the DCC data curation lifecycle. How could they be overcome? This is a useful exercise to start thinking about the information you would need to put in your plan.
3 mins (65)
DMPOnline is a tool developed by the DCC which helps you to write your data management plan.
There are templates for dmps for all the research councils, Horizon 2020, Wellcome Trust and CRUK.
It takes you through the sections of the templates and gives guidance as you work. We’ve now incorporated some OU guidance into this as well. There is also an OU template for researchers who are not funded by any of the bodies for which there is a template, but feel it would be helpful to write a data management plan anyway.
If you do try out this tool, please give me any feedback you might have.
1 min (66)
Keep it simple – not all the reviewers are going to be data management experts
Be specific – instead of saying “we will follow standards” explain WHICH standards, instead of “we will create a large amount of data” HOW MUCH data?
Short – some funders have requirements for how long the plan should be (eg. ESRC 3 pages)
Seek advice – from other researchers at the university who have written plans, or done similar projects. Example of the reading experience database taking advice from colleagues who had worked on the listening experience database.
Be realistic!
RDM is an allowable cost for all RCUK funders, but any costs have to be fully accounted for. All expenditure on direct costs must take place before the actual end date of the project and must be fully auditable.
No expenditure can be ‘double funded’ (a service that is centrally supported by the indirect costs paid on all research grants cannot then also be included as a direct cost on a grant)
Send DMPs in advance of bid submission! Preferably a week ahead, if possible. But later is better than never!
I am happy to meet with Pis and project teams at the beginning of projects to discuss strategies for managing data and clarify funder requirements. Also able to set up bespoke training sessions for departments/research groups
At the end of your project, hopefully your data will have been managed in a way that facilitates sharing, but if in doubt get in touch for help
Guidance is on the intranet site, URL on next slide.
Send enquiries to email at bottom of screen, this way anyone from the team can pick it up if I’m away.
The RDM project is developing some infrastructure, with 2 aims: collaborating on data during projects, and sharing and preserving data post-project. Just starting procurement process now and hope to have something in place by mid-2016.
2 mins (68)
Links to additional resources are available on the RDM intranet site.
I’ll put this presentation on the site after the workshop.