Stuart Macdonald steps through the process of creating a robust data management plan for researchers. Presented at the European Association for Health Information and Libraries (EAHIL) 2015 workshop, Edinburgh, 11 June 2015.
Creating a Data Management Plan for your Grant Application
1. Creating a Data Management Plan for
your Grant Application
Stuart Macdonald
RDM Services Coordinator / Associate Data Librarian
University of Edinburgh
stuart.macdonald@ed.ac.uk
EAHIL + ICAHIS + ICLC, Edinburgh, 11 June 2015
2. Course content
• Background
• What is Research Data and RDM?
• What is a Data Management Plan?
• Benefits and drivers
• What do Funders want?
• Six themes for a DMP
• Exercise: What makes a good DMP?
• Support for DMP
3. Background
• EDINA and University Data Library (EDL) together are a division
within Information Services (IS) of the University of Edinburgh.
• EDINA is a Jisc-designated centre for digital expertise & online
service delivery
• - http://edina.ac.uk/
• The Data Library assists Edinburgh University users in the
discovery, access, use and management of research datasets -
http://www.ed.ac.uk/is/data-library
• Research & Learning Services – focus on developing and
delivering digital library technologies (RDM Programme, OA
Scholarly Communications, open scholarship, bibliometrics,
resource discovery, LMS)
4. What is Research data?
“Recorded, factual material commonly retained by and accepted
in the [research] community as necessary to validate research findings; although the
majority of such data is created in digital
format, all research data is included irrespective of the format in
which it is created.”
UK Engineering and Physical Sciences Research Council (EPSRC)
• There is no single accepted definition of research data. Several definitions exist. It may
be useful to consider not just what material would be required to validate research
findings but also think about what information is needed to enable re-use of the data.
• Not all definitions of data will be appropriate for all disciplines.
• Data can be generated for one purpose and used for a completely different one.
5. Research Data Management (RDM)
• Data management is a general term covering how you organise, structure,
store, and care for the data used or generated during a research project.
• It includes:
– How you deal with data on a day-to-day basis over the lifetime of a
project.
– What happens to data in the longer term – what you do with your data
after the project ends.
• RDM is also considered one of the areas of responsible conduct for
research.
6. What is a DMP?
DMPs are written at the start of a project to define:
• What data will be collected or created?
• How the data will be documented and described?
• Where the data will be stored?
• Who will be responsible for data security and backup?
• Which data will be shared and/or preserved?
• How the data will be shared and with whom?
7. Benefits
Developing DMPs can help you to:
• Make informed decisions to anticipate & avoid problems.
• Avoid duplication, data loss and security breaches.
• Develop procedures early on for consistency.
• Ensure data are accurate, complete, reliable and secure.
• Save time and effort to make your lives easier.
8. Drivers of RDM
“Publicly funded research data are a public good, produced
in the public interest, which should be made openly
available with as few restrictions as possible in a timely
and responsible manner that does not harm intellectual
property.”
RCUK Common Principles on Data Policy
http://www.rcuk.ac.uk/research/datapolicy/
9. Institutional RDM Policy
2. Responsibility for RDM lies primarily with
Principal Investigators (PIs).
3. All new research proposals must include
research data management plans…
4. The institution will provide training, support,
advice and where appropriate guidelines and
templates…
7. Research data management plans must
ensure that research data are available
for access and re-use where appropriate…
http://www.ed.ac.uk/schools-departments/information-
services/about/policies-and-regulations/research-data-policy
10. What do UK Funders want?
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
11. What do RCUK Funders want?
• AHRC, BBSRC, ESRC, MRC, NERC, and STFC all require some
form of data management or sharing plan as part of a funding
application.
• EPSRC does not ask for a DMP, but EXPECTS that one will exist!
• The requirements are diverse, but they all have the RCUK
Common Principles as their foundation.
12. RCUK common principles on data
policy -
Key messages:
1. Data are a public good and should be made openly available where possible.
2. Adherence to community standards and best practice. Preserve data of long-
term value.
3. Metadata for discoverability and access. Link to data from publications.
4. Recognise constraints (legal, ethical and commercial) on what data to release.
5. Allow embargo periods delaying data release to protect the effort of creators.
6. Acknowledge sources to recognise IP and abide by T&Cs.
7. Ensure cost-effective use of public funds for RDM.
http://www.rcuk.ac.uk/research/datapolicy/
13. MRC
• Called a “Data Management Plan”.
• Should be “concise” but no specific length restrictions.
• Must cover:
– Description of the data
– Data collection / generation
– Data management, documentation and curation
– Data security and confidentiality of potentially disclosive personal
information
– Data sharing and access
– Responsibilities
– Other relevant policies
http://www.mrc.ac.uk/research/research-policy-ethics/data-sharing/data-management-plans/
14. What do other funders want?
• Cancer Research UK and the Wellcome Trust both require data
management & sharing plans.
• European Horizon 2020 funding programme is currently piloting DMPs for
the 2014-2015 Work Programme:
– Future and Emerging Technologies
– Research infrastructures – part e-Infrastructures
– Leadership in enabling and industrial technologies – Information and Communication
Technologies
– Societal Challenge: 'Secure, Clean and Efficient Energy'–part Smart cities and
communities
– Societal Challenge: 'Climate Action, Environment, Resource Efficiency and Raw Materials'
– Societal Challenge: 'Europe in a changing world–inclusive, innovative and reflective
Societies'
– Science with and for Society
• The Pilot will give the EC a better understanding of what supporting infrastructure is
needed and of what factors impact on non-sharing such as security, privacy or data
protection.
15. Cancer Research UK
• Called a “Data Management & Sharing Plan”.
• No specific length restrictions.
• Must cover:
– volume, type, content and format of the final dataset
– standards that will be utilised for data collection and management
– metadata, documentation or other supporting material
– method used to share data
– timescale for public release of data
– long-term preservation plan for the dataset
– whether a data sharing agreement will be required
– any reasons why there may be restrictions on data sharing
http://www.cancerresearchuk.org/funding-for-researchers/applying-for-funding/policies-that-affect-your-grant/submission-of-a-
data-sharing-and-preservation-strategy
16. Wellcome Trust
• Called a “Data Management & Sharing Plan”.
• No specific length restrictions.
• Must cover:
– what data outputs will your research generate
– what data will have value to other researchers?
– when will you share the data?
– where will you make the data available?
– how will other researchers be able to access the data?
– are any limits to data sharing required?
– how will you ensure that key datasets are preserved to ensure their long-term value?
– what resources will you require to deliver your plan?
http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Guidance-for-researchers/index.htm
17. European Horizon 2020
• Called a “Data Management Plan”.
• No specific length restrictions.
• Should cover:
– Data set reference and name
– Data set description
– Standards and metadata
– Data sharing
– Archiving and preservation
• Must be delivered in the first 6 months of project.
• Is expected to evolve and grow throughout duration of project.
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
19. Six themes for a DMP
1. Data types, formats, standards and capture methods
2. Ethics and Intellectual Property
3. Access, data sharing and reuse
4. Short-term storage and data management
5. Deposit and long-term preservation
6. Resourcing
20. 1. Data types, formats…
• What data outputs will your research
generate? Data type, volume, quality,
formats
• Outline the metadata, documentation
or other supporting material that should
accompany the data for it to be
interpreted correctly
• What standards and methodologies will
be utilised for data collection and
management?
• State the relationship to other data available in public repositories.
• Be prepared to explain and justify the choices being made.
21. 2. Ethics and IP
• Make explicit mention of consent, confidentiality,
anonymisation and other ethical considerations,
where appropriate, and strategies taken to not
preclude further re-use of data
• Demonstrate that you have sought advice on
and addressed all copyright and rights
management issues that apply to the resource
• Are any restrictions on data sharing required, for
example to safeguard research participants or to
gain appropriate intellectual property protection?
• Get advice on IPR from your research office at an early stage – before signing over
ANY rights to collaborators or commercial partners!
22. 3. Short-term storage & data
management
• Describe the planned quality assurance
and back-up procedures [security/storage].
• Use managed networked services
for storing your data.
• Specify the responsibilities for data
management and curation within research
teams and at all participating institutions.
• Keep track of all data generated.
23. 4. Access, data sharing & reuse
• Anticipate and plan for data reuse.
• Clarify issues surrounding
data ownership.
• Present a strong case for any
restrictions on sharing.
• Ensure all necessary ethical
approvals are in place.
• Be very clear about where, when
and how data will be made available.
• Get advice on ethics from Ethics Committee and advice on DPA from Records
Management before sharing, or refusing to share, any data.
24. Data sharing
• You should consider making your data available for others to re-use where
possible under appropriate safeguards.
• Repositories: ways of sharing data
– http://www.zenodo.org
– Figshare (interdisciplinary): http://figshare.com/
– Institutional (data) repository
– Domain or national data archive
• Registries:
– http://www.datacite.org/
– http://databib.org
– http://www.re3data.org/
• Advantages
- permanent / stable, findable, citable, safe and controlled environment
Have announced their plan to merge their
two projects into one service that will be
managed under the auspices of DataCite
by the end of 2015.
25. 5. Deposit & long-term preservation
• Explain your archiving/preservation plan
to ensure the long-term value of key datasets.
• Select data of long-term value: identify which of the
data sets produced are considered to be of long-term
value.
• Deposit all data with a responsible data
repository within specific period after the end of the
grant (as determined by funder)
• Assure that data will remain accessible. Use
existing infrastructure e.g. data archive or the
institutional repository,
• Some funders regards non-deposit of research data as
an exception and reserves the right to request deposit
when there is insufficient evidence to prevent data
sharing.
26. External repositories
When choosing an external repository you should consider:
• Does your funder require data to be offered to a specific repository?
• Is the repository sustainable?
• What will be done with your data if the repository closes down?
• How much will it cost? Are costs upfront or annual?
• Will data be easily accessible to them and to third parties?
• How does the repository promote discoverability?
• Does the repository record when data is accessed, downloaded, or cited
so they will get recognition for their work?
27. 6. Resourcing
• Outline and justify costs: what resources
will you require to deliver your plan, e.g.
data management and data preparation for
sharing.
• Be realistic about the human time and
effort required.
• Show that funds will be used efficiently
and effectively.
28. Supporting researchers with DMPs
Various types of support can be provided:
• Guidelines and templates on what to include in plans
• Example answers, guidance and links to local support
• A library of successful DMPs to reuse
• Training courses and guidance websites
• Tailored consultancy services
• Online tools (e.g. customised DMPonline)
29. DMPonline
Free and open web-based tool to help
researchers write plans:
https://dmponline.dcc.ac.uk/
• Templates based on different
requirements
• Tailored guidance (disciplinary,
funder etc.)
• Customised exports to a variety
of formats
• Ability to share DMPs with others
Edinburgh has started the process of
customising DMPonline for researchers.
DMPonline screencast:
http://www.screenr.com/PJHN
30. Software Management Plans
• A prototype Software Management Plan Service has been developed by
the Software Sustainability Institute to help researchers write software
management plans: https://ssi-dev.epcc.ed.ac.uk/
• They are relatively new for research software proposals, though many of
the elements discussed in these plans would be expected in standard
proposals.
• The EPSRC Software for the Future call explicitly requires software
management plans as part of the Pathways to Impact. NSF SI2 funding
requires software to be addressed as part of the mandatory data
management plan.
• A guide is on writing & using a software management plan is available:
http://www.software.ac.uk/resources/guides/software-management-plans
31. Software Management Plans
A software management plan can help researchers to:
• formalise a set of structures and goals that ensure research software is
accessible and reusable in the short, medium and long term
• consider whether third-party software to be used within a research
project will be available, and supported, for the lifetime of the project
• give funders confidence that software they have funded survives beyond
the funding period, that there is something to show for their investment
32. Exercise: What makes a good DMP?
In groups of 3 or 4:
• Read the materials that have been handed round.
• Thinking as a reviewer compare the 2 plans against the ESDS guidance you
have been given.
• Identify the strengths and weaknesses of each.
• Which one of these plans would you approve?
• What changes would you want made to the other plan before it would
gain approval?
33. Tips to share
• Keep DMPs simple, short and specific, avoid jargon.
• Seek advice - consult and collaborate.
• Start early – don’t wait until the last minute!
• The plan will - and should - change over the life of project. It is a living document
so need updating regularly.
• Always contact your funder when you need clarification or further information.
• Include all expected costs in your data management costing, esp. extra storage
space for active data, data deposit / long-term storage etc.
• Agree in your Team on using an open /standard /common format for long-term
preservation, and be consistent with it.
• Also see: http://www.youtube.com/watch?v=7OJtiA53-Fk
34. A final recommendation…
• MANTRA is an internationally
recognized self-paced online
training course developed by
EDINA & Data Library for PGR’s
and early career researchers in
data management issues.
• 8 self-paced learning modules
which map onto the research
data lifecycle
• Data handling exercises with
open datasets in 4 analytical
packages: R, SPSS, NVivo, ArcGIS.
http://datalib.edina.ac.uk/mantra
35. Links
• MRC Good research practice: Principles and guidelines
http://www.mrc.ac.uk/news-events/publications/good-research-practice-principles-and-
guidelines/
• Checklist for a data management plan. DCC (Digital Curation Centre)
http://www.dcc.ac.uk/sites/default/files/documents/resource/DMP/DMP_Checklist_2013.pd
f
• UK Data Archive: Data management costing tool
http://www.data-archive.ac.uk/media/247429/costingtool.pdf
• UK Data Archive: Anonymisation
http://www.data-archive.ac.uk/create-manage/consent-ethics/anonymisation
• UK Data Archive: Ethical/Legal
http://www.data-archive.ac.uk/create-manage/consent-ethics/legal
• Record Management: Taking sensitive information and personal data outside the University’s
computing environment
http://edin.ac/1hZaL07
25 years ago
disk storage - expensive
researchers interested in working with data came together to petition the PLU and the University’s Library – wanting a university-wide provision for files that were too large to be stored on individual computing accounts
Early holdings were research data from universities of edinburgh, glasgow, and strathclyde