Facing the data challenge: Developing data policy and services
1. Facing the Research Data
Challenge:l
Developing Data Policy and Services
Marieke Guy
Digital Curation Centre
m.guy@ukoln.ac.uk
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of
this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Funded by:
Street, 5th Floor, San Francisco, California, 94105, USA.
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
2. Outline
• Who is responsible for RDM?
• What are the components of a data service?
• Learning lessons from other HEIs
• Developing policies and roadmaps
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
3. Who is Responsible for RDM?
Funders
Advisory Data
bodies centres
Research
Organisations
Support Publishers
services
Researchers
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
4. Components of a Research Data Service?
Tools Support staff & services
Metadata and documentation
Research
Archive
environment&
Storage
systems Preserve
Back-up
RDM policies & Share
Access
Advocacy (senior mgmt & researcher)
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
5. Data Storage – Bristol Example
• £2m funding to date; further
investment planned
• Available to all researchers for
research data
• Petascale facility – expandable
• 3 machine rooms – resilience
(tape archive 2012) Blue Peta at Bristol
• 1st 5TB free per Data Steward then
£400 per TB p.a. for disk storage;
tape backup £40 per TB
http://data.bris.ac.uk
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
6. Archiving – Institutional Data Repositories
Not intended to replace
national, subject or
http://datashare.is.ed.ac.ukother established data
collections Essex-RDR and
DataPool at Southampton
Acknowledge hybrid
environment
www.dspace.cam.ac.uk/
https://databank.ora.ox.ac.uk
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
7. Archiving – External Data Centres
Research funders’ Structured databases
data centres…
Disciplinary&
community List of repositories
initiatives & data centres:
http://datacite.org/repolist
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
8. Data Registries (metadata)
RADAR: Researching a
CERIF for Datasets Data Asset Registry
http://radar.blogs.edina.ac.uk
Develop an extension to the
research information standard
http://cerif4datasets.wordpress.com
Can we learn lessons from overseas?
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
9. Guidance and training
Collate guidance
www.gla.ac.uk/datamanagement
Online training
http://datalib.edina.ac.uk/mantra
and others from JISC RDMTrain
Embed into curriculum via
Doctoral Training Centres
e.g. Research360@Bath
http://blogs.bath.ac.uk/research360
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
10. Disciplinary Training (RDMTrain)
• The training materials they created are mapped to the
lifecycle model below.
• The projects were:
• CAIRO – performing arts (Uni of Bristol)
• DataTrain- archaeology and social
anthropology (Uni of Cambridge)
• DATUM for Health – health sciences
(Northumbria Uni)
• DMTpsych – psychology (Uni of York,
Sheffield Unis)
• Research Data MANTRA – geosciences,
social sciences (Uni of Edinburgh)
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
11. Existing Research Data Policies
• University of Oxford
Statement of commitment until infrastructure is in place
• University of Edinburgh
10 short principles, described as ‘aspirational’
• University of Northampton
brief policy on RCUK Code, detailing procedures & support
• University of Hertfordshire
part of wider data management policy – guide as appendix
• University of East London
newest policy, based on Edinburgh’s
www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
12. How are Others Developing Policies?
• Towards a RDM policy at Manchester
Reviewed existing policies, collated funder
requirements, drafted policy for discussion
• Driving institutional data policy at Southampton
Draft policy and series of user guides put forward for to
University Advisory/Executive groups for ratification
www.dcc.ac.uk/news/developing-institutional-data-policies-trend-2012
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
13. JISC MRD Leeds Workshop
• Programme workshop on institutional research data management
policy development and implementation
• Themes/thoughts:
• Institutions are still all at different stages with their research data management
policies.
• Having a policy in place without any real buy-in from staff can be more harmful
over time .
• Think about if your policy is aspirational or a working document
• Policy and infrastructure need to evolve in correlation.
• Consider the other policies – both internal and external – with which your new
research data management policy should work in concert.
• Retain awareness of the different roles and legislation for research data and
administrative data.
• Try to avoid taking the view that researchers will automatically resist
implementation of a research data management policy.
http://bit.ly/jiscwestwood
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
14. Slide courtesy of Robin Rice, University of Edinburgh
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
15. Lots to think about and develop,
so where to start?
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
16. Make a plan!
“EPSRC expects all those it funds to have
developed a clear roadmap to align their policies
and processes with EPSRC’s expectations by 1st
May 2012, and to be fully compliant with these
expectations by 1st May 2015.”
www.epsrc.ac.uk/about/standards/researchdata/Pages/impact.aspx
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
17. What is a Roadmap?
• a plan made up of stages
• a guideline which it is necessary to follow during
the entire project
• a visual showing the key streams of activity that
a person, team, or organisation needs to
complete to achieve set objectives, usually
keyed to a specific timeline
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
18. Key Elements in EPSRC Requirements
• Ensure published research papers state how and on what terms any
supporting research data may be accessed (ii)
• Have policies and processes to maintain effective internal awareness
of research data holdings and third-party access requests (iii)
• Publish appropriately structured metadata (normally within 12
months of the data being generated) including DOIs (v)
• Securely preserve research data for a minimum of 10-years from end
of embargo or last 3rd party access request (vii)
• Ensure effective data curation throughout the full data lifecycle (viii)
www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
19. What is the EPSRC Looking For?
• Know what you hold – publish metadata- record
access requests
• Link publications and data
• Share data whenever possible
• Curate and preserve valuable data
The same as other funders (i.e good research
practice) so think broadly when you develop your
Strategy – where does it fit in?
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
20. RDM Infrastructure
RDM Institutional
Strategy Policy
(includes
EPSRC
Roadmap)
Guidelines DMP DMP
(departmental) (project)
• Institutional policy – This is what the institution is committed to do.
• Strategy/action plan/roadmap – This is the institution’s response to
expectations placed on them by research councils etc.
• Guidelines – This is what the institution expect of staff (& services
available, and where responsibilities lie).
• Data management plans – This is staff are going to do at a departmental or
project level.
22. Questions?
• Slides from DCC Roadshow Web site
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
23. Exercise: Developing a Roadmap for RDM
Think about the potential components of a RDM service
Based on the strengths/weaknesses you identified in the quiz:
• Draft a list of actions needed at your institution
• Attempt to prioritise your list and pencil in timeframes (consider quick
wins!)
• Decide who needs to be involved to make this happen?
• Discuss how to make these plans public?
DCC Belfast, Queen’s University, 6-7 June 2012 #dcc_belfast
Hinweis der Redaktion
This talk pulls together the lessons from the DCC roadshow to consider how to develop policies and services for Research Data Management (RDM)
We ’ll cover who is responsible for RDM and what the potential components of a research data service are. The main part of the talk will focus on how other universities are addressing certain aspects to see where you can learn lessons At the end we ’ll touch on developing roadmaps in light of the EPSRC policy requirement and do an exercise on this
There are lots of stakeholders with varied roles, both within organisations and external to them. Requirements and support can be external (e.g. from funders, publishers, data centres) but in terms of developing infrastructure, research organisations are taking a central role. Ensuring clarity of responsibility across stakeholders and bringing people together is key.
*Animated slide – components come in separately* This isn ’t definitive. It’s just an idea of the building blocks involved and how they might be put together. - Storage is often though of first. It should be properly backed up with appropriate access controls and ability to access from anywhere - Also need an appropriate environment for research (instruments, hardware, software, VREs) tools and systems e.g. for grants - Aside from current work environments, we also need to consider facilities for archiving to preserve and share data - There ’s an inherent need to access/share data, so we need standards, tools and approaches for metadata across the lifecycle - We have the basics of a system, but none of this works without people to keep things running and provide guidance and training - Also need policies to provide overarching governance - And to ensure uptake and maintenance you need buy-in across the board, incentives and financial backing We ’ll now consider how different institutions are addressing certain aspects of this.
The data.bris team gave a case study at the DCC Roadshow in Cardiff in December 2011. This details here are abstracted from that talk. They are building research data services around their High Performance Computing facility to provide all researchers with adequate storage for their research data. The key things to note is the cost model – they provide a clear, up-front cost so additional storage can be written into proposals. Other Universities (Oxford, Leicester) have produced similar figures
A few institutions already run data repositories e.g. Edinburgh and Cambridge (both DSpace) Others are piloting them e.g. Essex and Southampton (doing extensions to existing ePrints repositories as part of JISC MRD02 programme) and Databank at Oxford. Key thing is that none of these services intend to replace established data services. Where there are more appropriate disciplinary data centres, for example, the data should be submitted there.
There are many external services – dedicated data centres supported by research funders and various structured databases and community initiatives. The list of data centres provided by DataCite is a useful reference for institutions and researchers to identify the most appropriate place of deposit.
This area is the aspect most in its infancy. No institutions appear to have a handle on exactly what research data they hold in order to systematically register & manage data, and expose appropriate metadata to facilitate sharing. However, several UK institutions have flagged a desire to develop institutional data catalogues so models are likely to emerge. A pertinent project to look at is C4D, which is developing an extension to the cerif standard to record information on research data. Research Data Australia – a discovery service for research data from Australian universities supported by ANDS – is a model the DCC is looking at to see how a similar service could be provided in the UK.
There are many examples of guidance and training – most are Creative Commons licensed so you can repurpose them. At the University of Glasgow, the Incremental project pulled together details of existing support to raise awareness of services that tended to be missed or misunderstood. Mantra provided excellent online training modules, as did other JISC RDMTrain projects. A current trend is to embed RDM into existing curricula e.g. core PhD skills courses. The research360 project is collaborating with a Doctoral Training Centre and reflect on this in their blog
There are four institutional RDM policies at present (Feb 2012). These differ in approach: Oxford University doesn’t have a policy per se. They collaborated with the University of Melbourne on the EIDCSR project (c.2009) and realised that implementation is a stumbling block so first introduced a Statement of Commitment until infrastructure was developed. A proper policy is being developed on the DaMaRO project. The University of Edinburgh’s policy is exemplary and seems to be the biggest influence on policy development at other institutions. It was written by an external consultant (Chris Rusbridge) and is described as aspirational as they know there’s some way to go to make it a reality. The University of Northampton reiterates the RCUK Code as its guiding principle and usefully provides guidance on procedures and support to explain how the policy should be implemented. The University of Hertfordshire has RDM requirements as part of a wider data management policy. The language/style is more legal, however an appendix provides much more practical guidance on data management.
Other universities are sharing lessons about how they are developing policy. The University of Manchester has released a document which explains how they’ve reviewed existing policies and funder requirements and what they’ve taken from each. The draft policy is included in this. The University of Southampton has blogged about developing their policy but have not yet shared the text. They’re developing a series of user guides to accompany the policy and usefully outline the ratification process, as they have good experience of this from passing open access policies.
Other universities are sharing lessons about how they are developing policy. The University of Manchester has released a document which explains how they’ve reviewed existing policies and funder requirements and what they’ve taken from each. The draft policy is included in this. The University of Southampton has blogged about developing their policy but have not yet shared the text. They’re developing a series of user guides to accompany the policy and usefully outline the ratification process, as they have good experience of this from passing open access policies.
Many thanks to Robin Rice for this slide, which she presented at the JISC MRD launch event. She spoke about the importance of knowing what your drivers are and getting lots of people involved to help develop your policy. There are also some practical decisions to make e.g. in terms of style and who will write the policy. The really key thing is to know your current situation (service gap analysis) and where you want to be (postcard from the future) so you can plan the transition between these stages
Uppermost on many minds at the moment is the requirement to develop a roadmap in response to the EPSRC
A question the DCC is often asked is ‘What is a roadmap?’ Here are some basic definitions found online. The key thing isn’t this outcome (i.e. the plan) rather the process of getting there – taking stock of your current position and realising what you need to do to be in a position to comply with the EPSRC policy in 3 years so you can plan for that activity.
This slide pulls out some of the key EPSRC policy requirements which have an impact on service development You need to know what you hold so you can systematically manage data – particularly access requests (iii), You need to make others aware of data holdings by publishing appropriate metadata (ii & v) And you need to proactively manage data throughout its lifecycle i.e. for 10+ years (vii & viii) Other requirements cover specific points about implementation i.e. what metadata to include, where data can be stored (not in a jurisdiction with lower legal safeguards), expectations for curation, and how to fund all of this work.
This slide pulls out some of the key EPSRC policy requirements which have an impact on service development You need to know what you hold so you can systematically manage data – particularly access requests (iii), You need to make others aware of data holdings by publishing appropriate metadata (ii & v) And you need to proactively manage data throughout its lifecycle i.e. for 10+ years (vii & viii) Other requirements cover specific points about implementation i.e. what metadata to include, where data can be stored (not in a jurisdiction with lower legal safeguards), expectations for curation, and how to fund all of this work.
This slide pulls out some of the key EPSRC policy requirements which have an impact on service development You need to know what you hold so you can systematically manage data – particularly access requests (iii), You need to make others aware of data holdings by publishing appropriate metadata (ii & v) And you need to proactively manage data throughout its lifecycle i.e. for 10+ years (vii & viii) Other requirements cover specific points about implementation i.e. what metadata to include, where data can be stored (not in a jurisdiction with lower legal safeguards), expectations for curation, and how to fund all of this work.
In the exercise, please consider the potential components of a RDM service which we’ve covered here and the strengths and weaknesses you identified earlier in the CARDIO quiz to decide what you need to do, when and how.