SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Research Data, or: How I Learned to
Stop Worrying and Love the Policy
RDMF14: Research Data (and) Systems
York, 9th November 2015
Dr Torsten Reimer
Scholarly Communications Officer
Imperial College London
t.reimer@imperial.ac.uk / @torstenreimer
http://orcid.org/0000-0001-8357-9422
Why are we here?
Why we fight – Compliance! Really?
“Well compliance is really important, yes that's
the whole reason we are doing it really. I mean
to comply with Research Council guidelines
yes. I am not saying the whole reason but
that's the main driver, yes.”
10.1371/journal.pone.0114734
There are issues with RCUK/EPSRC policy:
• cost-benefit analysis, anyone?
• expensive/issues around funding
• enough support/incentive for culture change?
• fine in theory, but is it workable in practice?
But…
Blame funders, or blame ourselves (hedgehog and hare)?
It seems wherever we go, the funders have
already been there: HEFCE open
access policy; EPSRC data policy…
Are the funders too fast? Or we too slow?
Imagine the sector had agreed on best
practice years ago – and implemented
it in a sensible way!
So, why are we here again? No really, why?
Data Science hub and KPMG Data Observatory
Data Science hub and KPMG Data Observatory launch (04 Nov)
"At a research intensive university like
Imperial it is hard to do anything that
doesn't involve data.“
James Stirling, Provost
"Data is at the heart of the human
condition."
Joanna Shields, UK Minister for Internet
Safety and Security
Considering these statements you’d think everyone, especially
Imperial, would have RDM all sorted, wouldn’t you?
… and yet we are losing research data
“In their parents' attic, in boxes in the garage, or stored on now-defunct floppy
disks — these are just some of the inaccessible places in which scientists have
admitted to keeping their old research data.”
http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
Isn’t research meant to be reproducible?
The results of only 6 out 53 ‘landmark’ studies were found
reproducible.
Drug development: Raise standards for preclinical cancer research.
DOI: doi:10.1038/483531a
“Several recent publications suggested that the seminal findings from
academic laboratories could only be reproduced 11–50% of the
time. The lack of data reproducibility likely contributes to the
difficulty in rapidly developing new drugs and biomarkers that
significantly impact the lives of patients with cancer and other
diseases.”
A Survey on Data Reproducibility in Cancer Research Provides
Insights into Our Limited Ability to Translate Findings from the
Laboratory to the Clinic. DOI: 10.1371/journal.pone.0063221
Shouldn’t the public too be allowed to play with data?
(This is in our own interest!)
RDM systems landscape
Case for a national infrastructure?
Currently, ~100 UK institutions spend effort to define and implement
an RDM infrastructure (storage, workflows, interfaces, metadata,
compliance, monitoring, business model etc.). Some aspects
have to be local, but…
…imagine a national research data infrastructure (say for data
publishing and preservation), run by RCUK:
• Economies of scale
• No issues with funding
• Just one system to interface with
• Increased visibility/discoverability
• Solution would by default be compliant
• No commercial “ownership” of public data
However, past experience suggests…
However, past experience suggests…
One RDM system to rule them all?
• Is community track record actually
better than funders’?
• Jisc offers components, but have we
found right model for collaboration
(supplier? leader? partner?)?
• Commercial solutions exist– trust?
Should they define our infrastructure?
 Funders set policy; 3rd parties
infrastructure – we’ve been too slow again!
 However, is one system actually suitable
(redundancy, competition, disciplines etc.)?
Until the one solution emerges (if ever), we should:
• consider defining minimum requirements (metadata,
identifiers, embargoes) for 3rd party solutions?
• use a flexible approach that enables us to learn and change
Imperial College London
(From funder policy to) institutional strategy
Imperial College London
• Seven London campuses
• Four Faculties: Engineering,
Medicine, Natural Sciences
and Business School
• Ranked 3rd in Europe / 8th in the
world (THE 2015-16 rankings)
• Net income (2014): £855m, incl.
£351m research grants and contracts
• ~15,000 students, ~7,400 staff, incl. ~3,900 academic & research staff
• Staff publish 10-12,000 scholarly articles per year
• Largest data traffic into Janet network of all UK universities
Process of policy development
• 2014: Draft policy: “Statement of Strategic Aims”
• Lack of reliable data (on data storage needs (scale) in particular)
• Concerns about cost of maintaining infrastructure
• Concerns about uncertainties and changing market / policy landscape
• Decision: re-think approach – more cost-effective, based on better data
• Approach: RDM Green Shoots and RDM Investigation
• Funded by Vice-Provost (Research)
• Green Shoots: 6 bottom-up, academic projects (2nd half of 2014)
• RDM investigation (Oct 2014-Jan 2015)
• Online survey (academics; 390 responses)
• ~40 interviews (academics)
• Workshops (academics & data managers)
RDM Green Shoots
• Haystack – a computational molecular data notebook
(Dr Mike Bearpark, Chemistry)
• Imperial College Healthcare Tissue Bank
(Prof. Gerry Thomas, Surgery & Cancer)
• Integrated Rule-based Data Management System for
Genome Sequencing Data (Dr Michael Mueller, Medicine)
• RDM in Computational and Experimental Molecular
Sciences (Prof. Henry Rzepa, Chemistry)
• RDM: Where software meets data (Dr Gerard Gorman &
Dr Matthew Piggott, Earth Science & Engineering)
• Time Series (Dr Nick Jones, Mathematics)
Idea
• Provide a platform and technology which automatically connects researchers
through their time-series data, models and analysis methods
Achievements
• Online interdisciplinary collection of time-series data and time-series analysis code
• Functionality to automatically profile time series
• Functionality to automatically profile time series algorithms
• Functionality to use these profiles to place a user’s work in the context of others
RDM Benefits
• Incentivises data sharing by allowing data comparison – increases discoverability of
an academic’s data plus increases likelihood of finding other relevant data
• Resource also available to general public
More Information
• http://www.comp-engine.org/timeseries/
Example project: Time Series
Online survey – where does active data live?
0 10 20 30 40 50 60 70 80
College computer
External/portable storage
Cloud storage
Personal computer
Departmental/group storage
College H drive
ICT central storage
Use of different types of storage in %
Online survey – growth of data volume
0 5 10 15 20 25 30
> 1 PB
100 TB – 1 PB
10 TB – 100 TB
1 TB – 10 TB
100 GB – 1 TB
10 GB – 100 GB
< 10 GB
Research group data storage needs in %
Now
In 2 years
Findings (best practice)
• RDM principles are considered to be sound but not fully practised
• Sharing publicly-funded data accepted in principle but some question
value and cost
• Concerns about (metadata) effort to make shared data discoverable
• Metadata schemas are not yet widely available across disciplines
• Auto-generate metadata where possible
• Consensus that RDM training for PhDs is vital
(also to ensure data loss when they leave)
Findings (data)
• 60-100% of grant required to re-generate data used in publications
• % of data that needs retaining to support publications: ~60%
• Data storage capacity will have to grow significantly
• Concerns around back-up and archiving, esp. considering data volume
• Popularity of cloud services (as opposed to College storage)
 Researchers want self-administered, secure, responsive solution
for data sharing, storing and archiving; open APIs preferred
(“Yes [storage] is really important. Basically, whenever we have been out
to talk to researchers, that's the thing they have latched on to and want to
talk about the most.” 10.1371/journal.pone.0114734)
Conclusions / policy implementation principles
• Provide platform-independent, flexible data storage
• Embed RDM training into PhD progression
• Where available, uses existing workflows:
• Symplectic Elements: metadata management
• Spiral (DSpace): public (metadata) catalogue
• Additional infrastructure:
• use external resources
• no long-term commitment
• as flexible as possible
• cost-effective
Reesult: Imperial College RDM Policy
“Imperial College London is committed to
promoting the highest standards of
academic research, including excellence in
research data management. This includes a
robust digital curation infrastructure that
supports open data access and protects
confidential data. The College acknowledges
legal, ethical and commercial constraints on
data sharing and the need to preserve the
academic entitlement to publication.”
“Principal Investigators have overall
responsibility for the effective management
of research data generated within or obtained
for their research, including by their research
groups. The Library and ICT will provide
training, guidance and services to support
PIs.” http://imperial.ac.uk/research-data-management
Building a flexible RDM infrastructure
Research Project
Data: Box
Software: GitHub
Data/software
stillneeded
Delete
External repositoryInternalStorage
Elements
Spiral
Creates data/software
Project ends
no
yes
Metadata, manual
or automatic
Can it be
published or
embargoed
externally?
yesno
Metadata, manual
or automatic
Can metadata
bepublished?
Library reviews
yes
Summarising RDM in 6 steps
1. Make a data management plan: use DMPOnline
2. Store your data management plan centrally: use InfoEd
3. Store your live data securely and safely: use Box
4. Store your final data (and/or code) for 10+ years,
making it publicly available: use Zenodo
5. Tell the College where your data (and/or code) is
published or stored: use Symplectic
6. Reference your funding and your data in the
publications it underpins: tell your publisher
Box – Data storage, sharing and syncing
Roll-out across College:
• unlimited data
storage
• online access, easy
sharing, data syncing
• file viewers included
• backup, data remains
even when staff leave
• machine learning
tools to describe data
• API
Infrastructure summary
• Flexible, can react to market / policy changes
• Components can be exchanged, no additional
in-house infrastructure
• Make a start, collect data, learn – change as required
• Preservation infrastructure needs further work
(discussions with Arkivum about ‘framework’ for
costing into grants) – how much do we need
to retain beyond published data?
• It isn’t perfect, but we can make a start
“In, through … and beyond”
RDM policy with research software requirements
“3.6.7 Cost Effectiveness – where computer-generated data may be
reliably recreated at a cost less than that of storing raw output data,
then the inputs and human-readable outputs of the relevant
programme may be stored instead along with a reference to or copy of
the software version used.”
“3.7 If software is developed as part of a research project, Principal
Investigators must archive the particular version of the software
used to generate or analyse the data in a repository and inform the
Library of its location, taking account of the points raised in 3.5
above. Principal Investigators are encouraged to follow the
Sustainability and Preservation Framework of the Software
Sustainability Institute.”
Treat software as valuable research output
PyRDM Green Shoots project
Zenodo integrates with GitHub
College survey on distributed version control
Software Sustainability Institute – I a fellow
ORCID – Open Researcher and Contributor ID
• Emerging global standard for identifying authors of academic outputs
• The College created ORCID iDs for academics staff in late 2014
(now 2,088 of 3,200 iDs claimed, ~1,500 linked in Elements)
• Imperial hosted launch of Jisc ORCID consortium with
50 UK universities in September 2015
http://www.imperial.ac.uk/orcid
Towards automating RDM reporting with ORCID
Author links ORCID
with CRIS
…shares ORCID iD
with repository
…publishes dataset
DataCite DOI linked to
ORCID iD
CRIS pulls metadata
from ORCID /
DataCite / Repository
But: is the external
metadata likely to be
complete “enough”?
Useful infrastructure makes compliance a by-product
• One workflow for data generation, publishing, reporting and curation
• Link data generation directly to storage (log into facility, data “at your
desk” before you are out of the “lab”)
• (HSS colleagues – “facility” can also be a book scanner
• Automate reporting and generating / sharing of metadata
Facilities
write
(meta)
data into
Box
Data
processed
/ analysed
from Box
Machine-
learning
adds
metadata
Publish to
repository
from Box,
with
reference
Metadata
directly or
indirectly
(ORCID)
to CRISS
Make data useful for us, not just for external re-use
Now that we get data, shouldn’t we analyse it?
Add value by:
• connect researchers who have similar data interests
• connect researchers to relevant data
• present data in a way that’s suitable for public reuse
• develop data analytics and knowledge transfer service
• collect impact information on data
• Let’s make a start and learn from doing, from actual data
• Think about where we can coordinate (3rd party requirements)
• It is early stages, take a flexible approach
• Don’t wait for funders, interpret policies in a useful way and lead
=> If we lead instead of following there will be fewer unpleasant
surprises to deal with!
Research Data, or: How I Learned to
Stop Worrying and Love the Policy
Image Credit (note NC licence!)
1. https://en.wikipedia.org/wiki/File:Dr._Strangelove_-
_Group_Captain_Lionel_Mandrake.png public domain
2. https://it.wikipedia.org/wiki/Why_We_Fight#/media/File:Why_We_Fight
_title.jpg public domain
3. https://commons.wikimedia.org/wiki/File:Hase_und_Igel_%281%29.jpg
public domain
4. https://www.flickr.com/photos/jdhancock/4617759902/ C-3PO vs. Data
(137/365), by JD Hancock, CC BY 2.0
5. https://en.wikipedia.org/wiki/One_Ring#/media/File:Unico_Anello.png
public domain
6. https://www.flickr.com/photos/dinnerseries/14994148089/ OXO tools,
by Didriks, CC BY 2.0
7. https://www.flickr.com/photos/albertovo5/3908190631/ How I Learned
To Stop Worrying..., by hjhipster, CC BY NC 2.0

Weitere ähnliche Inhalte

Was ist angesagt?

Imperial College London - journey to open scholarship
Imperial College London - journey to open scholarshipImperial College London - journey to open scholarship
Imperial College London - journey to open scholarshipTorsten Reimer
 
‘Everything Available’ – a vision for the development of the British Library ...
‘Everything Available’ – a vision for the development of the British Library ...‘Everything Available’ – a vision for the development of the British Library ...
‘Everything Available’ – a vision for the development of the British Library ...Torsten Reimer
 
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...UKSG: connecting the knowledge community
 
Implementing Open Access – BU and UCL
Implementing Open Access – BU and UCLImplementing Open Access – BU and UCL
Implementing Open Access – BU and UCLRepository Fringe
 
Archaeological Training in an Open Access World: Lessons from the REWARD Proj...
Archaeological Training in an Open Access World: Lessons from the REWARD Proj...Archaeological Training in an Open Access World: Lessons from the REWARD Proj...
Archaeological Training in an Open Access World: Lessons from the REWARD Proj...ariadnenetwork
 
How compliant is your institution? University of Glasgow RIOXX case study - M...
How compliant is your institution? University of Glasgow RIOXX case study - M...How compliant is your institution? University of Glasgow RIOXX case study - M...
How compliant is your institution? University of Glasgow RIOXX case study - M...Jisc
 
ORCID Implementations with University RIM Systems (Flinders University, L. Wa...
ORCID Implementations with University RIM Systems (Flinders University, L. Wa...ORCID Implementations with University RIM Systems (Flinders University, L. Wa...
ORCID Implementations with University RIM Systems (Flinders University, L. Wa...ORCID, Inc
 
Lessons in Open Access Compliance for Higher Education (LOCH)
Lessons in Open Access Compliance for Higher Education (LOCH)Lessons in Open Access Compliance for Higher Education (LOCH)
Lessons in Open Access Compliance for Higher Education (LOCH)Repository Fringe
 
Preparing for the UK Research Data Registry and Discovery Service
Preparing for the UK Research Data Registry and Discovery ServicePreparing for the UK Research Data Registry and Discovery Service
Preparing for the UK Research Data Registry and Discovery ServiceRepository Fringe
 
Jisc Publications Router: Delivering Open Access Content to Institutions
Jisc Publications Router: Delivering Open Access Content to InstitutionsJisc Publications Router: Delivering Open Access Content to Institutions
Jisc Publications Router: Delivering Open Access Content to InstitutionsEDINA, University of Edinburgh
 
Optimising Resources to develop a strategic approach to OA
Optimising Resources to develop a strategic approach to OAOptimising Resources to develop a strategic approach to OA
Optimising Resources to develop a strategic approach to OARepository Fringe
 
Finding, managing and using the right MediaHub content
Finding, managing and using the right MediaHub contentFinding, managing and using the right MediaHub content
Finding, managing and using the right MediaHub contentEDINA, University of Edinburgh
 
Pre equipment sharingandoa_v1_20160407
Pre equipment sharingandoa_v1_20160407Pre equipment sharingandoa_v1_20160407
Pre equipment sharingandoa_v1_20160407Marta Teperek
 
Implementing ISNIs and ORCIDs at La Trobe University
Implementing ISNIs and ORCIDs at La Trobe UniversityImplementing ISNIs and ORCIDs at La Trobe University
Implementing ISNIs and ORCIDs at La Trobe UniversitySimon Huggard
 
Linked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE ProjectLinked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE Projectariadnenetwork
 
Open access advocacy: joining the dots (session 4a)
Open access advocacy: joining the dots (session 4a)Open access advocacy: joining the dots (session 4a)
Open access advocacy: joining the dots (session 4a)Research Consulting Limited
 
Show me the money - the long path to a sustainable RDM Facility
Show me the money - the long path to a sustainable RDM FacilityShow me the money - the long path to a sustainable RDM Facility
Show me the money - the long path to a sustainable RDM FacilityJisc RDM
 
Data sharing in the Netherlands
Data sharing in the NetherlandsData sharing in the Netherlands
Data sharing in the NetherlandsJisc RDM
 
UKSG webinar: Making scholarly communication great again. Do institutional re...
UKSG webinar: Making scholarly communication great again. Do institutional re...UKSG webinar: Making scholarly communication great again. Do institutional re...
UKSG webinar: Making scholarly communication great again. Do institutional re...UKSG: connecting the knowledge community
 

Was ist angesagt? (20)

Imperial College London - journey to open scholarship
Imperial College London - journey to open scholarshipImperial College London - journey to open scholarship
Imperial College London - journey to open scholarship
 
‘Everything Available’ – a vision for the development of the British Library ...
‘Everything Available’ – a vision for the development of the British Library ...‘Everything Available’ – a vision for the development of the British Library ...
‘Everything Available’ – a vision for the development of the British Library ...
 
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
 
Implementing Open Access – BU and UCL
Implementing Open Access – BU and UCLImplementing Open Access – BU and UCL
Implementing Open Access – BU and UCL
 
Archaeological Training in an Open Access World: Lessons from the REWARD Proj...
Archaeological Training in an Open Access World: Lessons from the REWARD Proj...Archaeological Training in an Open Access World: Lessons from the REWARD Proj...
Archaeological Training in an Open Access World: Lessons from the REWARD Proj...
 
How compliant is your institution? University of Glasgow RIOXX case study - M...
How compliant is your institution? University of Glasgow RIOXX case study - M...How compliant is your institution? University of Glasgow RIOXX case study - M...
How compliant is your institution? University of Glasgow RIOXX case study - M...
 
ORCID Implementations with University RIM Systems (Flinders University, L. Wa...
ORCID Implementations with University RIM Systems (Flinders University, L. Wa...ORCID Implementations with University RIM Systems (Flinders University, L. Wa...
ORCID Implementations with University RIM Systems (Flinders University, L. Wa...
 
Lessons in Open Access Compliance for Higher Education (LOCH)
Lessons in Open Access Compliance for Higher Education (LOCH)Lessons in Open Access Compliance for Higher Education (LOCH)
Lessons in Open Access Compliance for Higher Education (LOCH)
 
Preparing for the UK Research Data Registry and Discovery Service
Preparing for the UK Research Data Registry and Discovery ServicePreparing for the UK Research Data Registry and Discovery Service
Preparing for the UK Research Data Registry and Discovery Service
 
Jisc Publications Router: Delivering Open Access Content to Institutions
Jisc Publications Router: Delivering Open Access Content to InstitutionsJisc Publications Router: Delivering Open Access Content to Institutions
Jisc Publications Router: Delivering Open Access Content to Institutions
 
Optimising Resources to develop a strategic approach to OA
Optimising Resources to develop a strategic approach to OAOptimising Resources to develop a strategic approach to OA
Optimising Resources to develop a strategic approach to OA
 
Finding, managing and using the right MediaHub content
Finding, managing and using the right MediaHub contentFinding, managing and using the right MediaHub content
Finding, managing and using the right MediaHub content
 
Pre equipment sharingandoa_v1_20160407
Pre equipment sharingandoa_v1_20160407Pre equipment sharingandoa_v1_20160407
Pre equipment sharingandoa_v1_20160407
 
CERIF CRIS UK landscape
CERIF CRIS UK landscapeCERIF CRIS UK landscape
CERIF CRIS UK landscape
 
Implementing ISNIs and ORCIDs at La Trobe University
Implementing ISNIs and ORCIDs at La Trobe UniversityImplementing ISNIs and ORCIDs at La Trobe University
Implementing ISNIs and ORCIDs at La Trobe University
 
Linked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE ProjectLinked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE Project
 
Open access advocacy: joining the dots (session 4a)
Open access advocacy: joining the dots (session 4a)Open access advocacy: joining the dots (session 4a)
Open access advocacy: joining the dots (session 4a)
 
Show me the money - the long path to a sustainable RDM Facility
Show me the money - the long path to a sustainable RDM FacilityShow me the money - the long path to a sustainable RDM Facility
Show me the money - the long path to a sustainable RDM Facility
 
Data sharing in the Netherlands
Data sharing in the NetherlandsData sharing in the Netherlands
Data sharing in the Netherlands
 
UKSG webinar: Making scholarly communication great again. Do institutional re...
UKSG webinar: Making scholarly communication great again. Do institutional re...UKSG webinar: Making scholarly communication great again. Do institutional re...
UKSG webinar: Making scholarly communication great again. Do institutional re...
 

Ähnlich wie Research Data, or: How I Learned to Stop Worrying and Love the Policy

Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
Rachel Bruce UK research and data management where are we now
Rachel Bruce UK research and data management where are we nowRachel Bruce UK research and data management where are we now
Rachel Bruce UK research and data management where are we nowJisc
 
Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...EDINA, University of Edinburgh
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Historic Environment Scotland
 
Introduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster UniversityIntroduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster UniversityLancaster University Library
 
Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management BlueprintEduserv
 
Libraries and Research Data Management – What Works? Lessons Learned from the...
Libraries and Research Data Management – What Works? Lessons Learned from the...Libraries and Research Data Management – What Works? Lessons Learned from the...
Libraries and Research Data Management – What Works? Lessons Learned from the...LIBER Europe
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Survey of research data management practices up2010
Survey of research data management practices up2010Survey of research data management practices up2010
Survey of research data management practices up2010heila1
 
Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011heila1
 
Supporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of StirlingSupporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of StirlingLisa Haddow
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Robin Rice
 
Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...L Molloy
 
Educause 2015 RDM Maturity
Educause 2015 RDM Maturity Educause 2015 RDM Maturity
Educause 2015 RDM Maturity ResearchSpace
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinarSarah Jones
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Incentives for modern research
Incentives for modern researchIncentives for modern research
Incentives for modern researchJisc
 

Ähnlich wie Research Data, or: How I Learned to Stop Worrying and Love the Policy (20)

Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Rachel Bruce UK research and data management where are we now
Rachel Bruce UK research and data management where are we nowRachel Bruce UK research and data management where are we now
Rachel Bruce UK research and data management where are we now
 
Looking After Your Data: RDM @ Edinburgh
Looking After Your Data: RDM @ EdinburghLooking After Your Data: RDM @ Edinburgh
Looking After Your Data: RDM @ Edinburgh
 
Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...
 
RDM@Edinburgh
RDM@EdinburghRDM@Edinburgh
RDM@Edinburgh
 
RDM@Edinburgh
RDM@EdinburghRDM@Edinburgh
RDM@Edinburgh
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...
 
Introduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster UniversityIntroduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster University
 
Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management Blueprint
 
Libraries and Research Data Management – What Works? Lessons Learned from the...
Libraries and Research Data Management – What Works? Lessons Learned from the...Libraries and Research Data Management – What Works? Lessons Learned from the...
Libraries and Research Data Management – What Works? Lessons Learned from the...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Survey of research data management practices up2010
Survey of research data management practices up2010Survey of research data management practices up2010
Survey of research data management practices up2010
 
Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011
 
Supporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of StirlingSupporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of Stirling
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...
 
Educause 2015 RDM Maturity
Educause 2015 RDM Maturity Educause 2015 RDM Maturity
Educause 2015 RDM Maturity
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinar
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Incentives for modern research
Incentives for modern researchIncentives for modern research
Incentives for modern research
 

Mehr von Torsten Reimer

Does anybody care about digital preservation? Digital preservation from a per...
Does anybody care about digital preservation? Digital preservation from a per...Does anybody care about digital preservation? Digital preservation from a per...
Does anybody care about digital preservation? Digital preservation from a per...Torsten Reimer
 
A Manifesto for the Digital Shift in Research Libraries
A Manifesto for the Digital Shift in Research LibrariesA Manifesto for the Digital Shift in Research Libraries
A Manifesto for the Digital Shift in Research LibrariesTorsten Reimer
 
Researching researchers Delivering a systematic user research programme in a ...
Researching researchers Delivering a systematic user research programme in a ...Researching researchers Delivering a systematic user research programme in a ...
Researching researchers Delivering a systematic user research programme in a ...Torsten Reimer
 
The once and future library: will there be, and what might a research library...
The once and future library: will there be, and what might a research library...The once and future library: will there be, and what might a research library...
The once and future library: will there be, and what might a research library...Torsten Reimer
 
For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...Torsten Reimer
 
Making ‘Everything Available’ – Transforming the (online) services and experi...
Making ‘Everything Available’ – Transforming the (online) services and experi...Making ‘Everything Available’ – Transforming the (online) services and experi...
Making ‘Everything Available’ – Transforming the (online) services and experi...Torsten Reimer
 
‘Everything Available’ – the strategy for the British Library’s research serv...
‘Everything Available’ – the strategy for the British Library’s research serv...‘Everything Available’ – the strategy for the British Library’s research serv...
‘Everything Available’ – the strategy for the British Library’s research serv...Torsten Reimer
 

Mehr von Torsten Reimer (7)

Does anybody care about digital preservation? Digital preservation from a per...
Does anybody care about digital preservation? Digital preservation from a per...Does anybody care about digital preservation? Digital preservation from a per...
Does anybody care about digital preservation? Digital preservation from a per...
 
A Manifesto for the Digital Shift in Research Libraries
A Manifesto for the Digital Shift in Research LibrariesA Manifesto for the Digital Shift in Research Libraries
A Manifesto for the Digital Shift in Research Libraries
 
Researching researchers Delivering a systematic user research programme in a ...
Researching researchers Delivering a systematic user research programme in a ...Researching researchers Delivering a systematic user research programme in a ...
Researching researchers Delivering a systematic user research programme in a ...
 
The once and future library: will there be, and what might a research library...
The once and future library: will there be, and what might a research library...The once and future library: will there be, and what might a research library...
The once and future library: will there be, and what might a research library...
 
For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...
 
Making ‘Everything Available’ – Transforming the (online) services and experi...
Making ‘Everything Available’ – Transforming the (online) services and experi...Making ‘Everything Available’ – Transforming the (online) services and experi...
Making ‘Everything Available’ – Transforming the (online) services and experi...
 
‘Everything Available’ – the strategy for the British Library’s research serv...
‘Everything Available’ – the strategy for the British Library’s research serv...‘Everything Available’ – the strategy for the British Library’s research serv...
‘Everything Available’ – the strategy for the British Library’s research serv...
 

Kürzlich hochgeladen

Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxMYDA ANGELICA SUAN
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational PhilosophyShuvankar Madhu
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 

Kürzlich hochgeladen (20)

Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptx
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational Philosophy
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 

Research Data, or: How I Learned to Stop Worrying and Love the Policy

  • 1. Research Data, or: How I Learned to Stop Worrying and Love the Policy RDMF14: Research Data (and) Systems York, 9th November 2015 Dr Torsten Reimer Scholarly Communications Officer Imperial College London t.reimer@imperial.ac.uk / @torstenreimer http://orcid.org/0000-0001-8357-9422
  • 2. Why are we here?
  • 3. Why we fight – Compliance! Really? “Well compliance is really important, yes that's the whole reason we are doing it really. I mean to comply with Research Council guidelines yes. I am not saying the whole reason but that's the main driver, yes.” 10.1371/journal.pone.0114734 There are issues with RCUK/EPSRC policy: • cost-benefit analysis, anyone? • expensive/issues around funding • enough support/incentive for culture change? • fine in theory, but is it workable in practice? But…
  • 4. Blame funders, or blame ourselves (hedgehog and hare)? It seems wherever we go, the funders have already been there: HEFCE open access policy; EPSRC data policy… Are the funders too fast? Or we too slow? Imagine the sector had agreed on best practice years ago – and implemented it in a sensible way!
  • 5. So, why are we here again? No really, why?
  • 6. Data Science hub and KPMG Data Observatory
  • 7. Data Science hub and KPMG Data Observatory launch (04 Nov) "At a research intensive university like Imperial it is hard to do anything that doesn't involve data.“ James Stirling, Provost "Data is at the heart of the human condition." Joanna Shields, UK Minister for Internet Safety and Security Considering these statements you’d think everyone, especially Imperial, would have RDM all sorted, wouldn’t you?
  • 8. … and yet we are losing research data “In their parents' attic, in boxes in the garage, or stored on now-defunct floppy disks — these are just some of the inaccessible places in which scientists have admitted to keeping their old research data.” http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
  • 9. Isn’t research meant to be reproducible? The results of only 6 out 53 ‘landmark’ studies were found reproducible. Drug development: Raise standards for preclinical cancer research. DOI: doi:10.1038/483531a “Several recent publications suggested that the seminal findings from academic laboratories could only be reproduced 11–50% of the time. The lack of data reproducibility likely contributes to the difficulty in rapidly developing new drugs and biomarkers that significantly impact the lives of patients with cancer and other diseases.” A Survey on Data Reproducibility in Cancer Research Provides Insights into Our Limited Ability to Translate Findings from the Laboratory to the Clinic. DOI: 10.1371/journal.pone.0063221
  • 10. Shouldn’t the public too be allowed to play with data?
  • 11. (This is in our own interest!)
  • 13. Case for a national infrastructure? Currently, ~100 UK institutions spend effort to define and implement an RDM infrastructure (storage, workflows, interfaces, metadata, compliance, monitoring, business model etc.). Some aspects have to be local, but… …imagine a national research data infrastructure (say for data publishing and preservation), run by RCUK: • Economies of scale • No issues with funding • Just one system to interface with • Increased visibility/discoverability • Solution would by default be compliant • No commercial “ownership” of public data
  • 16. One RDM system to rule them all? • Is community track record actually better than funders’? • Jisc offers components, but have we found right model for collaboration (supplier? leader? partner?)? • Commercial solutions exist– trust? Should they define our infrastructure?  Funders set policy; 3rd parties infrastructure – we’ve been too slow again!  However, is one system actually suitable (redundancy, competition, disciplines etc.)? Until the one solution emerges (if ever), we should: • consider defining minimum requirements (metadata, identifiers, embargoes) for 3rd party solutions? • use a flexible approach that enables us to learn and change
  • 17. Imperial College London (From funder policy to) institutional strategy
  • 18. Imperial College London • Seven London campuses • Four Faculties: Engineering, Medicine, Natural Sciences and Business School • Ranked 3rd in Europe / 8th in the world (THE 2015-16 rankings) • Net income (2014): £855m, incl. £351m research grants and contracts • ~15,000 students, ~7,400 staff, incl. ~3,900 academic & research staff • Staff publish 10-12,000 scholarly articles per year • Largest data traffic into Janet network of all UK universities
  • 19. Process of policy development • 2014: Draft policy: “Statement of Strategic Aims” • Lack of reliable data (on data storage needs (scale) in particular) • Concerns about cost of maintaining infrastructure • Concerns about uncertainties and changing market / policy landscape • Decision: re-think approach – more cost-effective, based on better data • Approach: RDM Green Shoots and RDM Investigation • Funded by Vice-Provost (Research) • Green Shoots: 6 bottom-up, academic projects (2nd half of 2014) • RDM investigation (Oct 2014-Jan 2015) • Online survey (academics; 390 responses) • ~40 interviews (academics) • Workshops (academics & data managers)
  • 20. RDM Green Shoots • Haystack – a computational molecular data notebook (Dr Mike Bearpark, Chemistry) • Imperial College Healthcare Tissue Bank (Prof. Gerry Thomas, Surgery & Cancer) • Integrated Rule-based Data Management System for Genome Sequencing Data (Dr Michael Mueller, Medicine) • RDM in Computational and Experimental Molecular Sciences (Prof. Henry Rzepa, Chemistry) • RDM: Where software meets data (Dr Gerard Gorman & Dr Matthew Piggott, Earth Science & Engineering) • Time Series (Dr Nick Jones, Mathematics)
  • 21. Idea • Provide a platform and technology which automatically connects researchers through their time-series data, models and analysis methods Achievements • Online interdisciplinary collection of time-series data and time-series analysis code • Functionality to automatically profile time series • Functionality to automatically profile time series algorithms • Functionality to use these profiles to place a user’s work in the context of others RDM Benefits • Incentivises data sharing by allowing data comparison – increases discoverability of an academic’s data plus increases likelihood of finding other relevant data • Resource also available to general public More Information • http://www.comp-engine.org/timeseries/ Example project: Time Series
  • 22. Online survey – where does active data live? 0 10 20 30 40 50 60 70 80 College computer External/portable storage Cloud storage Personal computer Departmental/group storage College H drive ICT central storage Use of different types of storage in %
  • 23. Online survey – growth of data volume 0 5 10 15 20 25 30 > 1 PB 100 TB – 1 PB 10 TB – 100 TB 1 TB – 10 TB 100 GB – 1 TB 10 GB – 100 GB < 10 GB Research group data storage needs in % Now In 2 years
  • 24. Findings (best practice) • RDM principles are considered to be sound but not fully practised • Sharing publicly-funded data accepted in principle but some question value and cost • Concerns about (metadata) effort to make shared data discoverable • Metadata schemas are not yet widely available across disciplines • Auto-generate metadata where possible • Consensus that RDM training for PhDs is vital (also to ensure data loss when they leave)
  • 25. Findings (data) • 60-100% of grant required to re-generate data used in publications • % of data that needs retaining to support publications: ~60% • Data storage capacity will have to grow significantly • Concerns around back-up and archiving, esp. considering data volume • Popularity of cloud services (as opposed to College storage)  Researchers want self-administered, secure, responsive solution for data sharing, storing and archiving; open APIs preferred (“Yes [storage] is really important. Basically, whenever we have been out to talk to researchers, that's the thing they have latched on to and want to talk about the most.” 10.1371/journal.pone.0114734)
  • 26. Conclusions / policy implementation principles • Provide platform-independent, flexible data storage • Embed RDM training into PhD progression • Where available, uses existing workflows: • Symplectic Elements: metadata management • Spiral (DSpace): public (metadata) catalogue • Additional infrastructure: • use external resources • no long-term commitment • as flexible as possible • cost-effective
  • 27. Reesult: Imperial College RDM Policy “Imperial College London is committed to promoting the highest standards of academic research, including excellence in research data management. This includes a robust digital curation infrastructure that supports open data access and protects confidential data. The College acknowledges legal, ethical and commercial constraints on data sharing and the need to preserve the academic entitlement to publication.” “Principal Investigators have overall responsibility for the effective management of research data generated within or obtained for their research, including by their research groups. The Library and ICT will provide training, guidance and services to support PIs.” http://imperial.ac.uk/research-data-management
  • 28. Building a flexible RDM infrastructure
  • 29. Research Project Data: Box Software: GitHub Data/software stillneeded Delete External repositoryInternalStorage Elements Spiral Creates data/software Project ends no yes Metadata, manual or automatic Can it be published or embargoed externally? yesno Metadata, manual or automatic Can metadata bepublished? Library reviews yes
  • 30. Summarising RDM in 6 steps 1. Make a data management plan: use DMPOnline 2. Store your data management plan centrally: use InfoEd 3. Store your live data securely and safely: use Box 4. Store your final data (and/or code) for 10+ years, making it publicly available: use Zenodo 5. Tell the College where your data (and/or code) is published or stored: use Symplectic 6. Reference your funding and your data in the publications it underpins: tell your publisher
  • 31. Box – Data storage, sharing and syncing Roll-out across College: • unlimited data storage • online access, easy sharing, data syncing • file viewers included • backup, data remains even when staff leave • machine learning tools to describe data • API
  • 32. Infrastructure summary • Flexible, can react to market / policy changes • Components can be exchanged, no additional in-house infrastructure • Make a start, collect data, learn – change as required • Preservation infrastructure needs further work (discussions with Arkivum about ‘framework’ for costing into grants) – how much do we need to retain beyond published data? • It isn’t perfect, but we can make a start
  • 33. “In, through … and beyond”
  • 34. RDM policy with research software requirements “3.6.7 Cost Effectiveness – where computer-generated data may be reliably recreated at a cost less than that of storing raw output data, then the inputs and human-readable outputs of the relevant programme may be stored instead along with a reference to or copy of the software version used.” “3.7 If software is developed as part of a research project, Principal Investigators must archive the particular version of the software used to generate or analyse the data in a repository and inform the Library of its location, taking account of the points raised in 3.5 above. Principal Investigators are encouraged to follow the Sustainability and Preservation Framework of the Software Sustainability Institute.”
  • 35. Treat software as valuable research output PyRDM Green Shoots project Zenodo integrates with GitHub College survey on distributed version control Software Sustainability Institute – I a fellow
  • 36. ORCID – Open Researcher and Contributor ID • Emerging global standard for identifying authors of academic outputs • The College created ORCID iDs for academics staff in late 2014 (now 2,088 of 3,200 iDs claimed, ~1,500 linked in Elements) • Imperial hosted launch of Jisc ORCID consortium with 50 UK universities in September 2015 http://www.imperial.ac.uk/orcid
  • 37. Towards automating RDM reporting with ORCID Author links ORCID with CRIS …shares ORCID iD with repository …publishes dataset DataCite DOI linked to ORCID iD CRIS pulls metadata from ORCID / DataCite / Repository But: is the external metadata likely to be complete “enough”?
  • 38. Useful infrastructure makes compliance a by-product • One workflow for data generation, publishing, reporting and curation • Link data generation directly to storage (log into facility, data “at your desk” before you are out of the “lab”) • (HSS colleagues – “facility” can also be a book scanner • Automate reporting and generating / sharing of metadata Facilities write (meta) data into Box Data processed / analysed from Box Machine- learning adds metadata Publish to repository from Box, with reference Metadata directly or indirectly (ORCID) to CRISS
  • 39. Make data useful for us, not just for external re-use Now that we get data, shouldn’t we analyse it? Add value by: • connect researchers who have similar data interests • connect researchers to relevant data • present data in a way that’s suitable for public reuse • develop data analytics and knowledge transfer service • collect impact information on data
  • 40. • Let’s make a start and learn from doing, from actual data • Think about where we can coordinate (3rd party requirements) • It is early stages, take a flexible approach • Don’t wait for funders, interpret policies in a useful way and lead => If we lead instead of following there will be fewer unpleasant surprises to deal with! Research Data, or: How I Learned to Stop Worrying and Love the Policy
  • 41. Image Credit (note NC licence!) 1. https://en.wikipedia.org/wiki/File:Dr._Strangelove_- _Group_Captain_Lionel_Mandrake.png public domain 2. https://it.wikipedia.org/wiki/Why_We_Fight#/media/File:Why_We_Fight _title.jpg public domain 3. https://commons.wikimedia.org/wiki/File:Hase_und_Igel_%281%29.jpg public domain 4. https://www.flickr.com/photos/jdhancock/4617759902/ C-3PO vs. Data (137/365), by JD Hancock, CC BY 2.0 5. https://en.wikipedia.org/wiki/One_Ring#/media/File:Unico_Anello.png public domain 6. https://www.flickr.com/photos/dinnerseries/14994148089/ OXO tools, by Didriks, CC BY 2.0 7. https://www.flickr.com/photos/albertovo5/3908190631/ How I Learned To Stop Worrying..., by hjhipster, CC BY NC 2.0