Open University Library training delivered on 8th December 2017 about making research data open. Covers repositories and archives for research data (including ORDO), preparing data for sharing, rights related to research data and ethical concerns related to data sharing.
Procuring digital preservation CAN be quick and painless with our new dynamic...
OU Library Training: Making your research data open
1. Isabel Chadwick & Dan Crane
Research Support Librarians
library-research-support@open.ac.uk
Making your research data
open
8th December 2017
2. Overview of the workshop
• What/how/when/why (not) to share
• Preparing data for sharing
• Rights and data sharing
• Re-using data
• Ethics and data sharing
• Questions/further information
3. Rufus Pollock, Cambridge University and Open
Knowledge Foundation, 2008
“The coolest thing to do with
your data will be thought of by
someone else.”
6. “In keeping with OU principles of openness,
it is expected that research data will be open
and accessible to other researchers, as soon
as appropriate and verifiable, subject to the
application of appropriate safeguards
relating to the sensitivity of the data and
legal and commercial requirements.”
OU Research Data Management Policy, November 2016
http://www.open.ac.uk/library-research-support/sites/www.open.ac.uk.library-
research-support/files/files/Open-University-Research-Data-Management-Policy.pdf
Why should you share your data?
Policies: Open University…
10. What do you need to share?
• Raw data
• Derived data
• Data underpinning
publications
• Code
• Methods
What are research data in your context?
What would others need to understand your research?
11. Barriers to sharing
Discussion
• Look at the discussion cards
on your table.
• Can these barriers to sharing
be overcome?
• How?
• If not, why not?
12. Open Research Data Online
(ORDO)
Online data sharing services
• Figshare
• Zenodo
• CKAN DataHub
• Mendeley Data
Directories
• re3data
Funders’ repository services
• UK Data Service ReShare
• NERC data centres
How to share
Data repositories
14. Preparing data for sharing
Metadata/documentation
“...make sure that data are fully
described, so that consumers have
sufficient information to understand
their strengths, weaknesses,
analytical limitations, and security
requirements as well as how to
process the data...”
G8 Open Data Charter (2013)
https://www.gov.uk/government/publications/open-data-
charter/g8-open-data-charter-and-technical-annex
15. Preparing data for sharing
Metadata/documentation
What do others need to understand your data?
Embedded documentation
• code, field and label
descriptions
• descriptive headers or
summaries
• recording information in
the Document Properties
function of a file
(Microsoft)
Supporting documentation
• Working papers or
laboratory books
• Questionnaires or
interview guides
• Final project reports and
publications
• Catalogue metadata
16. Preparing data for sharing
Metadata/documentation
Imagine you have just downloaded the data
sample sheet from a repository...
1. What contextual or explanatory information is
missing?
2. Is there anything odd about the data that
needs clarifying?
3. What additional documentation
would you like to see supplied?
17. Preparing data for sharing
File formats
• Unencrypted
• Uncompressed
• Non-proprietary/patent-encumbered
• Open, documented standard
• Standard representation (ASCII, Unicode)
Type Recommended Avoid for data sharing
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
18. Rights related to data sharing
Intellectual property rights
IP will usually belong to the
institution (OU) but...
• Sometimes funders exert
claims over rights
• When working with
commercial partners
there may be joint IP
rights – best handled with
an agreement/contract
19. Rights related to data sharing
Copyright/Database rights
Database rights
apply when there has
been substantial
intellectual
investment in
obtaining, verifying or
presenting content in
an original manner
Copyright applies to:
• Original literary
dramatic, musical
and artistic works
• Sound recordings,
films, broadcasts
• The typographical
arrangement of
publications
• NOT facts
20. Rights related to data sharing
Open licences
How do you want your data to be re-used?
Other options should
be considered for
databases and
software.
21. Rights related to data sharing
Freedom of Information
Be prepared...
22. Rights related to data sharing
Freedom of Information
• Research data can be the subject of FoI requests
• Must be provided unless an exemption or exception
allows your institution not to disclose it
• Could be addressed to anyone in the organisation
• Only 20 working days to respond
Exemptions:
• Accessible by other means
• Contains personal data
• Subject to a duty of confidentiality
• Release would prejudice legitimate commercial interests
• Intended for future publication (but must be the data, not
a paper based on the data)
23. Re-using data
Consider...
• Citation
• Purpose
• Discovery
• Access
• Cost
• Licensing
Prepare for...
• Data cleansing
• Data
interpretation
difficulties
• Data
disappearance
Where to look...
• Disciplinary
data archives
• Re3data
• Datacite
• British Library
• Data access
statements
24. Ensure you have obtained valid consent and provided
information re.:
• The research and nature of participation
• How confidentiality will be maintained
• Options for varied consent conditions for participation,
publication and data sharing
• How research data will be stored and preserved in the
long-term
• How data may be used for future research or teaching
and any restrictions on that use
Ethics and data sharing
Obtaining consent
At a minimum, consent forms
should not preclude data
sharing, such as by promising
to destroy data unnecessarily.
25. Ethics and data sharing
Activity
Look at one of the example consent
forms and discuss:
• What are your initial impressions?
• How effective is it?
• Is there anything
missing/unnecessary?
• Compare it to consent forms
you’ve used: would you change
anything on your own having read
it?
26. Library Services
How we can help
• Open Research Data Online (ORDO)
• Help with Data Management Plans and consent forms
• Advice on preparation of data for sharing
• Data catalogue on ORO
• Online guidance
• Enquiries
Email: library-research-
support@open.ac.uk
27. Useful links
• The OU Library Research Support website: http://www.open.ac.uk/library-
research-support/research-data-management
• Open Research Data Online (ORDO): https://ou.figshare.com
• Digital Curation Centre: http://www.dcc.ac.uk/
• DMP Online: https://dmponline.dcc.ac.uk/
• UK Data Archive: http://www.data-archive.ac.uk/
• MANTRA: http://datalib.edina.ac.uk/mantra/
• The Orb: http://open.ac.uk/blogs/the_orb
29. 1. Sharing your data isn’t just about compliance
2. Good metadata enables re-use
3. Know your rights – and make conditions for
re-use clear
3 take home points...
1 (3)
Show of hands who’s interested in ethics? If not everyone, move this section to the end and invite those who aren’t interested to leave.
1 (4)
1 (5)
Many funders now have policies which require you to share your data.
Even if when your grant was approved there was no policy, check again as things are changing.
REF policy includes extra points for environment – i.e. University providing data services and also for going “above and beyond” the Open Access policy, which will include making underlying data available.
1 (6)
Publishers are really beginning to come on board with data sharing and many require it upon submission of papers.
2 (8)
The OU’s RDM policy was approved by Research Committee in November 2016.
Make your data open wherever possible (including physical data) – no later than the first date of online publication of research.
Published research papers must include statements on how and on what terms supporting data may be accessed, or if there is no data the paper should make that clear.
Manage it responsibly throughout your project
The university will provide services and facilities, training support and guidance
Note: All those engaged in research at the OU, including those involved in collaborating with other institutions, must take personal responsibility for managing their research data in accordance with University and funder requirements
1 (9)
Sharing data can have huge impacts on collaboration between researchers world wide as this example shows.
1 (10)
You might remember this news story about George Osborne basing the austerity plan on research data which had been incorrectly analysed. By making data public these kinds of anomalies are more likely to be spotted and incidents like this less likely to happen!
1 (11)
And of course there is a personal benefit to you as a researcher. Studies have found that there is between a 9% and a 30% increase in citations for papers which make the underlying data available.
2 (13)
Think about what research data are in your he data which underpins your publications, but you need to think about whether this will be understandable to others, would they be able to replicate your results? So you might also want to share your code or your methods to enable better understanding/context.
Depending on your academic discipline and the data type, what you share may vary.
You might want to share raw data, but in some disciplines this might be totally inappropriate, as they will be too vast and meaningless to other people.
You might just want to share your derived, analysed data
Or you might only want to share t.
Describe the data your project will create to your neighbour, think about how much of that data you want , or think is practical , to share, then think between you how much context your neighbour would need in order to be able to work with the data.
10 (23)
Remember to take activity to session!
2 (25)
There are a number of ways that you can share your data.
The OU recently implemented ORDO – the OU’s institutional research data repository. You can use ORDO to store both live and archive research data. It is based upon the Figshare platform and, crucially, allows you to create a permanent link, a DOI, to your uploaded published research data. This makes it easier to share with others, and provides a means for others to cite and link to your data, thereby giving you proper credit for your work. In a second phase of development, we will be looking to integrate all research datasets added to ORDO, with ORO, so that they show in your staff profile pages. ORO is also our institutional workhorse for the REF, so we will bringing all your research outputs together in one place.
Currently you are required to crate a metadata record of your research datasets in ORO, with a link to their storage location, or details about how enquirers might gain access to them.
Externally, there are a number of repositories. Your funder may well have a repository in which you are required to deposit your data, like the ESRC Also, the NERC data centres. So funders often require you to use their data centres for funded research, and you may want to store your res data in discipline specific services, before you consider using ORDO. That’s fine, it’s supposed to be there as a backup service.
In addition to this there are several free, online services like Figshare, which was devised by someone from Imperial College and is used now by various journals to publish data underpinning research publications. It can also be used as a datastore throughout your project, as it allows online analysis of data, and collaboration with other partners. You may upload unlimited public data and you also get a 20GB allowance for private data.
Zenodo is a similar tool, but can only be used for publication, this was developed by CERN as part of the EU OpenAIRE project and is aimed at the long-tail of science. There is a maximum threshold for upload of 20GB per file, and you are able to include multiple files in one dataset or collection.
CKAN datahub is another similar, free-to-use tool.
There are now a number of journals which specialise in research data, such as Gigascience or BMC Research Notes. They publish peer reviewed articles describing datasets for future reuse. Other journals may allow you to link to your data stored in Figshare or Dryad.
And finally you can find possible repositories to publish your data in re3data, which lists repositories according to academic discipline. All the services here are linked from the RDM intranet pages, and soon to be released library research support website pages.
5 (30)
Who can use it?
All University research staff and doctoral students, except those working at Affiliated research centres.
External collaborating researchers will also be allowed to use the service, as invited collaborators onto OU workspaces run by OU staff.
What can they use it for?
To store and publish OU research data that supports original research activity. This data store is intended to be used primarily as a research data store once a project has been completed, although it is possible to store live research data from “in-flight” projects in here too. Many funded projects will already have a recognised data store that researchers will be expected to use to make their research data available [use relevant examples here.. like the UK Data Archive as used by the ESRC, or the NERC (Natural Environment) Polar Data Centre and British Oceanographic Data Centre.] Others might use data stores required by specific publishers. What this facility provides is an easy to use alternative to those data stores, so that researchers can store their supporting research data which they deem has long term value.
[It was established particularly to answer a requirement by the EPSRC that all institutions working on research funded by them should provide a means within the institution of supporting research data publication.]
Time for a very quick demo of the system
You can access ORDO through the intranet A-Z list, through the LRS website or simply google ORDO OU.
The landing page shows you the most recently uploaded items. Some files will have a preview in this thumbnail version.
Show a couple of examples of things uploaded – video of clarinet thing
also this thing from public figshare on language acquisition in baby: https://figshare.com/articles/A_baby_s_first_250_words_time_stamped_at_Twitter_/991275 [change for demos in other disciplines)
Show different features on page - altmetrics, views, discussion, citations, share, cite etc.
Show the collection. Collections are useful as a way of curating your research material to give it particular value or show it in a particular context. Collections can be public (like this one) or private. Public collections are assigned a DOI.
To log in click on the red box in the corner, use your institution log in.
Uploading items is really easy – demo this on test site!
Confidential files – upload your data to keep a record and get a DOI but keep the data hidden
Embargoed file –make data available after a certain amount of time to meet ethical or commercial requirements
Metadata records – don’t store the data on figshare, but use it to keep a record and get a DOI (don’t do this if your data has a DOI from another repository)
Private links – share a private link with a colleague or reviewer without having to make the data public
What next?
We plan to harvest metadata from ORDO and populate ORO records with it. This is because ORO is our workhorse for integrating with other systems; ORO has established methods for populating OU staff profile pages with RSS feeds, it is indexed by Google Scholar, is our key tool for managing REF publications compliance.
1 (31)
Adding context to your data means that there is less risk that people with misinterpret it, and will open up more possibilities for re-use. Both by other people and by you!
2 (33)
Embedded documentation is integral to the file
Supporting documentation is in addition to the file. We encourage you to upload a readme file alongside any data you deposit in ORDO. Make sure it’s structured in a way that makes it easy to understand.
Anything that will help others understand your data is useful.
5 (38)
Embedded documentation is integral to the file
Supporting documentation is in addition to the file. We encourage you to upload a readme file alongside any data you deposit in ORDO. Make sure it’s structured in a way that makes it easy to understand.
Anything that will help others understand your data is useful.
2 (40)
2 (42)
Depending on what your data is it may or may not attract copyright.
1 (43)
4 (47)
Assigning a licence will make it really clear to anyone who wants to re-use the data what the conditions are. Take your time to think about this. We would normally recommend CC-BY.
No derivatives is not brilliant for data as it really limits how people can re-use it, and suppresses innovation.
Non-commercial can be difficult to unpack too and may preclude the data being used in support of works for which an author is given recompense (such as textbooks), and might preclude the data being used in support of works that are sold (such as journal articles) even if the author does not benefit financially.
Take time to explain this.
1 (48)
There have been some high profile cases around Freedom of Information and research data, including the ClimateGate scandal at UEA.
The rules are fairly clear, so it’s best to be prepared and think about this early on in your project.
2 (50)
If there are exemptions, try to identify them at the start of your project so that you are prepared if a request comes in. Good thing to include in your DMP.
4 (54)
POSSIBLY MOVE THIS SECTION TO AFTER QUESTIONS (need 15 mins)
In the past researchers gained consent from participants primarily so that they could collect data.
However, many funders are now increasingly requesting researchers to share and preserve their data as part of their requirements.
It is therefore important that participants fully understand:
how you will store, publish and share their data
how you will ensure that their data remains confidential and anonymous (where applicable) throughout the duration of the project and after
Failure to obtain consent could result in non-compliance with your funder's requirements and limit the opportunities you have to share, publish and preserve your data.
If things change, you may be able to go back to your participants and change the details of the agreement.
Anonymisation can be time-consuming, so agreeing what can and can’t be recorded or transcribed may well save you time and effort. For example, if they don’t want you to use names, then conduct the interview without using names.
Consider who needs access to the data
Inform your participants what will happen with the data after the project has finished
At a minimum, do not preclude data sharing e.g. by promising to destroy data
Pre-planning and agreeing with participants during the consent process, on what may and may not be recorded or transcribed, can be more effective than anonymisation
Consider controlling access if anonymisation or consent for sharing are impossible
15 mins
2 (56) + 15 (71)
Please use this email address rather than my personal email as I work part time and it may need to be picked up by one of my colleagues.
1 (57) + 15 (72)
Links to additional resources are available on the RDM intranet site.
I’ll put this presentation on The Orb after the workshop.
Please do fill in the feedback questionnaire that you get sent after this - we’re going to revamp this session for next year so we will be keen to take any suggestions on board.
15 (72) + 15 (87)
Remind them to fill in feedback form!!!
15 (72) + 15 (87)
Remind them to fill in feedback form!!!