This webinar provides information about strategies for successful Research Data Management, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management.
At the end of the session participants will be able to:
- Understand the basic principles and importance of RDM
- Set clear goals regarding data curation, preservation and sharing
- Comply with the requirements of the Research Data Pilot
- Draft a Data Management Plan
- Identify RDM resources and tools
fundamental of entomology all in one topics of entomology
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
1. Open Research Data Pilot
Open research data and data management
for Horizon 2020 projects
OpenAIRE
Belgium
Emilie Hermans
Project Assistant OpenAIRE, UGent can be reused under the CC BY license
2. 2
Why data management/
open data?
1. e.g. Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://doi.org/10.7717/peerj.175, Piwowar HA, Day RS, Fridsma DB
(2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308
2. Cartoon: "recycle" | Foster by Patrick Hochstenbach, 2015
1. Prevents data loss
2. Data management to maximize usefulness:
organize, make understandable and reusable
3. Fosters creativity, participation of citizens
and increases transparency
4. Get credit: (much!) longer shelf life than
interpretation
3. 3
The Open Research Data Pilot
Horizon 2020
limited and flexible pilot
• Avoid duplication of research and
loss of resources
• Foster Open Science:
transparency, effectiveness and
greater impact
Open Access
to research data
Data Management
Planning
4. 4
Which areas are participating?
Open Research Data Pilot
Projects in other areas can participate on a voluntary basis
• Check Article 29.3 of the Model Grant Agreement
• Costs eligible (Article 6.2.D.3 of the Model Grant Agreement)
• Future and Emerging Technologies
• Research infrastructures (including e-Infrastructures)
• Leadership in enabling and industrial technologies – Information and Communication Technologies
• Nanotechnologies, Advanced Materials, Advanced Manufacturing and Processing, and Biotechnology:
‘nanosafety’ and ‘modelling’ topics
• Societal Challenge: Food security, sustainable agriculture and forestry, marine and maritime and inland
water research and the bioeconomy - selected topics in the calls H2020-SFS-2016/2017, H2020-BG-
2016/2017, H2020-RUR-2016/2017 and H2020-BB-2016/2017, as specified in the work programme
• Societal Challenge: Climate Action, Environment, Resource Efficiency and Raw materials – except raw
materials
• Societal Challenge: Europe in a changing world – inclusive, innovative and reflective Societies
• Science with and for Society
• Cross-cutting activities - focus areas – part Smart and Sustainable Cities.
5. 5
Requirements of the Data Pilot
1. Develop a Data Management Plan (DMP)
2. Deposit data in a research data repository
Open Research Data Pilot
3. Open data: freely used, modified, and shared by anyone for
any purpose
4. Provide information, tools and instruments needed to
validate results
6. REASONS FOR OPTING-OUT
6Open Research Data Pilot
• Exploitation of results
• Confidentiality
• Protection of personal data
• Would jeopardize the main aim of the action
• No data generated
• Any other legitimate reason
• Complete opt-out via project amendment
• Complete or partially opt-out: describe issues in project DMP
• As open as possible as closed as necessary
Projects can opt out at any stage.
7. 7
Develop a DMP
Open Research Data Pilot
Updated minimum at:
• Initial DMP: within first 6 Months of the project
• Mid-term review
• Final project review
Data
Management
Plan
(DMP)
Living document: revise and update
Data management plan (DMP):
• Well managed in present and prepared for preservation in the future
• Handling of data during and after project
8. The DMP should address the points below on a
dataset by dataset basis:
• Data set reference and name
• Data set description
• Standards and metadata
• Data sharing
• Archiving and preservation (including storage and
backup)
Annex 2
(mid-term & final review)
Scientific research data should be easily:
• Discoverable
• Accessible
• Assessable and intelligible
• Useable beyond the original purpose for which it
was collected
• Interoperable to specific quality standards
Annex 1
(by month 6)
Content of a DMP
Annex I and II of EC guidelines
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
9. 9Open Research Data Pilot 9
How to write a DMP
Online data management tool: dmponline.dcc.ac.uk/
16. 1616
Content of a DMP
Handling
of data
Collecting
and
processing
Methodology
and
standards
Open
access
Curation
and
preservation
Annex I and II of EC guidelines
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
17. Handling of data
Open Research Data Pilot 17
• Storage and backup
• Additional measures?
• During and after the project
Handling
of data
18. Collecting and analysing data
Open Research Data Pilot 18
Collecting
and
processing
• Be clear what data you use
• Provide links to data sets you used
• E.g. lab notebook, end-to-end code/scripts for
statistics
• Software can help: R, MatLab, Python…
• Be clear what methods you use
19. 19
Data files: standard formats
Open Research Data Pilot
Use data formats that are:
Methodology
and
standards
• Open standard
• In an easily re-usable format
• Commonly used
by research community
Examples of preferred format choices:
Text .odt, .txt, .xml, .html, .rtf
Tabular Data .csv (comma separated values),
.xml, .rdf, .SPSS portable
Images .tif, .jpeg2000, .png, .svg,
Structured data .xml, .rdf
Any standard used in your field
20. 20
Create searchable data
Open Research Data Pilot
• Data about data
• Machine readable
Using metadata
• Consists of set of attributes
• Helps prevent inappropriate use
22. Use standards of
your domain
Digital Curation Centre
General
• Dublin Core (DC)
• Datacite metadata schema
• Metadata Object Description
Schema (MODS)
Humanities
• Text Encoding Initiative
(TEI)
• Visual Resources
Association Core (VRA)
Archives/Repositories
• DatastaR minimD-space
metadata
• um Metadata
Social Science
• Data Documentation
Initiative (DDI)
Life Sciences
• Darwin Core
• Integrated Taxonomic
information System (ITIS)
Earth Science
• Directory Interchange Format
(DIF)
• Standard for the Exchange of
Earthquake Data (SEED)
Ecology
• Ecological Metadata
Language (EML)
Geographic/Geospatial
• Federal Geographic Data
Committee (FGDC)
• ISO 19115
• Geospatial Interoperability
Framework (GIF)
METADATA
STANDARDS
23. Where to deposit data?
Open Research Data Pilot 23
• Disciplinary data repository
Research data repository
Curation
and
preservation
• Institutional data repository
• Zenodo
• Matches data needs
• Directory of data repositories:
www.Re3data.org
25. Re3data
Open Research Data Pilot 25
Trustworthy
Digital repository
• Persistent identifier
• Licenses
• Access
26. 26
What to deposit?
Open Research Data Pilot
• Tools: Documentation, scripts, software, info about statistical
analyses….
Open Access
to research data
Everything needed to validate results
presented in scientific publications
• Understandable? add readme text file
• Data
• Metadata
• Other data described in Data Management Plan
27. 27
What to deposit?
Open Research Data Pilot
Select
• Confidentiality/anonymization
• Regenerating data cheaper
than archiving?
• Version control
• Potentially useful to others
28. 28
Open data
Open Research Data Pilot
• Apply an open license:
• Keep it simple
• What intellectual property rights exist in the data?
• Apply a suitable ‘open’ license
e.g. creative commons :
• Data repositories can provide licenses
Open
access
• Re3data.org
29. 29
Example
Open Research Data Pilot
Understandable
for humans
Machine readable
metadata
Tools
Open Data
Open license
30. Open Research Data Pilot 30
Support and information?
3030
OpenAIRE - An Open Knowledge & Research Information Infrastructure
• www.OpenAIRE.eu offers infrastructure, tools, information and helpdesk system
FACILITATING THE
OPEN ACCESS
POLICY OF THE
EUROPEAN
COMMISSION
31. Open Research Data Pilot
Zenodo
For all content types
Create communities
describe publish
31
For all content types
With GitHub Integration
Create communities
upload describe publish
32. Open Research Data Pilot 32
OpenAIRE
3232
www.openaire.eu/search
Link your data to
publications or project
33. Open Research Data Pilot 33
OpenAIRE
3333
Training and support material
Information on:
• Open research data pilot
• Creating a data management plan
• Selecting a data repository
Support material:
Briefing papers, factsheets, webinars,
workshops , FAQs, helpdesk
www.openaire.eu/opendatapilot
34. (Open) Data
Metadata
Other tools
dmponline.dcc.ac.uk
Open
Research
Data Pilot
Data Repositories
• EC guidelines
• OpenAIRE.eu
• www.dcc.ac.uk
• Standard File Formats
• Standards metadata
schema
• Open Licences
• 6 months
• Mid-term review
• Final review
STEP 1
WRITE A DMP
Deliverable at
FIND REPOSITORY DEPOSIT DATA Supporting
infrastructure and
information
STEP 2 STEP 3 SUPPORT
• discipline/institutional
• www.re3data.org
• Zenodo
Matches data needs
Designed by Freepik
35. 35
Questions!
Open Research Data Pilot
www.openaire.eu
@openaire_eu
Facebook.com/groups/openaire
https://www.linkedin.com/groups/OpenAIRE3893548
Emilie.Hermans@UGent.be
info@openaccess.be
can be reused under the CC BY license
Hinweis der Redaktion
http://slideplayer.com/slide/6631125/#
Prevents data loss: 80% of data is lost after 10 years. Data is fragile and reproducibility very difficult without data.
Maximize usefulness and built much more efficient on previous work
3. Fosters creativity, participation of citizens and increases transparency
4. Data tend to have a (much!) longer shelf life than interpretation
After accounting for other factors affecting citation rate, we find a robust citation benefit from open data.1
Horizon 2020 Includes a limited and flexible pilot with opt-outs and safeguards
action on open access to research data. Participating projects must develop a Data Management Plan(DMP)
specifying which data will be openly accessible.
If your project stems from one of these Horizon 2020 areas, you are automatically part of the pilot.
Costs related to data management in Horizon 2020 are eligible for reimbursement during the duration of the project (see Article 6.2.D.3 of the Annotated Model Grant Agreement)
Develop a data management plan in the first 6 months of the project and keep it up-to-date throughout their project;
Deposit their research data in a suitable research data repository;
Make sure third parties can freely access, mine, exploit, reproduce and disseminate their data;
Make clear what tools will be needed to use the raw data to validate research results, or provide the tools themselves.
A data management plan or DMP is a formal document that outlines how you will handle your data both during your research, and after the project is completed.[1] The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; this ensures that data are well-managed in the present, and prepared for preservation in the future.
- Discoverable/ a standard identification mechanism such as a DOI
- Accessible–how easily can the data be accessed and are there any licenses or embargo periods attached?
- Assessable and intelligible –is the data provided in such a way that judgements can
be made about its reliability, such as in peer review alongside a scientific paper
DMP online: A web based tool to help researchers write a DMP
Includes a template for Horizon 2020 projects with guidance
checklist for a Data Management Plan:
- a list of questions and guidance that researchers may find useful when writing data management plans;
Storage of data during the project: what are you going to do with the data during the project?
Collecting: how will data be collected? What will you do with the data? E.g. survey: will it include a disclaimer what will happen with the data?
Types of data: Standards (formats, metadata) How will you describe them?
Access policy for your data, can be open or partly open, do you need to take extra measurements to secure your data?
Post-project plans, how to preserve your data?
Will the data be stored and backed-up appropriately during the research project? For example on managed university filestores rather than external hard drives
Arrange backup and storage procedures which are most suited to the partners and nature of your project
Collecting: how will data be collected? What will you do with the data? E.g. survey: will it include a disclaimer what will happen with the data?
Provide links to data sets you used or if you’re allowed, lincenses and copyright, you can also upload the original data set.
Provide end-to-end code/scripts for the generation of figures and statistics
Keep in mind: will someone who is not familiar with the data or the research setup understand what the data is about
Try to make the barriers to view your data as low as possible. Use open file formats.
Avoid word, pdf and excel files. You can use pdf/a for archiving/if the layout matters.
metadata assures accessibility of the data
Data about data to discover and disclose data: resource descriptions
A metadata record consists of a set of attributes or elements, necessary to describe the data in question
Structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information source
basically data is read by humans, metadata is read by PCs
Helps prevent inappropriate use due to misunderstanding or research purpose or parameters
Standards-based metadata is generally preferable, but where no appropriate standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.
Trustworthy Digital repository: either supports a repository standard or is certified
Metadata
Other data, including associated metadata, as specified and within the deadlines laid down in the Data Management Plan, that is, according to the individual judgement of each project: For instance curated data not directly attributable to a publication, or raw data.
Documentation: Codebooks, lab journals, informed consents forms… required to enable reuse of the data.
Will people not involved in the project understand what the data is about, how it has been processed.
Read me file: in a plain text format about your data:
Keep in mind: will someone who is not familiar with the data or the research setup understand what the data is about
keep it simple: There is no requirement that every dataset must be made open right now. Starting out by opening up just one dataset, or even one part of a large dataset, is fine – of course, the more datasets you can open up the better.
Open licenses: legaly sound licensing
CC0: public domain, waive copyright
CC-BY: Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. No additional restrictions
NUP84 proteins.
MRC is a standard file format for electron density
The txt explains the parameters used