This document discusses strategies for supporting open science through the full research cycle and data/software preservation. It outlines current practices for managing, storing, publishing, and reusing research data and software. It proposes improvements like requiring researchers to post datasets to repositories under embargo linked to any subsequent publications to reduce workload, better track outputs, and improve data linking and availability. The goal is to make data sharing and open science practices more seamless and effective.
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
Publishing the Full Research Data Lifecycle
1. | 1
Anita de Waard, VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
May 20, 2016
Publishing The Full Research Cycle
To Support Open Science
Container Strategies for Data & Software
Preservation that Promote Open Science
Notre Dame, IN
2. | 2
Source: JISC: How and why you should manage your research data: a guide for researchers
Caroline Ingram, Published: 7 January 2016
Research Data Life Cycle:
4. | 4
Manage, Store, Preserve:
Data Rescue: Preserving Data At Risk
https://olivearchive.org/
Software Rescue: Preserving Executable Content
http://www.codata.org/task-groups/data-at-
risk/dar-workshops
5. | 5
https://data.mendeley.com/
Linked to published
papers – or not
Linked to Github – or
not
Versioning and
provenance
Manage, Store, Preserve: Mendeley Data
Allowing Different
Licenses
6. | 6
Data
articles
Software
articles
Method
articles
Protocols
Video
articles
Hardware
articles
Lab
resources
Full Research
paper
• Brief article types designed to
communicate a specific element of
the research cycle
• Complementary to full research
papers
• Easy to prepare and submit
• Peer-reviewed and indexed
• Receive a DOI and fully citable
• Allow citable post-publication
updates
• Primarily Open Access (CC-BY)
• Published in Multidisciplinary and
domain-specific journals
https://www.elsevier.com/books-and-journals/research-elements
Share, Publish: Research Elements
7. | 7
http://www.journals.elsevier.com/softwarex/
Share, Publish: SoftwareX
• Submissions to SoftwareX are composed of
- A short article describing the software, with a focus on the impact of
the software in the research community and re-usability across disciplines
- A “metadata table” containing information about the software and key metrics:
- A permanent link to a software repository (GitHub) where the software and code is
stored and maintained by Elsevier and made freely available
• Peer Review
- Follows a simple reviewer questionnaire, available from the SoftwareX website, that
evaluates usability and scientific impact of the software
- Less attention is placed on the technical quality of the software
8. | 8
data uploaded on
Mendeley Data
code/softwar
e deposited
to GitHub
software updates
Software
article
peer-review
process
submitted
SoftwareX
Metadata
Bi-directional
links
software article
published; live stats
shown
code/software
forked to the journal
GitHub repository
(open source)
CC-BY
linke
d
Data is publicly
available on
Mendeley Data
(CC-BY)
accepted
Share, Publish: SoftwareX
10. | 10
• The first Reproducibility Paper was published recently:
http://www.sciencedirect.com/science/article/pii/S0306437915301113
• It is linked to this paper:
http://www.sciencedirect.com/science/article/pii/S0306437915000472
• The data is hosted here: https://data.mendeley.com/datasets/xz6gv65m6d/6
• To reproduce the experiment, the journal requires source code for the software
components, together with installation scripts; we suggest authors to host their code in
GitHub
• In addition to the source code, we recommend authors to submit a virtual machine, where
all appropriate software components are readily installed and can be reproduced on a
wide variety of platforms. Authors are to submit their experiments using either ReproZip or
Docker.
Reuse: Reproducibility Papers
11. | 11
Discover, Reuse and Cite:
• ICSU-WDS/RDA Publishing Data Service Working group,
merged with National Data Service pilot
• Cross-stakeholder - with support and input from CrossRef, DataCite, OpenAIRE,
Europe PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and
others
• Proposed long-term architecture and interoperability framework: www.scholix.org
• Operational prototype at http://dliservice.research-infrastructures.eu/#/api
(including 1.4 Million links from various sources)
12. | 12
Discover, Reuse and Cite:
https://www.elsevier.com/connect/data-citation-is-becoming-real-with-force11-and-elsevier
13. | 13
Publishing The Full Research Cycle Requires
Networks of Collaboration:
Force11:
- Multi-stakeholder, member-driven organisation
- Unites scholars, tool developers, librarians, publishers, funding agencies etc. etc.
- E.g. Software citation group, akin to Data Citation Group
National Data Service:
- Multi-stakeholder group, based around supercomputing centres
- Aims to be a ‘connective tissue’ between data creation, curation, storage etc projects.
- Inviting Pilots: two or more partners who have not worked together, interested in
collaborating on a data-centric project to solve a real-world needs
- E.g. Datasearch, Data Linking systems
RDA:
- Coleading Data publishing, linking group
- Colead Cost Recovery group, part of RDA US Sustainability effort
- Active in Chemistry, Earth Science groups, starting IG on Data Search
- SciDataCon, Sept 11-16, Denver, CO
The National
DATA SERVICE
17. | 17
Researche
rs
Funding
AgencyInstitution
Data
Repository
Dataset
Journal
Paper
1. Researcher creates datasets and posts to
repository(under embargo)
2. Funder is automatically notified of dataset publication
3. Researcher writes paper & publishes in journal;
embargo is lifted and data linked
- NB this also allows release of non-used data for negative result and
reproducibility
4. Funder and institution get report on publication and embargo lifting
2
1
1
3
3
3
4
4i. Less
Work!
iv. Better
Tracking!
iii. Better
Linking!
ii. More
Data
Stored!
Share and Publish, Proposal: