Supporting research data across Springer Nature: joining up policy and practice. Slides from Graham Smith (Research Data Manager, Springer Nature) at HKU Open Data and Data Publishing Seminar, 25th October 2021.
1. Graham Smith, Research Data Manager
Springer Nature
25/10/21
Illustration
inspired
by
the
work
of
Alan
Turing
Supporting research data at Springer
Nature: joining up policies and practice
3. Research Data
may be qualitative, quantitative,
digital, analogue and any file
type
at our journals:
we are interested in the
minimal,
measurement-level dataset
underlying the results or
analysis of the research
publication
4. 3
Why is data important?
Evidence
Research article:
✓ peer reviewed
✓ basis of academic credit
✓ version of record
✓ preserved & available long-term
✓ expectation to publish
Research Data
?
5. 4
Why is data important?
Reproducibility and transparency
1. Freedman, L. P., Cockburn, I. M. & Simcoe, T. S. PLoS Biol. 13, e1002165 (2015) http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165
2. Begley, C. G. & Ellis, L. M. Nature 483, 531–533 (2012), 3. Prinz, F., Schlange, T. & Asadullah, K. Nature Rev. Drug Discov. 10, 712 (2011)
4. Baker (2015) http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
5. Ioannidis et al (2009) https://www.nature.com/ng/journal/v41/n2/full/ng.295.html
• Irreproducible biology
research costs US $28 billion
per year1
• Pharma companies report
75%+ failure rates replicating
conclusions of peer-reviewed
papers 2,3
A study5
of eighteen Nature
Genetics papers found :
• Two could be reproduced
fully
• Six were reproduced
partially
• Ten could not be
reproduced
“The main reason for failure to
reproduce was data unavailability,
and discrepancies were mostly due
to incomplete data annotation or
specification of data processing
and analysis.”
— Nature Genetics 41, 149–155 (2009)
A Nature survey4
highlights
concern in the research
community
>50% of researchers couldn’t
reproduce their own
experiments
>70% couldn’t reproduce the
work of others
Evidence suggests data
availability enables
reproducibility
Evidence is mounting on costs
& scale of the issue
6. 5
Why is data important?
Credit
1. Colavizza et al (2019) https://doi.org/10.1371/journal.pone.0230416
2. Pienta et al (2010) https://deepblue.lib.umich.edu/handle/2027.42/78307
3. Piwowar & Vision (2013) https://doi.org/10.7717/peerj.175
4. Henneken & Accomazzi (2011) https://arxiv.org/abs/1111.3618
5. Dorch et al (2015) https://arxiv.org/abs/1511.02512
6. Sears et al (2011)
https://figshare.com/articles/Data_Sharing_Effect_on_Article_Citation_Rate_in_Paleoc
eanography/1222998/1
Data archiving can double the publication
output of studies
A study of 7,000 NSF and NIH research
projects in social sciences found that:
• Those with archived data resulted in 10
(median) publications;
• Those without archived data resulted in 5
publications6
Research articles with open data are cited on
average 25% more
Analysis of half a million OA articles
indicates ~25% citation advantage1
Previous analyses indicate an advantage of
up to 50% advantage, depending on the field
Data citation enables non-journal metrics
Gene
expression
microarrays2
Astronomy3
Astrophysics4
Paleoceanography5
7. 6
The case for data: Societal benefits
8. http://www.unitedformedicalresearch.com/advocacy_reports/the-impact-of-genomics-on-the-u-s-economy
9. http://www.ebi.ac.uk/about/news/press-releases/value-and-impact-of-the-european-bioinformatics-institute
CASE STUDY: Human Genome Project
$1 trillion: Estimated contribution to the US
economy, as reported by the Battelle Memorial
Institute1
CASE STUDY: European Bioinformatics
Institute
£1 billion: Annual efficiency savings to researchers
worldwide, according to an independent report2
8. 7
The case for data: demand
Researchers consider data sharing important and actively try to share
From a Springer Nature researcher survey. Total respondents: 7719 https://doi.org/10.6084/m9.figshare.5975011
9. 8
Research funders increasingly include data sharing in their policies.
A Springer Nature investigation of funders worldwide shows:
• 54 funders mandate data sharing
• 31 funders encourage data sharing
Why is data important?
Policy requirements
10. Less ideal methods of data sharing
Supplementary Information
Data ‘in the paper’
Data online (not in a repository)
‘Available on request’
11. Data sharing done right
Data repository
✔ - globally unique and persistent identifier
✔ - long-term storage of data and metadata
✔ - specialist repositories group the same type of data
✔ - funder and journal policy compliance
✔ - data files frequently previewed and accompanied by rich metadata
✔ - licensing and reuse of data made clear
15. 14
“The evidence shows that the current research data policy ecosystem is in
critical need of standardization and harmonization”
Naughton, L. & Kernohan, D., (2016). Making sense of journal research data policies. Insights. 29(1), pp.84–89. DOI:
http://doi.org/10.1629/uksg.284
Analysis shows that understanding journal data policies is difficult
Full Policy Partial Policy No Policy
Data source: Linda Naughton, JISC Journal Research Data Policy Bank project presentation (n = 250)
16. 15
Standard research data policies at Springer Nature
To address this complexity, we have been rolling
out standard research data policies since 2016
(and were the first publisher to do so).
More than 1,900 (~77%) Springer Nature
journals have a standard data policy as of
October 2021.
Our approach is practical and pragmatic,
enabling all journals to adopt a policy even if
they are new to data sharing.
17. 16
Policy features at Springer Nature
Type 1: data sharing & citation encouraged
Type 2: data sharing & data availability statements
encouraged
Type 3: data sharing encouraged, data availability
statements required
Type 4: data sharing & citation required, data peer review
All policy types:
• Recommend/require sharing of data via
repositories
• Recommend/require the inclusion of a data
availability statement with submitted manuscripts
18. 17
Policy features at Springer Nature
● Springer Nature journals are focusing on transparency around data
availability; the majority of journals are adopting a policy of mandatory
data availability statements
● Current policy for Nature and BMC journals
● Springer journals are in the process of moving to type 3
● Certain data types in life sciences must be shared (e.g. DNA, RNA and
protein sequences); for most other types, sharing is encouraged
19. 18
Data policy rollout (October 2021)
Type 1 = Data sharing encouraged
Type 2 = Data sharing and DAS
encouraged
Type 3 = Data sharing encouraged, DAS
required (for life sciences some data types
must be shared)
Type 4 = Data sharing and DAS required
20. 19
To align with industry standards for research data sharing,
including the Transparency and Openness Guidelines, the Research
Data Alliance policy framework, and the standards set by the STM
Research Data Programme.
To require data availability statements across all of our journals,
and to encourage more authors to share data in repositories.
To support and empower journal teams to enforce data sharing
policies which are appropriate for their discipline.
To promote and enable our authors to share FAIR data
Aim of our continued Research Data policy roll-out
21. 20
What stops researchers from sharing their data
From a Springer Nature researcher survey. Total respondents: 7719
https://doi.org/10.6084/m9.figshare.5975011
26. 25
Scientific Data and BMC Research Notes are two of our journals dedicated to data
publishing. The expertise we have from these successful journals puts us in a great
position to support subject-specific titles wanting to publish data papers
Data publishing at Springer Nature
27. 26
Based on the successful launch of this
article type at BMC Research Notes,
data notes are now available at BMC
Genomic Data. These distil all the
important elements of a data paper
into clear sections, including an
abstract, data description, table and
overview of the data files themselves,
limitations and reference list.
Examples available online
Data notes: a short, simple way to publish data
28. 27
27
The story behind the image
Alan Turing (1867–1934)
The scope of the achievements of Alan Turing, computer
pioneer, wartime code-breaker and polymath, cannot be
overstated. Renowned as the man who broke the Enigma code,
Turing is also considered the father of computer science and
artificial intelligence. His legacy is represented here with a
visualisation of a “Turing Machine”, a hypothetical device he
devised to represent the logic of a computer. The binary code
depicted translates to one of Turing’s memorable quotes:
Science is a differential equation. Religion is a boundary
condition.
Thank you
Graham Smith
Research Data Manager,
Editorial Operations, Springer Nature
graham.smith@springernature.com
@Mr_G_Smith