On National Teacher Day, meet the 2024-25 Kenan Fellows
Publishing perspectives on data management & future directions
1. Publishing perspectives on data
management & future directions
Research Integrity Advisors Data management workshop
Friday 31 March
Virginia Barbour
Director, AOASG
ORCID: 0000-0002-2358-2440
ginny.barbour@qut.edu.au
2. My roles
Director, Australasian Open Access Strategy Group
Chair, Committee on Publication Ethics (COPE)
Editor PLOS Medicine, then Editorial Director, PLOS 2004 - 2015
Involved publishing initiatives, including AllTrials, reporting guidelines
Joint appointment between Office of Research Ethics and Integrity, and
Division of Technology, Information and Library Services, QUT
3. Journals’ interest in data
Background
https://commons.wikimedia.org/wiki/File:Network-mapping.gif
https://commons.wikimedia.org/wiki/File:Question_mark_1.svg
Motives
Practicalities
5. “Today, the CMS Collaboration at
CERN has released more than 300
terabytes (TB) of high-quality open
data. These include over 100 TB, or 2.5
inverse femtobarns (fb−1), of data
from proton collisions”
This is the age of data
https://www.flickr.com/photos/jerry-raia/13522426525/in/photostream/
6. Journals may see the
problem first, but they
are not the source of
the problem
https://www.flickr.com/photos/studiomiguel/3946174063
10. Increasing number of cases relevant to data
Data
• Top: over 16yr - fabrication 17%, selective/misleading
reporting/interpretation 13%;
• High: 2009-12 – unauthorized use & image manipulation
Correction of the literature
• retractions 47%, corrections 27%, expressions of concern 11%,
disputes 9%, corrigenda & errata 6%
11. Poor data management scuppers research:
a case study
“Dear Editor
In xxx, yyy published my colleagues’ and my article .
Since the manuscript’s publication, we have been working on other, unrelated studies
using the same database. When results in these new, unrelated studies were
implausible, I undertook an intensive, several weeks-long investigation … I found we
had failed to load 8 files of data into the dataset. This mistake resulted in the under-
reporting of xxx … this mistake occurred despite the intensive quality checks we
have in place to ensure data quality and accuracy.
We sincerely apologize for these data issues and are committed to correcting the
article…”
12. Poor data management leads to accusation of research misconduct:
a case study
A student submitted a paper to a journal as part of his PhD work. The research was data
heavy – it was based on digital scans of cell images.
The paper was published.
Six months later a reader noted an anomaly, asked the journal for the underlying data,
who in turn asked the author.
The PhD student had moved on. None of his data had been stored securely at his
previous institution and it could not be found. The journal felt that the lack of availability
of data meant that the paper was unreliable and asked the institution to investigate
whether misconduct had occurred.
In the investigation it turned out that the student had asked repeatedly for a place to
store his data but the university had not been able to provide one.
The university accepted responsibility and the investigation led to the development of a
policy on data management there. The student was exonerated.
15. From: How Does the Availability of Research Data Change With Time Since Publication? Timothy H. Vines and colleagues, Abstract (podium),
Peer Review Congress, 2013
15
16. Do some
research
Write a narrative
description that is
inextricably linked to
the data and methods
Integrated collection
of methods, results,
data, metadata
Store all data in
accessible,
usable format,
link to publication
Facilitate re-use & replication
by people or
machines
The ideal situation
17. What we often have at journals
• Unextractable data
• Everything “extra” in one (unreadable) file
• Third party licenses
• Proprietary data
• No metadata
17
18. Data availability in research papers allows
Replication
Validation
New analysis
Better interpretation
Inclusion in meta-analyses
Facilitation of reproducibility of research
Closer scrutiny of published work
Better ‘bang for the buck’ out of research investment
20. “The evidence shows that the current research data policy ecosystem is
in critical need of standardization and harmonization”
How many journals have a research data policy?
52.4
23.2
23.2
All Journals
64.8
14.4
18.4
Science Journals
40
32
28
Social Science
Journals
Full Policy Partial Policy No Policy
Data source: Linda Naughton, JISC Journal Research Data Policy Bank project presentation (n = 250)
Iain Hrynaszkiewicz
21. Different levels of openness in research data publishing:
1. Accessible only to an individual researcher/group
2. Accessible to others on (reasonable) request
3. Published as electronic supplementary material
4. Deposited in a general or institutional data repository (e.g.
figshare)
5. Deposited in a subject/community specific data repository
Not all research data are Open Data
More open
22. Wiley data sharing survey
2886 responses (3.2% response rate) – 52% had shared/published data
Data publishing
• 67% via supplementary material in journals
• 28% via an institutional repository
• 19% use a discipline-specific data repository
• 6% use a general-purpose repository, such as Dryad or figshare
Data sharing (informal)
• 57% sharing at a conference
• 42% sharing on request via email, direct contact, etc.
• 37% via personal, institutional, or project website
Are researchers sharing research data?
Slide from Iain Hrynaszkiewicz
23. Data management is largely
regarded by academics as:
• Boring
• Waste of time
• Expensive
• Hard
• Confusing
24. They need to be persuaded that it is:
• Boring
• Waste of time
• Expensive
• Hard
• Confusing
• Part of the job
• Time saving
• Cost effective
• Easy
• Rewarded
25. • Content types e.g. data articles and journals
• Credit and incentives e.g. data citation and data articles
• Encouraging reuse e.g. open licenses
• Improving data quality e.g. data peer review, community standards and
repositories
• Data discoverability e.g. repository partnerships, linking, integration with
submission systems and research data metadata
• Raising awareness e.g. editorials, outreach
• Guidance e.g. information for authors
• Policy – and its implementation
What are publishers doing about it?
Iain Hrynaszkiewicz
26. Journal data policy landscape
• Nothing stated
• Data sharing encouraged
• Data sharing implied as a condition of submission/publication with mandates for specific data
types (eg Nature pre -2016)
• Mandated data availability statements in every paper and mandates for specific data types
(Royal Society, BioMed Central, Palgrave Communications, Nature 2016 – )
• Mandated data sharing for all, with exceptions, with statement in paper (PLOS, BMJ)
• Mandated data sharing for all with statement & link to data (e.g. American Economics Rev)
• Mandated open data and data citation as a condition of submission (e.g. F1000Research) STRONGER
Adapted from Iain Hrynaszkiewicz
27. “PLOS journals require
authors to make
all data underlying the
findings described in
their manuscript fully
available without
restriction, with rare
exception”
28. References
• Naughton, L. & Kernohan, D., (2016). Making sense of journal research data policies. Insights. 29(1),
pp.84–89. DOI: http://doi.org/10.1629/uksg.284
• Lin J, Strasser C (2014) Recommendations for the Role of Publishers in Access to Data. PLoS Biol 12(10):
e1001975. doi:10.1371/journal.pbio.1001975
• Hrynaszkiewicz I, Li P, Edmunds SC. Open science and the role of publishers in reproducible research. In:
Stodden V, Leisch F, Peng, RD, editors. Implementing Reproducible Research. CRC Press; 2014. Public
(https://osf.io/35s9d/)
• https://scholarlykitchen.files.wordpress.com/2014/11/researcher-data-insights-infographic-final.pdf