1. Towards Open Research
practices, experiences, barriers and opportunities
3rd Research Data Network
St Andrews, 30 November 2016
Veerle Van den Eynden Gareth Knight
UK Data Service
University of Essex
London School of Hygiene & Tropical
Medicine
2. Our research
• Researchers funded by Wellcome Trust and ESRC: biomedical, clinical,
population health, humanities, social sciences
Current attitudes and practices related to sharing of:
• Publications
• Data
• Code
Barriers that inhibit or prevent researchers from sharing
Identification of action that funders can take to encourage good
practice and mitigate issues
• Survey (N=583 + 259), focus groups (N=22)
Van den Eynden, Veerle et al. (2016) Towards Open Research: Practices, experiences, barriers and
Opportunities. Wellcome Trust. https://dx.doi.org/10.6084/m9.figshare.4055448
3. Data sharing practices
• 95% of respondents generate research data
• 51 / 55 % of these made research data available in last 5 years
• 4 / 2 datasets on average: full dataset or subset, e.g. with paper
• sharing increases with career length
• sharing varies by discipline
• 77% reuse existing data for: background, validation, methodology development & new analysis
4. Reasons to share data (Wellcome)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
My funder requires me to share my data(N=273)
Journal expects data underpinning findings to be accessible(N=273)
My research community expects data sharing(N=274)
It is good research practice to share research data(N=277)
It enables collaboration and contribution by other researchers(N=274)
It has public health benefits, e.g. disease outbreaks(N=265)
Ability to respond rapidly to public health emergencies(N=263)
Ethical obligation towards research participants to maximize benefits for society(N=266)
Contributes to academic credentials(N=273)
Enables validation and /or replication of my research(N=275)
Improved visibility for my research(N=273)
I can get credit and more citations by sharing data(N=267)
Not at all important Slightly important Moderately important Very important Extremely important
5. Reasons to share data (ESRC)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
My funder requires me to share my data(N=131)
Journal expects data to be accessible(N=132)
My research community expects data sharing(N=131)
It is good research practice to share research data(N=133)
Collaboration and contribution by other researchers(N=131)
It has public health benefits, e.g. disease outbreaks(N=125)
Ability to respond rapidly to public health emergencies(N=122)
Ethical obligation/Maximize benefits for society(N=128)
Contributes to academic credentials(N=128)
Enables validation and /or replication of my research(N=129)
Improved visibility for my research(N=128)
I can get credit and more citations by sharing data(N=127)
Not at all important Slightly important Moderately important Very important Extremely important
Benefits from data sharing: collaborations, higher citation rates
Most no direct benefits; but also no bad experiences
6. Barriers to data sharing (Wellcome)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
I may lose publication opportunities if I share data(N=517)
Others may misuse or misinterpret my data(N=519)
I have insufficient skills to prepare the data(N=505)
It requires time/effort to prepare my data for deposit(N=520)
I do not have sufficient funding to prepare data for sharing(N=509)
I do not have permission (consent) from my research participants to share data(N=510)
Data contain confidential / sensitive information and cannot be de-identified(N=504)
My data are commercially sensitive or has commercial value(N=501)
There are third party rights in my data(N=499)
No suitable repository exists for my data(N=502)
Country-specific regulations do not allow sharing(N=486)
Not at all important Slightly important Moderately important Very important Extremely important
7. Barriers to data sharing (ESRC)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
I may lose publication opportunities(N=231)
Others may misuse or misinterpret my data(N=229)
I have insufficient skills to prepare the data(N=227)
It requires time/effort to prepare data for deposit(N=233)
Insufficient funding to prepare data(N=232)
No consent from research participants to share data(N=232)
Confidential / sensitive data(N=229)
Commercially sensitive/has commercial value(N=218)
There are third party rights in my data(N=219)
No suitable repository exists for my data(N=220)
Country-specific regulations do not allow sharing(N=214)
Not at all important Slightly important Moderately important Very important Extremely important
9. Significant differences in motivations
MOREIMPORTANTLESSIMPORTANT
Extra funding to cover
costs
established researchers
~
cell, development and
physical science, genetic
and molecular science,
neuroscience and
mental health,
population health
infection and
immunobiology
Enhanced academic
reputation
early career researchers
~
researchers not sharing
data now
Knowing how other people
use data
early career researchers
~
LMIC researchers
~
cell, development and
physical science,
humanities, infection
and immuno-biology,
population health
genetic and molecular
science
Co-authorship on reuse
papers
early career researchers
clinical, population
health, social science
researchers
cell, devel and physical
science, neuroscience
and mental health
biomedical and
humanities researchers,
genetic and molecular
science, infection and
immunobiology
Case study that showcase
data
LMIC researchers
~
humanities, Infection
and immuno-biology,
population health
cell, development and
physical science, genetic
and molecular science,
neuroscience and mental
health
Data deposit leads to data
paper publication
early career researchers;
LMIC researchers
~
cell, development and
physical science,
infection and immuno-
biology, neuroscience
and mental health
genetic and molecular
science, humanities and
social sciences
MOREIMPORTANTLESSIMPORTANT
Considered favourably in
funding and promotion
decisions
UK-based researchers
~
cell, development and
physical science,
genetic and molecular
science, neuroscience
and mental health
Population health
Evidence of data citation
early career
researchers
researchers not sharing
data now
Ability to limit data
access to specific
purposes or individuals
LMIC researchers
~
clinical, population
health and social
science researchers
biomedical researchers
Assistance from
institution or funder to
prepare data
clinical, population
health and social
science researchers
biomedical and
humanities researchers
Nothing would motivate
researchers not sharing
data now
10. Code sharing practices
• 40% generate code
– Researchers performing surveys, secondary analysis &
simulations more likely to produce code
• 43% of these made code available in last 5 years
– Researchers performing simulations, secondary analysis
and experiments share most code
– Researcher applying qualitative and survey methods shared
less
• 37% reuse existing code
– Obtained from colleagues/collaborators & community
repository
– Good documentation, originate from a reputable source,
and openly available are key factors in code reuse
11. Reasons to share code (Wellcome)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
My funder requires me to share my code(N=97)
Journal expects code to be accessible(N=97)
My research community expects code sharing(N=97)
It is good research practice to share code(N=101)
To enable collaboration and contribution (N=98)
Contributes to my academic credentials(N=95)
Enables validation of my research(N=97)
Enables replication of my research(N=96)
Improved visibility for my research(N=95)
I can get credit and more citations by sharing code(N=91)
Not at all important Slightly important Moderately important Very important Extremely important
12. Code sharing benefits (Wellcome)
0 5 10 15 20 25 30 35 40
Career benefits
More publications
Higher citation rate
New collaborations
More funding opportunities
Financial benefit
New patents
Improvements to public health
Use in health emergencies
None
Other
13. Code sharing barriers (Wellcome)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Desire to patent (N=210)
Protecting intellectual property (N=213)
Software and systems dependencies (N=213)
I may lose publication opportunities if I share code (N=210)
Others may misuse or misinterpret my code (N=211)
Insufficient skills to prepare the code for public use (N=213)
It requires time/effort to prepare my code for deposit (N=217)
Insufficient funding to prepare code for public use (N=211)
My code has commercial value (N=207)
There are third party rights in my code (N=206)
No suitable repository exists for my code (N=197)
Not at all important Slightly important Moderately important Very important Extremely important
14. Motivations for more code sharing
(Wellcome)
0 10 20 30 40 50 60
Financial incentive from my institution
Extra funding to cover the costs
Enhanced academic reputation
Code access and metrics
Knowing how others use my code
Co-authorship on papers resulting from reuse
Case study that showcases my code
It is looked on more favourably in funding and promotion decisions
Evidence of code citation
Assistance from institution/funder staff to prepare code
Nothing motivates me
16. Data Sharing & Reuse
Policy development
• Provide guidelines on how to share 'difficult' data types, e.g. sensitive and large data
• Consider how contradictions between government and funder data sharing policy can addressed
Rewards
• Ensure data sharing recognised in career progress evaluation
• Facilitate opportunities for data creators to become co-authors on new publications based upon their data
Promotion
• Monitor use and showcase examples of best practice
• Provide networking/training opportunities for data creators and re-users
Infrastructure development
• Build repository that offers free storage, supports granular access controls, and resource-specific features
(e.g. imaging data, large datasets)
Funding
• Consider a dedicated funding stream to cover data/code preparation for projects, and additional staff within
institution/project/support network to help with data preparation
17. Code Sharing & Reuse
Policy development
• Consider code sharing mandate
• Include processing scripts such as stata.do files and batch files in interpretation
Rewards
• recognise in funding decisions
• encourage authors to cite code in research outputs
Promotion
• monitor code reuse and showcase examples of code sharing best practice
• Provide networking/training opportunities for code developers and code re-users
Infrastructure development
• Invest in creation of deposit tools
• Consider setup of a long-term repository for research code (e.g. Wellcome GitLab), or offer guidance on platforms
to use
Funding
• Consider additional funding for code sharing preparation during project life and ongoing maintenance over time
18. Further developments
• Wellcome Open Research platform
• Wellcome Open Research Pilot Project (Cambridge)
• Series of reports and reviews
19. Wellcome Trust, David Carr, Robert Kiley
Anca Vlad, UK Data Service
All researchers contributing wisdom via surveys and focus group discussions
Expert advisors: Barry Radler (University of Wisconsin), Carol Tenopir (University of Tennessee), David Leon (LSHTM),
Frank Manista (Jisc), Jimmy Whitworth (LSHTM) and Louise Corti (UK Data Service)
Editor's Notes
code sharing is more in its infancy
less practised, fewer benefits, less problematic
40% of researchers generate code
43% of these share code
no significant differences by disciplines or career stage