This document discusses Science 2.0 and the shift towards more open and collaborative ways of conducting science. It provides three examples of Science 2.0 projects: Galaxyzoo, which had over 150,000 volunteers classify galaxies; Synaptic Leap, which published all data and experiments online to collaborate on finding new drug treatments; and a study on government debt that was found to have coding errors after others accessed the original data. The document argues that Science 2.0 involves more than just open access, and includes data-intensive science, citizen science, open code, and open lab books/workflows. It discusses how different Science 2.0 practices are growing at different rates and the implications this shift has for scientific outputs, methods,
How to Remove Document Management Hurdles with X-Docs?
Science20brussels osimo april2013
1. Science 2.0: discussing the best
available evidence
David Osimo, Katarzyna Szkuta
Tech4i2 limited for DG RTD
23rd January 2013
1
2. Three stories
• Galaxyzoo: Galaxyzoo let users classify galaxies – 150K
volunteers had already classified more than 10 million
images of galaxies. “as accurate as that done by
astronomers“. 25+ scientific articles by Galaxy Zoo project
(from 2009)
• Synaptic Leap: to find an alternative drug treatment for
schistosomiasis with fewer side effect. All data and
experiments published on Electronic Lab Notebook; social
network activated. About 30 people, half from industry,
participated. Identified new process and resolving agent.
• Excel-gate: Reinhart & Rogoff, 2010: “as countries see
debt/GDP going above 90%, growth slows dramatically”.
Paper was used as main theoretical justification for
austerity. 2013: after getting the original excel file,
Herndon et al. discover coding error + data gaps +
unconventional weighting.
2
5. DataDataFigshare.com
intensive
intensive
Sci-starter.com
Citizens
Citizens
science
science
An emerging
ecosystem of services
and standards
Open
Open
Runmycode.org
code
code
ArXiv
Open
Open
Myexperiment.org
labbooks
labbooks
//wflows
wflows
Pre-print
Pre-print
Open
Open
Roar.eprints.org
access
access
Datadryad.or
Open
Open
g
data
data
Alternative
Alternative
Altmetric.com
Reputation
Reputation
systems
systems
Open
Open
annotati
Openannotation.org
annotati
on
on
Scientific
Scientific
Researchgate.com
blogs
blogs
Collaborative
Collaborative
Mendeley.com
bibliographies
bibliographies
5
6. Growing at different speed
Trend
Status
Data
Pre-print
Mature
694.000 articles in arXiv
Open access
Fast growing
Exponential growth of OA journals. 8/10%
of scientific output is OA
Data intensive
Fast growing
52% of science authors deals with datasets
larger than 1Gb
Citizen scientist
Medium growth
650K Zoouniverse users
500 similar projects on SciStarter
Open data
Medium growth
20% scientists share data
15% journals require data sharing
Reference
sharing
Medium growth
2 Million users of Mendeley referencesharing tools
Open code
Sketchy growth
21% of JASA articles make code available
7% journals require code
Open Notebook
Sketchy growth
Isolated projects
Natural sciences outrank social science across all trends
6
7. Where The Data Goes Now:
> 50 My Papers
> 50 My Papers
2 M scientists
2 M scientists
2 M papers/year
2 M papers/year
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
Majority of data
(90%?) is stored
on local hard drives
PDB:
PDB:
88,3 kk
88,3
PetDB:
PetDB:
1,5 kk
1,5
MiRB:
MiRB:
25k
25k
Some data
(8%?) stored in large,
generic data
repositories
Dryad:
Dryad:
7,631 files
7,631 files
SedDB:
SedDB:
0.6 kk
0.6
TAIR:
TAIR:
72,1 kk
72,1
Dataverse:
Dataverse:
0.6 M
0.6 M
Datacite:
Datacite:
1.5 M
1.5 M
Source: Anita De Waard 2013
8. Deep implications
• New scientific outputs and players: nanopublications,
data and code; vertical disintegration of the value
chain
• Greater role for inductive methods: everything
becomes a Genome Project
• Scaling serendipity: Big linked data, collaborative
annotation, social networking and knowledge mining
detect unexpected correlations on a massive scale
• Better science: reproducible and truly falsifiable
research findings; earlier uncovering of mistakes
• More productive science: reusing data and products,
crowdsourcing work, reduce time-to-publication
8
9. Europe can lead
• European scientific publishers are leading on
experimentation with new kind of open and data-intensive
services
E.g. “Article of the Future project, AppsForScience competition
(Elsevier) Thieme ( a small German publisher) data integration
• Home to world class science 2.0 startups:
Mendeley and ResearchGate are global players in social networking
for scientists, Digital Science that recently acquired FigShare
Mendeley used by about 2 million researchers, covering 65 million
documents vs 49 by commercial databases by Thomson Reuters.
Elsevier just bought Mendeley for 50 M Euros.
• Home to top citizen science initiatives
(GalaxyZoo was launched in Oxford, ExCiteS group and Citizen
Cyberscience Centre)
• Funding agencies are active in new mandates on openness
(e.g. Wellcome Trust, FP7) – open access, open data
9
10. BUT the institutional framework is
a bottleneck
• Researchers are reluctant to
share data and code [1], and
to provide open peer review
• Current career mechanisms
are “publish or perish”. No
reward for sharing.
• Publishing data and code
requires additional work
• Publishing intermediate
products can actually hinder
publication/patenting:
sharing is difficult in patentintensive domains
• Funding mechanisms are too
rigid, roadmap-based and
evaluated on articles and
patents
[1] Wicherts et al., 2011 ; Research Information Network, 2008 ; Campbell , 2002
10
11. Institutional failure and the case for public intervention
• Contradictions emerge between individuals’ and societal benefits
• Research funders (and publishers) have high leverage on
scientific institutions
BENEFITS
Individual
Researchers
Institutions
Business
Publishers
Societal
benefits
Open access
++
+
+
--
++
Open data
--
--
--
+
++
Open code
--
--
--
=
++
Citizen
science
+
=
+
=
+
Alternative
reputation
systems
+
-
+
-
+
Dataintensive
+
+
+
+
++
Social media
+
=
=
=
+
11
12. How to grasp this opportunity?
• It’s not about adding a science 2.0 top-down
roadmap-based initiative in existing
programmes
• It’s not about simply letting a thousand
flowers flourish bottom-up
• It’s about nudging the right institutional rearrangement (Perez) and right system of
incentives for the scientific value chain
12
13. Towards research policy 2.0
Recommendation
Inspiring example
Adopt more flexible reputation
mechanisms for scientists
From 2013, NSF requires PI to list
research “products” rather than
“publications”
Encourage sharing by regulation
Wellcome Trust mandatory data plan
Cover the costs of sharing
intermediate output such as data
Gold access publication costs to be
covered in Horizon2020
Develop Innovative infrastructure, Alternative reputation system,
tools , methods and standards
Openannotation, Datadryad
Make IPR more flexible
Innocentive.com, Peertopatent.com
Increase open-ended funding
system
FET open, UK Arts council, Inducement
prizes
Collect better evidence
Dedicated data-gathering exercise (a’ la
PEW)
13
14. Thanks
• Continue the discussion at
science20study.wordpress.com
• Collect evidence and cases at
groups.diigo.com/group/science-20
• Contact david.osimo@tech4i2.com ;
katarzyna.szkuta@tech4i2.com ; @osimod
14
16. Emerging impact:
a) more productive science
– using the same data sets for multiple research. 50% of Hubble
papers came from data re-users [1].
– Crowdsourcing work: “thousands recruited in months versus years
and billions of data points per person, potential novel discovery in
the patterns of large data sets, and the possibility of near real-time
testing and application of new medical findings.” [2].
– “cut down the time it takes to go from lab to medicine by 10-15
years with Open Notebook Science”. “because of poor literature
analysis tools 20-25% of the work done in his synthetic chemistry
lab is unnecessary duplication or could be predicted to fail” [3]
– Faster circulaton of high-quality ideas: 70% of publications
discussed in blogs are from high-impact journals
– Open research solved one-third of a sample of problems that large
and well-known R & D-intensive firms had been unsuccessful in
solving internally [4]
[1] http://archive.stsci.edu/hst/bibliography/pubstat.html
[2] http://www.jmir.org/2012/2/e46/
[3] http://science.okfn.org/category/pubs/
[4] Lakhani et al., 2007)
16
17. b) Better science
• Greater falsifiability (Popper): move towards reproducible
science thanks to publishing data + code in addition to
article,
• Rapidly uncover mistaken findings (Climategate 2009 or
microarray-based clinical trials underway at Duke
University)
• Data sharing is associated with greater robustness of
findings [1]. Sharing data and notes applies to failures, as
well as successes
• Especially important for computational science
“Computational science cannot be elevated to a third
branch of the scientific method until it generates routinely
verifiable knowledge” [2]
[1] Wicherts et al., 2011
[2] Donoho, Stodden, et al. 2009
17
18. c) Greater role of inductive
methods
• “The end of theory”: “Here’s the evidence, now
what is the hypothesis?”
• All science becomes computational. 38% of
scientists spend more than 1/5 of their time
developing software (Merali, 2010).
• Greater availability of data collection and
datasets increases the utility of inductive
methods. Genome project as new paradigm
18
19. d) Scaling serendipity
• From penicillin to theory of relativity, serendipity has
always been a core component of science
• Big linked data, collaborative annotation and knowledge
mining of OA articles allow to detect unexpected
correlation on a massive scale. Mendeley manages the
bibliographies of 2 Million scientists and uses them for
suggest further reading.
• Emerging evidence that for scholars recommendation is
more important than search for references. Social
networking and recommendation systems allow scientists
to “stumble upon” new evidence
• Open research successful solvers solved problems at the
boundary or outside of their fields of expertise [1]
[1] Lakhani et al., 2007
19
20. e) New outputs and players
#beyondthepdf
• Nanopublications, datasets, code
• Integration of data and code with articles
• Reproducible papers and books
20
21. Emerging policies
• Funders and publishers have high leverage on researchers
• Increasing push towards Open Access from funders
• Journals and funding agencies increasingly require data
submission and data management plans
• From 14 January 2013, NSF grants forms requires PI to list
research “products” rather than “publications”
• Alternative metrics emerge such as altmetrics and
download statistics
21
22. Towards research policy 2.0
Features
• Simplified proposals
• Rewarding solutions, not proposal
• Multi-stage
• Open priorities
• Flexible and open ended (allowing for
• serendipity)
• Peer-selection Reputation-based
• (funding not to the proposal but to the person)
• Multidisciplinarity by design
• Flexible IPR
• Short project time
• Accepting failure
• transparency (open monitoring)
• Based on social network analysis
Examples
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Inducement prizes e.g.
http://www.heritagehealthprize.com
Seed Capital
http://www.ibbt.be/en/istart/our-istart
toolbox/iventure)
ERC http://erc.europa.eu
SBIR http://www.sbir.gov
FET OPEN
http://cordis.europa.eu/fp7/ict/fetopen/h
ome_en.html
SME
htt://cordis.europa.eu/fetch?
CALLER=PROGLINK_PARTNERS&AC
TION=D&DOC=1&CAT=PROG&QUERY=012e7c32
4da6:39b1:49a0
957c&RCN=862
IBBT www.ibbt.be
Arts council
http://www.artscouncil.org.uk/funding/gr
ants
arts
Banca dell’innovazione / Innovation Bank
http://italianvalley.wired.it/news/altri/per
che-ci
22
serve-una-banca-nazionale-dell-