Science20brussels osimo april2013

Science 2.0: discussing the best
available evidence
David Osimo, Katarzyna Szkuta
Tech4i2 limited for DG RTD
23rd January 2013

1

Three stories
• Galaxyzoo: Galaxyzoo let users classify galaxies – 150K
volunteers had already classified more than 10 million
images of galaxies. “as accurate as that done by
astronomers“. 25+ scientific articles by Galaxy Zoo project
(from 2009)
• Synaptic Leap: to find an alternative drug treatment for
schistosomiasis with fewer side effect. All data and
experiments published on Electronic Lab Notebook; social
network activated. About 30 people, half from industry,
participated. Identified new process and resolving agent.
• Excel-gate: Reinhart & Rogoff, 2010: “as countries see
debt/GDP going above 90%, growth slows dramatically”.
Paper was used as main theoretical justification for
austerity. 2013: after getting the original excel file,
Herndon et al. discover coding error + data gaps +
unconventional weighting.

2

Science 2.0: much more than Open Access

Open
Open
access
access

3

DataDataintensive
intensive

Citizens
Citizens
science
science

Open
Open
code
code

Open
Open
labbooks
labbooks
//wflows
wflows

Pre-print
Pre-print

Open
Open
access
access

Open
Open
data
data

Alternative
Alternative
Reputation
Reputation
systems
systems

Open
Open
annotati
annotati
on
on
Scientific
Scientific
blogs
blogs

Collaborative
Collaborative
bibliographies
bibliographies

4

DataDataFigshare.com
intensive
intensive

Sci-starter.com
Citizens
Citizens
science
science

An emerging
ecosystem of services
and standards

Open
Open
Runmycode.org
code
code

ArXiv
Open
Open
Myexperiment.org
labbooks
labbooks
//wflows
wflows

Pre-print
Pre-print

Open
Open
Roar.eprints.org
access
access

Datadryad.or
Open
Open
g
data
data

Alternative
Alternative
Altmetric.com
Reputation
Reputation
systems
systems

Open
Open
annotati
Openannotation.org
annotati
on
on
Scientific
Scientific
Researchgate.com
blogs
blogs

Collaborative
Collaborative
Mendeley.com
bibliographies
bibliographies

5

Growing at different speed
Trend

Status

Data

Pre-print

Mature

694.000 articles in arXiv

Open access

Fast growing

Exponential growth of OA journals. 8/10%
of scientific output is OA

Data intensive

Fast growing

52% of science authors deals with datasets
larger than 1Gb

Citizen scientist

Medium growth

650K Zoouniverse users
500 similar projects on SciStarter

Open data

Medium growth

20% scientists share data
15% journals require data sharing

Reference
sharing

Medium growth

2 Million users of Mendeley referencesharing tools

Open code

Sketchy growth

21% of JASA articles make code available
7% journals require code

Open Notebook

Sketchy growth

Isolated projects

Natural sciences outrank social science across all trends

6

Where The Data Goes Now:
> 50 My Papers
> 50 My Papers
2 M scientists
2 M scientists
2 M papers/year
2 M papers/year

A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories

Majority of data
(90%?) is stored
on local hard drives

PDB:
PDB:
88,3 kk
88,3

PetDB:
PetDB:
1,5 kk
1,5

MiRB:
MiRB:
25k
25k

Some data
(8%?) stored in large,
generic data
repositories
Dryad:
Dryad:
7,631 files
7,631 files

SedDB:
SedDB:
0.6 kk
0.6
TAIR:
TAIR:
72,1 kk
72,1

Dataverse:
Dataverse:
0.6 M
0.6 M

Datacite:
Datacite:
1.5 M
1.5 M
Source: Anita De Waard 2013

Deep implications
• New scientific outputs and players: nanopublications,
data and code; vertical disintegration of the value
chain
• Greater role for inductive methods: everything
becomes a Genome Project
• Scaling serendipity: Big linked data, collaborative
annotation, social networking and knowledge mining
detect unexpected correlations on a massive scale
• Better science: reproducible and truly falsifiable
research findings; earlier uncovering of mistakes
• More productive science: reusing data and products,
crowdsourcing work, reduce time-to-publication
8

Europe can lead
• European scientific publishers are leading on
experimentation with new kind of open and data-intensive
services
E.g. “Article of the Future project, AppsForScience competition
(Elsevier) Thieme ( a small German publisher) data integration

• Home to world class science 2.0 startups:

Mendeley and ResearchGate are global players in social networking
for scientists, Digital Science that recently acquired FigShare
Mendeley used by about 2 million researchers, covering 65 million
documents vs 49 by commercial databases by Thomson Reuters.
Elsevier just bought Mendeley for 50 M Euros.

• Home to top citizen science initiatives

(GalaxyZoo was launched in Oxford, ExCiteS group and Citizen
Cyberscience Centre)

• Funding agencies are active in new mandates on openness
(e.g. Wellcome Trust, FP7) – open access, open data

9

BUT the institutional framework is
a bottleneck
• Researchers are reluctant to
share data and code [1], and
to provide open peer review
• Current career mechanisms
are “publish or perish”. No
reward for sharing.
• Publishing data and code
requires additional work
• Publishing intermediate
products can actually hinder
publication/patenting:
sharing is difficult in patentintensive domains
• Funding mechanisms are too
rigid, roadmap-based and
evaluated on articles and
patents
[1] Wicherts et al., 2011 ; Research Information Network, 2008 ; Campbell , 2002

10

Institutional failure and the case for public intervention
• Contradictions emerge between individuals’ and societal benefits
• Research funders (and publishers) have high leverage on
scientific institutions
BENEFITS

Individual
Researchers

Institutions

Business

Publishers

Societal
benefits

Open access

++

+

+

--

++

Open data

--

--

--

+

++

Open code

--

--

--

=

++

Citizen
science

+

=

+

=

+

Alternative
reputation
systems

+

-

+

-

+

Dataintensive

+

+

+

+

++

Social media

+

=

=

=

+

11

How to grasp this opportunity?
• It’s not about adding a science 2.0 top-down
roadmap-based initiative in existing
programmes
• It’s not about simply letting a thousand
flowers flourish bottom-up
• It’s about nudging the right institutional rearrangement (Perez) and right system of
incentives for the scientific value chain
12

Towards research policy 2.0
Recommendation

Inspiring example

Adopt more flexible reputation
mechanisms for scientists

From 2013, NSF requires PI to list
research “products” rather than
“publications”

Encourage sharing by regulation

Wellcome Trust mandatory data plan

Cover the costs of sharing
intermediate output such as data

Gold access publication costs to be
covered in Horizon2020

Develop Innovative infrastructure, Alternative reputation system,
tools , methods and standards
Openannotation, Datadryad
Make IPR more flexible

Innocentive.com, Peertopatent.com

Increase open-ended funding
system

FET open, UK Arts council, Inducement
prizes

Collect better evidence

Dedicated data-gathering exercise (a’ la
PEW)
13

Thanks
• Continue the discussion at
science20study.wordpress.com
• Collect evidence and cases at
groups.diigo.com/group/science-20
• Contact david.osimo@tech4i2.com ;
katarzyna.szkuta@tech4i2.com ; @osimod

14

Emerging impact:
a) more productive science
– using the same data sets for multiple research. 50% of Hubble
papers came from data re-users [1].
– Crowdsourcing work: “thousands recruited in months versus years
and billions of data points per person, potential novel discovery in
the patterns of large data sets, and the possibility of near real-time
testing and application of new medical findings.” [2].
– “cut down the time it takes to go from lab to medicine by 10-15
years with Open Notebook Science”. “because of poor literature
analysis tools 20-25% of the work done in his synthetic chemistry
lab is unnecessary duplication or could be predicted to fail” [3]
– Faster circulaton of high-quality ideas: 70% of publications
discussed in blogs are from high-impact journals
– Open research solved one-third of a sample of problems that large
and well-known R & D-intensive firms had been unsuccessful in
solving internally [4]
[1] http://archive.stsci.edu/hst/bibliography/pubstat.html
[2] http://www.jmir.org/2012/2/e46/
[3] http://science.okfn.org/category/pubs/
[4] Lakhani et al., 2007)

16

b) Better science
• Greater falsifiability (Popper): move towards reproducible
science thanks to publishing data + code in addition to
article,
• Rapidly uncover mistaken findings (Climategate 2009 or
microarray-based clinical trials underway at Duke
University)
• Data sharing is associated with greater robustness of
findings [1]. Sharing data and notes applies to failures, as
well as successes
• Especially important for computational science
“Computational science cannot be elevated to a third
branch of the scientific method until it generates routinely
verifiable knowledge” [2]
[1] Wicherts et al., 2011
[2] Donoho, Stodden, et al. 2009

17

c) Greater role of inductive
methods
• “The end of theory”: “Here’s the evidence, now
what is the hypothesis?”
• All science becomes computational. 38% of
scientists spend more than 1/5 of their time
developing software (Merali, 2010).
• Greater availability of data collection and
datasets increases the utility of inductive
methods. Genome project as new paradigm

18

d) Scaling serendipity
• From penicillin to theory of relativity, serendipity has
always been a core component of science
• Big linked data, collaborative annotation and knowledge
mining of OA articles allow to detect unexpected
correlation on a massive scale. Mendeley manages the
bibliographies of 2 Million scientists and uses them for
suggest further reading.
• Emerging evidence that for scholars recommendation is
more important than search for references. Social
networking and recommendation systems allow scientists
to “stumble upon” new evidence
• Open research successful solvers solved problems at the
boundary or outside of their fields of expertise [1]
[1] Lakhani et al., 2007

19

e) New outputs and players
#beyondthepdf
• Nanopublications, datasets, code
• Integration of data and code with articles
• Reproducible papers and books

20

Emerging policies
• Funders and publishers have high leverage on researchers
• Increasing push towards Open Access from funders
• Journals and funding agencies increasingly require data
submission and data management plans
• From 14 January 2013, NSF grants forms requires PI to list
research “products” rather than “publications”
• Alternative metrics emerge such as altmetrics and
download statistics

21

Towards research policy 2.0
Features
• Simplified proposals
• Rewarding solutions, not proposal
• Multi-stage
• Open priorities
• Flexible and open ended (allowing for
• serendipity)
• Peer-selection Reputation-based
• (funding not to the proposal but to the person)
• Multidisciplinarity by design
• Flexible IPR
• Short project time
• Accepting failure
• transparency (open monitoring)
• Based on social network analysis

Examples
•
•
•
•
•
•
•
•
•
•
•

•
•
•
•
•
•

Inducement prizes e.g.
http://www.heritagehealthprize.com
Seed Capital
http://www.ibbt.be/en/istart/our-istart
toolbox/iventure)
ERC http://erc.europa.eu
SBIR http://www.sbir.gov
FET OPEN
http://cordis.europa.eu/fp7/ict/fetopen/h
ome_en.html
SME

htt://cordis.europa.eu/fetch?
CALLER=PROGLINK_PARTNERS&AC
TION=D&DOC=1&CAT=PROG&QUERY=012e7c32
4da6:39b1:49a0
957c&RCN=862

IBBT www.ibbt.be
Arts council
http://www.artscouncil.org.uk/funding/gr
ants
arts
Banca dell’innovazione / Innovation Bank
http://italianvalley.wired.it/news/altri/per
che-ci
22
serve-una-banca-nazionale-dell-

Science20brussels osimo april2013

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie Science20brussels osimo april2013

Ähnlich wie Science20brussels osimo april2013 (20)

Mehr von osimod

Mehr von osimod (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Science20brussels osimo april2013