The document discusses the potential for open science and data to disrupt traditional academic institutions by deinstitutionalizing rewards and metrics for scientific research. It argues that fully embracing open practices like data sharing, collaboration and publishing in open access venues is necessary for a more transparent and reproducible scientific process, but that many academic institutions have yet to adapt their reward systems to incentivize these activities. The talk outlines steps the scientific community can take to further this transition, such as developing community resources and standards, educating peers, and recognizing the different roles that institutions need to play to support open scholarship.
7. It Starts with the Metrics of
Success
[Adapted from Carole Goble]
WikiSym+OpenSym Aug 7, 2013 7
8. Committee on Academic
Promotions
• What Counts
– Money
– Grants
– Papers
– Teaching
– Service
• What Does Not
– Sharing data
– Sharing software
– Open access
– Collaboration
– Patents
– Startups
WikiSym+OpenSym Aug 7, 2013 8
Getting Ahead as a Computational Biologist in Academia PLOS Comp Biol
9. The Era of Open Has The
Potential to Deinstitutionalize
WikiSym+OpenSym Aug 7, 2013 9
Daniel Hulshizer/Associated Press
10. Interim Solution:
Use the Traditional Reward System
The Wikipedia Experiment – Topic Pages
Identify areas of Wikipedia that
relate to the journal that are
missing of stubs
Develop a Wikipedia page in the
sandbox
Have a Topic Page Editor Review
the page
Publish the copy of record with
associated rewards
Release the living version into
Wikipedia
WikiSym+OpenSym Aug 7, 2013 10
14. Business Models Rule
• The Internet demanded new business models to
support scholarly communication
• Open access was one such sustainable model:
– Began with the community
– Was driven by new organizations (PLOS, BMC,
F1000, eLife, Dryad, Mendeley etc.)
– Was NOT driven by academic institutions
– Was driven by policies and funders
WikiSym+OpenSym Aug 7, 2013 14
15. One Metric of Change:
Multidisciplinary Open Access
Mega Journal
• This year PLOS ONE
will publish over
30,000 papers!
WikiSym+OpenSym Aug 7, 2013 15
16. This Disruption Got Us
Thinking About…
• A paper as only one form of knowledge
discovery
• The use of interaction and rich media from
which to learn and actually do science
• Reproducibility
• Reward structures
• Better management of the research lifecycle
P.E. Bourne 2005 In the Future will a Biological Database Really be Different
from a Biological Journal? PLOS Comp. Biol. 1(3) e34
WikiSym+OpenSym Aug 7, 2013 16
17. This Disruption Got Us
Thinking About…
• A paper as only one form of knowledge
discovery
• The use of interaction and rich media from
which to learn and actually do science
• Reproducibility
• Reward structures
• Better management of the research lifecycle
P.E. Bourne 2005 In the Future will a Biological Database Really be Different
from a Biological Journal? PLOS Comp. Biol. 1(3) e34
WikiSym+OpenSym Aug 7, 2013 17
18. Better Management of the
Research Lifecycle is Not a
New Concept
WikiSym+OpenSym Aug 7, 2013 18
19. “An article about
computational science in a
scientific publication is not the
scholarship itself, it is merely
advertising of the scholarship.
The actual scholarship is the
complete software
development environment,
[the complete data] and the
complete set of instructions
which generated the figures.”
David Donoho, “Wavelab and
Reproducible Research,” 1995
datasets
data collections
algorithms
configurations
tools and apps
codes
workflows
scripts
code libraries
services,
system software
infrastructure,
compilers
hardware
Morin et al Shining Light into Black Boxes
Science 13 April 2012: 336(6078) 159-160
Ince et al The case for open computer
programs, Nature 482, 2012
[Carole Goble]
20. The Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Repositories
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
21. The Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Repositories
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
22. automate: workflows, pipeline
& service integrative
frameworks
pool, share & collaborate
web systems
nanopub
semantics & ontologies
machine readable documentation
scientific software
engineering
CS
SE
Carole Goble]
23. Why is This Important to Me
Personally?
• My wife is being treated for stage 1 breast
cancer
• This highlights for me the disparity
between what is happening in the lab and
what is happening in the clinic
– In the lab cancer is a personalized and
treatable condition
– In the clinic we are still equally “poisoning”
patients with drugs first introduced 10-20
years ago WikiSym+OpenSym Aug 7, 2013 23
26. Most Laboratories
• We are the long tail
• Goodbye to the student is
goodbye to the data
• Very few of us have
complied (or will comply
with the data
management plans we
write into grants)
• Too much software is
unusable
S.Veretnik, J.L.Fink, and P.E. Bourne 2008 Computational Biology Resources Lack
Persistence and Usability. PLoS Comp. Biol. . 4(7): e1000136
WikiSym+OpenSym Aug 7, 2013 26
27. Today’s Research Lifecycle is
Digitally Fragmented at Best
• Proof:
– I cant immediately reproduce the research in
my own laboratory
• It took an estimated 280 hours for an average user
to approximately reproduce the paper
– Workflows are maturing and becoming helpful
– Data and software versions and accessibility
prevent exact reproducability
Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology:
The Case of the Tuberculosis Drugome PLOS ONE under review.
WikiSym+OpenSym Aug 7, 2013 27
28. At the Same Time The
Disruption Continues
WikiSym+OpenSym Aug 7, 2013 28
29. G8 open data charter
http://opensource.com/government/13/7/open-data-charter-g8
WikiSym+OpenSym Aug 7, 2013 29
30. • In the US alone..
– March 2012 OSTP
commits $200M to Big
Data
– OSTP demands
sharing plans by
August 2013
– GBMF/Sloan provide
institutional awards for
data science
– NCBI considers data
catalog and
MyBibliography
And the Disruption Continues
WikiSym+OpenSym Aug 7, 2013 30
31. Where Will It End?
First We Should Ask What It Is
We Wish to Accomplish
WikiSym+OpenSym Aug 7, 2013 31
32. 1. A link brings up figures
from the paper
0. Full text of PLoS papers stored
in a database
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
3. A composite view of
journal and database
content results
Here is What I Want – The Paper
As Experiment
1. User clicks on thumbnail
2. Metadata and a
webservices call provide
a renderable image that
can be annotated
3. Selecting a features
provides a
database/literature
mashup
4. That leads to new
papers
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e34
32
33. Here is What I Want –
Knowledge Push
• Each evening the labs “Evernote”
notebooks are scanned for commonalities
from the days activities. These are seeds
in a deep search of the webs research
lifecycles that has become available since
last searched. Results are ranked and
presented for consideration over coffee
the next morning
http://www.discoveryinformaticsinitiative.org/diw2012
WikiSym+OpenSym Aug 7, 2013 33
34. Will End With …
• Infrastructure:
– Science, Nature, Cell and megajournals all
“open access”
– An array of coupled institutional repositories
– A central repository – PubMed Central
– Open software in full support of the research
lifecycle
– The research lifecycle in the cloud
WikiSym+OpenSym Aug 7, 2013 34
35. Will End With …
• Sociologically:
– An end to build it and they will come
– Alternative metrics accepted by the
community
– Alternative reward systems that recognize the
realities of today’s scholarship, namely:
• Open data availability
• Software availability
• Collaborative research
WikiSym+OpenSym Aug 7, 2013 35
36. We Have a Way to Go
Consider the Life Sciences
• Good News
– We have NCBI/EBI
– Publishers are starting
to embrace data
– Workflows in support
of the research
lifecycle are catching
on
• Bad News
– Sustainability remains
a noun not a verb
– Data are organized by
type not by questions
asked (silos)
– Tenure committees
are still in the dark
ages
WikiSym+OpenSym Aug 7, 2013 36
37. What Can We Do As a
Community?
WikiSym+OpenSym Aug 7, 2013 37
39. What I Have Learned About
Trust 1/2
• Trust is like compound interest
• Comes from listening
• Comes from engaging the community in
every aspect of the process
• Comes from data consistency and level of
annotation
• Comes from responsiveness
• Comes from the quality of the delivery
service 39WikiSym+OpenSym Aug 7, 2013
40. What I Have Learned About
Trust 2/2
• Quality begats trust
– Quality requires data models/ontologies
• Quality requires people
– Annotators are the unsung heroes
• Trust requires provenance & versioning
• Trust requires explaining that all data and
knowledge are not created equal
40WikiSym+OpenSym Aug 7, 2013
42. Think Globally Act Locally
• Support emergent community commons/portals
• Be involved in the support and development of
metadata standards
• Contribute to workflow development etc. to drive
an open research lifecycle
• Educate your mentors on the importance of
open science and scholarly communication
• Write software thinking of an App model
WikiSym+OpenSym Aug 7, 2013 42
43. Understand That All
Data/Knowledge Are NOT
Created Equal
• We need to understand
how data are used
• Sustainability is not
more money from the
funding agencies its
about business models
• Reductionism is not a
dirty word
• We need to do more
with the long tailOn the Future of Genomic Data
Science 11 February 2011:
vol. 331 no. 6018 728-729 WikiSym+OpenSym Aug 7, 2013
44. Recognize That Institutions
Must Play a Greater Role
• We need institutional data/knowledge
sharing plans
• We need data/information scientists to be
better recognized by institutions – its not
all about papers – this implies new metrics
44WikiSym+OpenSym Aug 7, 2013
45. Learn from the App Store
• The App model
– Think of it operating on a content base
rather than a mobile device
– Simple and consistent user interface
– Needs to pass some quality control
– Has a reward
• The App+ Model
– Apps interoperate through a generic
workflow interface
WikiSym+OpenSym Aug 7, 2013 45
46. In Summary
• Open science is a means to accelerate
the rate of discovery
• Disruption has begun, but there is great
inertia in the system
• All of us are stakeholders and capable of
invoking further positive change
• We need to get institutions and more
scientists involved….
WikiSym+OpenSym Aug 7, 2013 46