Philip Bourne argues that scholarly communication is broken because the scientific process is too slow. Major issues include long publication times, paywalls restricting access to critical information, and rewards that do not incentivize new forms of scholarship. To accelerate discovery, we need improved data sharing and interoperability, tools for data analysis and annotation, changed reward systems, and embracing new modes of scholarly communication enabled by the internet. Institutions have an important role to play by supporting new publishing workflows and crowd-sourcing solutions to create an open electronic platform for scholarship.
1. Why is Scholarly Communication
Broken and What Can Be Done?
In Celebration of Open Access Week
Philip E. Bourne
University of California San Diego
pbourne@ucsd.edu
UCSD LibrariesOct. 18, 2010
2. Disclaimer
• I am a domain (life) scientist not a computer or
information scientist
• I am fortunate enough to have a major biological
resource (the Protein Data Bank) and a major biological
journal (PLoS Computational Biology) as my playground
• I am part of the long tail
• I am naïve, but I am the majority
Oct. 18, 2010 UCSD Libraries
3. Agenda
• Motivation
• What needs to be done?
• A few examples
• The role of the institution
Oct. 18, 2010 UCSD Libraries
4. The Scientific Process is Too Slow to
Respond to a Crisis – Either Global or
Personal
Oct. 18, 2010 UCSD Libraries
Motivation
http://knol.google.com/k/plos-currents-influenza#
By the time the paper is published
we could all be dead
5. * http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010
1RUZ: 1918 H1 Hemagglutinin
Structure Summary page activity for
H1N1 Influenza related structures
3B7E: Neuraminidase of A/Brevig Mission/1/1918
H1N1 strain in complex with zanamivir
In a time of crisis the need for fast access
to accurate data and any knowledge of
that data are paramount
Motivation
Oct. 18, 2010 UCSD Libraries
6. If that is not enough…
For some people the scientific
process may be too slow to save
their life
Oct. 18, 2010 UCSD Libraries
Motivation
7. Josh Sommer – A Remarkable Young Man
Co-founder & Executive Director the Chordoma Foundation
Oct. 18, 2010 UCSD Libraries
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
8. Chordoma
• A rare form of brain
cancer
• No known drugs
• Treatment – surgical
resection followed by
intense radiation
therapy
Oct. 18, 2010 UCSD Libraries
Motivation
http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
12. Oct. 18, 2010 UCSD Libraries
Adapted: http://sagecongress.org/Presentations/Sommer.pdf
Motivation
Isaac
If I have seen further it is only by
standing on the shoulders of giants
Isaac Newton
From Josh’s point of view the climb
up just takes too long
> 15 years and > $850M to be
more precise
16. Now we are all hopefully motivated
let us break this down to what
actually needs to be done in my
opinion
Here are a few big things …
Oct. 18, 2010 UCSD Libraries
What Needs to be Done?
17. A Few Things to Accelerate the Rate of
Scientific Discovery
• Better communication, data and knowledge access,
and new modes of discovery, which means:
– We need data and knowledge about that data to
interoperate i.e. we need new kinds of fast, versatile
publications and data archives
– We need to be more open with both
– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge
discovery
– Reward systems need to change
– We need scientist management tools
– We need to be less fixated on the big data problems
– We need to unleash the full power of the Internet
Oct. 18, 2010 UCSD Libraries Easy Hard
18. 1. A link brings up figures
from the paper
0. Full text of PLoS papers stored
in a database
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
3. A composite view of
journal and database
content results
We Need Data and
Knowledge About That
Data to Interoperate
1. User clicks on content
2. Metadata and
webservices to data
provide an interactive
view that can be
annotated
3. Selecting features
provides a
data/knowledge
mashup
4. Analysis leads to new
content I can share
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
PLoS Comp. Biol. 2005 1(3) e34
19. We Need Data and Knowledge About That
Data to Interoperate – What is Stopping US?
• Governance – publishers vs. database
providers
• Reward
• Metadata standards for provenance, privacy
etc.
• Exemplars
• ….
Oct. 18, 2010 UCSD Libraries
Caveat: Each discipline is different – I speak very much from a biomedical
sciences perspective
20. Certainly the Argument for Interoperability
in the Biomedical Sciences is Strong
• PubMed contains
18,792,257 entries
• ~100,000 papers indexed
per month
• In Feb 2009:
– 67,406,898 interactive
searches were done
– 92,216,786 entries were
viewed
• 1078 databases
reported in NAR 2008
• MetaBase
http://biodatabase.org
reports 2,651 entries
edited 12,587 times
Data as of April 14, 2009
PLoS Comp. Biol. 2005 1(3) e34
What Needs to be Done?
22. Example Interoperability: The Literature View
http://biolit.ucsd.edu
Nucleic Acids Research 2008 36(S2) W385-389
Oct. 18, 2010 UCSD Libraries
What Needs to be Done?
24. Semantic Tagging & Widgets are a
Powerful Tool to Integrate Data and
Knowledge of that Data, But as Yet
Not Used Much
Oct. 18, 2010 UCSD Libraries
Will Widgets and Semantic Tagging Change Computational Biology?
PLoS Comp. Biol. 6(2) e1000673
What Needs to be Done?
25. Semantic Tagging of Database Content
in The Literature or Elsewhere
http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp
PLoS Comp. Biol. 6(2) e1000673Semantic Tagging
27. The Publishers are Starting to Do It
Oct. 18, 2010 UCSD Libraries
From Anita de Waard, Elsevier
What Needs to be Done?
28. This is Literature Post-processing
Better to Get the Authors Involved
• Authors are the absolute experts on the
content
• More effective distribution of labor
• Add metadata before the article enters the
publishing process
Oct. 18, 2010 UCSD Libraries
What Needs to be Done?
29. Word 2007 Add-in for authors
• Allows authors to add metadata as they write, before they
submit the manuscript
• Authors are assisted by automated term recognition
– OBO ontologies
– Database IDs
• Metadata are embedded directly into the manuscript
document via XML tags, OOXML format
– Open
– Machine-readable
• Open source, Microsoft Public License
http://www.codeplex.com/ucsdbiolit
Oct. 18, 2010 UCSD Libraries
What Needs to be Done?
30. Challenges
• Authors
– Carrot IF one or more publishers fast tracked a
paper that had semantic markup it might catch on
• Publishers
– Carrot Competitive advantage
Oct. 18, 2010 UCSD Libraries
What Needs to be Done?
31. A Few Things to Accelerate the Rate of
Scientific Discovery
• Better communication, data and knowledge access,
and new modes of discovery, which means:
– We need data and knowledge about that data to
interoperate i.e. we need new kinds of fast, versatile
publications and data archives
– We need to be more open with both
– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge
discovery
– Reward systems need to change
– We need scientist management tools
– We need to be less fixated on the big data problems
– We need to unleash the full power of the Internet
Oct. 18, 2010 UCSD Libraries Easy Hard
32. Reward Systems Need to Change
What is Needed?
• Author disambiguation
• Auditing (identification and metrics) of all
scholarship - means new tools
• Seniors need to promote alternative forms of
scholarship
• Juniors need to respond
Oct. 18, 2010 UCSD Libraries
Reward Systems Need to Change
Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia
PLoS Comp Biol to appear
34. What Are these Alternative Forms of
Scholarship?
Research
[Grants]
Journal
Article
Conference
Paper
Poster
Session
Reviews
Blogs
Community Service/Data
Curation
Reward Systems Need to Change
Oct. 18, 2010 UCSD Libraries
35. Ideally the ID will be Tagged to Every
Piece of Scholarly Communication
I an Not a Scientist I am a Number
PLoS Comp. Biol. 2008 4(12) e1000247
Reward Systems Need to Change
Oct. 18, 2010 UCSD Libraries
36. A Few Things to Accelerate the Rate of
Scientific Discovery
• Better communication, data and knowledge access,
and new modes of discovery, which means:
– We need data and knowledge about that data to
interoperate i.e. we need new kinds of fast, versatile
publications and data archives
– We need to be more open with both
– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge
discovery
– Reward systems need to change
– We need scientist management tools
– We need to be less fixated on the big data problems
– We need to unleash the full power of the Internet
Oct. 18, 2010 UCSD Libraries Easy Hard
37. The Truth About My Laboratory
• I have ?? mail folders!
• The intellectual
memory of my
laboratory is in those
folders
• This is an unhealthy hub
and spoke mentality
We Need Scientist Management Tools
Oct. 18, 2010 UCSD Libraries
38. The Truth About My Laboratory
• I generate way more negative that
positive data, but where is it?
• Content management is a mess
– Slides, posters…..
– Data, lab notebooks ….
– Collaborations, Journal clubs …
• Software is open but where is it?
• Farewell is for the data too
Computational Biology Resources Lack Persistence and Usability. PLoS
Comp. Biol. 2008 4(7): e1000136 We Need Scientist Management Tools
http://artbyvida.com/portfolio.php
39. Many Great Tools Out There
Oct. 18, 2010 UCSD Libraries
We Need Scientist Management Tools
Taverna
40. Where I See the Problems
• The long tail is confused
• Lack of interoperability between the options
• The reward (publishing) is still removed from
the available tools
Oct. 18, 2010 UCSD Libraries We Need Scientist Management Tools
41. Science is Increasingly a Digital Workflow
Scientist
Idea
Experiment
Data
Conclusions
PublishThe Role of the Institution
Laboratory
Publisher
42. Maybe The Line is Somewhere Else?
Scientist
Idea
Experiment
Data
Conclusions
Publish
Laboratory
Publisher
Institution
Lab Notebook
The Role of the Institution
43. This Amounts to Publishing Workflows
But That Has its Problems
• Workflows are not linear
• Workflow : paper is not 1:1
• Confidentiality
• Peer review
• Infrastructure
• Community acceptance
• Reward system
The Role of the Institution
44. Solutions to Publishing Workflows?
• New organizations (university as publisher?)
• Appropriate reward system
• Shared governance
– author, institution, publisher
• Crowd sourcing the electronic printing press
The Role of the Institution
45. Crowd Sourcing the Electronic Printing Press
(aka Workshop: Beyond the PDF)
• Funded by DDCF, Microsoft, NCI, Sage
Bionetworks:
• Aims:
– Define user requirements
– Establish a specification document
– Open source the development effort
– Have a commitment from a publisher to publish a
research object using the system
– Act as an exemplar for what can be done
The Role of the Institution
46. Logistics
• UC San Diego
• Jan 19-21, 2010
• Under the auspices of
W3C
• FoRC will have a follow
on meeting
The Role of the Institution