1. The Future of Open Science
Philip E. Bourne
http://www.slideshare.net/pebourne/
4/08/14 NIAID Workshop on Open Science 1
2. The future depends on who you
ask
Here is my biased viewpoint
4/08/14 NIAID Workshop on Open Science 2
3. My Background/Bias
• RCSB PDB/IEDB Database Developer – Views on
community, quality, sustainability …
• PLOS Journal Co-founder – Open science
advocate
• Associate Vice Chancellor for Innovation –
Business models, interaction with the private
sector, sustainability
• Professor – Mentoring, reward system, value (or
not) of research
• NIH Strategist/Transformer - ??
4/08/14 NIAID Workshop on Open Science 3
4. Perhaps the first question to ask is:
What is an endpoint?
4/08/14 NIAID Workshop on Open Science 4
5. What is an Endpoint?
4/08/14 NIAID Workshop on Open Science 5
6. What Does The Democratization of
Science Imply?
• The obvious – participation by all
• Not so obvious
– More scrutiny
– New types of rewards
– More equal value placed on all participants
– The removal of artificial boundaries that corral
knowledge (through power and resources) within
silos that do not make sense as complexity
increases
4/08/14 NIAID Workshop on Open Science 6
7. Consider some personal examples that
illustrate these implications
4/08/14 NIAID Workshop on Open Science 7
8. More Scrutiny – Highlights
Lack of Reproducibility
• I can’t immediately reproduce the research
in my own laboratory:
• It took an estimated 280 hours for an average user
to approximately reproduce the paper
• Workflows are maturing and becoming helpful
• Data and software versions and accessibility
prevent exact reproducibility
Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology:
The Case of the Tuberculosis Drugome PLOS ONE 8(11) e80278 .
NIAID Workshop on Open
Science
84/08/14
9. Why New Types of
Rewards?
• I have a paper with 16,000 citations that no
one has ever read
• I have papers in PLOS ONE that have more
citations than ones in PNAS
• I have data sets I am proud of few places to
put them
• I edited a journal but it did not count for much
4/08/14 NIAID Workshop on Open Science 9
10. Equal Value Placed
on Participants
• The UC System has Research Scientists (RS) &
Project Scientists (PS) as well as tenured
faculty -
– RS/PS have no senate rights yet:
– RS/PS frequently teach
– RS/PS frequently have more grant money
– RS/PS typically perform more service
– RS/PS are most of the data scientists you know
4/08/14 NIAID Workshop on Open Science 10
12. Institutional Boundaries
• Academia – Departments of physics, math,
biology, chemistry etc. persist but scholars
rarely confine themselves to these disciplines
• NIH – 27 institutes and centers, many
dedicated to specific diseases & conditions –
yet a specific gene undoubtedly transcends ICs
4/08/14 NIAID Workshop on Open Science 12
13. The Era of Open Has The Potential
to Deinstitutionalize
NIAID Workshop on Open
Science
13
Daniel Hulshizer/Associated Press
4/08/14
14. An Example of That Potential:
The Story of Meredith
NIAID Workshop on Open
Science
14
http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne
4/08/14
15. The Era of Open Has The Potential
to Deinstitutionalize
NIAID Workshop on Open
Science
15
Daniel Hulshizer/Associated Press
4/08/14
16. I have argued that the democratization
of science is compelling
and that much has happened around
open literature, open software and
now open data
4/08/14 NIAID Workshop on Open Science 16
17. I Would Also Argue That This Process is
About to Accelerate
• Others provide a more
compelling argument:
– Google car
– 3D printers
– Waze
– Robotics
4/08/14 NIAID Workshop on Open Science 17
18. From the Second Machine Age
4/08/14 NIAID Workshop on Open Science 18
From: The Second Machine Age: Work, Progress, and Prosperity in a
Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
19. So what will this look like for an
institution?
4/08/14 NIAID Workshop on Open Science 19
Institutions will become digital enterprises
20. Components of The Academic Digital
Enterprise
• Consists of digital assets
– E.g. datasets, papers, software, lab notes
• Each asset is uniquely identified and has
provenance, including access control
– E.g. publishing simply involves changing the access
control
• Digital assets are interoperable across the
enterprise
4/08/14 NIAID Workshop on Open Science 20
21. Life in the Academic Digital Enterprise
• Jane scores extremely well in parts of her graduate on-line neurology class. Neurology professors,
whose research profiles are on-line and well described, are automatically notified of Jane’s
potential based on a computer analysis of her scores against the background interests of the
neuroscience professors. Consequently, professor Smith interviews Jane and offers her a research
rotation. During the rotation she enters details of her experiments related to understanding a
widespread neurodegenerative disease in an on-line laboratory notebook kept in a shared on-line
research space – an institutional resource where stakeholders provide metadata, including access
rights and provenance beyond that available in a commercial offering. According to Jane’s
preferences, the underlying computer system may automatically bring to Jane’s attention Jack, a
graduate student in the chemistry department whose notebook reveals he is working on using
bacteria for purposes of toxic waste cleanup. Why the connection? They reference the same gene a
number of times in their notes, which is of interest to two very different disciplines – neurology and
environmental sciences. In the analog academic health center they would never have discovered
each other, but thanks to the Digital Enterprise, pooled knowledge can lead to a distinct advantage.
The collaboration results in the discovery of a homologous human gene product as a putative target
in treating the neurodegenerative disorder. A new chemical entity is developed and patented.
Accordingly, by automatically matching details of the innovation with biotech companies worldwide
that might have potential interest, a licensee is found. The licensee hires Jack to continue working
on the project. Jane joins Joe’s laboratory, and he hires another student using the revenue from the
license. The research continues and leads to a federal grant award. The students are employed,
further research is supported and in time societal benefit arises from the technology.
From What Big Data Means to Me JAMIA 2014 21:194
4/08/14 NIAID Workshop on Open Science 21
22. Life in the NIH Digital Enterprise
• Researcher x is made aware of researcher y through commonalities
in their data located in the data commons. Researcher x reviews the
grants profile of researcher y and publication history and impact
from those grants in the past 5 years and decides to contact her. A
fruitful collaboration ensues and they generate papers, data sets
and software. Metrics automatically pushed to company z for all
relevant NIH data and software in a specific domain with utilization
above a threshold indicate that their data and software are heavily
utilized and respected by the community. An open source version
remains, but the company adds services on top of the software for
the novice user and revenue flows back to the labs of researchers x
and y which is used to develop new innovative software for open
distribution. Researchers x and y come to the NIH training center
periodically to provide hands-on advice in the use of their new
version and their course is offered as a MOOC.
4/08/14 NIAID Workshop on Open Science 22
23. To get to that end point we have to
consider the complete digital research
lifecycle
4/08/14 NIAID Workshop on Open Science 23
24. The Digital Research Life Cycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
4/08/14 24NIAID Workshop on Open Science
25. Tools and Resources Will Be Better
Coordinated
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
4/08/14 NIAID Workshop on Open Science 25
26. Through Interconnection Around a
Common Framework
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
4/08/14 NIAID Workshop on Open Science 26
27. New/Extended Support Structures Will
Emerge
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
4/08/14 NIAID Workshop on Open Science 27
28. We Have a Ways to Go
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
4/08/14 NIAID Workshop on Open Science 28
29. But Lets Not Forget NIH has
Contributed a Lot
• NLM/NCBI
• Individual IC support
• Open access policies – PubMed Central
• Emergent data sharing plans
• Big Data to Knowledge (BD2K)
• Office of the Associate Director for Data
Science
• .. And more to come…
4/08/14 NIAID Workshop on Open Science 29
30. Call Out to Eric Green, and the Team…
4/08/14 NIAID Workshop on Open Science 30
bd2k.nih.gov
31. Interesting Observations So Far
• We need to start by
asking, how are we using
the data now?
• We have the why for data
sharing, but not the how
• Training is spotty
• Existing data resources
need attention
• Sometimes it is enough
for me to sit down
4/08/14 NIAID Workshop on Open Science 31
32. Office of Data Science
Data
Commons
Training
Center
BD2K Review
Sustainability Education Innovation Process
• Cloud – Data &
Compute
• Search
• Security
• Reproducibility
Standards
• App Store
• Hands-on
• MOOCs
• Community
Engagement
• Data Science
Centers
• Training Grants
• DDI
• Analysis
• Domain Support
• Data
Resource
Support
• Metrics
• Best Practices
• Evaluation
• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Communication
Collaboration
Programmatic Theme
Deliverable
Example Features
• To IC’s
• To Researchers
• To Federal
Agencies
• To International
Partners
• To Computer
Scientists
Scientific Data Council External Advisory Board
04/03/14
33. 1. A link brings up figures
from the paper
0. Full text of PLoS papers stored
in a database
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
3. A composite view of
journal and database
content results
One Possible End Point
1. User clicks on thumbnail
2. Metadata and a
webservices call provide
a renderable image that
can be annotated
3. Selecting a features
provides a
database/literature
mashup
4. That leads to new
papers
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e344/08/14 33
34. Open Science Will:
• Lead to the democratization of science
• Change how institutions think and operate – they
will become digital enterprises
• Impact all aspects of the scholarly research
lifecycle
• Accelerate seek{ing} fundamental knowledge
about the nature and behavior of living systems
and the application of that knowledge to enhance
health, lengthen life, and reduce illness and
disability
4/08/14 NIAID Workshop on Open Science 34