Philip Bourne outlines his vision of transforming biomedical research into a digital enterprise by making data and other digital assets more open, interoperable, and accessible across boundaries through initiatives like the NIH's Big Data to Knowledge initiative; this would help address issues like the slow pace of discovery and non-reproducibility of research by better connecting scientists and their work.
Python Notes for mca i year students osmania university.docx
Towards Biomedical Research as a Digital Enterprise
1. Towards Biomedical Research
as a Digital Enterprise
Philip E. Bourne
University of California San Diego
pbourne@ucsd.edu
2/14/14
2014 ACMI Winter Symposium
1
2. My Background/Bias
• Limited Biomedical Informatics Experience – IAIMS,
Pharmacy Informatics
• RCSB PDB/IEDB Database Developer – Views on
community, quality, sustainability …
• PLOS Journal Co-founder – Open Science Advocate
• Associate Vice Chancellor for Innovation – Business
models, interaction with the private sector,
sustainability
• Professor – Mentoring, reward system, value (or not) of
research
2/14/14
2014 ACMI Winter Symposium
2
3. Why Am I Here?
• In two weeks I will take on the NIH role of
Associate Director for Data Science (ADDS):
NIH Data Science Point Person
Reports to NIH Director
Lead the BD2K initiative
Trans-NIH responsibilities for data
Eric Green, Acting
[Modified slide from Eric Green]
2/14/14
2014 ACMI Winter Symposium
3
4. Disclaimer
• These comments are currently being made as
an employee of the University of California
system and reflect my own opinions.
2/14/14
2014 ACMI Winter Symposium
4
5. I want to engage with this community
to:
•
•
•
•
Understand the most pressing problems
Begin a dialog
Inform you of what I am currently thinking
Inform you of NIH initiatives that are
underway or planned
• Have you change my thinking appropriately
2/14/14
2014 ACMI Winter Symposium
5
6. The NIH Process Thus Far
An external advisory group provided a
valuable blueprint for what should be
done
acd.od.nih.gov/diwg.htm
2/14/14
2014 ACMI Winter Symposium
6
7. Blueprint Recommendations
• Promote central and federated catalogs
– Establish minimal metadata framework
– Tools to facilitate data sharing
– Elaborate on existing data sharing policies
• Support methods and applications
– Fund all phases of software development
– Leverage lessons from National Centers
• Training
– More funding
– Enhance review of training apps
– Quantitative component to all awards
• On campus IT strategic plan
– Catalog of existing tools
– Informatics laboratory
– Ditto big data
• Sustainable funding commitment
2/14/14
2014 ACMI Winter Symposium
acd.od.nih.gov/diwg.htm
7
8. Special Considerations for Phenotypic
Data Relevant to ACMI
• Definition: From cellular to human; sensitive
or non-sensitive
• Need:
– Provide transparency regarding current policies
– Develop a common language for appropriate data
access
– Establish the appropriate forum to establish
policies
2/14/14
2014 ACMI Winter Symposium
8
9. Some of the Phenotypic Data Issues
• Data Governance
– Needs a balance of technology and policy
solutions
• Data Sharing
– Query with or without data release
• Data Characterization
– Local vs standard nomenclature and associated
mapping
Aligns well with:
Hripcsak et al. J. Am. Med. Inform. Assoc. 2014 21:204-211
2/14/14
2014 ACMI Winter Symposium
9
10. Let Me Outline Then in General Terms
Where I See My Effort Being Spent
Going Forward
http://pebourne.wordpress.com/2013/12/
2/14/14
2014 ACMI Winter Symposium
10
11. ADDS Initial Thrusts
•
•
•
•
•
•
•
•
How data are currently being used
Lightweight metadata standards
Data & software registries
Expanded policies on data sharing, open source
software
Training programs & reward systems
Institutional incentives
Private sector incentives
Data centers serving community needs
2/14/14
2014 ACMI Winter Symposium
11
12. ADDS Initial Thrusts
•
•
•
•
•
•
•
•
How data are currently being used
Lightweight metadata standards
Data & software registries
Expanded policies on data sharing, open source
software
Training programs & reward systems
Institutional incentives
Private sector incentives
Data centers serving community needs
2/14/14
2014 ACMI Winter Symposium
12
13. We Need to Start By Asking How Are
We Using the Data Now!
Only Then Can We Make Rational
Decisions About Data – Large or Small
2/14/14
2014 ACMI Winter Symposium
13
14. How Data Are Used
Structure Summary page activity for
H1N1 Influenza related structures
Jan. 2008
Jul. 2008
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 2009
Jul. 2009
Jan. 2010
Jul. 2010
3B7E: Neuraminidase of A/Brevig Mission/1/1918
H1N1 strain in complex with zanamivir
1RUZ: 1918 H1 Hemagglutinin
2/14/14
14
2014 ACMI Winter Symposium
[Andreas Prlic]
15. We Need to Learn from Industries
Whose Livelihood Addresses the
Question of Use
2/14/14
2014 ACMI Winter Symposium
15
16. ADDS Initial Thrusts – More Detail
• Now:
–
–
–
–
–
Data centers (under review)
Data science training grants (call out)
Pilot data catalog consortium (call out)
Genomic Data Sharing Policy (being finalized)
Piloting “NIH-drive”
• What Is Planned:
– Extended public-private programs specifically for data science
activities
– Interagency activities
– International exchange programs
– Cold Spring Harbor-like training facilities – by-coastal?
– Programs for better data descriptions
– Reward institutions/communities
– Policies to get clinical trial data into the public domain
2/14/14
2014 ACMI Winter Symposium
16
17. ADDS Initial Thrusts – More Detail
• Now:
–
–
–
–
–
Data centers (under review)
Data science training grants (call out)
Pilot data catalog consortium (call out)
Genomic Data Sharing Policy (being finalized)
Piloting “NIH-drive”
• What Is Planned:
– Extended public-private programs specifically for data science
activities
– Interagency activities
– International exchange programs
– Cold Spring Harbor-like training facilities – by-coastal?
– Programs for better data descriptions
– Reward institutions/communities
– Policies to get clinical trial data into the public domain
2/14/14
2014 ACMI Winter Symposium
17
18. Pilot NIH-Drive
• Investigator A from the NCI makes frequent
reference to the over expression of genes x and y.
• Investigator B from the NHLBI makes frequent
reference to the under expression of genes x and
y
• Automatic notification of a potential common
interest before publication or database deposition
2/14/14
2014 ACMI Winter Symposium
18
19. Let Me Bring Us Back to a More Far
Reaching View Embodied in the Title
of This Talk:
Towards Biomedical Research as a
Digital Enterprise
2/14/14
2014 ACMI Winter Symposium
19
20. First Consider What We Do (or Wish
We Could Do) Every Day:
We take actions on digital data
increasingly across boundaries
2/14/14
2014 ACMI Winter Symposium
20
21. Actions on Biomedical Data Implies:
•
•
•
•
•
•
•
•
•
Insuring data quality and hence trust
Making data sustainable
Making data open and accessible
Making data findable
Providing suitable metadata and annotation
Making data queryable
Making data analyzable
Presenting data as to maximize its value
Rewarding good data practices
2/14/14
2014 ACMI Winter Symposium
21
22. Actions on Biomedical Data Implies:
•
•
•
•
•
•
•
•
•
Insuring data quality and hence trust
Making data sustainable
Making data open and accessible
Making data findable
Providing suitable metadata and annotation
Making data queryable
Making data analyzable
Presenting data as to maximize its value
Rewarding good data practices
2/14/14
2014 ACMI Winter Symposium
22
23. Boundaries on Biomedical Data
Implies:
• Working across biological scales
• Working across biomedical disciplines
• Working across basic and clinical research and
practice
• Working across institutional boundaries
• Working across public and private sectors
• Working across national and international
borders
• Working across funding agencies
2/14/14
2014 ACMI Winter Symposium
23
24. Boundaries on Biomedical Data
Implies:
• Working across biological scales
• Working across biomedical disciplines
• Working across basic and clinical research and
practice
• Working across institutional boundaries
• Working across public and private sectors
• Working across national and international
borders
• Working across funding agencies
2/14/14
2014 ACMI Winter Symposium
24
25. These Issues Have Been Around
Almost As Long As Biomedical
informatics
The Good News is That “Big Data” Has
Bought More Attention to the Problem
2/14/14
2014 ACMI Winter Symposium
25
26. What Are Big Data?
• Large datasets from high throughput
experiments
• Large numbers of small datasets
• Data which are “ill-formed”
• The why (causality) is replaced by the what
• A signal that a fundamental change is taking
place – a tipping point?
2/14/14
2014 ACMI Winter Symposium
26
27. That Change is Embodied in
The Digital Enterprise
• Consists of digital assets
• E.g. datasets, papers, software, lab notes
• Each asset is uniquely identified and has
provenance, including access control
• E.g. publishing simply involves changing the
access control
• Digital assets are interoperable across the
enterprise
2/14/14
2014 ACMI Winter Symposium
27
28. The Enterprise Is Almost Anything..
Your Lab, your Institution, the
NIH….
2/14/14
2014 ACMI Winter Symposium
28
29. Consider an Academic Institution As A
Digital Enterprise
•
Jane scores extremely well in parts of her graduate on-line neurology class. Neurology professors,
whose research profiles are on-line and well described, are automatically notified of Jane’s
potential based on a computer analysis of her scores against the background interests of the
neuroscience professors. Consequently, professor Smith interviews Jane and offers her a research
rotation. During the rotation she enters details of her experiments related to understanding a
widespread neurodegenerative disease in an on-line laboratory notebook kept in a shared on-line
research space – an institutional resource where stakeholders provide metadata, including access
rights and provenance beyond that available in a commercial offering. According to Jane’s
preferences, the underlying computer system may automatically bring to Jane’s attention Jack, a
graduate student in the chemistry department whose notebook reveals he is working on using
bacteria for purposes of toxic waste cleanup. Why the connection? They reference the same gene a
number of times in their notes, which is of interest to two very different disciplines – neurology and
environmental sciences. In the analog academic health center they would never have discovered
each other, but thanks to the Digital Enterprise, pooled knowledge can lead to a distinct advantage.
The collaboration results in the discovery of a homologous human gene product as a putative target
in treating the neurodegenerative disorder. A new chemical entity is developed and patented.
Accordingly, by automatically matching details of the innovation with biotech companies worldwide
that might have potential interest, a licensee is found. The licensee hires Jack to continue working
on the project. Jane joins Joe’s laboratory, and he hires another student using the revenue from the
license. The research continues and leads to a federal grant award. The students are employed,
further research is supported and in time societal benefit arises from the technology.
From What Big Data Means to Me JAMIA 2014 21:194
2/14/14
2014 ACMI Winter Symposium
29
30. The NIH is Starting to Think About the
Digital Enterprise, Witness…
bd2k.nih.gov
2/14/14
2014 ACMI Winter Symposium
30
31. What Will Define the NIH Digital
Enterprise?
•
•
•
•
•
•
•
•
•
NCBI/NLM
Trans-NIH collaboration – a culture change
Long-term NIH strategic planning
The BD2K Initiative
A “hub” of data science activities
International cooperation
Interagency cooperation
Data sharing policies
External forces….
2/14/14
2014 ACMI Winter Symposium
31
32. External Forces: Science Will Continue
to Become More Open
• The public (and hence the politicians demand
it)
• Its the right thing to do
• Its part of the modern psyche
• The scholarly enterprise is broken and more
stakeholders are acknowledging it
2/14/14
2014 ACMI Winter Symposium
32
33. Result: Discovery is Too Slow
[Josh Sommer]
2/14/14
http://sagecongress.org/Presentations/Sommer.pdf
2014 ACMI Winter Symposium
33
34. Result: Discovery is Too Slow
[Josh Sommer]
2/14/14
http://sagecongress.org/Presentations/Sommer.pdf
2014 ACMI Winter Symposium
34
35. Personal Evidence for a
Broken System
• I have a paper with 16,000 citations that no
one has ever read
• I have papers in PLOS ONE that have more
citations than ones in PNAS
• I have data sets I am proud of but no place to
put them
• I “cant” reproduce work from my own lab….
2/14/14
2014 ACMI Winter Symposium
35
36. Personal Evidence for a
Broken System
• I cant immediately reproduce the research
in my own laboratory:
• It took an estimated 280 hours for an average user
to approximately reproduce the paper
• Workflows are maturing and becoming helpful
• Data and software versions and accessibility
prevent exact reproducibility
Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology:
The Case of the Tuberculosis Drugome PLOS ONE 8(11) e80278 .
2/14/14
2014 ACMI Winter Symposium
36
37. Politicians Demand It:
G8 open data charter
http://opensource.com/government/13/7/open-data-charter-g8
2/14/14
2014 ACMI Winter Symposium
37
40. An Example of That External Force:
The Story of Meredith
http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne
2/14/14
2014 ACMI Winter Symposium
40
43. There Still Needs to be a Reward System
The Wikipedia Experiment – Topic Pages
Identify areas of Wikipedia that
relate to the journal that are
missing of stubs
Develop a Wikipedia page in the
sandbox
Have a Topic Page Editor Review
the page
Publish the copy of record with
associated rewards
Release the living version into
Wikipedia
2/14/14
2014 ACMI Winter Symposium
43
44. One Possible End Product of Open
Science
0. Full text of PLoS papers stored
in a database
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
4.
1.
1. A link brings up figures
from the paper
2.
2/14/14
3. A composite view of
journal and database
content results
3.
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
1. User clicks on thumbnail
2. Metadata and a
webservices call provide
a renderable image that
can be annotated
3. Selecting a features
provides a
database/literature
mashup
4. That leads to new
papers
PLoS Comp. Biol. 2005 1(3) e34
44
45. If This Vision of a Digital Enterprise
Comes to Pass Based Upon:
•
•
•
•
More open science
Deinstitutionalization
New modes of scholarly communication
Changing rewards for scholarship
What Will Biomedical Research Look
Like?
2/14/14
2014 ACMI Winter Symposium
45
46. The Research Life Cycle will
Persist
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
2/14/14
2014 ACMI Winter Symposium
46
47. Tools and Resources Will Continue
To Be Developed
Authoring
Tools
Lab
Notebooks
Data
Capture
Analysis
Tools
Software
Scholarly
Communication
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
48. Those Elements of the Research Life
Cycle will Become More Interconnected
Authoring Around a Common Framework
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Scholarly
Communication
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
49. New/Extended Support Structures Will
Emerge
Authoring
Tools
Data
Capture
Lab
Notebooks
Analysis
Tools
Scholarly
Communication
Software
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Commercial &
Public Tools
DisciplineBased Metadata
Standards
Community Portals
Git-like
Resources
By Discipline
Data Journals
New Reward
Systems
Training
Institutional Repositories
2/14/14
2014 ACMI Winter Symposium
Commercial Repositories
49
50. Change in the Way we Support the
Research Lifecycle
Authoring
Tools
Data
Capture
Lab
Notebooks
Software
Analysis
Tools
Scholarly
Communication
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Commercial &
Public Tools
DisciplineBased Metadata
Standards
Community Portals
Git-like
Resources
By Discipline
Data Journals
New Reward
Systems
Training
Institutional Repositories
2/14/14
2014 ACMI Winter Symposium
Commercial Repositories
50
51. Conclusion:
Biomedical Research Will Increasingly
Become a Digital Enterprise in the Way
I Have Described
Agree/Disagree?
If Agree Where Should Resources be Put?
If Disagree What is Your Vision?
2/14/14
2014 ACMI Winter Symposium
51
52. Provocative Questions Perhaps?
• Do BMI’s see openness in the same way as
computational biologists; if not why not?
• Is there indeed perturbation in what it means
to be a research scholar and if so is that
disruption as prevalent in clinical research as
basic research?
• What would you do in my shoes?
2/14/14
2014 ACMI Winter Symposium
52