Meeting the Computational Challenges Associated with Human Health
1. Meeting the Computational
Challenges Associated with Human
Health
Philip E. Bourne, PhD
Associate Director for Data Science
National Institutes Health
SC14 New Orleans
November 20, 2014
2. We have come a long way in just one
researcher’s career
3.
4. We Have Both Been Very Successful
World Climate Report 2011
http://www.cnet.com/news/china-unseats-u-s-in-supercomputer-ranking/
5. We Have Both Been Very Successful
World Climate Report 2011
http://www.cnet.com/news/china-unseats-u-s-in-supercomputer-ranking/
7. .. But There is Much to Do
Number of drugs is:
– Too few
– Too Long to get to market
– Not personalized
Rare diseases are ignored
Clinical trials are too limited in the number of patients
too expensive and not retroactive
Education & training does not match well to current
market needs
Research is not cost effective
– Not easily replicated
– Too slow to disseminate
…..
8. .. And there is much ferment in
research ..
http://fora.tv/2012/04/20/Congress_Unplugged_
Phil_Bourne
9. .. and healthcare systems
http://www.genomicsengland.co.uk/the-100000-genomes-project/
10. But we can see the promise and much
of that promise is driven by the data
revolution
12. An Example of That Promise:
Comorbidity Network for 6.2M Danes
Over 14.9 Years
Jensen et al 2014 Nat Comm 5:4022
13. What is the NIH Doing to Fulfill
That Promise?
14. ADDS Mission
Statement
To foster an open ecosystem that
enables biomedical* research to be
conducted as a digital enterprise that
enhances health, lengthens life and
reduces illness and disability
* Includes biological, biomedical, behavioral, social,
environmental, and clinical studies that relate to understanding
health and disease.
15. Elements of The Ecosystem
Community Policy
Infrastructure
• Sustainability
• Collaboration
• Training
16. Elements of The Ecosystem
Community Policy
Infrastructure
• Sustainability
Collaboration
• Training
Virtuous
Research
Cycle
17. Policies – Now & Forthcoming
Data Sharing
– Genomic data sharing announced
– Data sharing plans on all research awards
– Data sharing plan enforcement
• Machine readable plan
• Repository requirements to include grant numbers
http://www.nih.gov/news/health/aug2014/od-27.htm
18. Policies - Forthcoming
Data Citation
– Goal: legitimize data as a form of scholarship
– Process:
• Machine readable standard for data citation (done)
• Endorsement of data citation for inclusion in NIH bib
sketch, grants, reports, etc.
• Example formats for human readable data citations
• Slowly work into NLM/NCBI workflow
19. Infrastructure - The
BD2K
Center
Commons
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
DDICC
Software
Standard
s
Labs
Labs
Labs
Labs
20. What is the Commons?
A Conceptual Framework for sharing and
being FAIR:
– Finding
– Accessing
– Integrating,
– Reusing
digital research objects with attribution
The Commons is agnostic of computing platform
21. The Commons
Digital Objects
(with UIDs)
Search
(indexed metadata)
Computing
Platform
The Commons
22. The Commons
Digital Objects
(with UIDs)
Search
(indexed metadata)
Computing
Platform
The Commons
23. The Commons: Compute Platforms
The Commons
Conceptual Framework
Public Cloud
Platforms
Super Computing
(HPC) Platforms
Other
Platforms ?
Google, AWS (Amazon)
Microsoft (Azure), IBM,
other?
In house compute
solutions
Private clouds, HPC
– Pharma
– The Broad
– Bionimbus
Low access by NIH PIs
Super Computing 2014
ADDS coordinating
meeting with SC centers
NERSC “Commons Pilot”
24. The Commons
Digital Objects
(with UIDs)
Search
(indexed metadata)
Computing
Platform
The Commons
25. The Commons:
Research Objects APIs and Search
The Commons
Conceptual Framework
Public Cloud
Platforms
Research Object IDs under discussion by the community
– BD2K centers, NCI Cloud pilots (Google & AWS supported)
– Large Public Data Sets, MODs
Search
– BD2K Data and Software Discovery Indices
– Google Search functions
Appropriate APIs being developed by the community eg
GA4GH
Use cases
26. The Commons:
Next Steps
The Commons
Conceptual Framework
Next Steps
– Currently identifying pilot projects
Public Cloud
Platforms
Interested speak with Vivien Bonazzi
27. Commons – Simple Implementation
Stack
App
Store
AAPPIIss App
Store
Biomedical Data
Biomedical Data
Software
Software
BBioiommeeddicicaal l DDAATTAA
BBigig DDaattaa SSooffttwwaarree
SSccaalalabblele HHaarrddwwaarree
29. Community: Training
Data Science Training Goals
1) Build an OPEN digital framework for data
science training:
NIH Data Science Workforce Development Center
1) Develop short-term training opportunities:
Courses, educational resources, etc.
1) Develop the discipline of biomedical data
science and support cross-training – OPEN
courseware
All goals have a diversity component and manate
30. What Is Needed? – Some Examples
from Across the ICs
Homogenization of disparate large unstructured
datasets
Deriving structure from unstructured data
Feature mapping and comparison from image data
Visualization and analysis of multi-dimensional
phenotypic datasets
Causal modeling of large scale dynamic networks
and subsequent discovery
Utilize data that are sparsely and irregularly sampled
and noisy
BD2K can offer reference datasets and points of
domain expertise to explore these questions
31. Potential Outcomes
Mobility: improve the outcomes of surgeries in
children with cerebral palsy and gait pathology
Wellness: markers derived from constantly monitored
eHealth/mobile health devices – apply to smoking
cessation, weight loss
Cancer: further personalization of treatment
Mental Health: better identify factors that resist and
promote brain disease e.g., schizophrenia, bipolar
disorder, major depression, attention deficit
hyperactivity disorder (ADHD), obsessive compulsive
disorder (OCD), autism
Addiction: utilizing social media to track and treat
drug use and addiction
33. Associate Director for Data Science
Scientific Data Council External Advisory Board
Programmatic Theme
Sustainability Education Innovation Process
Deliverable
Commons Training
BD2K Efficiency
Example Features • IC’s
• Cloud – Data &
Compute
• Search
• Security
• Reproducibility
Standards
• App Store
• Coordinate
• Hands-on
• Syllabus
• MOOCs
• Community
• Centers
• Training Grants
• Catalogs
• Standards
• Analysis
• Data
Resource
Support
• Metrics
• Best
Practices
• Evaluation
• Portfolio
Analysis
Collaboration
Partnerships
• Researchers
• Federal
Agencies
• International
Partners
• Computer
Scientists
The Biomedical Research Digital Enterprise
16 million hospital inpatient events (24.5% of total), 35 million outpatient clinic events (53.6% of total) and 14 million emergency
department events (21.9% of total