This document summarizes Mike Becich's presentation on building data commons at various institutions. It discusses initiatives by NIH to build data commons through funding of pilots involving model organism databases, human microbiome data, and NCI data commons. It outlines efforts by NLM to build repositories for data discovery through bioCADDIE and provides details on building blocks for data commons at the University of Pittsburgh involving genomic and health data. Finally, it summarizes funding opportunities across NIH to build data commons through archiving longitudinal and child health data.
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Towards a Data Commons
1. ACMI 2017 Winter Symposium 1
Mike Becich, MD PhD
Department of Biomedical Informatics
Chair and University Distinguished Professor,
Associate Vice Chancellor for Informatics
Associate Director, U Pit Cancer Inst, Clin Trans Sci Inst
University of Pittsburgh School of Medicine
NCI Board of Scientific Advisors
Towards a Data Commons
ACMI 2017 Winter Symposium
Duck Key, FL
2. ACMI 2017 Winter Symposium 2
Motivations
• Making Data Sharing Efficient (and Persistent)
• NIH Institutes/Center (ICs) are funding “Commons”
– Precision Medicine and Data Science programs are drivers
• NLM’s Trans-NIH Biomedical Informatics Coordinating
Committee (BMIC):
• Data Sharing Repositories -
https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
• Common Data Elements (CDE) Resource Portal -
https://www.nlm.nih.gov/cde/index.html
• NCI, NIAID, NICHD and NLM have been most proactive
to date
3. ACMI 2017 Winter Symposium 3
NIH Initiatives
• NIH Data Commons Pilots –
https://datascience.nih.gov/commons
– Model Organism Database
– BD2K Centers Pilots (e.g. Pitt/Harvard)
– Human Microbiome Project
– NCI Data Commons
• Genomic Data Commons (GDC) – U Chicago – TCGA data
• Cloud Pilots - ongoing
• NIAID – BD2K Center for Enhanced Data
Annotation and Retrieval (Musen/Stanford)
4. ACMI 2017 Winter Symposium 4
Big Data To Knowledge (BD2K)
bioCADDIE and DataMed
• USCD (Ohno-Machado) BD2K Data Discovery
Index Project – bioCADDIE
• DataMed v1.5 available
• Aims to allow in a PubMed-like fashion to
search for and discover data sets
• Is this scalable to provide institutional
infrastructure?
6. ACMI 2017 Winter Symposium 6
PCORI CDRN and CTSA ACT start to unlock
Clinical Data from EHRs – Key Drivers
7. ACMI 2017 Winter Symposium 7
Further Fuel is Precision Medicine Initiative –
Adding Biospecimens, Mobile Sensors
8. ACMI 2017 Winter Symposium 8
• Operationally data sharing is an NIH requirement
• Most Institutions (maybe all) don’t really treat data as
the valued asset it is – era of Data Science
• Most health science investigators are struggling due
to access to scalable storage, high performance
computing and open source tool maintenance – the
day of supercomputing is here
• Hence, institutions (and BMI) need to support a “real”
plan for Research Data Management
• At Pitt Data Commons = Research Data Management
Key Question – How to Pull It All Together
9. ACMI 2017 Winter Symposium 9
Data Commons Infrastructure @ Pitt
Data
Infrastructure
Component
Awareness Implementation &
Deployment
Adoption Comments
CRIS/Center for
Research Computing
Yellow Red Red Under discussion
DMPT Tool Yellow Yellow Yellow In progress
Box storage (small
scale) sharing
Green Green green Not the type of “cloud computing” we
need for research – simply storage and
no HPC, software tools
Storage (large scale) Red Red Red Turn to PSC, SaM and commercial cloud
provider(s) – need scale and flexibility
Data Catalogue Red Red red
Metadata schema /
ontologies
Red Red red No institutional data schema in place;
disciplinary standards present in some
areas
Analysis tools Green (everyone knows this
is needed)
Yellow red Check licensing arrangements
Visualization tools Yellow Red Red
DOIs Yellow Red Red
Deposit Red Red Red
Repository/
preservation
Red Red Red Noted as a major gap
Tracking tools Red Red Red
Training Yellow Yellow Yellow 4 classes offered by HSLS
Advocacy/ guides Yellow Yellow Yellow In development by ULS, HSLS, CSSD
10. ACMI 2017 Winter Symposium 10
Who’s at the table?
• School of Computing and Information (SCI):
– Department of Computer Science
– School of Information Science
• Dept of Information Culture & Data Stewardship (Liz Lyons - chair)
• Department of Biomedical Informatics
– CRIO for the Health Sciences – Recruiting Op TBN
• CIO & Computing Services and Systems Development
• Center for Research Computing – New Director TBN
• Pittsburgh Supercomputing Center
• Health Sciences & University (Pitt and CMU) Libraries
• Office of Research
11. ACMI 2017 Winter Symposium 11
Building Blocks – Pittsburgh Genome Research
Repository (Rebecca Jacobson – ACMI)
12. ACMI 2017 Winter Symposium 12
Building Blocks – BD2K - Center for Causal
Discovery (Greg Cooper - ACMI)
13. ACMI 2017 Winter Symposium 13
Building Blocks – Pittsburgh Health
Data Alliance (Becich – ACMI)
• Two Centers created:
• Center for Machine
Learning in Healthcare
• Led by Joe Marks in
CMU School of
Computer Science
• Center for Commercial
Applications (CCA)
• Led by Mike Becich
and Don Taylor
• $2M/yr in Early Stage
• $22M in follow on
funding for successful
projects
• Launch in July 2015
15. ACMI 2017 Winter Symposium 15
• NCI – Cancer Immunology Data Commons
(CIDC) – linked to Cancer Immunologic
Monitoring and Analysis Centers (CIMAC)
• PDX Data Commons – Patient Derived
Xenografts – linked to PDX Trial Research
Network
• NCI Commons Credits for cloud HPC
New National Funding Ops – NCI
16. ACMI 2017 Winter Symposium 16
• TOPMed – Trans-Omics for Precision Medicine goals:
– Collect and assemble -omics (RNASeq, methylation,
metabolomics, epigenomics, and proteomics) data with
WGS and clinical outcomes data across diverse populations
including those traditionally underrepresented in research.
– Build a data commons repository that the scientific
community can use for future research and to enable
precision medicine.
– Stimulate systems medicine approaches that help organize
data to ensure they are accessible and interpretable for
health disease research.
– Promote discoveries about the fundamental mechanisms
that underlie HLBS disorders.
NHLBI Data Commons
17. ACMI 2017 Winter Symposium 17
• Archiving and Sharing of Longitudinal Data
Resources on Aging (U24)
– Foster data sharing and wider use of longitudinal
data for research on aging in the behavioral and
social sciences
– sharing best practices in data and metadata
documentation, and disseminating information
about useful data sets to the research community
NIA Data Commons Efforts
18. ACMI 2017 Winter Symposium 18
• Archiving and Documenting Child Health and Human
Development Data Sets
– support archiving and documenting existing data sets in
order to enable secondary analysis of these data by the
scientific community
– Types of data include survey data, administrative data,
results of assays conducted on biospecimens, data from
clinical trials, and patient registries.
– Also included are archiving activities for data that is to be
added to existing data sets in order to enhance their
potential scientific impact, such as geographic information
systems (GIS), community-level, or registry data.
NICHD Data Commons Efforts
19. ACMI 2017 Winter Symposium 19
• “... efficient storage, manipulation, analysis, and
sharing of research output, from all parts of the
research lifecycle...” PE Bourne
– Funding opportunities are being launched across the NIH
– Time to fit your local, regional and national data sharing
and analysis needs
– Need “jumpstart” funding in research computing
infrastructure
– Sustainability possible if Offices of Research ensure “data
sharing” infrastructure is budgeted on each grant
Common Goals of Commons
20. ACMI 2017 Winter Symposium 20
Conclusions
Please join in this effort by e-mailing me – becich@pitt.edu
Provide Interest/Skills/Personal Goals – I will send you Pitt’s RoadMap
• Biomedical Informatics and the new home (NLM)
of the Data Science Program is Key
• Influencers should assist the new NLM Director in
the four working groups of the NLM Strategic
Planning Process
• Key innovations in development of research
objects, integrative metadata development,
causal analytics and novel research computing
environments (supercomputing/cloud
computing/storage) will be key!!!