SlideShare ist ein Scribd-Unternehmen logo
1 von 37
NIH Data Commons
NIH Data Storage Summit
October 20, 2017
Vivien Bonazzi Ph.D.
Senior Advisor for Data Science (NIH/OD)
Project Leader for the NIH Data Commons
What’s driving the need for a
Data Commons?
Challenges with the current state of data
 Generating large volumes of biomedical data
 Cheap to generate, costly to store on local servers
 Multiple copies of the same data in different locations
 Building data resources that cannot be easily found by others
 Data resources are not connected to each other and cannot
share data or tools
 No standards and guidelines on how to share and access data
Convergence of factors
 Increasing recognition of the need to support data sharing
 Availability of digital technologies and infrastructures that
support Data at scale
 Cloud: data storage, compute and sharing
 FAIR – Findable Accessible Interoperable Reproducible
 Understanding that data is a valuable resource that needs to be
sustained
https://gds.nih.gov/
Went into effect January 25, 2015
NCI guidance:
http://www.cancer.gov/grants-training/grants-management/nci-
policies/genomic-data
Requires public sharing of genomic data sets
Findable
Accessible
Interoperable
Reusable
DATA has VALUE
DATA is CENTRAL to the Digital Economy
a signal of the coming Digital Economy
Scientific digital assets
Data
Software
Workflows
Documentation
Journal Articles
Organizations will be defined by their digital assets
The most successful organizations of the
future will be those that can
leverage their digital assets and transform
them into a digital enterprise
Data Commons
Enabling data driven science
Enable investigators to leverage all possible data and
tools in the effort to accelerate biomedical discoveries,
therapies and cures
by
driving the development of data infrastructure and data
science capabilities through collaborative research and
robust engineering
Developing a Data Commons
 Treats products of research – data, methods, tools,
papers etc. as digital objects
 For this presentation: Data = Digital Objects
 These digital objects exist in a shared virtual space
 Find, Deposit, Manage, Share, and Reuse data,
software, metadata and workflows
 Digital object compliance through FAIR principles:
 Findable
 Accessible (and usable)
 Interoperable
 Reusable
The Data Commons
is a platform
that allows transactions to occur
on FAIR data at scale
The Data Commons Platform
Compute Platform: Cloud
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
FAIR
App store/User Interface/Portal
PaaS
SaaS
IaaS
Other Data Commons’
Data Commons Engagement
US Government Agencies & EU groups
Interoperability with other Commons’
 Common goals – democratizing, collaborating & sharing data
 Reuse of currently available open source tools which support
interoperability
 GA4GH, UCSC, GDC, NYGC
 May 2017 BioIT Commons Session
 Shared open standard APIs for data access and computing
 Ability to deploy and compute across multiple cloud environments
 Docker containers – Dockerstore/Docker registry
 Workflows management, sharing and deployment
 Discoverability (indexing) objects across cloud commons
 Global Unique identifiers
 Common user authentication system
The Good News
 Considerable agreement about the general approaches to
be taken
 Many people are already addressing many of the problems:
 Data architectures/platforms
 Automated/semi-automated data access/authentication protocols
 Common metadata standards and templates
 Open tools and software
 Instantiation and initial metrics of Findability, Accessibility,
Interoperability, and Reusability
 Relationships/agreements with Cloud Service Providers that leverage
their interest in hosting NIH data
 Moving data to the cloud and operating in a cloud environment
The Challenges
 A need to “Bring it all Together” – Community endorsement of:
 Metadata standards/tools/approaches
 Crosswalks between equivalent terms/ontologies
 Robust, shared approaches to data access/authentication
 Best practices that will enable existing data to become FAIR and will
guide generation of future datasets
 Rapidly evolving field makes approaches/tools/etc subject to
change – approaches need to be adaptable
 Effort is required to adapt data to community standards and move
data to the cloud
 How much does that cost and how long does it take?
 Lack of interoperability between cloud providers
The Challenges
 Making data FAIR comes with a cost
 How much does it actually cost?
 How can we minimize the cost?
 How do we determine whether any one set of data warrants the
expense?
 What is the value added to the data by making it FAIR?
 What new science can be achieved?
 How can new derived data or new computational approaches be
added to the dataset to enrich it?
 What are the limitations of FAIRness from dataset to dataset?
Development of a
NIH Data Commons Pilot
NIH Data Commons Pilot
allows access, use and sharing
of large, high value NIH data
in the cloud
NIH Data Commons Pilot
NIH Data Commons Structure
26
Cloud
Services: APIs, Containers, GUIDs, Indexing, Search,
Auth
ACCESS
Scientific analysis tools/workflows
Data
“Reference” Data Sets
TOPMed, GTEx, MODs
FAIR
App store/User Interface/Portal/Workspace
PaaS
SaaS
IaaS
Operationalizing
the NIH Data Commons Pilot
NIH Data Commons Pilot : Implementation
Storage, NIH Marketplace, Metrics and Costs
Leveraging and extending relationships established as part of BD2K
to provide access cloud to storage and compute
Supplements: TOPMed, GTEx, MODs groups
Prepare (and move) data sets to the cloud for storage, access and
scientific use
Work collaboratively with the OT awardees to build towards data access
Data Commons OT Solicitation: Other Transaction
ROA: Research Opportunity Announcement
Developing the fundamental FAIR computational components to
support access, use and sharing of the 3 data sets above
NIH Data Commons Pilot Consortium
 Establishing a new NIH Marketplace
 access to a sustainable cloud infrastructure for data science at NIH
 Over the next 18 months, NIH will establish its own NIH Cloud Marketplace
 Data Commons Pilot Consortium awardees ability to acquire cloud storage and compute
services
 Enable ICs to easily acquire cloud storage and storage services from commercial
cloud providers, resellers, and integrators
 Building on existing relationship with CSPs
 Led by CIT with input from Multi-IC working group
Storage, NIH Marketplace, Metrics and Costs
 Assessment and Evaluation
 What are the costs associated with cloud storage and usage?
 What are the business best practices?
 How should costs be paid?
 Who should pay them?
 How should highly used data be managed vs less used data?
 Are data producers supportive of this model?
 Are users (of all experience levels) able to access and use data effectively?
 How will we know if the Data Commons Pilot is successful?
 How to adjust to changing needs?
Storage, NIH Marketplace, Metrics and Costs
Supplements to 3 Test Data Set Groups
 Administrative Supplements to TOPMed, GTEx and MODs
 PIs for each data set were requested to review the OT (ROA) and
determine appropriate ways to interact
 Prepare (and move) data sets to the cloud for storage, access
and scientific use
 Make community workflows and cloud based tools of popular
analysis pipelines from the 3 datasets accessible
 Facilitate discovery and interpretation of the association of
human and model organism genotypes and phenotypes
NIH Data Commons: OT ROA
 Key Capabilities – modular components
 Development of Community Supported FAIR Guidelines and Metrics
 Global Unique Identifiers (GUID) for FAIR biomedical data
 Open Standard APIs (interoperability & connectivity)
 Cloud Agnostic Architecture and Frameworks
 Cloud User Workspaces
 Research Ethics, Privacy, and Security (AUTH)
 Indexing and Search
 Scientific Use cases
 Training, Outreach, Coordination
 Stage 1: 180 day window
 Develop MVPs (Minimum Viable Products)
 Demonstrations of the Data Commons and its components
 Have one copy of each test data set in each cloud provider
 Understanding of the process required to achieve this
 Draft version of a single standard access control system
 be able to access and use the data through the access control system
 Able to use a variety of analysis tools and pipelines on the 3 data sets in the
cloud – (driven by scientific use cases)
 Have a rudimentary ability to query across test data sets
 Display phenotype, expression and variant data aligned with a specific gene or
genomic location
 Display model organism orthologs for a given set of human genes
 Draft FAIR guidelines and metrics
 Understand how each of the computational components that support the ability
to access data fit together and what standards are needed
 Written plans of how and why these demonstrations should be extended into a full
Pilot
NIH Data Commons Pilot: Outcomes
 Stage 2: 4 year period
 To extend and fully implement the Data Commons Pilot based on the
design strategies and capabilities developed as part of Stage 1
 Review of MVP/demonstrations and written plans from Stage 1
 Goals and Milestones with clear and specific outcomes
 Evaluate, negotiate, and revise terms of existing awards
 Award additional OTs
NIH Data Commons Pilot: Outcomes
Acknowledgments
DPCPSI: Jim Anderson, Betsy Wilder, Vivien Bonazzi, Marie Nierras, Rachel Britt,
Sonyka Ngosso, Lora Kutkat, Kristi Faulk, Jen Lewis, Kate Nicholson,
Chris Darby, Tonya Scott
NHLBI: Gary Gibbons, Alastair Thomson, Teresa Marquette, Jeff Snyder,
Melissa Garcia, Maarten Lerkes, Ann Gawalt, Cashell Jaquish,
George, Papanicolaou
NHGRI: Eric Green, Valentina di Francesco, Ajay Pillai, Simona Volpi, Ken Wiley
NIAID: Nick Weber
CIT: Andrea Norris
NLM: Patti Brennan
NCBI: Steve Sherry
Stay in
Touch
QR Business Card
LinkedIn
@Vivien.Bonazzi
Slideshare
Blog
(Coming soon!)

Weitere ähnliche Inhalte

Was ist angesagt?

NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14
SEAD
 

Was ist angesagt? (20)

NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the Point
 
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
 
Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
SEAD slide set (October 2011)
SEAD slide set (October 2011)SEAD slide set (October 2011)
SEAD slide set (October 2011)
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
BD2K Update
BD2K Update BD2K Update
BD2K Update
 
More with Less? Collaborative Trends in Research Data Management
More with Less? Collaborative Trends in Research Data ManagementMore with Less? Collaborative Trends in Research Data Management
More with Less? Collaborative Trends in Research Data Management
 
Licence to Share: Research and Collaboration through Go-Geo! and ShareGeo
Licence to Share: Research and Collaboration through Go-Geo! and ShareGeoLicence to Share: Research and Collaboration through Go-Geo! and ShareGeo
Licence to Share: Research and Collaboration through Go-Geo! and ShareGeo
 
A SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIHA SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIH
 
FAIR data
FAIR dataFAIR data
FAIR data
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14
 
Big Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH PerspectiveBig Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH Perspective
 
ESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and ToolsESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and Tools
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
 
DataCite and its Members: Connecting Research and Identifying Knowledge
DataCite and its Members: Connecting Research and Identifying KnowledgeDataCite and its Members: Connecting Research and Identifying Knowledge
DataCite and its Members: Connecting Research and Identifying Knowledge
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big Data
 
RDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library AssociationsRDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library Associations
 

Ähnlich wie NIH Data Summit - The NIH Data Commons

Hedstrom Infrastructure
Hedstrom InfrastructureHedstrom Infrastructure
Hedstrom Infrastructure
guest2c9ba28e
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 

Ähnlich wie NIH Data Summit - The NIH Data Commons (20)

The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertA Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
FAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDA
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
University of Minho Data Repository - features to publish & share data and w...
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Publishing Data on the Web
Publishing Data on the Web Publishing Data on the Web
Publishing Data on the Web
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Hedstrom Infrastructure
Hedstrom InfrastructureHedstrom Infrastructure
Hedstrom Infrastructure
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 

Kürzlich hochgeladen

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 

Kürzlich hochgeladen (20)

Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 

NIH Data Summit - The NIH Data Commons

  • 1. NIH Data Commons NIH Data Storage Summit October 20, 2017 Vivien Bonazzi Ph.D. Senior Advisor for Data Science (NIH/OD) Project Leader for the NIH Data Commons
  • 2. What’s driving the need for a Data Commons?
  • 3. Challenges with the current state of data  Generating large volumes of biomedical data  Cheap to generate, costly to store on local servers  Multiple copies of the same data in different locations  Building data resources that cannot be easily found by others  Data resources are not connected to each other and cannot share data or tools  No standards and guidelines on how to share and access data
  • 4. Convergence of factors  Increasing recognition of the need to support data sharing  Availability of digital technologies and infrastructures that support Data at scale  Cloud: data storage, compute and sharing  FAIR – Findable Accessible Interoperable Reproducible  Understanding that data is a valuable resource that needs to be sustained
  • 5. https://gds.nih.gov/ Went into effect January 25, 2015 NCI guidance: http://www.cancer.gov/grants-training/grants-management/nci- policies/genomic-data Requires public sharing of genomic data sets
  • 6.
  • 7.
  • 8.
  • 10. DATA has VALUE DATA is CENTRAL to the Digital Economy a signal of the coming Digital Economy
  • 11. Scientific digital assets Data Software Workflows Documentation Journal Articles Organizations will be defined by their digital assets
  • 12. The most successful organizations of the future will be those that can leverage their digital assets and transform them into a digital enterprise
  • 13. Data Commons Enabling data driven science Enable investigators to leverage all possible data and tools in the effort to accelerate biomedical discoveries, therapies and cures by driving the development of data infrastructure and data science capabilities through collaborative research and robust engineering
  • 14. Developing a Data Commons  Treats products of research – data, methods, tools, papers etc. as digital objects  For this presentation: Data = Digital Objects  These digital objects exist in a shared virtual space  Find, Deposit, Manage, Share, and Reuse data, software, metadata and workflows  Digital object compliance through FAIR principles:  Findable  Accessible (and usable)  Interoperable  Reusable
  • 15. The Data Commons is a platform that allows transactions to occur on FAIR data at scale
  • 16. The Data Commons Platform Compute Platform: Cloud Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data FAIR App store/User Interface/Portal PaaS SaaS IaaS
  • 18. Data Commons Engagement US Government Agencies & EU groups
  • 19. Interoperability with other Commons’  Common goals – democratizing, collaborating & sharing data  Reuse of currently available open source tools which support interoperability  GA4GH, UCSC, GDC, NYGC  May 2017 BioIT Commons Session  Shared open standard APIs for data access and computing  Ability to deploy and compute across multiple cloud environments  Docker containers – Dockerstore/Docker registry  Workflows management, sharing and deployment  Discoverability (indexing) objects across cloud commons  Global Unique identifiers  Common user authentication system
  • 20. The Good News  Considerable agreement about the general approaches to be taken  Many people are already addressing many of the problems:  Data architectures/platforms  Automated/semi-automated data access/authentication protocols  Common metadata standards and templates  Open tools and software  Instantiation and initial metrics of Findability, Accessibility, Interoperability, and Reusability  Relationships/agreements with Cloud Service Providers that leverage their interest in hosting NIH data  Moving data to the cloud and operating in a cloud environment
  • 21. The Challenges  A need to “Bring it all Together” – Community endorsement of:  Metadata standards/tools/approaches  Crosswalks between equivalent terms/ontologies  Robust, shared approaches to data access/authentication  Best practices that will enable existing data to become FAIR and will guide generation of future datasets  Rapidly evolving field makes approaches/tools/etc subject to change – approaches need to be adaptable  Effort is required to adapt data to community standards and move data to the cloud  How much does that cost and how long does it take?  Lack of interoperability between cloud providers
  • 22. The Challenges  Making data FAIR comes with a cost  How much does it actually cost?  How can we minimize the cost?  How do we determine whether any one set of data warrants the expense?  What is the value added to the data by making it FAIR?  What new science can be achieved?  How can new derived data or new computational approaches be added to the dataset to enrich it?  What are the limitations of FAIRness from dataset to dataset?
  • 23. Development of a NIH Data Commons Pilot
  • 24. NIH Data Commons Pilot allows access, use and sharing of large, high value NIH data in the cloud
  • 26. NIH Data Commons Structure 26 Cloud Services: APIs, Containers, GUIDs, Indexing, Search, Auth ACCESS Scientific analysis tools/workflows Data “Reference” Data Sets TOPMed, GTEx, MODs FAIR App store/User Interface/Portal/Workspace PaaS SaaS IaaS
  • 28. NIH Data Commons Pilot : Implementation Storage, NIH Marketplace, Metrics and Costs Leveraging and extending relationships established as part of BD2K to provide access cloud to storage and compute Supplements: TOPMed, GTEx, MODs groups Prepare (and move) data sets to the cloud for storage, access and scientific use Work collaboratively with the OT awardees to build towards data access Data Commons OT Solicitation: Other Transaction ROA: Research Opportunity Announcement Developing the fundamental FAIR computational components to support access, use and sharing of the 3 data sets above
  • 29. NIH Data Commons Pilot Consortium
  • 30.  Establishing a new NIH Marketplace  access to a sustainable cloud infrastructure for data science at NIH  Over the next 18 months, NIH will establish its own NIH Cloud Marketplace  Data Commons Pilot Consortium awardees ability to acquire cloud storage and compute services  Enable ICs to easily acquire cloud storage and storage services from commercial cloud providers, resellers, and integrators  Building on existing relationship with CSPs  Led by CIT with input from Multi-IC working group Storage, NIH Marketplace, Metrics and Costs
  • 31.  Assessment and Evaluation  What are the costs associated with cloud storage and usage?  What are the business best practices?  How should costs be paid?  Who should pay them?  How should highly used data be managed vs less used data?  Are data producers supportive of this model?  Are users (of all experience levels) able to access and use data effectively?  How will we know if the Data Commons Pilot is successful?  How to adjust to changing needs? Storage, NIH Marketplace, Metrics and Costs
  • 32. Supplements to 3 Test Data Set Groups  Administrative Supplements to TOPMed, GTEx and MODs  PIs for each data set were requested to review the OT (ROA) and determine appropriate ways to interact  Prepare (and move) data sets to the cloud for storage, access and scientific use  Make community workflows and cloud based tools of popular analysis pipelines from the 3 datasets accessible  Facilitate discovery and interpretation of the association of human and model organism genotypes and phenotypes
  • 33. NIH Data Commons: OT ROA  Key Capabilities – modular components  Development of Community Supported FAIR Guidelines and Metrics  Global Unique Identifiers (GUID) for FAIR biomedical data  Open Standard APIs (interoperability & connectivity)  Cloud Agnostic Architecture and Frameworks  Cloud User Workspaces  Research Ethics, Privacy, and Security (AUTH)  Indexing and Search  Scientific Use cases  Training, Outreach, Coordination
  • 34.  Stage 1: 180 day window  Develop MVPs (Minimum Viable Products)  Demonstrations of the Data Commons and its components  Have one copy of each test data set in each cloud provider  Understanding of the process required to achieve this  Draft version of a single standard access control system  be able to access and use the data through the access control system  Able to use a variety of analysis tools and pipelines on the 3 data sets in the cloud – (driven by scientific use cases)  Have a rudimentary ability to query across test data sets  Display phenotype, expression and variant data aligned with a specific gene or genomic location  Display model organism orthologs for a given set of human genes  Draft FAIR guidelines and metrics  Understand how each of the computational components that support the ability to access data fit together and what standards are needed  Written plans of how and why these demonstrations should be extended into a full Pilot NIH Data Commons Pilot: Outcomes
  • 35.  Stage 2: 4 year period  To extend and fully implement the Data Commons Pilot based on the design strategies and capabilities developed as part of Stage 1  Review of MVP/demonstrations and written plans from Stage 1  Goals and Milestones with clear and specific outcomes  Evaluate, negotiate, and revise terms of existing awards  Award additional OTs NIH Data Commons Pilot: Outcomes
  • 36. Acknowledgments DPCPSI: Jim Anderson, Betsy Wilder, Vivien Bonazzi, Marie Nierras, Rachel Britt, Sonyka Ngosso, Lora Kutkat, Kristi Faulk, Jen Lewis, Kate Nicholson, Chris Darby, Tonya Scott NHLBI: Gary Gibbons, Alastair Thomson, Teresa Marquette, Jeff Snyder, Melissa Garcia, Maarten Lerkes, Ann Gawalt, Cashell Jaquish, George, Papanicolaou NHGRI: Eric Green, Valentina di Francesco, Ajay Pillai, Simona Volpi, Ken Wiley NIAID: Nick Weber CIT: Andrea Norris NLM: Patti Brennan NCBI: Steve Sherry
  • 37. Stay in Touch QR Business Card LinkedIn @Vivien.Bonazzi Slideshare Blog (Coming soon!)

Hinweis der Redaktion

  1. Current snapshot of Commons status
  2.   Development of FAIR-ness Metrics 
  3. The Data Commons is a federated way to provide access and sharing of large , high value NIH data The purpose of a Cloud based Data Commons is to make large data sets accessible and usable by the broader community. Having one copy of a large data set on the cloud means it is accessible by many researchers and they don’t need to copy the data set from NCBI (or other repositories) to the cloud every time they want to use it. One copy of a large data set on the cloud, accessed multiple times by many researchers who are only paying for the ability to compute on that data is more cost and time effective than moving the same large data set multiple times to the cloud A cloud based Data Commons becomes much more powerful when (community based) standardized methods and systems are adopted. These standards apply to the way the data and tools interact with each and the computing environment they sit within ie cloud and other how data and tools are made accessible to the user. Standards specifically relate to the FAIR guidelines, API to access data, workflows and tool, docker containers for deployment of tools to the cloud Standards are what enable a federated Commons. Standards create the basic ground rules and common language for interactions in the system.
  4. The Data Commons Framework describes the ecosystem that the OT solicitation is building towards. Each of the key capabilities described in the OT have a major role in the development of the ecosystem
  5. Governance of the Commons can be found on slide XX
  6. The purpose of this slide is to give a sense that to provide access to the data requires a series of modular reusable components I wont describe each KC but I want to give them a sense that there are modular components that fit together to permit access
  7. Multi IC Working Group co-chairs for the Data Commons Pilot Gary Gibbons, Eric Green, Patti Brennan, Jim Anderson, Andrea Norris