SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Globus Genomics:
Democratizing NGS Analysis
Ravi K Madduri, University of Chicago and Argonne
National Laboratory
madduri@uchicago.edu
@madduri

globus.org/genomics
RIP Fred Sanger
Frederick Sanger (13 August 1918 – 19 November
2013) was a British biochemist who won the Nobel
Prize for Chemistry twice, the only person to have
done so. In 1958 he was awarded a Nobel prize in
chemistry "for his work on the structure of
proteins, especially that of insulin". In 1980, Walter
Gilbert and Sanger shared half of the chemistry prize
"for their contributions concerning the determination
of base sequences in nucleic acids". He was the
fourth person to have been awarded two Nobel
Prizes, either individually or in tandem with others

globus.org/genomics
Our vision for a 21st century
discovery infrastructure
Provide more capability for
more people at lower cost by
delivering “Science as a service”
www.globus.org
globus.org/genomics
globus.org/genomics
Globus Genomics

Globus Genomics

•

Public
Data

Globus Provides a
Sequencing
Centers

•
•
•

High-performance
Fault-tolerant
Secure
Research Lab

Galaxy Based Workflow
Management System
Fastq

Galaxy
Data Libraries

Storage

file transfer Service between Cluster/
Local
Cloud
Seq
all data-endpoints
Center

Picard

Globus Integrated wit
Ref Genome Galaxy
• Web-based UI
• Drag-Drop workflow
creations
• Easily modify Workflo
with new tools

Alignment

•

GATK

Analytical tools are
automatically run
on the scalable
compute resources
when possible

Variant Calling

Globus Genomics on
Amazon EC2

Data Management

Data Analysis
globus.org/genomics
Core Capabilities
• Computational profiles for
various analysis tools to
provide optimal performance
• Resources can be
provisioned on-demand with
Amazon Web Services cloud
based infrastructure
• High performance, Reliable
Data movement is
streamlined with integrated
Globus file-transfer
functionality
• Integrated Globus endpoints
and Campus login
globus.org/genomics
• Pricing includes
• Estimated compute
• Storage (one month)
• Globus Genomics platform usage
• Support
globus.org/genomics
Scalability

globus.org/genomics
Security and Privacy
Globus Genomics compliance with the NCBI Database of Genotypes
and Phenotypes (dbGaP) security best practices
Protecting the Security of Controlled Data on Servers

•
•
•

•
•
•

All Globus Genomics servers are protected by Amazon Security Groups
and by stateful packet inspection firewalls. Only necessary services are
allowed
All relevant security patches are applied as soon as they are available
Globus Genomics and Globus provide sharing solutions that are secure
and user controlled
Globus Genomics uses HTTPS and GridFTP protocol with
authentication and encryption when transferring the files
Data access is strictly restricted to individual users and only users can
share the data with other users. We provide detailed instructions to our
users on data security and access control.
The data sharing and access policies on Globus Genomics are retained
across all the systems involved
globus.org/genomics
Accessibility
• Unified Web-interface for obtaining genomic
data and applying computational tools to
analyze the data
• Easily integrate your own tools and scripts for
analysis (CLI based tools
• Collection of tools (Tools Panel) that reflect
good practices and community insights
• Access every step of analysis and
intermediate results:
View, Download, Visualize, Reuse
globus.org/genomics
Reproducibility and
Reuse
•

•
•
•
•
•

Track provenance and ensure repeatability of each analysis
step:
• Input datasets, tools used, parameter values, and output
datasets
Annotate each step or collection of steps to track and
reproduce results
Intuitive Workflow Editor to create or modify complex
workflows and use them as templates – Reusable and
Reproducible
Publish and share metadata, histories, and workflows at
multiple levels
Store public and generated datasets as Data Libraries – e.g:
hg19 Ref Genome
Shared datasets and workflows can be imported by other
users for reuse
globus.org/genomics
Collaboration
• Users from different institutions can come
together and meet in the middle
• Jointly create and share analytical
pipelines
• Securely share data
• Verify results

globus.org/genomics
Collaboration
Globus Genomics facilitates meta-analysis across sequence datasets. Largescale meta-analysis has been hugely important in driving the success of GWAS.
With GWAS, investigators could simply share summary results without losing
much, but for sequencing, we do much better when we jointly call the samples
and reanalyze. There are only a few places that can do this at scale now, and
creating resources that allow groups to come together spontaneously to do this
is hugely important. This is a really important opportunity for the scientific
community to have a very distributed approach to large-scale analysis from the
control perspective, while still being centralized and cost-effective from the
hardware and software end. It is straightforward for groups to come together to
do meta-analysis over many large sequence datasets in which data can be
secured so that raw data are not directly shared (but rather each group
maintains control over access to their raw data) but variant calls can be made
over all data with a shared pipeline that can then be used to conduct analysis
over all of the new variant calls. Power to the people!!
-- Nancy Cox, PhD
University of Chicago
globus.org/genomics
Standard Pipelines: Whole
Genome and Exome

globus.org/genomics
Standard Pipeline: RNASeq

globus.org/genomics
Standard Pipeline: ChipSeq

globus.org/genomics
Sustainability
• Our goal is to build service that lives
beyond a funded proposal
• Two pricing options and multiple usage
tiers.
– Targeted users include individual research
groups and bioinformatics cores
– Platform pricing (includes only subscription to
the Globus Genomics platform)
– Bundled pricing (includes Globus Genomics
platform subscription and AWS usage costs)
globus.org/genomics
Recent Results

globus.org/genomics
mputation Institute, University of Chicago, Chicago, IL, USA. 2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, U
3 Section Genetic Medicine, University of Chicago, Chicago, IL.

Challenges in Next-Gen Sequencing Analysis

Parallel Workflows on Globus Genomics

High Performance, Reusable Consensus
globus.org/genomics
Calling Pipeline
globus.org/genomics
Coming Soon!
• Integration with Globus Catalog Service
– Better metadata management
– End-to-end, integrated views

• Integration with Amazon S3 and Glacier
• Business Associate Agreements with Amazon
• HIPAA compliance with a 3rd party audit and
certification
• Multi-factor authentication
• Integration with Figshare and ORCID

globus.org/genomics
Globus Genomics
Data and Analysis Commons
Sustainable Service for
Science
globus.org/genomics
• More information on Globus
Genomics and to sign up for a
free trial :
www.globus.org/genomics
• More information on Globus:
www.globus.org
globus.org/genomics
Our work is supported by:
U.S. DEPARTMENT OF

ENERGY

24

globus.org/genomics
Thank you!

@madduri

globus.org/genomics

Weitere ähnliche Inhalte

Was ist angesagt?

Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openlyFAIRDOM
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Todd Vision
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnTodd Vision
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Todd Vision
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...GigaScience, BGI Hong Kong
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 

Was ist angesagt? (20)

Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
 
ROHub
ROHubROHub
ROHub
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Phylogenetics: Making publication-quality tree figures
Phylogenetics: Making publication-quality tree figuresPhylogenetics: Making publication-quality tree figures
Phylogenetics: Making publication-quality tree figures
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 

Andere mochten auch

翻轉醫療-人類基因大數據解密
翻轉醫療-人類基因大數據解密翻轉醫療-人類基因大數據解密
翻轉醫療-人類基因大數據解密Chung-Tsai Su
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesChung-Tsai Su
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and GenomicsAl Costa
 
Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big DataIan Foster
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduriRavi Madduri
 
Effective ansible
Effective ansibleEffective ansible
Effective ansibleWu Bigo
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)joseplaborda
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
 

Andere mochten auch (20)

翻轉醫療-人類基因大數據解密
翻轉醫療-人類基因大數據解密翻轉醫療-人類基因大數據解密
翻轉醫療-人類基因大數據解密
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and Opportunities
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and Genomics
 
Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big Data
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Supporting Barack Obama for President
Supporting Barack Obama for PresidentSupporting Barack Obama for President
Supporting Barack Obama for President
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
HL7: Clinical Decision Support
HL7: Clinical Decision SupportHL7: Clinical Decision Support
HL7: Clinical Decision Support
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Public.Cdsc.Middleton
Public.Cdsc.MiddletonPublic.Cdsc.Middleton
Public.Cdsc.Middleton
 
Effective ansible
Effective ansibleEffective ansible
Effective ansible
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)
 
Raskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 NovemberRaskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 November
 
Coded Photography - Ramesh Raskar
Coded Photography - Ramesh RaskarCoded Photography - Ramesh Raskar
Coded Photography - Ramesh Raskar
 
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh RaskarWhat is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
 
What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'
 
Google Glass Breakdown
Google Glass BreakdownGoogle Glass Breakdown
Google Glass Breakdown
 

Ähnlich wie Globus Genomics: Democratizing NGS Analysis

Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchIan Foster
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterGlobus
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsopen_phacts
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformGlobus
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte Scale(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte ScaleAmazon Web Services
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global communityExternalEvents
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
The Cancer Genomics Cloud (CGC) Pilots NIH IC Show and Tell
The Cancer Genomics Cloud (CGC) Pilots   NIH IC Show and TellThe Cancer Genomics Cloud (CGC) Pilots   NIH IC Show and Tell
The Cancer Genomics Cloud (CGC) Pilots NIH IC Show and TellSteve Tsang
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSouth Tyrol Free Software Conference
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 

Ähnlich wie Globus Genomics: Democratizing NGS Analysis (20)

Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow tools
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte Scale(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte Scale
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
The Cancer Genomics Cloud (CGC) Pilots NIH IC Show and Tell
The Cancer Genomics Cloud (CGC) Pilots   NIH IC Show and TellThe Cancer Genomics Cloud (CGC) Pilots   NIH IC Show and Tell
The Cancer Genomics Cloud (CGC) Pilots NIH IC Show and Tell
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free software
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 

Kürzlich hochgeladen

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Globus Genomics: Democratizing NGS Analysis

  • 1. Globus Genomics: Democratizing NGS Analysis Ravi K Madduri, University of Chicago and Argonne National Laboratory madduri@uchicago.edu @madduri globus.org/genomics
  • 2. RIP Fred Sanger Frederick Sanger (13 August 1918 – 19 November 2013) was a British biochemist who won the Nobel Prize for Chemistry twice, the only person to have done so. In 1958 he was awarded a Nobel prize in chemistry "for his work on the structure of proteins, especially that of insulin". In 1980, Walter Gilbert and Sanger shared half of the chemistry prize "for their contributions concerning the determination of base sequences in nucleic acids". He was the fourth person to have been awarded two Nobel Prizes, either individually or in tandem with others globus.org/genomics
  • 3. Our vision for a 21st century discovery infrastructure Provide more capability for more people at lower cost by delivering “Science as a service” www.globus.org globus.org/genomics
  • 5. Globus Genomics Globus Genomics • Public Data Globus Provides a Sequencing Centers • • • High-performance Fault-tolerant Secure Research Lab Galaxy Based Workflow Management System Fastq Galaxy Data Libraries Storage file transfer Service between Cluster/ Local Cloud Seq all data-endpoints Center Picard Globus Integrated wit Ref Genome Galaxy • Web-based UI • Drag-Drop workflow creations • Easily modify Workflo with new tools Alignment • GATK Analytical tools are automatically run on the scalable compute resources when possible Variant Calling Globus Genomics on Amazon EC2 Data Management Data Analysis globus.org/genomics
  • 6. Core Capabilities • Computational profiles for various analysis tools to provide optimal performance • Resources can be provisioned on-demand with Amazon Web Services cloud based infrastructure • High performance, Reliable Data movement is streamlined with integrated Globus file-transfer functionality • Integrated Globus endpoints and Campus login globus.org/genomics
  • 7. • Pricing includes • Estimated compute • Storage (one month) • Globus Genomics platform usage • Support globus.org/genomics
  • 9. Security and Privacy Globus Genomics compliance with the NCBI Database of Genotypes and Phenotypes (dbGaP) security best practices Protecting the Security of Controlled Data on Servers • • • • • • All Globus Genomics servers are protected by Amazon Security Groups and by stateful packet inspection firewalls. Only necessary services are allowed All relevant security patches are applied as soon as they are available Globus Genomics and Globus provide sharing solutions that are secure and user controlled Globus Genomics uses HTTPS and GridFTP protocol with authentication and encryption when transferring the files Data access is strictly restricted to individual users and only users can share the data with other users. We provide detailed instructions to our users on data security and access control. The data sharing and access policies on Globus Genomics are retained across all the systems involved globus.org/genomics
  • 10. Accessibility • Unified Web-interface for obtaining genomic data and applying computational tools to analyze the data • Easily integrate your own tools and scripts for analysis (CLI based tools • Collection of tools (Tools Panel) that reflect good practices and community insights • Access every step of analysis and intermediate results: View, Download, Visualize, Reuse globus.org/genomics
  • 11. Reproducibility and Reuse • • • • • • Track provenance and ensure repeatability of each analysis step: • Input datasets, tools used, parameter values, and output datasets Annotate each step or collection of steps to track and reproduce results Intuitive Workflow Editor to create or modify complex workflows and use them as templates – Reusable and Reproducible Publish and share metadata, histories, and workflows at multiple levels Store public and generated datasets as Data Libraries – e.g: hg19 Ref Genome Shared datasets and workflows can be imported by other users for reuse globus.org/genomics
  • 12. Collaboration • Users from different institutions can come together and meet in the middle • Jointly create and share analytical pipelines • Securely share data • Verify results globus.org/genomics
  • 13. Collaboration Globus Genomics facilitates meta-analysis across sequence datasets. Largescale meta-analysis has been hugely important in driving the success of GWAS. With GWAS, investigators could simply share summary results without losing much, but for sequencing, we do much better when we jointly call the samples and reanalyze. There are only a few places that can do this at scale now, and creating resources that allow groups to come together spontaneously to do this is hugely important. This is a really important opportunity for the scientific community to have a very distributed approach to large-scale analysis from the control perspective, while still being centralized and cost-effective from the hardware and software end. It is straightforward for groups to come together to do meta-analysis over many large sequence datasets in which data can be secured so that raw data are not directly shared (but rather each group maintains control over access to their raw data) but variant calls can be made over all data with a shared pipeline that can then be used to conduct analysis over all of the new variant calls. Power to the people!! -- Nancy Cox, PhD University of Chicago globus.org/genomics
  • 14. Standard Pipelines: Whole Genome and Exome globus.org/genomics
  • 17. Sustainability • Our goal is to build service that lives beyond a funded proposal • Two pricing options and multiple usage tiers. – Targeted users include individual research groups and bioinformatics cores – Platform pricing (includes only subscription to the Globus Genomics platform) – Bundled pricing (includes Globus Genomics platform subscription and AWS usage costs) globus.org/genomics
  • 19. mputation Institute, University of Chicago, Chicago, IL, USA. 2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, U 3 Section Genetic Medicine, University of Chicago, Chicago, IL. Challenges in Next-Gen Sequencing Analysis Parallel Workflows on Globus Genomics High Performance, Reusable Consensus globus.org/genomics Calling Pipeline
  • 21. Coming Soon! • Integration with Globus Catalog Service – Better metadata management – End-to-end, integrated views • Integration with Amazon S3 and Glacier • Business Associate Agreements with Amazon • HIPAA compliance with a 3rd party audit and certification • Multi-factor authentication • Integration with Figshare and ORCID globus.org/genomics
  • 22. Globus Genomics Data and Analysis Commons Sustainable Service for Science globus.org/genomics
  • 23. • More information on Globus Genomics and to sign up for a free trial : www.globus.org/genomics • More information on Globus: www.globus.org globus.org/genomics
  • 24. Our work is supported by: U.S. DEPARTMENT OF ENERGY 24 globus.org/genomics

Hinweis der Redaktion

  1. Questions remain:-- What capabilities? Where does time go?-- How do we turn them into usable solutions?-- How do we scale from thousands to millions?-- How do we incentivize contributions? Long tail.
  2. Here are some of the areas where we have active projectsMuch of our legacy is in the physical sciencesBut increasingly we are finding ourselves working in the life sciences….
  3. Total: over 350K Core hours in last 6months
  4. Joint work with Dr. Subha’s group at Lombardi Cancer Center – Currently running the workflow at scale on breast and colorectal cancer data