SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
Files, Tools, and Bioinformatics in the Cloud


   Thomas Keane

   Vertebrate Resequencing Informatics
   WTSI
   thomas.keane@sanger.ac.uk




Vertebrate Resequencing Informatics   17th November, 2009
DATA is the problem!

 NGS means large volumes of raw data
     Previously SRF (~8-10bytes per bp), now BAM (~1.6bytes per bp)
 How much data can a sequencing machine produce?
     20Gbp per lane, 16 lanes per run (1 run = 1.5 weeks) => 11Tbp/year
     Small sequencing center: 4 machines?
     44Tbp per year!
 Raw data in BAM: 70Tbytes                             SV Calling: SVMerge
 Processed calls much smaller
     1000G pilot VCF < 1Gbyte

        Alignment + BAM improvement




Vertebrate Resequencing Informatics   17th November, 2009
Simplistic Model: Cloud as compute resource




                                                                               Processes
                                                                  1. Align


                            SRF/Fastq/BAM
                            (2Mbps/sec)                           Variant calling (n x SNP callers, n indel
                                                                  callers, SV callers)
Sequencing Center/Institute                                   BAM + VCF
                                                              (2Mbps/sec)


                                           BAM                                      3,240 days
                                           VCF                                      to upload!


  Vertebrate Resequencing Informatics   17th November, 2009
Move the raw data generation to the compute




                                                                Variant calling (n x SNP callers, n indel
                                                                callers, SV callers)



Sequencing Center/Institute
                                            VCF
                         BAM
                         VCF

    Vertebrate Resequencing Informatics   17th November, 2009
Large Collaborative Projects: Cloud centric model




                                                            VCF

                                               Analysis groups
Vertebrate Resequencing Informatics   17th November, 2009

Weitere ähnliche Inhalte

Andere mochten auch

Cloud Computing And Mobility
Cloud Computing And MobilityCloud Computing And Mobility
Cloud Computing And Mobilitygowense
 
Cloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the FutureCloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the FutureAppistry
 
Trend and Future of Cloud Computing
Trend and Future of Cloud ComputingTrend and Future of Cloud Computing
Trend and Future of Cloud Computinghybrid cloud
 
2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study2015 Future of Cloud Computing Study
2015 Future of Cloud Computing StudyNorth Bridge
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computingHaslina
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing StudyNorth Bridge
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple pptAgarwaljay
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computingRkrishna Mishra
 

Andere mochten auch (10)

Cloud Computing And Mobility
Cloud Computing And MobilityCloud Computing And Mobility
Cloud Computing And Mobility
 
Cloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the FutureCloud Computing and the Datacenter of the Future
Cloud Computing and the Datacenter of the Future
 
Trend and Future of Cloud Computing
Trend and Future of Cloud ComputingTrend and Future of Cloud Computing
Trend and Future of Cloud Computing
 
2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computing
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple ppt
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
 
cloud computing ppt
cloud computing pptcloud computing ppt
cloud computing ppt
 

Ähnlich wie Next generation sequencing in cloud computing era

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data PreprocessingcursoNGS
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesThomas Keane
 
ULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORKULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORKaravind m t
 
Creating a SNP calling pipeline
Creating a SNP calling pipelineCreating a SNP calling pipeline
Creating a SNP calling pipelineDan Bolser
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 
Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas
 
PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!Kara Van Malssen
 
IPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH SwitzerlandIPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH SwitzerlandSwiss IPv6 Council
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...Academia Sinica
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace VisualizationPTIHPA
 
Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time SchedulingProviding Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time Schedulingtcucinotta
 
ProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.netProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.netProcessOne
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyDaniel Bimschas
 
Distributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] worldDistributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] worldSATOSHI TAGOMORI
 
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specificationCambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specificationAdvantec Distribution
 

Ähnlich wie Next generation sequencing in cloud computing era (20)

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data Preprocessing
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and Challenges
 
ULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORKULTRA WIDE BAND BODY AREA NETWORK
ULTRA WIDE BAND BODY AREA NETWORK
 
Creating a SNP calling pipeline
Creating a SNP calling pipelineCreating a SNP calling pipeline
Creating a SNP calling pipeline
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 
Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)
 
PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!PBCore, METS, PREMIS, MODS, METSRights...oh my!
PBCore, METS, PREMIS, MODS, METSRights...oh my!
 
IPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH SwitzerlandIPv6 strategy for deployment at ETH Switzerland
IPv6 strategy for deployment at ETH Switzerland
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Cambium networks ugps_specification
Cambium networks ugps_specificationCambium networks ugps_specification
Cambium networks ugps_specification
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time SchedulingProviding Performance Guarantees to Virtual Machines using Real-Time Scheduling
Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling
 
ProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.netProcessOne Push Platform: pubsub.p1pp.net
ProcessOne Push Platform: pubsub.p1pp.net
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Distributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] worldDistributed Stream Processing in the real [Perl] world
Distributed Stream Processing in the real [Perl] world
 
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specificationCambium networks pmp_430_access_point_(5.4_g_hz)_specification
Cambium networks pmp_430_access_point_(5.4_g_hz)_specification
 

Mehr von Thomas Keane

Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsThomas Keane
 
Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Thomas Keane
 
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2Thomas Keane
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Thomas Keane
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingThomas Keane
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Thomas Keane
 
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Thomas Keane
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...Thomas Keane
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Thomas Keane
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Thomas Keane
 

Mehr von Thomas Keane (10)

Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotations
 
Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)
 
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...
 
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010
 

Kürzlich hochgeladen

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Next generation sequencing in cloud computing era

  • 1. Files, Tools, and Bioinformatics in the Cloud Thomas Keane Vertebrate Resequencing Informatics WTSI thomas.keane@sanger.ac.uk Vertebrate Resequencing Informatics 17th November, 2009
  • 2. DATA is the problem! NGS means large volumes of raw data   Previously SRF (~8-10bytes per bp), now BAM (~1.6bytes per bp) How much data can a sequencing machine produce?   20Gbp per lane, 16 lanes per run (1 run = 1.5 weeks) => 11Tbp/year   Small sequencing center: 4 machines?   44Tbp per year! Raw data in BAM: 70Tbytes SV Calling: SVMerge Processed calls much smaller   1000G pilot VCF < 1Gbyte Alignment + BAM improvement Vertebrate Resequencing Informatics 17th November, 2009
  • 3. Simplistic Model: Cloud as compute resource Processes 1. Align SRF/Fastq/BAM (2Mbps/sec) Variant calling (n x SNP callers, n indel callers, SV callers) Sequencing Center/Institute BAM + VCF (2Mbps/sec) BAM 3,240 days VCF to upload! Vertebrate Resequencing Informatics 17th November, 2009
  • 4. Move the raw data generation to the compute Variant calling (n x SNP callers, n indel callers, SV callers) Sequencing Center/Institute VCF BAM VCF Vertebrate Resequencing Informatics 17th November, 2009
  • 5. Large Collaborative Projects: Cloud centric model VCF Analysis groups Vertebrate Resequencing Informatics 17th November, 2009