SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Introduction of NGS Data Analysis on Hadoop 
Chung-Tsai Su 
SPN Architect, Core Tech 
Trend Micro 
2014/10/31 @CSIE.NTU 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 1
Q&A 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 2 http://setmoney.blob.core.windows.net/newsimages/2014/09/04/136352-XXL.jpg
http://www.genome.gov/sequencingcosts/ 
NGS Era
NGS Pipeline 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 4
High-Level Workflow of NGS 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 5 
Read 
Mapping 
Raw 
Reads 
(.fq) 
Variant 
Calling 
Sequence 
Alignment/ 
Mapping 
(.sam/.bam) 
Variant 
Calling file 
(.vcf)
NGS Data Analysis Pipeline 
• GATK best practice 
h1t0t/3p1/s20:1/4/wwwCo.bnfidreontaiald | Cinopsyritgihtt u20t1e2 .Torenrdg M/igcroa Itnkc. /guide/best6-practices?bpm=DNAseq
illumina solution 
7 
http://systems.illumina.com/content/dam/illumina-marketing/ 
documents/products/brochures/brochure_sequencing_systems_portfolio.pdf
The First $1,000 Genome – illumina HiSeq X Ten 
h1t0t/3p1:/2//0s14ystemCso.niflidleunmtiali |n Caop.ycrioghmt 2/0s12y Tsretnedm Miscr/oh Inics.eq-x-sequen8cing-system.html
Expectation of Data Processing 
Power for illumina HiSeq X Ten 
• A cluster of 10 HiSeq X instruments 
• Capable of sequencing up to 18,000 whole human 
genomes each year 
– Has a run cycle of ~3 days and produces ~150 genomes each 
run cycle 
– Running the industry standard BWA+GATK analysis pipeline to 
perform this analysis on a reasonably high-end (Dual Intel Xeon 
E5-2697v2 CPU – 12 core, 2.7 GHz with 96 GB DRAM) 
compute server takes ~24 hours per genome. 
– To achieve the required throughput of 150 genomes every three 
days, at least 50 of these servers are required. 
• Should meet a target of ~28 minutes for the completion 
of the mapping, aligning, sorting, de-duplication and 
variant calling of each genome. 
h1t0t/3p1/:2/0/1w4 ww.Ceodnfidicenotiagl | eConpyorigmht 2e0.1c2 Toremnd /Mdicrroa Ingc.en/ 9
Literature Survey 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 10
Literature 
• CloudBurst, 2009 
• CloudAligner, 2011 
• DistMap, 2013 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 11
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 12
Algorithm of CloudBurst 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 13 
Seed-and-Extend 
Algorithm
Experiments$ 
Performance of CloudBurst 
Scalability+ 
16000 
14000 
12000 
10000 
8000 
6000 
4000 
2000 
0 
Running Time vs Number of Reads on Chr 1 
0 1 2 3 4 5 6 7 8 
Runtime (s) 
Millions of Reads 
0 1 
2 3 
4 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 14
Speedup over Serial RMAP 
EECS$584$–$Fall$2013$ 
Speedup+over+serial+RMAP+ 
40 
35 
30 
25 
20 
15 
10 
5 
0 
Speedup over serial RMAP 
0 1 2 3 4 
Speedup 
Number of Mismatches 
chr1 chr22 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 15
Experiments$ 
Speedup on EC2 
Speedup+on+EC2+ 
1800 
1600 
1400 
1200 
1000 
800 
600 
400 
200 
0 
Running Time on EC2 
High-CPU Medium Instance Cluster 
24 48 72 96 
Running time (s) 
Number of Cores 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 16
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 17
Overhead of Disk I/O 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 18
Architecture of CloudAligner 
Seed-and-Extend 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 19 
Algorithm
Performance on Small Data 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 20
Performance on Large Data 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 21
Performance on Amazon EMR 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 22
Comparison with CloudBurst and CloudAligner 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 23
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 24
Workflow of DistMap 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 25
Evaluation of Read Mapping tools 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 26
Comparison of DistMap and other tools for 
distributed mapping 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 27
Market Movement 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 28
Hardware Solution - 
The World’s First NGS Bioinformatics Processor 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 29
h1t0t/3p1/:2/0/1w4 ww.Cbonifnidean.tical o| Cmopy/rpighrto 20d12u Tcretn.dh Mticmro Ilnc. 30
Architecture of bina Technology 
h1t0t/3p1/:2/0/1w4 ww.Cbonifnidean.tical o| Cmopy/rtigehtc 2h01n2 Torelnod gMiycr.oh Intcm. l 31
h1t0t/3p1s/2:0/1/4www.dConnafidnenetixal u| Cso.pcyorigmht 2/i0m12 aTrgeneds M/iucrso Iencc.ases/dnanex3u2s_CHARGE_prod1.png
Summary 
• NGS is a new page for Big Data Era 
• Need more CS experts to solve scalability and 
performance issues 
• Also, need more Data Scientist to discover the 
secrets/insights of Human Genome 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 33
http://technews.tw/2014/08/02/gene-big-data/ 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 34 http://technews.tw/2014/08/02/gene-big-data/
Q&A 
10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 35

Weitere ähnliche Inhalte

Was ist angesagt?

Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIWhole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIPhil Ewels
 
Formal Verification of Functional Code
Formal Verification of Functional CodeFormal Verification of Functional Code
Formal Verification of Functional CodeMartin Děcký
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
Sciunits: Resuable Research Object
Sciunits: Resuable Research Object Sciunits: Resuable Research Object
Sciunits: Resuable Research Object Tanu Malik
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysisYun Lung Li
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudHelix Nebula The Science Cloud
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolHong ChangBum
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersIntel® Software
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
 
LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware AnalysisLO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware AnalysisPietro De Nicolao
 
Lichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerLichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerBOSC 2010
 
OVH AntiDDoS : Threat Detection
OVH AntiDDoS : Threat DetectionOVH AntiDDoS : Threat Detection
OVH AntiDDoS : Threat DetectionSteven Le Roux
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchDirk Petersen
 
Variability, Bugs, and Cognition
Variability, Bugs, and CognitionVariability, Bugs, and Cognition
Variability, Bugs, and CognitionAndrzej Wasowski
 

Was ist angesagt? (20)

Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIWhole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
 
Formal Verification of Functional Code
Formal Verification of Functional CodeFormal Verification of Functional Code
Formal Verification of Functional Code
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Sciunits: Resuable Research Object
Sciunits: Resuable Research Object Sciunits: Resuable Research Object
Sciunits: Resuable Research Object
 
Software Dev
Software DevSoftware Dev
Software Dev
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science Cloud
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Attackboard slides dac12-0605
Attackboard slides dac12-0605Attackboard slides dac12-0605
Attackboard slides dac12-0605
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
NANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling TradeNANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling Trade
 
RESTful Triple Spaces of Things
RESTful Triple Spaces of ThingsRESTful Triple Spaces of Things
RESTful Triple Spaces of Things
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
 
LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware AnalysisLO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
 
Lichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerLichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseeker
 
OVH AntiDDoS : Threat Detection
OVH AntiDDoS : Threat DetectionOVH AntiDDoS : Threat Detection
OVH AntiDDoS : Threat Detection
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 
Variability, Bugs, and Cognition
Variability, Bugs, and CognitionVariability, Bugs, and Cognition
Variability, Bugs, and Cognition
 

Andere mochten auch

NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsAnnelies Haegeman
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansGenomeInABottle
 
Molecular QC: Interpreting your Bioinformatics Pipeline
Molecular QC: Interpreting your Bioinformatics PipelineMolecular QC: Interpreting your Bioinformatics Pipeline
Molecular QC: Interpreting your Bioinformatics PipelineCandy Smellie
 
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...John Blue
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...Lex Nederbragt
 
I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad TECNALIA Research & Innovation
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Mahidol University, Thailand
 
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...DavidClark206
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platformsAllSeq
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Christian Frech
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.mkim8
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposalGenomeInABottle
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 

Andere mochten auch (20)

Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
Molecular QC: Interpreting your Bioinformatics Pipeline
Molecular QC: Interpreting your Bioinformatics PipelineMolecular QC: Interpreting your Bioinformatics Pipeline
Molecular QC: Interpreting your Bioinformatics Pipeline
 
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...
 
I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
 
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
Global Next Generation Sequencing (NGS) Industry By Market Size & Forecast to...
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
Clinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation SequencingClinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation Sequencing
 
Ngs part i 2013
Ngs part i 2013Ngs part i 2013
Ngs part i 2013
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
2016 iHT2 San Diego Health IT Summit
2016 iHT2 San Diego Health IT Summit2016 iHT2 San Diego Health IT Summit
2016 iHT2 San Diego Health IT Summit
 

Ähnlich wie A Survey of NGS Data Analysis on Hadoop

Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...David Meyer
 
Building the iRODS Consortium
Building the iRODS ConsortiumBuilding the iRODS Consortium
Building the iRODS ConsortiumAll Things Open
 
Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Vrushali Channapattan
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud lohitvijayarenu
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyShital Kat
 
Accumulo Summit 2014: Addressing big data challenges through innovative archi...
Accumulo Summit 2014: Addressing big data challenges through innovative archi...Accumulo Summit 2014: Addressing big data challenges through innovative archi...
Accumulo Summit 2014: Addressing big data challenges through innovative archi...Accumulo Summit
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globallyridhav
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoopChris Huang
 
Gluecon miller horizon
Gluecon miller horizonGluecon miller horizon
Gluecon miller horizonMike Miller
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Denodo
 
Presentation for slideshare
Presentation   for slidesharePresentation   for slideshare
Presentation for slidesharebolu804
 
Are you ready to be edgy? Bringing applications to the edge of the network
Are you ready to be edgy? Bringing applications to the edge of the networkAre you ready to be edgy? Bringing applications to the edge of the network
Are you ready to be edgy? Bringing applications to the edge of the networkMegan O'Keefe
 
Eclipse IoT Day, March 2017 - LightweightM2M Protocol & Ecosystem
Eclipse IoT Day, March 2017 - LightweightM2M Protocol & EcosystemEclipse IoT Day, March 2017 - LightweightM2M Protocol & Ecosystem
Eclipse IoT Day, March 2017 - LightweightM2M Protocol & EcosystemOpen Mobile Alliance
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to GreenJohn Archer
 
Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Seema Shah
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
MobiCloud: Towards Cloud Mobile Hybrid Application Generation using Semantica...
MobiCloud: Towards Cloud Mobile Hybrid Application Generation using Semantica...MobiCloud: Towards Cloud Mobile Hybrid Application Generation using Semantica...
MobiCloud: Towards Cloud Mobile Hybrid Application Generation using Semantica...Amit Sheth
 

Ähnlich wie A Survey of NGS Data Analysis on Hadoop (20)

Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
Introduction to OpenDaylight and Hydrogen, Learnings from the Year, What's Ne...
 
Building the iRODS Consortium
Building the iRODS ConsortiumBuilding the iRODS Consortium
Building the iRODS Consortium
 
Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop Technology
 
Accumulo Summit 2014: Addressing big data challenges through innovative archi...
Accumulo Summit 2014: Addressing big data challenges through innovative archi...Accumulo Summit 2014: Addressing big data challenges through innovative archi...
Accumulo Summit 2014: Addressing big data challenges through innovative archi...
 
Dagster @ R&S MNT
Dagster @ R&S MNTDagster @ R&S MNT
Dagster @ R&S MNT
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoop
 
Gluecon miller horizon
Gluecon miller horizonGluecon miller horizon
Gluecon miller horizon
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7
 
Presentation for slideshare
Presentation   for slidesharePresentation   for slideshare
Presentation for slideshare
 
Are you ready to be edgy? Bringing applications to the edge of the network
Are you ready to be edgy? Bringing applications to the edge of the networkAre you ready to be edgy? Bringing applications to the edge of the network
Are you ready to be edgy? Bringing applications to the edge of the network
 
Eclipse IoT Day, March 2017 - LightweightM2M Protocol & Ecosystem
Eclipse IoT Day, March 2017 - LightweightM2M Protocol & EcosystemEclipse IoT Day, March 2017 - LightweightM2M Protocol & Ecosystem
Eclipse IoT Day, March 2017 - LightweightM2M Protocol & Ecosystem
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to Green
 
Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Gc vit sttp cc december 2013
Gc vit sttp cc december 2013
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
MobiCloud: Towards Cloud Mobile Hybrid Application Generation using Semantica...
MobiCloud: Towards Cloud Mobile Hybrid Application Generation using Semantica...MobiCloud: Towards Cloud Mobile Hybrid Application Generation using Semantica...
MobiCloud: Towards Cloud Mobile Hybrid Application Generation using Semantica...
 

Kürzlich hochgeladen

Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Delhi Call girls
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCamilleBoulbin1
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 

Kürzlich hochgeladen (20)

Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 

A Survey of NGS Data Analysis on Hadoop

  • 1. Introduction of NGS Data Analysis on Hadoop Chung-Tsai Su SPN Architect, Core Tech Trend Micro 2014/10/31 @CSIE.NTU 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 1
  • 2. Q&A 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 2 http://setmoney.blob.core.windows.net/newsimages/2014/09/04/136352-XXL.jpg
  • 4. NGS Pipeline 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 4
  • 5. High-Level Workflow of NGS 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 5 Read Mapping Raw Reads (.fq) Variant Calling Sequence Alignment/ Mapping (.sam/.bam) Variant Calling file (.vcf)
  • 6. NGS Data Analysis Pipeline • GATK best practice h1t0t/3p1/s20:1/4/wwwCo.bnfidreontaiald | Cinopsyritgihtt u20t1e2 .Torenrdg M/igcroa Itnkc. /guide/best6-practices?bpm=DNAseq
  • 7. illumina solution 7 http://systems.illumina.com/content/dam/illumina-marketing/ documents/products/brochures/brochure_sequencing_systems_portfolio.pdf
  • 8. The First $1,000 Genome – illumina HiSeq X Ten h1t0t/3p1:/2//0s14ystemCso.niflidleunmtiali |n Caop.ycrioghmt 2/0s12y Tsretnedm Miscr/oh Inics.eq-x-sequen8cing-system.html
  • 9. Expectation of Data Processing Power for illumina HiSeq X Ten • A cluster of 10 HiSeq X instruments • Capable of sequencing up to 18,000 whole human genomes each year – Has a run cycle of ~3 days and produces ~150 genomes each run cycle – Running the industry standard BWA+GATK analysis pipeline to perform this analysis on a reasonably high-end (Dual Intel Xeon E5-2697v2 CPU – 12 core, 2.7 GHz with 96 GB DRAM) compute server takes ~24 hours per genome. – To achieve the required throughput of 150 genomes every three days, at least 50 of these servers are required. • Should meet a target of ~28 minutes for the completion of the mapping, aligning, sorting, de-duplication and variant calling of each genome. h1t0t/3p1/:2/0/1w4 ww.Ceodnfidicenotiagl | eConpyorigmht 2e0.1c2 Toremnd /Mdicrroa Ingc.en/ 9
  • 10. Literature Survey 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 10
  • 11. Literature • CloudBurst, 2009 • CloudAligner, 2011 • DistMap, 2013 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 11
  • 12. 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 12
  • 13. Algorithm of CloudBurst 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 13 Seed-and-Extend Algorithm
  • 14. Experiments$ Performance of CloudBurst Scalability+ 16000 14000 12000 10000 8000 6000 4000 2000 0 Running Time vs Number of Reads on Chr 1 0 1 2 3 4 5 6 7 8 Runtime (s) Millions of Reads 0 1 2 3 4 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 14
  • 15. Speedup over Serial RMAP EECS$584$–$Fall$2013$ Speedup+over+serial+RMAP+ 40 35 30 25 20 15 10 5 0 Speedup over serial RMAP 0 1 2 3 4 Speedup Number of Mismatches chr1 chr22 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 15
  • 16. Experiments$ Speedup on EC2 Speedup+on+EC2+ 1800 1600 1400 1200 1000 800 600 400 200 0 Running Time on EC2 High-CPU Medium Instance Cluster 24 48 72 96 Running time (s) Number of Cores 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 16
  • 17. 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 17
  • 18. Overhead of Disk I/O 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 18
  • 19. Architecture of CloudAligner Seed-and-Extend 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 19 Algorithm
  • 20. Performance on Small Data 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 20
  • 21. Performance on Large Data 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 21
  • 22. Performance on Amazon EMR 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 22
  • 23. Comparison with CloudBurst and CloudAligner 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 23
  • 24. 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 24
  • 25. Workflow of DistMap 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 25
  • 26. Evaluation of Read Mapping tools 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 26
  • 27. Comparison of DistMap and other tools for distributed mapping 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 27
  • 28. Market Movement 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 28
  • 29. Hardware Solution - The World’s First NGS Bioinformatics Processor 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 29
  • 30. h1t0t/3p1/:2/0/1w4 ww.Cbonifnidean.tical o| Cmopy/rpighrto 20d12u Tcretn.dh Mticmro Ilnc. 30
  • 31. Architecture of bina Technology h1t0t/3p1/:2/0/1w4 ww.Cbonifnidean.tical o| Cmopy/rtigehtc 2h01n2 Torelnod gMiycr.oh Intcm. l 31
  • 32. h1t0t/3p1s/2:0/1/4www.dConnafidnenetixal u| Cso.pcyorigmht 2/i0m12 aTrgeneds M/iucrso Iencc.ases/dnanex3u2s_CHARGE_prod1.png
  • 33. Summary • NGS is a new page for Big Data Era • Need more CS experts to solve scalability and performance issues • Also, need more Data Scientist to discover the secrets/insights of Human Genome 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 33
  • 34. http://technews.tw/2014/08/02/gene-big-data/ 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 34 http://technews.tw/2014/08/02/gene-big-data/
  • 35. Q&A 10/31/2014 Confidential | Copyright 2012 Trend Micro Inc. 35

Hinweis der Redaktion

  1. From the figure, we can see that CloudAligner is 60 to 80% faster than CloudBurst.
  2. We mapped different subsets of the accession SRR035459 to the human chromosome 22 (50 Mbp) allowing up to 3 mismatches. From the figure, we can see that the execution time of both CloudBurst and CloudAligner is proportional to the number of reads, and CloudAligner outperforms Cloud- Burst from 35 to 67%.
  3. With CloudBurst, the limitation of ts approach is the network bandwidth. With CloudAligner, its limitation is in the computation power of the workers in Hadoop. Consequently, if we run CloudAligner on cluster of legacy machines with high speed network, we probably lose the performance advantage over CloudBurst.