SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Implementation
Profiling Process of AllPathsLG was performed for the following unpaired data sets using AllPathsLG-46513
including the memusage script by Liu Yongchao (University of Mainz) on BioU and job accounting scripts on
Blacklight.
Abstract
Next Generation Sequencers (NGS) provide high
throughput by parallelizing the sequencing
process, and producing millions of sequences in a
relatively short amount of time. Because NGS is
still relatively new, the methods to assemble data
have not been fully explored from an optimization
perspective. One such assembler is ALLPATHS-LG,
whose optimization profiling is the focus of this
poster.
In order to carry out the profiling tasks, the CPU
and memory usage of each step of the program
was analyzed using profilers. The profiling process
highlighted which steps were taking the most
amount of time, and if possible, each step was
optimized accordingly. In order to maximize the
efficiency and throughput of the program as a
whole, steps with the highest amount of I/O,
memory, and CPU time were given the most
priority, in order to decrease the amount of time
for sequence assembly.
Background
NGS data output has increased at a rate that
outpaces Moore’s law, more than doubling each
year since it was invented. In 2007, a single
sequencing run could produce a maximum of
around one gigabase (Gb) of data. By 2011, that
rate has nearly reached a terabase (Tb) of data in
a single sequencing run—nearly a 1000× increase
in four years. With the ability to rapidly generate
large volumes of sequencing data, NGS enables
researchers to move quickly from an idea to full
data sets in a matter of hours or days. Researchers
can now sequence more than five human
genomes in a single run, producing data in roughly
one week, for a reagent cost of less than $5,000
per genome. This optimization of the sequence
alignment code, will help cut both time and cost.
Analysis
Profiling the code on BioU and Blacklight resulted
in the identification of seven routines that
consume large amounts of CPU time as shown on
the graphs. Additionally, these modules have the
most I/O associated with them which makes the
good candidates for optimization.
In order to maximize the optimization, different
factors such as elapsed time, memory used, and
I/O have to be taken into account. Modules such
as FindErrors, AlignReads, and CommonPather are
good candidates for optimization.
Acknowledgements
References
 Sante Gnerre, Iain MacCallum, Dariusz
Przybylski, Filipe J. Ribeiro, Joshua N. Burton,
Bruce J. Walker, Ted Sharpe, Giles Hall,
Terrance P. Shea, Sean Sykes, Aaron M. Berlin,
Daniel Aird, Maura Costello, Riza Daza, Louise
Williams, Robert Nicol, Andreas Gnirke, Chad
Nusbaum, Eric S. Lander, and David B. Jaffe.
High-quality draft assemblies of mammalian
genomes from massively parallel sequence
data. PNAS [Online] 2010.
 Gperftools.
https://code.google.com/p/gperftools/wiki/Go
oglePerformanceTools>. June 13,2013
 This research was supported by the NIH Grants
T36-GM-095335 and 2-P41-RR06
 Alexander J. Ropelewski
 Dr. Bienvenido Velez
 Pittsburgh Super Computing Center
Parallel Benchmarking and Performance Profiling of de novo Genome Assembly Algorithms
Appropriate for NGS Data
Jan Salomon1, Alex Ropelewski2; Bienvenido Velez3
1Electrical and Computer Engineering Department, University of Puerto Rico, Mayaguez
2Pittsburgh Supercomputing Center, Pittsburgh, PA
BioU Results
Species Number of Fragment Reads Fragment Read Length Number of Jump Reads Jump Read Length
Bifidobacterium bifidum NCIMB 41171 1096991 101 1193262 93
Neisseria gonorrhoeae FA19 1748810 101 902879 101
Coprobacillus sp. D6 1271918 101 1775443 101
Enterococcus casseliflavus 899205 1588485 101 1265671 101
Eubacterium sp. 3_1_31 826347 93 828826 93
0 500 1000 1500 2000 2500 3000
PostPatcher
TagCircularScaffolds
KPatch
UnibaseCopyNumber3
CleanCorrectedReads
CleanAssembly
CloseUnipathGaps
RebuildAssemblyFiles
FixLocal
CommonPather
FindErrors
Other (<97)
UnipathPatcher
LocalizeReadsLG
AlignReads
Time Taken (seconds)
AllPathsLGModule
Combined Elapsed Time Per Step
0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000
ShaveUnipathGraph
UnipathPatcher
CloseUnipathGaps
RemoveDodgyReads
FixLocal
CleanCorrectedReads
MergeNeighborhoods1
AlignReads
FindErrors
UnibaseCopyNumber3
LocalizeReadsLG
Other (<3.8MB)
CommonPather
Memory Used (MB)
AllPathsLGModule
Combined VMRSS(MB)
Blacklight Results
Blacklight I/O Profiling Results
0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000
SamplePairedRea
UnipathPatcher
FixSomeIndels
MergeNeighborho
FixLocal
RemoveHighCNAli
ShaveUnipathGra
KPatch
AlignReads
CleanCorrectedR
UnibaseCopyNumb
RecoverUnipaths
LocalizeReadsLG
FindErrors
CommonPather
Other (<54000)
Size (MB)
AllPathsLGModule
Logical I/O Reads
0 100000 200000 300000 400000 500000 600000 700000 800000
SamplePairedRea
RemoveHighCNAli
FixSomeIndels
KPatch
MergeNeighborho
UnipathPatcher
ShaveUnipathGra
AlignReads
CleanCorrectedR
RecoverUnipaths
FixLocal
LocalizeReadsLG
FindErrors
UnibaseCopyNumb
CommonPather
Other (<83000)
Size (MB)
AllPathsLGModule
Logical I/O Written Command Name Characters Read Characters Written
AlignReads 110550.85 110970.454
CleanCorrectedR 113703.52 114114.204
CommonPather 369853.39 372239.993
FindErrors 224893.88 225584.826
FixLocal 66690.56 69286.932
FixSomeIndels 59103.36 60129.073
KPatch 109716.77 109789.758
LocalizeReadsLG 163704.31 164067.357
MergeNeighborho 69287.47 69671.636
Other 881481.07 889387.44
RecoverUnipaths 150995.8 151131.303
RemoveHighCNAli 73932.16 74184.158
SamplePairedRea 53789.79 54281.473
ShaveUnipathGra 83679.33 83973.843
UnibaseCopyNumb 122950.22 124257.897
UnipathPatcher 53471.03 54533.504
Future Work
Future work will involve profiling at a finer
detailed level than the coarse method described
in this poster as well as exploring code
optimizations for the most resource intensive
modules.
0 200 400 600 800 1000 1200 1400 1600
FixSomeIndels
SamplePairedReadStats
RemoveDodgyReads
ValidateAllPathsInputs
MakeScaffoldsLG
LocalizeReadsLG
RemoveDodgyReads
UnipathPatcher
CloseUnipathGaps
CleanCorrectedReads
UnibaseCopyNumber3
CommonPather
AlignReads
FixLocal
FindErrors
Other (<110)
Time Taken (seconds)
AllPathsLGModule
Elapsed Time
0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000
RemoveHighCNAligns
SamplePairedReadStats
ErrorCorrectJump
FixSomeIndels
UnipathPatcher
ShaveUnipathGraph
FixLocal
RemoveDodgyReads
CloseUnipathGaps
AlignReads
CleanCorrectedReads
FindErrors
UnibaseCopyNumber3
LocalizeReadsLG
Other (<2457)
CommonPather
Memory Used (MB)
AllPathsLGModule
Memory Used
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
bifido
clap19
copro
entero
eubac
Time Taken (percentage)
DataSet
Percentage of Time Taken of Top 7 Modules
AlignReads
CleanCorrectedReads
CloseUnipathGaps
CommonPather
FindErrors
UnibaseCopyNumber3
UnipathPatcher
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
bifido
clap19
copro
entero
eubac
Time Taken (percentage)
DataSet
Percentage of Time Taken of Top 7 Modules
AlignReads
CleanCorrectedReads
CloseUnipathGaps
CommonPather
FindErrors
UnibaseCopyNumber3
UnipathPatcher

Weitere ähnliche Inhalte

Andere mochten auch

Android Rooting and Flashing
Android Rooting and FlashingAndroid Rooting and Flashing
Android Rooting and FlashingMuhammad Ehsan
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenLex Yu
 
PCD - Process control daemon - Presentation
PCD - Process control daemon - PresentationPCD - Process control daemon - Presentation
PCD - Process control daemon - Presentationhaish
 
Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406Tim Bunce
 
Android introduction and rooting technology
Android introduction and rooting technologyAndroid introduction and rooting technology
Android introduction and rooting technologyGagandeep Nanda
 
Android Memory , Where is all My RAM
Android Memory , Where is all My RAM Android Memory , Where is all My RAM
Android Memory , Where is all My RAM Yossi Elkrief
 
Perl Memory Use 201209
Perl Memory Use 201209Perl Memory Use 201209
Perl Memory Use 201209Tim Bunce
 
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...peknap
 
Workshop - Linux Memory Analysis with Volatility
Workshop - Linux Memory Analysis with VolatilityWorkshop - Linux Memory Analysis with Volatility
Workshop - Linux Memory Analysis with VolatilityAndrew Case
 
Rooting an Android phone
Rooting an Android phoneRooting an Android phone
Rooting an Android phoneArnav Gupta
 
Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Tim Bunce
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamalKamal Maiti
 
Christo kutrovsky oracle, memory & linux
Christo kutrovsky   oracle, memory & linuxChristo kutrovsky   oracle, memory & linux
Christo kutrovsky oracle, memory & linuxKyle Hailey
 
Как Linux работает с памятью — Вячеслав Бирюков
Как Linux работает с памятью — Вячеслав БирюковКак Linux работает с памятью — Вячеслав Бирюков
Как Linux работает с памятью — Вячеслав БирюковYandex
 
Android Performance Best Practices
Android Performance Best Practices Android Performance Best Practices
Android Performance Best Practices Amgad Muhammad
 

Andere mochten auch (19)

Android Rooting and Flashing
Android Rooting and FlashingAndroid Rooting and Flashing
Android Rooting and Flashing
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
 
PCD - Process control daemon - Presentation
PCD - Process control daemon - PresentationPCD - Process control daemon - Presentation
PCD - Process control daemon - Presentation
 
Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406
 
animatronics pdf
animatronics pdfanimatronics pdf
animatronics pdf
 
Android rooting
Android rootingAndroid rooting
Android rooting
 
Android introduction and rooting technology
Android introduction and rooting technologyAndroid introduction and rooting technology
Android introduction and rooting technology
 
Android Memory , Where is all My RAM
Android Memory , Where is all My RAM Android Memory , Where is all My RAM
Android Memory , Where is all My RAM
 
Perl Memory Use 201209
Perl Memory Use 201209Perl Memory Use 201209
Perl Memory Use 201209
 
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
 
Workshop - Linux Memory Analysis with Volatility
Workshop - Linux Memory Analysis with VolatilityWorkshop - Linux Memory Analysis with Volatility
Workshop - Linux Memory Analysis with Volatility
 
Rooting an Android phone
Rooting an Android phoneRooting an Android phone
Rooting an Android phone
 
Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Perl Memory Use - LPW2013
Perl Memory Use - LPW2013
 
Memory in Android
Memory in AndroidMemory in Android
Memory in Android
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
 
Memory management in linux
Memory management in linuxMemory management in linux
Memory management in linux
 
Christo kutrovsky oracle, memory & linux
Christo kutrovsky   oracle, memory & linuxChristo kutrovsky   oracle, memory & linux
Christo kutrovsky oracle, memory & linux
 
Как Linux работает с памятью — Вячеслав Бирюков
Как Linux работает с памятью — Вячеслав БирюковКак Linux работает с памятью — Вячеслав Бирюков
Как Linux работает с памятью — Вячеслав Бирюков
 
Android Performance Best Practices
Android Performance Best Practices Android Performance Best Practices
Android Performance Best Practices
 

Ähnlich wie Parallel Benchmarking and Performance Profiling of de novo Genome Assembly Algorithms

Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...Arghya Kusum Das
 
deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...Amir Shokri
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsPriscill Orue Esquivel
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysisYun Lung Li
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop finalMeng-Ru (Raymond) Tsai
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsHalil Kaşkavalcı
 
Document clustering for forensic analysis
Document clustering for forensic analysisDocument clustering for forensic analysis
Document clustering for forensic analysissrinivasa teja
 
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET Journal
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsValery Tkachenko
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiersamreshkr19
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...Editor IJMTER
 
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Amazon Web Services
 

Ähnlich wie Parallel Benchmarking and Performance Profiling of de novo Genome Assembly Algorithms (20)

Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
 
deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic Algortihms
 
Document clustering for forensic analysis
Document clustering for forensic analysisDocument clustering for forensic analysis
Document clustering for forensic analysis
 
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
 
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
 

Parallel Benchmarking and Performance Profiling of de novo Genome Assembly Algorithms

  • 1. Implementation Profiling Process of AllPathsLG was performed for the following unpaired data sets using AllPathsLG-46513 including the memusage script by Liu Yongchao (University of Mainz) on BioU and job accounting scripts on Blacklight. Abstract Next Generation Sequencers (NGS) provide high throughput by parallelizing the sequencing process, and producing millions of sequences in a relatively short amount of time. Because NGS is still relatively new, the methods to assemble data have not been fully explored from an optimization perspective. One such assembler is ALLPATHS-LG, whose optimization profiling is the focus of this poster. In order to carry out the profiling tasks, the CPU and memory usage of each step of the program was analyzed using profilers. The profiling process highlighted which steps were taking the most amount of time, and if possible, each step was optimized accordingly. In order to maximize the efficiency and throughput of the program as a whole, steps with the highest amount of I/O, memory, and CPU time were given the most priority, in order to decrease the amount of time for sequence assembly. Background NGS data output has increased at a rate that outpaces Moore’s law, more than doubling each year since it was invented. In 2007, a single sequencing run could produce a maximum of around one gigabase (Gb) of data. By 2011, that rate has nearly reached a terabase (Tb) of data in a single sequencing run—nearly a 1000× increase in four years. With the ability to rapidly generate large volumes of sequencing data, NGS enables researchers to move quickly from an idea to full data sets in a matter of hours or days. Researchers can now sequence more than five human genomes in a single run, producing data in roughly one week, for a reagent cost of less than $5,000 per genome. This optimization of the sequence alignment code, will help cut both time and cost. Analysis Profiling the code on BioU and Blacklight resulted in the identification of seven routines that consume large amounts of CPU time as shown on the graphs. Additionally, these modules have the most I/O associated with them which makes the good candidates for optimization. In order to maximize the optimization, different factors such as elapsed time, memory used, and I/O have to be taken into account. Modules such as FindErrors, AlignReads, and CommonPather are good candidates for optimization. Acknowledgements References  Sante Gnerre, Iain MacCallum, Dariusz Przybylski, Filipe J. Ribeiro, Joshua N. Burton, Bruce J. Walker, Ted Sharpe, Giles Hall, Terrance P. Shea, Sean Sykes, Aaron M. Berlin, Daniel Aird, Maura Costello, Riza Daza, Louise Williams, Robert Nicol, Andreas Gnirke, Chad Nusbaum, Eric S. Lander, and David B. Jaffe. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. PNAS [Online] 2010.  Gperftools. https://code.google.com/p/gperftools/wiki/Go oglePerformanceTools>. June 13,2013  This research was supported by the NIH Grants T36-GM-095335 and 2-P41-RR06  Alexander J. Ropelewski  Dr. Bienvenido Velez  Pittsburgh Super Computing Center Parallel Benchmarking and Performance Profiling of de novo Genome Assembly Algorithms Appropriate for NGS Data Jan Salomon1, Alex Ropelewski2; Bienvenido Velez3 1Electrical and Computer Engineering Department, University of Puerto Rico, Mayaguez 2Pittsburgh Supercomputing Center, Pittsburgh, PA BioU Results Species Number of Fragment Reads Fragment Read Length Number of Jump Reads Jump Read Length Bifidobacterium bifidum NCIMB 41171 1096991 101 1193262 93 Neisseria gonorrhoeae FA19 1748810 101 902879 101 Coprobacillus sp. D6 1271918 101 1775443 101 Enterococcus casseliflavus 899205 1588485 101 1265671 101 Eubacterium sp. 3_1_31 826347 93 828826 93 0 500 1000 1500 2000 2500 3000 PostPatcher TagCircularScaffolds KPatch UnibaseCopyNumber3 CleanCorrectedReads CleanAssembly CloseUnipathGaps RebuildAssemblyFiles FixLocal CommonPather FindErrors Other (<97) UnipathPatcher LocalizeReadsLG AlignReads Time Taken (seconds) AllPathsLGModule Combined Elapsed Time Per Step 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 ShaveUnipathGraph UnipathPatcher CloseUnipathGaps RemoveDodgyReads FixLocal CleanCorrectedReads MergeNeighborhoods1 AlignReads FindErrors UnibaseCopyNumber3 LocalizeReadsLG Other (<3.8MB) CommonPather Memory Used (MB) AllPathsLGModule Combined VMRSS(MB) Blacklight Results Blacklight I/O Profiling Results 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 SamplePairedRea UnipathPatcher FixSomeIndels MergeNeighborho FixLocal RemoveHighCNAli ShaveUnipathGra KPatch AlignReads CleanCorrectedR UnibaseCopyNumb RecoverUnipaths LocalizeReadsLG FindErrors CommonPather Other (<54000) Size (MB) AllPathsLGModule Logical I/O Reads 0 100000 200000 300000 400000 500000 600000 700000 800000 SamplePairedRea RemoveHighCNAli FixSomeIndels KPatch MergeNeighborho UnipathPatcher ShaveUnipathGra AlignReads CleanCorrectedR RecoverUnipaths FixLocal LocalizeReadsLG FindErrors UnibaseCopyNumb CommonPather Other (<83000) Size (MB) AllPathsLGModule Logical I/O Written Command Name Characters Read Characters Written AlignReads 110550.85 110970.454 CleanCorrectedR 113703.52 114114.204 CommonPather 369853.39 372239.993 FindErrors 224893.88 225584.826 FixLocal 66690.56 69286.932 FixSomeIndels 59103.36 60129.073 KPatch 109716.77 109789.758 LocalizeReadsLG 163704.31 164067.357 MergeNeighborho 69287.47 69671.636 Other 881481.07 889387.44 RecoverUnipaths 150995.8 151131.303 RemoveHighCNAli 73932.16 74184.158 SamplePairedRea 53789.79 54281.473 ShaveUnipathGra 83679.33 83973.843 UnibaseCopyNumb 122950.22 124257.897 UnipathPatcher 53471.03 54533.504 Future Work Future work will involve profiling at a finer detailed level than the coarse method described in this poster as well as exploring code optimizations for the most resource intensive modules. 0 200 400 600 800 1000 1200 1400 1600 FixSomeIndels SamplePairedReadStats RemoveDodgyReads ValidateAllPathsInputs MakeScaffoldsLG LocalizeReadsLG RemoveDodgyReads UnipathPatcher CloseUnipathGaps CleanCorrectedReads UnibaseCopyNumber3 CommonPather AlignReads FixLocal FindErrors Other (<110) Time Taken (seconds) AllPathsLGModule Elapsed Time 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 RemoveHighCNAligns SamplePairedReadStats ErrorCorrectJump FixSomeIndels UnipathPatcher ShaveUnipathGraph FixLocal RemoveDodgyReads CloseUnipathGaps AlignReads CleanCorrectedReads FindErrors UnibaseCopyNumber3 LocalizeReadsLG Other (<2457) CommonPather Memory Used (MB) AllPathsLGModule Memory Used 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% bifido clap19 copro entero eubac Time Taken (percentage) DataSet Percentage of Time Taken of Top 7 Modules AlignReads CleanCorrectedReads CloseUnipathGaps CommonPather FindErrors UnibaseCopyNumber3 UnipathPatcher 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% bifido clap19 copro entero eubac Time Taken (percentage) DataSet Percentage of Time Taken of Top 7 Modules AlignReads CleanCorrectedReads CloseUnipathGaps CommonPather FindErrors UnibaseCopyNumber3 UnipathPatcher