SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Computing Nodes Performance AnalysisABSTRACT
References
[1] Reference for L’Ecuyer et al: https://cran.r-
project.org/web/views/HighPerformanceComputing.html
[2] Table 1: Loop unrolling across HPC using array job technique:
https://scl-wiki.fda.gov/wiki/images/6/6f/MS_Tasks_Parallelization-
V2_FINAL.pdf
[3] Sample Python project for Table #2 can be found at:
https://rusife.github.io/Performance-analysis-for-HPC-cluster-system-/
[4] Sample Python project for Table #3 can be found at:
https://rusife.github.io/Performance-Analysis-for-Network-Storage/
The run times of the applications are drastically reduced after
using the techniques based on the array job facility of the job
schedulers.
Application Performance Measurement graphs help the HPC team
for preventive maintenance and system capacity management.
Network Storage Performance plots help the HPC team for further
analysis of the performance of InfiniBand, 10Gb, and 1Gb ethernet with
an increasing number of node counts. In this case, 10Gb performance
levels off at about 10 and 8 Gb/sec. The 1Gb performance continues
with a constant linear improvement with additional nodes.
Future Work
Research is needed to automate steps in setting up and
convergence phases of the array job based scaling techniques.
Experimental and theoretical works need to be conducted to estimate
real speed up compared to ideal linear speedup.
Scaling
Technique
Advantages Disadvantages
Multi-
threading,
OpenMP
More than one
execution threads
working in parallel
within a node.
• Scaling is limited to cores on one
computing node.
MPI
More than one
execution threads
working in parallel on
more than one node.
• Increased likelihood of I/O and load
balancing problems.
• In practice, all computational resources
requested by all parallel threads must be
available for an MPI application to start.
This may lead to job starvation.
• No checkpointing.
Single loop
parallel
All of the above.
• Does not parallelize multilevel nested
loops.
Scientific
workflows,
MapReduce,
Spark,
Hadoop
Series of
computational or data
manipulation tasks
run on more than one
node in scalable
manner.
• Incomplete approach for Modelling and
Simulation (M&S )applications scaling and
parallelization.
• Not integrated with Son of Grid Engine and
similar widely used job schedulers.
Array job All of the above • Setup and convergence phase
Scaling Techniques Comparison
System Performance Plots
The Python program was further
enhanced to process other sets of data
files for measuring DataDirect Storage
read/write performance. The challenge in
this task was to read the data from the
input files in which the records were
mixed with the notes by the data
producer (application). To be able to read
the data and get the resulted graph, the
data were first filtered out by first two
and last columns, then important three
columns derived from the resulted data
frame and plotted using Seaborn Python
package
Disk Performance AnalysisScaling Simulation Loops
Why Array jobs?
Traditional software parallelization (scaling)
techniques include Multi-threading, OpenMP,
Message Passing Interface (MPI), Single loop
parallelization, Scientific workflows, MapReduce,
Spark, and Hadoop. The table below shows the
advantages and disadvantages of these techniques.
Array job based techniques combine
advantages of the traditional techniques. The
disadvantage of the array job based techniques is
the introduction of set up and convergence phases.
These overheads are insignificant compared to those
associated with traditional techniques. Some
specific advantages include:
(a) Natural checkpointing: Every task of an array
job is independent and system failures affect only
subsets of the tasks.
(b) Automated identification of incomplete partial
result files by counting and sorting numbers of
lines in the partial result files.
(c) Automated identification of missing partial
result files by comparing the list of expected
partial result file names with the actual ones;
(d) Rerunning only the failed tasks.
Application Scaling Techniques
We use array job facility of the cluster job
schedulers for scaling applications across the
computing nodes of the clusters and adapt
L’Ecuyer’s RngStream package [1] to provide a
quality Random Number Generator (RNG) for the
massively parallel computations. Simulation
iterations of an application are divided into subsets
of iterations which are delegated as array job tasks
to computing nodes of the clusters. Since tasks are
independent the array job starts as soon as resources
are available even for a single task. This avoids job
starvation problem encountered when using the
most dominant software parallelization technique –
Message Passing Interface.
The quality of simulation applications strongly
depends on the quality of employed random
numbers across tasks running independently on
different computing nodes. Studies show that
traditional RNGs are not adequate for parallel
simulations. L’Ecuyer’s RngStream package
provides 2^64 independent random number streams,
each with the period of the 2^127. A unique task ID
of a task is used to compute the unique stream for
the task.
FDA/CDRH High-Performance Computing
(HPC) team provides training and expert
consultations to the HPC users in migration and
scaling scientific applications on the HPC clusters.
These applications include large-scale modeling and
simulations, genomic analysis, computational
physics and chemistry, molecular and fluid
dynamics, and others which overwhelm even the
most powerful scientific computers and
workstations. Software scaling techniques for
simulation programs written in C/C++ and Java
programming languages are presented in this work.
Python programming language also was used to do
performance analysis.
A Python program is created to help with
analysis of the system performance. An input file
named hostnum.csv contains the node names with
their unique IDs, and the other three CSV input
data files contain the system performance logs
accumulated while running test application
programs on the Betsy cluster. Based on these files,
the Python program produces plots which show
bc111-bc120 do better on the BLASTX and HPL
problems but performs significantly worse on
JAGS problem for June 2019. Likewise, the other
nodes with differences are bc121-168. We have
known that different applications behave
differently, but this plot shows a very impressive
example that these problems are different. The
plots can be zoomed in to discover more detail
information about system performance.
Conclusion
Scaling Applications on High Performance Computing Clusters and
Analysis of the System Performance
Rusif Eyvazli1, Sabrina Mikayilova2 , Mike Mikailov3 , Stuart Barkley4
FDA/CDRH/OSEL/DIDSR
US Food and Drug Administration, Silver Spring, MD, USA
{1Rusif.Eyvazli, 2Sabrina.Mikayilova , 3Mike.Mikailov, 4Stuart.Barkley}@fda.hhs.gov
Acknowledgements
Authors appreciate great support of FDA/CDRH/OSEL/DIDSR
management, mentors, Dr. Mike Mikailov, Stuart Barkley, and all HPC
team members, including Fu-Jyh Luo.
Table 1: Loop unrolling across HPC using array job technique [2]
Table 2: Performance analysis for HPC cluster system [3]
Table 3: Disk read performance analysis for DataDirect Network storage used by FDA [4]

Weitere ähnliche Inhalte

Was ist angesagt?

Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Jonathan Dursi
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
Nithin Kakkireni
 
Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasort
pramodbiligiri
 
Programming Languages - Functional Programming Paper
Programming Languages - Functional Programming PaperProgramming Languages - Functional Programming Paper
Programming Languages - Functional Programming Paper
Shreya Chakrabarti
 

Was ist angesagt? (20)

Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
 
A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...
A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...
A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Weka tutorial
Weka tutorialWeka tutorial
Weka tutorial
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduce
 
Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasort
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
 
Programming Languages - Functional Programming Paper
Programming Languages - Functional Programming PaperProgramming Languages - Functional Programming Paper
Programming Languages - Functional Programming Paper
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Programmability in spss 14
Programmability in spss 14Programmability in spss 14
Programmability in spss 14
 
Gsoc proposal
Gsoc proposalGsoc proposal
Gsoc proposal
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polaris
 

Ähnlich wie Scaling Application on High Performance Computing Clusters and Analysis of the System Performance

Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
ijceronline
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
ijesajournal
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
ijesajournal
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
ijesajournal
 

Ähnlich wie Scaling Application on High Performance Computing Clusters and Analysis of the System Performance (20)

IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
 
IJET-V3I1P27
IJET-V3I1P27IJET-V3I1P27
IJET-V3I1P27
 
Test PDF file
Test PDF fileTest PDF file
Test PDF file
 
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
 
Automatically translating a general purpose C image processing library for ...
Automatically translating a general purpose C   image processing library for ...Automatically translating a general purpose C   image processing library for ...
Automatically translating a general purpose C image processing library for ...
 
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
 
E132833
E132833E132833
E132833
 
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
 
A novel methodology for task distribution
A novel methodology for task distributionA novel methodology for task distribution
A novel methodology for task distribution
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
Accelerating A C Image Processing Library With A GPU
Accelerating A C   Image Processing Library With A GPUAccelerating A C   Image Processing Library With A GPU
Accelerating A C Image Processing Library With A GPU
 
Vedic Calculator
Vedic CalculatorVedic Calculator
Vedic Calculator
 
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
 
MongoDB What's new in 3.2 version
MongoDB What's new in 3.2 versionMongoDB What's new in 3.2 version
MongoDB What's new in 3.2 version
 

Kürzlich hochgeladen

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
Sheetaleventcompany
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 

Kürzlich hochgeladen (20)

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 

Scaling Application on High Performance Computing Clusters and Analysis of the System Performance

  • 1. Computing Nodes Performance AnalysisABSTRACT References [1] Reference for L’Ecuyer et al: https://cran.r- project.org/web/views/HighPerformanceComputing.html [2] Table 1: Loop unrolling across HPC using array job technique: https://scl-wiki.fda.gov/wiki/images/6/6f/MS_Tasks_Parallelization- V2_FINAL.pdf [3] Sample Python project for Table #2 can be found at: https://rusife.github.io/Performance-analysis-for-HPC-cluster-system-/ [4] Sample Python project for Table #3 can be found at: https://rusife.github.io/Performance-Analysis-for-Network-Storage/ The run times of the applications are drastically reduced after using the techniques based on the array job facility of the job schedulers. Application Performance Measurement graphs help the HPC team for preventive maintenance and system capacity management. Network Storage Performance plots help the HPC team for further analysis of the performance of InfiniBand, 10Gb, and 1Gb ethernet with an increasing number of node counts. In this case, 10Gb performance levels off at about 10 and 8 Gb/sec. The 1Gb performance continues with a constant linear improvement with additional nodes. Future Work Research is needed to automate steps in setting up and convergence phases of the array job based scaling techniques. Experimental and theoretical works need to be conducted to estimate real speed up compared to ideal linear speedup. Scaling Technique Advantages Disadvantages Multi- threading, OpenMP More than one execution threads working in parallel within a node. • Scaling is limited to cores on one computing node. MPI More than one execution threads working in parallel on more than one node. • Increased likelihood of I/O and load balancing problems. • In practice, all computational resources requested by all parallel threads must be available for an MPI application to start. This may lead to job starvation. • No checkpointing. Single loop parallel All of the above. • Does not parallelize multilevel nested loops. Scientific workflows, MapReduce, Spark, Hadoop Series of computational or data manipulation tasks run on more than one node in scalable manner. • Incomplete approach for Modelling and Simulation (M&S )applications scaling and parallelization. • Not integrated with Son of Grid Engine and similar widely used job schedulers. Array job All of the above • Setup and convergence phase Scaling Techniques Comparison System Performance Plots The Python program was further enhanced to process other sets of data files for measuring DataDirect Storage read/write performance. The challenge in this task was to read the data from the input files in which the records were mixed with the notes by the data producer (application). To be able to read the data and get the resulted graph, the data were first filtered out by first two and last columns, then important three columns derived from the resulted data frame and plotted using Seaborn Python package Disk Performance AnalysisScaling Simulation Loops Why Array jobs? Traditional software parallelization (scaling) techniques include Multi-threading, OpenMP, Message Passing Interface (MPI), Single loop parallelization, Scientific workflows, MapReduce, Spark, and Hadoop. The table below shows the advantages and disadvantages of these techniques. Array job based techniques combine advantages of the traditional techniques. The disadvantage of the array job based techniques is the introduction of set up and convergence phases. These overheads are insignificant compared to those associated with traditional techniques. Some specific advantages include: (a) Natural checkpointing: Every task of an array job is independent and system failures affect only subsets of the tasks. (b) Automated identification of incomplete partial result files by counting and sorting numbers of lines in the partial result files. (c) Automated identification of missing partial result files by comparing the list of expected partial result file names with the actual ones; (d) Rerunning only the failed tasks. Application Scaling Techniques We use array job facility of the cluster job schedulers for scaling applications across the computing nodes of the clusters and adapt L’Ecuyer’s RngStream package [1] to provide a quality Random Number Generator (RNG) for the massively parallel computations. Simulation iterations of an application are divided into subsets of iterations which are delegated as array job tasks to computing nodes of the clusters. Since tasks are independent the array job starts as soon as resources are available even for a single task. This avoids job starvation problem encountered when using the most dominant software parallelization technique – Message Passing Interface. The quality of simulation applications strongly depends on the quality of employed random numbers across tasks running independently on different computing nodes. Studies show that traditional RNGs are not adequate for parallel simulations. L’Ecuyer’s RngStream package provides 2^64 independent random number streams, each with the period of the 2^127. A unique task ID of a task is used to compute the unique stream for the task. FDA/CDRH High-Performance Computing (HPC) team provides training and expert consultations to the HPC users in migration and scaling scientific applications on the HPC clusters. These applications include large-scale modeling and simulations, genomic analysis, computational physics and chemistry, molecular and fluid dynamics, and others which overwhelm even the most powerful scientific computers and workstations. Software scaling techniques for simulation programs written in C/C++ and Java programming languages are presented in this work. Python programming language also was used to do performance analysis. A Python program is created to help with analysis of the system performance. An input file named hostnum.csv contains the node names with their unique IDs, and the other three CSV input data files contain the system performance logs accumulated while running test application programs on the Betsy cluster. Based on these files, the Python program produces plots which show bc111-bc120 do better on the BLASTX and HPL problems but performs significantly worse on JAGS problem for June 2019. Likewise, the other nodes with differences are bc121-168. We have known that different applications behave differently, but this plot shows a very impressive example that these problems are different. The plots can be zoomed in to discover more detail information about system performance. Conclusion Scaling Applications on High Performance Computing Clusters and Analysis of the System Performance Rusif Eyvazli1, Sabrina Mikayilova2 , Mike Mikailov3 , Stuart Barkley4 FDA/CDRH/OSEL/DIDSR US Food and Drug Administration, Silver Spring, MD, USA {1Rusif.Eyvazli, 2Sabrina.Mikayilova , 3Mike.Mikailov, 4Stuart.Barkley}@fda.hhs.gov Acknowledgements Authors appreciate great support of FDA/CDRH/OSEL/DIDSR management, mentors, Dr. Mike Mikailov, Stuart Barkley, and all HPC team members, including Fu-Jyh Luo. Table 1: Loop unrolling across HPC using array job technique [2] Table 2: Performance analysis for HPC cluster system [3] Table 3: Disk read performance analysis for DataDirect Network storage used by FDA [4]