SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
Project Report


             Comp 7850 - Advances in Parallel Computing


All Pair Shortest Path Algorithm – Parallel Implementation and Analysis


                           Inderjeet Singh


                               7667292


                         December 16, 2011
Abstract


There are many algorithms to find all pair shortest path. Most popular and efficient of them is
Floyd Warshall algorithm. In this report the parallel version of the algorithm is presented
considering the one dimensional row wise decomposition of the adjacency matrix. The
algorithm is implemented with both MPI and OpenMP. From the results it is observed that
parallel algorithm is considerably effective for large graph sizes and MPI implementation is
better in terms of performance over OpenMP implementation of parallel algorithm.



1. Introduction


Finding the shortest path between two objects or all objects in a graph is a common task in
solving many day to day and scientific problems. The algorithms for finding shortest path find
their application in many fields such as social networks, bioinformatics, aviation, routing
protocols, Google maps etc. Shortest path algorithms can be classified into two types: single
source shortest paths and all pair shortest paths. There are many different algorithms for
finding the all pair shortest paths. Some of them are Floyd-Warshall algorithm and Johnson’s
algorithm.

All pair shortest path algorithm which is also known as Floyd-Warshall algorithm was developed
in 1962 by Robert Floyd [1]. This algorithm follows the methodology of the dynamic
programming. The algorithm is used for graph analysis and finds the shortest paths (lengths)
between all pair of vertices or nodes in a graph. The graph is a weighted directed graph with
negative or positive edges. The algorithm is limited to only returning the shortest path lengths
and does not return the actual shortest paths with names of nodes.

Sequential algorithm

         for k = 0 to N-1
                    for i = 0 to N-1
                               for j = 0 to N-1

                                         Iij (k+1) = min (Iij(k), Iik(k) + Ikj(k))
                              Endfor
                   Endfor
         endfor
                             The Floyd-Warshall Sequential algorithm [1]

                                              Page 2 of 11
Sequential pseudo-code of this algorithm is given above requires N^3 comparisons. For each
value of k that is the count of inter mediatory nodes between node i and j the algorithm
computes the distances between node i and j and for all k nodes between them and compares
it with the distance between i and j with no inter mediatory nodes between them. It then
considers the minimum distance among the two distances calculated above. This distance is the
shortest distance between node i and j. The time complexity of the above algorithm is Θ(N3).
The space complexity of the algorithm is Θ(N2). This algorithm requires the adjacency matrix as
the input. Algorithm also incrementally improves an estimate on the shortest path between
two nodes, until the path length is minimum.

In this project I have implemented the parallel version of all pair shortest path algorithm in both
MPI and OpenMP. From the results I found that parallel version gave speedup benefits over
sequential one, but these benefits are more observable for large datasets.



2. Problem and Solution


Parallelizing all pair shortest path algorithm is a challenging task considering the fact that the
problem is dynamic in nature. The algorithm can be parallelized by considering the one
dimensional row wise decomposition of the intermediate matrix I. This algorithm will allow the
use of at most N processors. Each task will execute the parallel pseudo code stated below.

                                0      1        999    1     5

                                9      0        3      2     999

                                999    999      0      4     999

                                999    999      2      0     3

                                3      999      999    999   0

                    Table 1: 5 node graph representation with 5*5 Adjacency
                      matrix. Nodes with no connection have weights 999




                                             Page 3 of 11
Parallel algorithm

            for k = 0 to N-1
                         for i = local_i_start to local_i_end
                                     for j = 0 to N-1
                                                Iij (k+1) = min (Iij(k), Iik(k) + Ikj(k))
                                    Endfor
                         Endfor
            endfor
                                     The Floyd-Warshall Parallel algorithm [3]

  Here local_i_start to local_i_end are the indexes decided by the partition size of the adjacency
  matrix i.e. value of N/P

  In the kth step of the algorithm each task or processor requires in addition to its local data
  (bigger shaded row) the kth row of the same I matrix. Hence, the task currently holding the kth
  row should broadcast it to all other tasks. The communication can be performed by using a tree
  structure in log p steps. A total of N values (message length) is broadcasted N times. The time
  complexity will be the addition of computation and communication. The times complexity for
  will be                                        .




Figure 1: Parallel version of Floyd's algorithm based on a one-dimensional decomposition of the I matrix. In
 (a), the data allocated to a single task are shaded: a contiguous block of rows. In (b), the data required by
           this task in the kth step of the algorithm are shaded: its own block and the kth row ([3])




                                                      Page 4 of 11
3. Implementation


For implementation I have both used MPI and OpenMP. Sequential and parallel (MPI and
OpenMP) programs are written in c. For creating the adjacency matrix I have used the random
number generator. Both the implementations of MPI and OpenMP works fine with the file as
input. In terms of hardware setup Helium clusters in the computer science department of
University of Manitoba are used.

In terms of number of nodes there are a total of six nodes, one head node and five computing
nodes. The head node is a Sun Fire X4200 machine. It is powered by one dual-core AMD
Opteron 252 2.6 GHz processor with 2GB RAM. In all the nodes Linux distribution of CentOS 5 is
installed as and operating system. The other computing nodes feature 8 dual Core AMD
Opteron 885 2.6GHz processors, making a total of 80 maximum cores. They all follow ccNUMA SMP
(cache-coherent Non-Uniform Memory Access Symmetric Multiprocessing) computing model.
Each computing node has 32 GB RAM size.


4. Results


The graphical results below give insights onto the execution time versus number of processes
for different data sizes, speedup versus number of processes for fixed load size and efficiency
versus number of processes for fixed load. Fig 2-4 shows the results for MPI implementation.
Fig 5-7 shows the graph results for OpenMP implementation. Fig 8-9 shows the graph results
for performance differences between OpenMP and MPI implementations.




                                           Page 5 of 11
120.000
                                Execution Time Vs No. of Processors
                                     (Different N - Data Sizes)
                 100.000



                  80.000
Time (seconds)



                                                                                                N=64
                                                                                                N=256
                  60.000
                                                                                                N= 512
                                                                                                N= 1024
                  40.000                                                                        N= 2048



                  20.000



                   0.000
                                   2         4       8       16     32     48        64    80
                                                         No. of Processors

                 Figure 2: Execution Time Vs. No. of Processors for different data sizes


                                       Speedup vs No. of Processors
                                                (N=2048)
                               25.00
                               20.00
                     Speedup




                               15.00
                               10.00
                                5.00
                                0.00
                                         2       4          8      16     32        48    64    80
                                                                No. of Processors


                     Figure 3: Speedup Vs. No. of Processors for node size of 2048




                                                          Page 6 of 11
Efficiency Vs No. of Processes
                                                        (N=2048)
                                  100.00%


                 Efficiency (%)
                                      50.00%


                                       0.00%
                                                2       4         8        16    32       48    64        80
                                                                       No. of Processes


                     Figure 4: Efficiency Vs. No. of Processors for node size of 2048


         100
                                        Execution Time vs. No. of Threads
          90                                  (Different Data Sizes)
          80
                                                                                                     N=64
Time (seconds)




          70
          60                                                                                         N=256
          50
          40                                                                                         N= 1024
          30                                                                                         N=2048
          20
          10
           0
                                        2           4    8         16
                                                        No. of Threads           32        48


         Figure 5: Execution Time Vs. No. of Threads for different graph size


                                             Speedup Vs No. of Threads
                                                    (N=2048)
                                       2.1
                                      2.05
                            Speedup




                                         2
                                      1.95
                                       1.9
                                      1.85
                                                2           4          8        16         32        48
                                                                      No. of Threads


                                  Figure 6: Speedup Vs. No. of Threads for node size of 2048


                                                                Page 7 of 11
Efficiency Vs No. of Threads
                                           (N=2048)
                           150.00%



          Efficiency (%)
                           100.00%
                            50.00%
                               0.00%
                                             2         4        8        16    32    48
                                                            No. of Threads


        Figure 7: Efficiency Vs. No. of Threads for node size of 2048

                               Execution Time (OpenMP Vs MPI)
                                           N=2048
                           120
                           100
       Time (secs)




                            80
                            60
                                                                                    OpenMP
                            40
                            20                                                      MPI
                             0
                                      2      4     8       16       32   48
                                          No. of Processors/ Threads


Figure 8: Execution Time Vs. No of Threads/Processes (OpenMP and MPI)

                               Speedup Comparison (OpenMP Vs
                                        MPI) N=2048
                      20

                      15
      Speedup




                      10
                                                                                    OpenMP
                           5                                                        MPI
                           0
                                  2         4      8       16       32    48
                                          No. of Processors/ Threads


   Figure 9: Speedup Vs. No of Threads/Processes (OpenMP and MPI)
                                                       Page 8 of 11
Efficiency Comparison (OpenMP vs
                                               MPI)(N=2048)
                                   120.00%
                                   100.00%

                   Efficiency(%)
                                    80.00%
                                    60.00%
                                                                                 OpenMP
                                    40.00%
                                    20.00%                                       MPI
                                     0.00%
                                             2     4     8    16    32      48
                                                 No. of Processes/threads


                Figure 10: Efficiency Vs. No of Threads/Processes (OpenMP and MPI)



5. Evaluation and Analysis


As can be seen from the fig 2, that there is a sharp decline in execution time for node size of
2048 for increasing number of processes. The execution time almost reaches a constant value
for increasing number of processes after 80 processes. There is a rise in execution time for
small size of 256 nodes for increasing number of processes. This can be explained as the
communication time overweighs the computation time for small node sizes. The data to be
handled becomes more fragmented for small node sizes and more processes, hence more
communication. From fig 3 it can be seen that for increasing number of processors speedup
increases, but levels off towards the end for same node size. There is no speedup till there are
256 nodes. Fig 4 displays efficiency which becomes less for increasing number of processes for
fixed load. This observation can be understood as more communication is required and more
number of partitions of data is there.

From fig 5 it can be observed that execution time increases when the data size increases, but as
the number of threads is increasing, instead of reduction in execution time, time becomes
stable around 85 secs. From fig 6 it is observed that the maximum speedup is 2.04 when there
are 32 threads for fixed 2048 number of nodes. Speedup increases in beginning but becomes
stable at the end. For two threads there is maximum efficiency of 96.26% for dataset size of
2048, refer fig 7. Efficiency is only 4.25% for 48 threads.



                                                        Page 9 of 11
While comparing the performance of MPI and OpenMP it is observed from graphs that MPI is
more suitable for all pair shortest path algorithm. Fig 8 shows a clear difference between
performances with respect to execution times. Speedup is close to two for OpenMP
irrespective of the increase in number of threads, while for MPI speedup increases, refer fig 9.
Efficiency wise OpenMP always performs badly for increasing number of threads, while MPI still
maintains some efficiency for increasing processes. MPI is more efficient than OpenMP.



6. Conclusions and Future Work


From the observations I believe that the best use of parallel implementation of all pair shortest
path algorithm is for large datasets or graphs. The speedups can only be observed with large
adjacency matrices. From the results it is clear that MPI implementation if better than OpenMP
for all pair shortest path algorithm. I think the reason can be because of distributed memory
approach of MPI with local memories for the partitioned dataset, which are quite fast
compared to shared memory access in case of OpenMP.

For future work I would like to do more analysis on the applications of parallel all pair shortest
path algorithm on real life dataset such as that of social network that are publically available. I
would also like to implement hybrid version of this algorithm.




                                           Page 10 of 11
7. References


  1. Robert W. Floyd. 1962. Algorithm 97: Shortest path. Commun. ACM 5, 6 (June 1962), 345-.
     DOI=10.1145/367766.368168 http://doi.acm.org/10.1145/367766.368168
  2. Floyd-Warshall algorithm http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm,
     last accessed 30th, October, 2011
  3. Case Study: Shortest-Path algorithms
     http://www.mcs.anl.gov/~itf/dbpp/text/node35.html#figgrfl3, last accessed 30th, October, 2011




                                         Page 11 of 11

Weitere ähnliche Inhalte

Was ist angesagt?

Bilinear z transformaion
Bilinear z transformaionBilinear z transformaion
Bilinear z transformaionNguyen Si Phuoc
 
FYBSC IT Digital Electronics Unit IV Chapter I Multiplexer, Demultiplexer, AL...
FYBSC IT Digital Electronics Unit IV Chapter I Multiplexer, Demultiplexer, AL...FYBSC IT Digital Electronics Unit IV Chapter I Multiplexer, Demultiplexer, AL...
FYBSC IT Digital Electronics Unit IV Chapter I Multiplexer, Demultiplexer, AL...Arti Parab Academics
 
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier TransformDSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier TransformAmr E. Mohamed
 
Lec 4 design via frequency response
Lec 4 design via frequency responseLec 4 design via frequency response
Lec 4 design via frequency responseBehzad Farzanegan
 
機械系のためのメカトロニクス ディジタル回路
機械系のためのメカトロニクス ディジタル回路機械系のためのメカトロニクス ディジタル回路
機械系のためのメカトロニクス ディジタル回路makoto shimojo
 
transistor transistor logic
transistor transistor logictransistor transistor logic
transistor transistor logicmansi acharya
 
Espace vectoriel
Espace vectorielEspace vectoriel
Espace vectoriellexois1
 
Unibox Technical Overview
Unibox Technical OverviewUnibox Technical Overview
Unibox Technical OverviewRishikesh Ghare
 
Arduino + rcs620sで遊ぼう
Arduino + rcs620sで遊ぼうArduino + rcs620sで遊ぼう
Arduino + rcs620sで遊ぼうtreby
 
7-2.Nyquist Stability Criterion.ppt
7-2.Nyquist Stability Criterion.ppt7-2.Nyquist Stability Criterion.ppt
7-2.Nyquist Stability Criterion.pptDummyDummy74
 
Peripheral Component Interconnect (PCI)
Peripheral Component Interconnect (PCI)Peripheral Component Interconnect (PCI)
Peripheral Component Interconnect (PCI)Suraj B.V.S.G
 
Monte carlo analysis
Monte carlo analysisMonte carlo analysis
Monte carlo analysisGargiKhanna1
 
Quantum Cost Calculation of Reversible Circuit
Quantum Cost Calculation of Reversible CircuitQuantum Cost Calculation of Reversible Circuit
Quantum Cost Calculation of Reversible CircuitSajib Mitra
 
Multiplexer and Demultiplexer.pdf
Multiplexer and Demultiplexer.pdfMultiplexer and Demultiplexer.pdf
Multiplexer and Demultiplexer.pdfsudarmani5694
 
M2 labs-121106002301-phpapp02
M2 labs-121106002301-phpapp02M2 labs-121106002301-phpapp02
M2 labs-121106002301-phpapp02ant09_sain
 

Was ist angesagt? (20)

Bilinear z transformaion
Bilinear z transformaionBilinear z transformaion
Bilinear z transformaion
 
FYBSC IT Digital Electronics Unit IV Chapter I Multiplexer, Demultiplexer, AL...
FYBSC IT Digital Electronics Unit IV Chapter I Multiplexer, Demultiplexer, AL...FYBSC IT Digital Electronics Unit IV Chapter I Multiplexer, Demultiplexer, AL...
FYBSC IT Digital Electronics Unit IV Chapter I Multiplexer, Demultiplexer, AL...
 
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier TransformDSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
 
Lec 4 design via frequency response
Lec 4 design via frequency responseLec 4 design via frequency response
Lec 4 design via frequency response
 
機械系のためのメカトロニクス ディジタル回路
機械系のためのメカトロニクス ディジタル回路機械系のためのメカトロニクス ディジタル回路
機械系のためのメカトロニクス ディジタル回路
 
transistor transistor logic
transistor transistor logictransistor transistor logic
transistor transistor logic
 
Flipflop
FlipflopFlipflop
Flipflop
 
VHDL - Part 2
VHDL - Part 2VHDL - Part 2
VHDL - Part 2
 
Timing diagram of microprocessor 8085
Timing diagram of microprocessor 8085Timing diagram of microprocessor 8085
Timing diagram of microprocessor 8085
 
Introduction to Embedded Systems a Practical Approach
Introduction to Embedded Systems a Practical ApproachIntroduction to Embedded Systems a Practical Approach
Introduction to Embedded Systems a Practical Approach
 
Espace vectoriel
Espace vectorielEspace vectoriel
Espace vectoriel
 
Unibox Technical Overview
Unibox Technical OverviewUnibox Technical Overview
Unibox Technical Overview
 
Energy & Power Signals |Solved Problems|
Energy & Power Signals |Solved Problems|Energy & Power Signals |Solved Problems|
Energy & Power Signals |Solved Problems|
 
Arduino + rcs620sで遊ぼう
Arduino + rcs620sで遊ぼうArduino + rcs620sで遊ぼう
Arduino + rcs620sで遊ぼう
 
7-2.Nyquist Stability Criterion.ppt
7-2.Nyquist Stability Criterion.ppt7-2.Nyquist Stability Criterion.ppt
7-2.Nyquist Stability Criterion.ppt
 
Peripheral Component Interconnect (PCI)
Peripheral Component Interconnect (PCI)Peripheral Component Interconnect (PCI)
Peripheral Component Interconnect (PCI)
 
Monte carlo analysis
Monte carlo analysisMonte carlo analysis
Monte carlo analysis
 
Quantum Cost Calculation of Reversible Circuit
Quantum Cost Calculation of Reversible CircuitQuantum Cost Calculation of Reversible Circuit
Quantum Cost Calculation of Reversible Circuit
 
Multiplexer and Demultiplexer.pdf
Multiplexer and Demultiplexer.pdfMultiplexer and Demultiplexer.pdf
Multiplexer and Demultiplexer.pdf
 
M2 labs-121106002301-phpapp02
M2 labs-121106002301-phpapp02M2 labs-121106002301-phpapp02
M2 labs-121106002301-phpapp02
 

Andere mochten auch

All pairs shortest path algorithm
All pairs shortest path algorithmAll pairs shortest path algorithm
All pairs shortest path algorithmSrikrishnan Suresh
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesInderjeet Singh
 
Determining Relevance Rankings from Search Click Logs
Determining Relevance Rankings from Search Click LogsDetermining Relevance Rankings from Search Click Logs
Determining Relevance Rankings from Search Click LogsInderjeet Singh
 
Determining Relevance Rankings with Search Click Logs
Determining Relevance Rankings with Search Click LogsDetermining Relevance Rankings with Search Click Logs
Determining Relevance Rankings with Search Click LogsInderjeet Singh
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh
 
Subset sum problem Dynamic and Brute Force Approch
Subset sum problem Dynamic and Brute Force ApprochSubset sum problem Dynamic and Brute Force Approch
Subset sum problem Dynamic and Brute Force ApprochIjlal Ijlal
 
Matrix chain multiplication 2
Matrix chain multiplication 2Matrix chain multiplication 2
Matrix chain multiplication 2Maher Alshammari
 
The n Queen Problem
The n Queen ProblemThe n Queen Problem
The n Queen ProblemSukrit Gupta
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh
 
Stack Data Structure & It's Application
Stack Data Structure & It's Application Stack Data Structure & It's Application
Stack Data Structure & It's Application Tech_MX
 
Applications of stack
Applications of stackApplications of stack
Applications of stackeShikshak
 
backtracking algorithms of ada
backtracking algorithms of adabacktracking algorithms of ada
backtracking algorithms of adaSahil Kumar
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back trackingTech_MX
 

Andere mochten auch (18)

All pairs shortest path algorithm
All pairs shortest path algorithmAll pairs shortest path algorithm
All pairs shortest path algorithm
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
 
Project
ProjectProject
Project
 
Determining Relevance Rankings from Search Click Logs
Determining Relevance Rankings from Search Click LogsDetermining Relevance Rankings from Search Click Logs
Determining Relevance Rankings from Search Click Logs
 
Determining Relevance Rankings with Search Click Logs
Determining Relevance Rankings with Search Click LogsDetermining Relevance Rankings with Search Click Logs
Determining Relevance Rankings with Search Click Logs
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance Industry
 
(floyd's algm)
(floyd's algm)(floyd's algm)
(floyd's algm)
 
Subset sum problem Dynamic and Brute Force Approch
Subset sum problem Dynamic and Brute Force ApprochSubset sum problem Dynamic and Brute Force Approch
Subset sum problem Dynamic and Brute Force Approch
 
Application of Stacks
Application of StacksApplication of Stacks
Application of Stacks
 
Matrix chain multiplication 2
Matrix chain multiplication 2Matrix chain multiplication 2
Matrix chain multiplication 2
 
The n Queen Problem
The n Queen ProblemThe n Queen Problem
The n Queen Problem
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance Industry
 
Stack Data Structure & It's Application
Stack Data Structure & It's Application Stack Data Structure & It's Application
Stack Data Structure & It's Application
 
Stack Applications
Stack ApplicationsStack Applications
Stack Applications
 
Applications of stack
Applications of stackApplications of stack
Applications of stack
 
Binary parallel adder
Binary parallel adderBinary parallel adder
Binary parallel adder
 
backtracking algorithms of ada
backtracking algorithms of adabacktracking algorithms of ada
backtracking algorithms of ada
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back tracking
 

Ähnlich wie All Pair Shortest Path Algorithm – Parallel Implementation and Analysis

Iaetsd implementation of power efficient iterative logarithmic multiplier usi...
Iaetsd implementation of power efficient iterative logarithmic multiplier usi...Iaetsd implementation of power efficient iterative logarithmic multiplier usi...
Iaetsd implementation of power efficient iterative logarithmic multiplier usi...Iaetsd Iaetsd
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSIJCI JOURNAL
 
A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...
A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...
A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...IJERA Editor
 
High Speed Signed multiplier for Digital Signal Processing Applications
High Speed Signed multiplier for Digital Signal Processing ApplicationsHigh Speed Signed multiplier for Digital Signal Processing Applications
High Speed Signed multiplier for Digital Signal Processing ApplicationsIOSR Journals
 
Ebc7fc8ba9801f03982acec158fa751744ca copie
Ebc7fc8ba9801f03982acec158fa751744ca   copieEbc7fc8ba9801f03982acec158fa751744ca   copie
Ebc7fc8ba9801f03982acec158fa751744ca copieSourour Kanzari
 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsIOSR Journals
 
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...IRJET Journal
 
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI ImplementationBelief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementationinventionjournals
 
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...AIRCC Publishing Corporation
 
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...ijcsit
 
LogicProgrammingShortestPathEfficiency
LogicProgrammingShortestPathEfficiencyLogicProgrammingShortestPathEfficiency
LogicProgrammingShortestPathEfficiencySuraj Nair
 
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...inventionjournals
 
PERFORMANCE ESTIMATION OF LDPC CODE SUING SUM PRODUCT ALGORITHM AND BIT FLIPP...
PERFORMANCE ESTIMATION OF LDPC CODE SUING SUM PRODUCT ALGORITHM AND BIT FLIPP...PERFORMANCE ESTIMATION OF LDPC CODE SUING SUM PRODUCT ALGORITHM AND BIT FLIPP...
PERFORMANCE ESTIMATION OF LDPC CODE SUING SUM PRODUCT ALGORITHM AND BIT FLIPP...Journal For Research
 
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...TELKOMNIKA JOURNAL
 

Ähnlich wie All Pair Shortest Path Algorithm – Parallel Implementation and Analysis (20)

Iaetsd implementation of power efficient iterative logarithmic multiplier usi...
Iaetsd implementation of power efficient iterative logarithmic multiplier usi...Iaetsd implementation of power efficient iterative logarithmic multiplier usi...
Iaetsd implementation of power efficient iterative logarithmic multiplier usi...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
 
At36276280
At36276280At36276280
At36276280
 
A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...
A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...
A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier us...
 
High Speed Signed multiplier for Digital Signal Processing Applications
High Speed Signed multiplier for Digital Signal Processing ApplicationsHigh Speed Signed multiplier for Digital Signal Processing Applications
High Speed Signed multiplier for Digital Signal Processing Applications
 
Ebc7fc8ba9801f03982acec158fa751744ca copie
Ebc7fc8ba9801f03982acec158fa751744ca   copieEbc7fc8ba9801f03982acec158fa751744ca   copie
Ebc7fc8ba9801f03982acec158fa751744ca copie
 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic Mathematics
 
Ap32283286
Ap32283286Ap32283286
Ap32283286
 
IJETT-V9P226
IJETT-V9P226IJETT-V9P226
IJETT-V9P226
 
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
 
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI ImplementationBelief Propagation Decoder for LDPC Codes Based on VLSI Implementation
Belief Propagation Decoder for LDPC Codes Based on VLSI Implementation
 
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
 
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
IMPROVED COMPUTING PERFORMANCE FOR LISTING COMBINATORIAL ALGORITHMS USING MUL...
 
LogicProgrammingShortestPathEfficiency
LogicProgrammingShortestPathEfficiencyLogicProgrammingShortestPathEfficiency
LogicProgrammingShortestPathEfficiency
 
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
 
PERFORMANCE ESTIMATION OF LDPC CODE SUING SUM PRODUCT ALGORITHM AND BIT FLIPP...
PERFORMANCE ESTIMATION OF LDPC CODE SUING SUM PRODUCT ALGORITHM AND BIT FLIPP...PERFORMANCE ESTIMATION OF LDPC CODE SUING SUM PRODUCT ALGORITHM AND BIT FLIPP...
PERFORMANCE ESTIMATION OF LDPC CODE SUING SUM PRODUCT ALGORITHM AND BIT FLIPP...
 
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...
 
I43024751
I43024751I43024751
I43024751
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

All Pair Shortest Path Algorithm – Parallel Implementation and Analysis

  • 1. Project Report Comp 7850 - Advances in Parallel Computing All Pair Shortest Path Algorithm – Parallel Implementation and Analysis Inderjeet Singh 7667292 December 16, 2011
  • 2. Abstract There are many algorithms to find all pair shortest path. Most popular and efficient of them is Floyd Warshall algorithm. In this report the parallel version of the algorithm is presented considering the one dimensional row wise decomposition of the adjacency matrix. The algorithm is implemented with both MPI and OpenMP. From the results it is observed that parallel algorithm is considerably effective for large graph sizes and MPI implementation is better in terms of performance over OpenMP implementation of parallel algorithm. 1. Introduction Finding the shortest path between two objects or all objects in a graph is a common task in solving many day to day and scientific problems. The algorithms for finding shortest path find their application in many fields such as social networks, bioinformatics, aviation, routing protocols, Google maps etc. Shortest path algorithms can be classified into two types: single source shortest paths and all pair shortest paths. There are many different algorithms for finding the all pair shortest paths. Some of them are Floyd-Warshall algorithm and Johnson’s algorithm. All pair shortest path algorithm which is also known as Floyd-Warshall algorithm was developed in 1962 by Robert Floyd [1]. This algorithm follows the methodology of the dynamic programming. The algorithm is used for graph analysis and finds the shortest paths (lengths) between all pair of vertices or nodes in a graph. The graph is a weighted directed graph with negative or positive edges. The algorithm is limited to only returning the shortest path lengths and does not return the actual shortest paths with names of nodes. Sequential algorithm for k = 0 to N-1 for i = 0 to N-1 for j = 0 to N-1 Iij (k+1) = min (Iij(k), Iik(k) + Ikj(k)) Endfor Endfor endfor The Floyd-Warshall Sequential algorithm [1] Page 2 of 11
  • 3. Sequential pseudo-code of this algorithm is given above requires N^3 comparisons. For each value of k that is the count of inter mediatory nodes between node i and j the algorithm computes the distances between node i and j and for all k nodes between them and compares it with the distance between i and j with no inter mediatory nodes between them. It then considers the minimum distance among the two distances calculated above. This distance is the shortest distance between node i and j. The time complexity of the above algorithm is Θ(N3). The space complexity of the algorithm is Θ(N2). This algorithm requires the adjacency matrix as the input. Algorithm also incrementally improves an estimate on the shortest path between two nodes, until the path length is minimum. In this project I have implemented the parallel version of all pair shortest path algorithm in both MPI and OpenMP. From the results I found that parallel version gave speedup benefits over sequential one, but these benefits are more observable for large datasets. 2. Problem and Solution Parallelizing all pair shortest path algorithm is a challenging task considering the fact that the problem is dynamic in nature. The algorithm can be parallelized by considering the one dimensional row wise decomposition of the intermediate matrix I. This algorithm will allow the use of at most N processors. Each task will execute the parallel pseudo code stated below. 0 1 999 1 5 9 0 3 2 999 999 999 0 4 999 999 999 2 0 3 3 999 999 999 0 Table 1: 5 node graph representation with 5*5 Adjacency matrix. Nodes with no connection have weights 999 Page 3 of 11
  • 4. Parallel algorithm for k = 0 to N-1 for i = local_i_start to local_i_end for j = 0 to N-1 Iij (k+1) = min (Iij(k), Iik(k) + Ikj(k)) Endfor Endfor endfor The Floyd-Warshall Parallel algorithm [3] Here local_i_start to local_i_end are the indexes decided by the partition size of the adjacency matrix i.e. value of N/P In the kth step of the algorithm each task or processor requires in addition to its local data (bigger shaded row) the kth row of the same I matrix. Hence, the task currently holding the kth row should broadcast it to all other tasks. The communication can be performed by using a tree structure in log p steps. A total of N values (message length) is broadcasted N times. The time complexity will be the addition of computation and communication. The times complexity for will be . Figure 1: Parallel version of Floyd's algorithm based on a one-dimensional decomposition of the I matrix. In (a), the data allocated to a single task are shaded: a contiguous block of rows. In (b), the data required by this task in the kth step of the algorithm are shaded: its own block and the kth row ([3]) Page 4 of 11
  • 5. 3. Implementation For implementation I have both used MPI and OpenMP. Sequential and parallel (MPI and OpenMP) programs are written in c. For creating the adjacency matrix I have used the random number generator. Both the implementations of MPI and OpenMP works fine with the file as input. In terms of hardware setup Helium clusters in the computer science department of University of Manitoba are used. In terms of number of nodes there are a total of six nodes, one head node and five computing nodes. The head node is a Sun Fire X4200 machine. It is powered by one dual-core AMD Opteron 252 2.6 GHz processor with 2GB RAM. In all the nodes Linux distribution of CentOS 5 is installed as and operating system. The other computing nodes feature 8 dual Core AMD Opteron 885 2.6GHz processors, making a total of 80 maximum cores. They all follow ccNUMA SMP (cache-coherent Non-Uniform Memory Access Symmetric Multiprocessing) computing model. Each computing node has 32 GB RAM size. 4. Results The graphical results below give insights onto the execution time versus number of processes for different data sizes, speedup versus number of processes for fixed load size and efficiency versus number of processes for fixed load. Fig 2-4 shows the results for MPI implementation. Fig 5-7 shows the graph results for OpenMP implementation. Fig 8-9 shows the graph results for performance differences between OpenMP and MPI implementations. Page 5 of 11
  • 6. 120.000 Execution Time Vs No. of Processors (Different N - Data Sizes) 100.000 80.000 Time (seconds) N=64 N=256 60.000 N= 512 N= 1024 40.000 N= 2048 20.000 0.000 2 4 8 16 32 48 64 80 No. of Processors Figure 2: Execution Time Vs. No. of Processors for different data sizes Speedup vs No. of Processors (N=2048) 25.00 20.00 Speedup 15.00 10.00 5.00 0.00 2 4 8 16 32 48 64 80 No. of Processors Figure 3: Speedup Vs. No. of Processors for node size of 2048 Page 6 of 11
  • 7. Efficiency Vs No. of Processes (N=2048) 100.00% Efficiency (%) 50.00% 0.00% 2 4 8 16 32 48 64 80 No. of Processes Figure 4: Efficiency Vs. No. of Processors for node size of 2048 100 Execution Time vs. No. of Threads 90 (Different Data Sizes) 80 N=64 Time (seconds) 70 60 N=256 50 40 N= 1024 30 N=2048 20 10 0 2 4 8 16 No. of Threads 32 48 Figure 5: Execution Time Vs. No. of Threads for different graph size Speedup Vs No. of Threads (N=2048) 2.1 2.05 Speedup 2 1.95 1.9 1.85 2 4 8 16 32 48 No. of Threads Figure 6: Speedup Vs. No. of Threads for node size of 2048 Page 7 of 11
  • 8. Efficiency Vs No. of Threads (N=2048) 150.00% Efficiency (%) 100.00% 50.00% 0.00% 2 4 8 16 32 48 No. of Threads Figure 7: Efficiency Vs. No. of Threads for node size of 2048 Execution Time (OpenMP Vs MPI) N=2048 120 100 Time (secs) 80 60 OpenMP 40 20 MPI 0 2 4 8 16 32 48 No. of Processors/ Threads Figure 8: Execution Time Vs. No of Threads/Processes (OpenMP and MPI) Speedup Comparison (OpenMP Vs MPI) N=2048 20 15 Speedup 10 OpenMP 5 MPI 0 2 4 8 16 32 48 No. of Processors/ Threads Figure 9: Speedup Vs. No of Threads/Processes (OpenMP and MPI) Page 8 of 11
  • 9. Efficiency Comparison (OpenMP vs MPI)(N=2048) 120.00% 100.00% Efficiency(%) 80.00% 60.00% OpenMP 40.00% 20.00% MPI 0.00% 2 4 8 16 32 48 No. of Processes/threads Figure 10: Efficiency Vs. No of Threads/Processes (OpenMP and MPI) 5. Evaluation and Analysis As can be seen from the fig 2, that there is a sharp decline in execution time for node size of 2048 for increasing number of processes. The execution time almost reaches a constant value for increasing number of processes after 80 processes. There is a rise in execution time for small size of 256 nodes for increasing number of processes. This can be explained as the communication time overweighs the computation time for small node sizes. The data to be handled becomes more fragmented for small node sizes and more processes, hence more communication. From fig 3 it can be seen that for increasing number of processors speedup increases, but levels off towards the end for same node size. There is no speedup till there are 256 nodes. Fig 4 displays efficiency which becomes less for increasing number of processes for fixed load. This observation can be understood as more communication is required and more number of partitions of data is there. From fig 5 it can be observed that execution time increases when the data size increases, but as the number of threads is increasing, instead of reduction in execution time, time becomes stable around 85 secs. From fig 6 it is observed that the maximum speedup is 2.04 when there are 32 threads for fixed 2048 number of nodes. Speedup increases in beginning but becomes stable at the end. For two threads there is maximum efficiency of 96.26% for dataset size of 2048, refer fig 7. Efficiency is only 4.25% for 48 threads. Page 9 of 11
  • 10. While comparing the performance of MPI and OpenMP it is observed from graphs that MPI is more suitable for all pair shortest path algorithm. Fig 8 shows a clear difference between performances with respect to execution times. Speedup is close to two for OpenMP irrespective of the increase in number of threads, while for MPI speedup increases, refer fig 9. Efficiency wise OpenMP always performs badly for increasing number of threads, while MPI still maintains some efficiency for increasing processes. MPI is more efficient than OpenMP. 6. Conclusions and Future Work From the observations I believe that the best use of parallel implementation of all pair shortest path algorithm is for large datasets or graphs. The speedups can only be observed with large adjacency matrices. From the results it is clear that MPI implementation if better than OpenMP for all pair shortest path algorithm. I think the reason can be because of distributed memory approach of MPI with local memories for the partitioned dataset, which are quite fast compared to shared memory access in case of OpenMP. For future work I would like to do more analysis on the applications of parallel all pair shortest path algorithm on real life dataset such as that of social network that are publically available. I would also like to implement hybrid version of this algorithm. Page 10 of 11
  • 11. 7. References 1. Robert W. Floyd. 1962. Algorithm 97: Shortest path. Commun. ACM 5, 6 (June 1962), 345-. DOI=10.1145/367766.368168 http://doi.acm.org/10.1145/367766.368168 2. Floyd-Warshall algorithm http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm, last accessed 30th, October, 2011 3. Case Study: Shortest-Path algorithms http://www.mcs.anl.gov/~itf/dbpp/text/node35.html#figgrfl3, last accessed 30th, October, 2011 Page 11 of 11