SlideShare a Scribd company logo
1 of 13
Predicting the Time of Oblivious BSP ,[object Object],González J.A.   1 , León C.  1 , Piccoli F.  2 , Printista M.  2 , Roda J.L.  1 ,  Rodríguez C.   1 , Sande F.  1 1 Dpto. de Estadística, Investigación Operativa y Computación Universidad de La Laguna Tenerife, Canary Islands, Spain 2 Universidad Nacional de San Luis Ejército de los Andes 950, San Luis, Argentina
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bulk Synchronous Parallel Model (BSP) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Microprocessor Cache Memory Network Interface DRAM Memory Interconnection Network Microprocesador Memoria Caché Interfaz de Red Memoria DRAM Microprocesador Memoria Caché Interfaz de Red Memoria DRAM Microprocesador Memoria Caché Interfaz de Red Memoria DRAM Microprocessor Cache Memory Network Interface DRAM Memory
Oblivious BPS Model (OBSP) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],–  h PS : OBSP packet size g L b0 g 0 h h PS L b time T(h) = g*h+L b   h    h PS T(h) = g 0 *h+L b0   h < h PS
Paderborn University BSP Library ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The Paderborn University BSP (PUB) Library - Design, Implementation and Performance Olaf Bonorden, Ben Juurlink, Ingo von Otte, Ingo Rieping 13 th  International Parallel Processing Symposium & 10 th  Symposium on Parallel and Distributed Processing (IPPS/SPDP) San Juan, Puerto Rico, April 12 - April 16, 1999
OBSP Cost Analysis
BSP Model vs OBSP Model  2,i =3*w + 2*(g*h+L b ) h    h PS T BSP =4*w + 2*(g*h+L) h    h PS BSP OBSP P1 P0 time w w 2w 2w L b L b g*h g*h P1 P0 time w w 2w 2w L L g*h g*h
FFT Analysis using the OBSP Model  1,i (T k (1) ,X k (1) ,  i (1) ) P1 P0 P2 P3 seq_fft Division bsp_partition Combination  2,i (T k (1) ,X k (1) ,  i (1) )  1,i (T k (2) ,X k (2) ,  i (2) )  1,i (T (0) ,X (0) ,0)  2,i (T (0) ,X (0) ,0) bsp_done X (0) ={0,1,2,3} X 0 (1) ={0,1} X 1 (1) ={2,3} X k (2) ={k} k=0,..,3 w 1,i g*h 1,i +L b w 2,i w 2,i (1) w 1,i (1) g*h 1,i (1) +L b  i (1) w 1,i (2)  i (2)
OBSP Prediction Accuracy Real and OBSP predicted time for the FFT algorithm on the CRAY T3E Real and OBSP predicted time for the RAP  algorithm on the CRAY T3E N=1000, M=1000 N=2048 OBSP parameter values on the CRAY T3E.  g  is in bytes per second p=16
PBS 209152 Items. CRAY T3E
Conclusions & Future Works ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OBSP Cost Analysis Example P1 P0 time w w 2w 2w L b L b g*h g*h
BSP Cost Analysis Example time w w 2w 2w L L g*h g*h P1 P0

More Related Content

What's hot

Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
PyData
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)
elliando dias
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
Linaro
 
Functional Reactive Programming by Gerold Meisinger
Functional Reactive Programming by Gerold MeisingerFunctional Reactive Programming by Gerold Meisinger
Functional Reactive Programming by Gerold Meisinger
GeroldMeisinger
 
Cosmic Rays- TEC
Cosmic Rays- TECCosmic Rays- TEC
Cosmic Rays- TEC
guest4cb860
 

What's hot (20)

An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Model-counting Approaches For Nonlinear Numerical Constraints
Model-counting Approaches For Nonlinear Numerical ConstraintsModel-counting Approaches For Nonlinear Numerical Constraints
Model-counting Approaches For Nonlinear Numerical Constraints
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)
 
Matlab bode diagram_instructions
Matlab bode diagram_instructionsMatlab bode diagram_instructions
Matlab bode diagram_instructions
 
USING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENT
USING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENTUSING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENT
USING ORFEO TOOLBOX A GROWING COMPETENCE IN A COLLABORATIVE ENVIRONMENT
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
Functional Reactive Programming by Gerold Meisinger
Functional Reactive Programming by Gerold MeisingerFunctional Reactive Programming by Gerold Meisinger
Functional Reactive Programming by Gerold Meisinger
 
FME Tips and Tricks
FME Tips and TricksFME Tips and Tricks
FME Tips and Tricks
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson sampling
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
 
Pain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication worksPain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication works
 
Cosmic Rays- TEC
Cosmic Rays- TECCosmic Rays- TEC
Cosmic Rays- TEC
 
Cosmic Rays Tec
Cosmic Rays  TecCosmic Rays  Tec
Cosmic Rays Tec
 
ESCAPE Kick-off meeting - LSST (Feb 2019)
ESCAPE Kick-off meeting - LSST (Feb 2019)ESCAPE Kick-off meeting - LSST (Feb 2019)
ESCAPE Kick-off meeting - LSST (Feb 2019)
 
Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talk
 
Python crash course for geologists in the mining industry
Python crash course for geologists in the mining industryPython crash course for geologists in the mining industry
Python crash course for geologists in the mining industry
 

Viewers also liked

Linux containers
Linux containersLinux containers
Linux containers
Indika Dias
 
REGURGITATION AND ASPIRATION DURING ANESTHESIA
REGURGITATION AND ASPIRATION DURING ANESTHESIA REGURGITATION AND ASPIRATION DURING ANESTHESIA
REGURGITATION AND ASPIRATION DURING ANESTHESIA
abiysileshi
 

Viewers also liked (7)

Ppcrslidesannotated
PpcrslidesannotatedPpcrslidesannotated
Ppcrslidesannotated
 
eG Citrix Monitor
eG Citrix MonitoreG Citrix Monitor
eG Citrix Monitor
 
Theperlreview
TheperlreviewTheperlreview
Theperlreview
 
Linux containers
Linux containersLinux containers
Linux containers
 
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
 
Acts 26 commentary
Acts 26 commentaryActs 26 commentary
Acts 26 commentary
 
REGURGITATION AND ASPIRATION DURING ANESTHESIA
REGURGITATION AND ASPIRATION DURING ANESTHESIA REGURGITATION AND ASPIRATION DURING ANESTHESIA
REGURGITATION AND ASPIRATION DURING ANESTHESIA
 

Similar to PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001

Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
AIRCC Publishing Corporation
 
Jeff Fischer - Python and IoT: From Chips and Bits to Data Science
Jeff Fischer - Python and IoT: From Chips and Bits to Data ScienceJeff Fischer - Python and IoT: From Chips and Bits to Data Science
Jeff Fischer - Python and IoT: From Chips and Bits to Data Science
PyData
 
cis98006
cis98006cis98006
cis98006
perfj
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
HamzaJaved306957
 
A novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systemsA novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systems
aliasghar1989
 

Similar to PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001 (20)

Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
 
Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...
 
Pycon9 dibernado
Pycon9 dibernadoPycon9 dibernado
Pycon9 dibernado
 
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
 
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
 
Er24902905
Er24902905Er24902905
Er24902905
 
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
 
Europy17_dibernardo
Europy17_dibernardoEuropy17_dibernardo
Europy17_dibernardo
 
Photoacoustic tomography based on the application of virtual detectors
Photoacoustic tomography based on the application of virtual detectorsPhotoacoustic tomography based on the application of virtual detectors
Photoacoustic tomography based on the application of virtual detectors
 
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)A minimal introduction to Python non-uniform fast Fourier transform (pynufft)
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)
 
Jeff Fischer - Python and IoT: From Chips and Bits to Data Science
Jeff Fischer - Python and IoT: From Chips and Bits to Data ScienceJeff Fischer - Python and IoT: From Chips and Bits to Data Science
Jeff Fischer - Python and IoT: From Chips and Bits to Data Science
 
Progress in the NNPDF global analysis
Progress in the NNPDF global analysisProgress in the NNPDF global analysis
Progress in the NNPDF global analysis
 
cis98006
cis98006cis98006
cis98006
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
 
A novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systemsA novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systems
 
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPPDefinition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
 
Lecture_2_v2_qc.pptx
Lecture_2_v2_qc.pptxLecture_2_v2_qc.pptx
Lecture_2_v2_qc.pptx
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 7. BSP Model vs OBSP Model  2,i =3*w + 2*(g*h+L b ) h  h PS T BSP =4*w + 2*(g*h+L) h  h PS BSP OBSP P1 P0 time w w 2w 2w L b L b g*h g*h P1 P0 time w w 2w 2w L L g*h g*h
  • 8. FFT Analysis using the OBSP Model  1,i (T k (1) ,X k (1) ,  i (1) ) P1 P0 P2 P3 seq_fft Division bsp_partition Combination  2,i (T k (1) ,X k (1) ,  i (1) )  1,i (T k (2) ,X k (2) ,  i (2) )  1,i (T (0) ,X (0) ,0)  2,i (T (0) ,X (0) ,0) bsp_done X (0) ={0,1,2,3} X 0 (1) ={0,1} X 1 (1) ={2,3} X k (2) ={k} k=0,..,3 w 1,i g*h 1,i +L b w 2,i w 2,i (1) w 1,i (1) g*h 1,i (1) +L b  i (1) w 1,i (2)  i (2)
  • 9. OBSP Prediction Accuracy Real and OBSP predicted time for the FFT algorithm on the CRAY T3E Real and OBSP predicted time for the RAP algorithm on the CRAY T3E N=1000, M=1000 N=2048 OBSP parameter values on the CRAY T3E. g is in bytes per second p=16
  • 10. PBS 209152 Items. CRAY T3E
  • 11.
  • 12. OBSP Cost Analysis Example P1 P0 time w w 2w 2w L b L b g*h g*h
  • 13. BSP Cost Analysis Example time w w 2w 2w L L g*h g*h P1 P0

Editor's Notes

  1. Good afternoon ladies and gentlemen. In this paper, we propose a Parallel Computing Model that extends the well-known Bulk Synchronous Parallel model to work with algorithms that don´t require global barrier synchronisation, and deals with new programming features as processor-partition operations and oblivious synchronisation. This last feature gives name to the model: the Oblivious BSP.
  2. Presentation starts with a brief introduction to the BSP model concepts, and then I will present the Oblivious BSP model. A methodology for predicting the execution time is shown using a trivial example. After that, I will show the preliminaries results obtained using the OBSP model to predict the execution time of two algorithms: FFT, which is an example of Data Parallelism, and RAP, which is solved by a intensive communication pipeline algorithm. To conclude the presentation I will mention current and future works into this line.
  3. The Bulk Synchronous Parallel model was proposed by Prof. Valiant in 1990. It considers a parallel machine made of a set of p processor with private memory, interconnected throe a global communication network and a mechanism for synchronising the processors. The BSP model can be characterised by the following parameters: the communication gap g , defined as the unary packet transmission time, which reflects the per-processor bandwidth; the latency L , which corresponds to the time needed to synchronise all processors. These values depend on the number of processors p . A BSP computation is organised into supersteps, each of them consists of: Local computation, inter-process communication, and a global synchronisation. The execution time for a superstep s is given by: the largest amount of work performed by any processor during the superstep, w s plus the largest number of packets sent or received by any processor during the superstep, h s plus the time required by the global synchronisation.
  4. The OBSP model extends the BSP model to deal with oblivious synchronisation and processor-partition operations. When the number of messages due to receive by a processor in a superstep is known, a zero-cost synchronisation mechanism can be used to reduce the synchronisation overhead. An Oblivious Synchronisation blocks a processor until the expected number of messages are received. A partition operation splits the current set of processors into several subsets. Each of them acts as an autonomous BSP machine with its own processor numbering and synchronisation points. The OBSP machine communication capabilities are characterised by the following parameters: the gap g, the Synchronising Latency, L the Oblivious Latency, L b and the special values for small packet sizes g 0 and L b0
  5. The Paderborn University BSP library (PUB) is a parallel C library based on the BSP model. In addition to the most common BSP features, PUB provides routines to perform: oblivious synchronisation, partition operations, and collective communications.
  6. In an OBSP prediction analysis, we assume that: 1) supersteps are numbered starting at 1, 2) all processors perform the same number of supersteps R, and 3) because processors can be in different supersteps at the same time, a processor in its superstep s can send a message to other processor in a previous superstep. The system ensures that communication is not made effective until the receiver processor finishes its superstep s. Instead of using a global barrier, the OBSP model defines the incoming partners of each processor OMEGA as the set of processors that sends a message to this processor union itself. EICh sub s,i denotes the maximum number of communicated packet by a processor. PHI sub s,i denotes the time spent by processor i in superstep s, and is given by these recursive formulas. When a partition operation is performed, this schema is recursively applied into each submachine.
  7. In this slice I compare both execution models using a trivial example. In the first superstep one processor performs local computation and sends a message to the other processor, which has to do double amount of work. Then, they synchronise and the second superstep is a symmetrical one. Using the BSP model, the maximum amount of local computation in each superstep is 2w so the total computing time is given by: Using the OBSP model, the first processor can get the second superstep while the second processor remains in the first superstep. The system buffers the message until the receiver processor is ready to receive it. This overlapping allows reduce the total execution time.
  8. This figure represents the FFT execution under the OBSP model. Coloured blocks corresponds to local computation, and black blocks denotes inter-processors communication. Blue lines on the right denotes the supersteps performed by a machine X (j) , while the black lines marks the computing and communication parts in every superstep. In the original set of processors, each of them performs some local computing that include a partition into two subsets to solve the odd and even components transformation. This partition process continues until only one processor remains in each submachine. Each of these inner submachines performs only a superstep to compute a sequential transformation, and then rejoin to the outer machine. Local computation in the first superstep includes the work performed by the inner submachine. The superstep finishes with a data exchange, and the second superstep consists of the odd and even transformed signal combination.
  9. Preliminary results have been obtained on a CRAY T3E. The first table shows the model parameters values for this machine. We note that the values for small packet sizes are not available. In the second table, we can see the measured time and the OBSP predicted time for the FFT algorithm with an input vector of size 2 million of elements. The prediction accuracy is quite good. Percentage errors are less than 3% for the overall algorithm. After this paper acceptance, some experiments have been carried out with a fine-grain intensive-communication pipeline algorithm that solves the RAP. Percentage errors are larger than the previous example, but we point out that this algorithm uses small message sizes and the used model parameters are g y L b.
  10. Preliminary results have been obtained on a CRAY T3E. The first table shows the model parameters values for this machine. We note that the values for small packet sizes are not available. In the second table, we can see the measured time and the OBSP predicted time for the FFT algorithm with an input vector of size 2 million of elements. The prediction accuracy is quite good. Percentage errors are less than 3% for the overall algorithm. After this paper acceptance, some experiments have been carried out with a fine-grain intensive-communication pipeline algorithm that solves the RAP. Percentage errors are larger than the previous example, but we point out that this algorithm uses small message sizes and the used model parameters are g y L b.
  11. As conclusions: We have proposed a new parallel computing model that extends the BSP model to work with oblivious synchronisation and partition operations. Preliminary results shows that prediction accuracy is as good as the BSP model, but In future works we want to obtain the parameters values for small message sizes, and we want to extend the analysis to other algorithms and parallel platforms.
  12. In the first superstep, processor 1 has to make double amount of work than processor 0. Processor 1 receives a message from processor 0, so its omega set include both processor. If h is the amount of communicated data, PHI’s for each processor is ... Processor 0 starts its second superstep while processor 1 remains still in the previous one. System buffers the message to ensure it will be delivered when receiver processor demands it. Processor 1 has less work to do in the second superstep, so it sends the message back and finishes.