SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Chapter 3: Principles of Scalable
         Performance

•   Performance measures
•   Speedup laws
•   Scalability principles
•   Scaling up vs. scaling down



                EENG-630 Chapter 3   1
Performance metrics and
               measures

•   Parallelism profiles
•   Asymptotic speedup factor
•   System efficiency, utilization and quality
•   Standard performance measures



                 EENG-630 Chapter 3   2
Degree of parallelism
• Reflects the matching of software and
  hardware parallelism
• Discrete time function – measure, for each
  time period, the # of processors used
• Parallelism profile is a plot of the DOP as a
  function of time
• Ideally have unlimited resources

               EENG-630 Chapter 3    3
Factors affecting parallelism
                profiles
•   Algorithm structure
•   Program optimization
•   Resource utilization
•   Run-time conditions
•   Realistically limited by # of available
    processors, memory, and other
    nonprocessor resources

                 EENG-630 Chapter 3    4
Average parallelism variables
• n – homogeneous processors
• m – maximum parallelism in a profile
∀ ∆ - computing capacity of a single
  processor (execution rate only, no
  overhead)
• DOP=i – # processors busy during an
  observation period

              EENG-630 Chapter 3   5
Average parallelism
• Total amount of work performed is
  proportional to the area under the profile
  curve
                     t2
          W = ∆ ∫ DOP(t )dt
                    t1
                    m
          W = ∆ ∑ i ⋅ ti
                    i =1
               EENG-630 Chapter 3    6
Average parallelism

        1 t2
A=
    t 2 − t1 ∫t1 DOP(t )dt

      m
                   m
                          
A =  ∑ i ⋅ ti  /  ∑ ti 
     i =1        i =1 
        EENG-630 Chapter 3   7
Example: parallelism profile and
     average parallelism




          EENG-630 Chapter 3   8
Asymptotic speedup
                                                     m
         m
T (1) = ∑ ti (1) = ∑
                     m
                        Wi
                                       T (1)       ∑W         i

        i =1       i =1 ∆         S∞ =        =    i =1
                                       T (∞ )     m
          m
T ( ∞ ) = ∑ ti ( ∞ ) = ∑
                          m
                            Wi                    ∑W /i
                                                  i =1
                                                          i


          i =1         i =1 i∆        = A in the ideal case
        (response time)



                     EENG-630 Chapter 3      9
Performance measures
• Consider n processors executing m
  programs in various modes
• Want to define the mean performance of
  these multimode computers:
  – Arithmetic mean performance
  – Geometric mean performance
  – Harmonic mean performance


              EENG-630 Chapter 3   10
Arithmetic mean performance
        m
Ra = ∑ Ri / m             Arithmetic mean execution rate
                          (assumes equal weighting)
       i =1
       m
R = ∑ ( f i Ri )
 *                        Weighted arithmetic mean
 a                        execution rate
       i =1

     -proportional to the sum of the inverses of
      execution times

               EENG-630 Chapter 3           11
Geometric mean performance
        m
Rg = ∏ R       1/ m
               i
                        Geometric mean execution rate

       i =1
          m
R = ∏ Ri
 *
 g
                 fi      Weighted geometric mean
                         execution rate
        i =1
  -does not summarize the real performance since it does
   not have the inverse relation with the total time

                EENG-630 Chapter 3         12
Harmonic mean performance

 Ti = 1 / Ri      Mean execution time per instruction
                  For program i


    1 m    1 m 1
Ta = ∑ Ti = ∑          Arithmetic mean execution time
    m i =1 m i =1 Ri   per instruction




                EENG-630 Chapter 3        13
Harmonic mean performance
                            m
Rh =1 / Ta =          m               Harmonic mean execution rate
                    ∑1 / R )
                     (
                     i=1
                                 i




             1
R =
 *
 h    m
                            Weighted harmonic mean execution rate

      ∑( f
      i =1
             i   / Ri )
                      -corresponds to total # of operations divided by
                       the total time (closest to the real performance)
                          EENG-630 Chapter 3        14
Harmonic Mean Speedup
• Ties the various modes of a program to the
  number of processors used
• Program is in mode i if i processors used
• Sequential execution time T1 = 1/R1 = 1
                                 1
          S = T1 / T =
                   *

                         ∑
                             n
                          i =1
                                 f i / Ri

              EENG-630 Chapter 3            15
Harmonic Mean Speedup
     Performance




     EENG-630 Chapter 3   16
Amdahl’s Law
• Assume Ri = i, w = (α, 0, 0, …, 1- α)
• System is either sequential, with probability
  α, or fully parallel with prob. 1- α
                     n
          Sn =
               1 + (n − 1)α
• Implies S → 1/ α as n → ∞
               EENG-630 Chapter 3   17
Speedup Performance




    EENG-630 Chapter 3   18
System Efficiency
• O(n) is the total # of unit operations
• T(n) is execution time in unit time steps
• T(n) < O(n) and T(1) = O(1)

           S ( N ) = T (1) / T (n)

                   S (n) T (1)
          E ( n) =      =
                     n    nT (n)
               EENG-630 Chapter 3    19
Redundancy and Utilization
• Redundancy signifies the extent of
  matching software and hardware parallelism

         R (n) = O(n) / O(1)
• Utilization indicates the percentage of
  resources kept busy during execution
                                  O ( n)
       U ( n) = R ( n) E ( n) =
                                nT (n20
               EENG-630 Chapter 3
                                        )
Quality of Parallelism
• Directly proportional to the speedup and
  efficiency and inversely related to the
  redundancy
• Upper-bounded by the speedup S(n)

                                   3
             S ( n) E ( n)     T (1)
    Q ( n) =               =   2
                 R ( n)      nT (n)O(n)
              EENG-630 Chapter 3       21
Example of Performance
• Given O(1) = T(1) = n3, O(n) = n3 + n2log n,
  and T(n) = 4 n3/(n+3)
  S(n)    =     (n+3)/4
  E(n)    =     (n+3)/(4n)
  R(n)    =     (n + log n)/n
  U(n)    =     (n+3)(n + log n)/(4n2)
  Q(n)    =     (n+3)2 / (16(n + log n))


               EENG-630 Chapter 3       22
Standard Performance Measures
• MIPS and Mflops
  – Depends on instruction set and program used
• Dhrystone results
  – Measure of integer performance
• Whestone results
  – Measure of floating-point performance
• TPS and KLIPS ratings
  – Transaction performance and reasoning power

               EENG-630 Chapter 3     23
Parallel Processing Applications
•   Drug design
•   High-speed civil transport
•   Ocean modeling
•   Ozone depletion research
•   Air pollution
•   Digital anatomy


                EENG-630 Chapter 3   24
Application Models for Parallel
           Computers
• Fixed-load model
  – Constant workload
• Fixed-time model
  – Demands constant program execution time
• Fixed-memory model
  – Limited by the memory bound



              EENG-630 Chapter 3    25
Algorithm Characteristics
• Deterministic vs. nondeterministic
• Computational granularity
• Parallelism profile
• Communication patterns and
  synchronization requirements
• Uniformity of operations
• Memory requirement and data structures
              EENG-630 Chapter 3   26
Isoefficiency Concept
• Relates workload to machine size n needed
  to maintain a fixed efficiency
                                   workload
                 w( s )
        E=                             overhead
           w( s ) + h( s, n)

• The smaller the power of n, the more
  scalable the system

              EENG-630 Chapter 3      27
Isoefficiency Function
• To maintain a constant E, w(s) should grow
  in proportion to h(s,n)

                   E
         w( s ) =      × h( s, n)
                  1− E
• C = E/(1-E) is constant for fixed E

        f E ( n) = C × h( s, n)
               EENG-630 Chapter 3   28
Speedup Performance Laws
• Amdahl’s law
  – for fixed workload or fixed problem size
• Gustafson’s law
  – for scaled problems (problem size increases
    with increased machine size)
• Speedup model
  – for scaled problems bounded by memory
    capacity

               EENG-630 Chapter 3      29
Amdahl’s Law
• As # of processors increase, the fixed load
  is distributed to more processors
• Minimal turnaround time is primary goal
• Speedup factor is upper-bounded by a
  sequential bottleneck
• Two cases:
   DOP < n
   DOP ≥ n

               EENG-630 Chapter 3   30
Fixed Load Speedup Factor
• Case 1: DOP > n                  • Case 2: DOP < n
            Wi   i 
  ti (i ) =      n                                        Wi
            i∆                     t i ( n ) = t i (∞ ) =
                                                            i∆
            m
               Wi   i
 T ( n) = ∑         n
          i =1 i∆                     m

                        T (1)           ∑W     i
                   Sn =        =        i =i

                        T ( n)      m
                                        Wi     i 
                                   ∑i
                                   i =1
                                               n
                                                
                        EENG-630 Chapter 3            31
Gustafson’s Law
• With Amdahl’s Law, the workload cannot
  scale to match the available computing
  power as n increases
• Gustafson’s Law fixes the time, allowing
  the problem size to increase with higher n
• Not saving time, but increasing accuracy


              EENG-630 Chapter 3   32
Fixed-time Speedup
• As the machine size increases, have
  increased workload and new profile
• In general, Wi’ > Wi for 2 ≤ i ≤ m’ and   W1’
  = W1
• Assume T(1) = T’(n)



               EENG-630 Chapter 3   33
Gustafson’s Scaled Speedup
                             '
 m
            Wi  m
                                 i 
 ∑Wi = ∑ i
 i =1  i =1
                                 n
                                  
                                      + Q ( n)
         m'

         ∑W         i
                        '
                              W1 + nWn
  S ='
     n
         i =1
                            =
           m
                              W1 + Wn
         ∑W
         i =1
                    i

          EENG-630 Chapter 3               34
Memory Bounded Speedup
             Model
• Idea is to solve largest problem, limited by
  memory space
• Results in a scaled workload and higher accuracy
• Each node can handle only a small subproblem for
  distributed memory
• Using a large # of nodes collectively increases the
  memory capacity proportionally


                 EENG-630 Chapter 3      35
Fixed-Memory Speedup
• Let M be the memory requirement and W
  the computational workload: W = g(M)
• g*(nM)=G(n)g(M)=G(n)Wn
                   m*

               ∑W         i
                              *
                                        W1 + G (n)Wn
 S =
  *
  n
                   i =1
                                     =
       m*
            Wi *    i                W1 + G (n)Wn / n
       ∑ i
       i =1
                     n  + Q ( n)
                     
                     EENG-630 Chapter 3       36
Relating Speedup Models
• G(n) reflects the increase in workload as
  memory increases n times
• G(n) = 1 : Fixed problem size (Amdahl)
• G(n) = n : Workload increases n times when
  memory increased n times (Gustafson)
• G(n) > n : workload increases faster than
  memory than the memory requirement

              EENG-630 Chapter 3   37
Scalability Metrics
• Machine size (n) : # of processors
• Clock rate (f) : determines basic m/c cycle
• Problem size (s) : amount of computational
  workload. Directly proportional to T(s,1).
• CPU time (T(s,n)) : actual CPU time for
  execution
• I/O demand (d) : demand in moving the
  program, data, and results for a given run
              EENG-630 Chapter 3   38
Scalability Metrics
• Memory capacity (m) : max # of memory words
  demanded
• Communication overhead (h(s,n)) : amount of
  time for interprocessor communication,
  synchronization, etc.
• Computer cost (c) : total cost of h/w and s/w
  resources required
• Programming overhead (p) : development
  overhead associated with an application program

                EENG-630 Chapter 3     39
Speedup and Efficiency
• The problem size is the independent
  parameter
                         T ( s,1)
      S ( s, n) =
                  T ( s, n) + h( s, n)
                       S ( s, n)
           E ( s, n) =
                           n
              EENG-630 Chapter 3    40
Scalable Systems
• Ideally, if E(s,n)=1 for all algorithms and
  any s and n, system is scalable
• Practically, consider the scalability of a m/c

                     S ( s, n) TI ( s, n)
         Φ ( s, n) =            =
                     S I ( s, n) T ( s, n)


                EENG-630 Chapter 3     41

Weitere ähnliche Inhalte

Was ist angesagt?

program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer ArchitecturePankaj Kumar Jain
 
Advanced computer architechture -Memory Hierarchies and its Properties and Type
Advanced computer architechture -Memory Hierarchies and its Properties and TypeAdvanced computer architechture -Memory Hierarchies and its Properties and Type
Advanced computer architechture -Memory Hierarchies and its Properties and TypeLalfakawmaKh
 
Computer architecture multi processor
Computer architecture multi processorComputer architecture multi processor
Computer architecture multi processorMazin Alwaaly
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computingRashid Ansari
 
Client-centric Consistency Models
Client-centric Consistency ModelsClient-centric Consistency Models
Client-centric Consistency ModelsEnsar Basri Kahveci
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systemskaran2190
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel systemManish Singh
 
System interconnect architecture
System interconnect architectureSystem interconnect architecture
System interconnect architectureGagan Kumar
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platformsSyed Zaid Irshad
 
Multiprocessor
MultiprocessorMultiprocessor
MultiprocessorNeel Patel
 
advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismPankaj Kumar Jain
 

Was ist angesagt? (20)

Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
 
Advanced computer architechture -Memory Hierarchies and its Properties and Type
Advanced computer architechture -Memory Hierarchies and its Properties and TypeAdvanced computer architechture -Memory Hierarchies and its Properties and Type
Advanced computer architechture -Memory Hierarchies and its Properties and Type
 
Computer architecture multi processor
Computer architecture multi processorComputer architecture multi processor
Computer architecture multi processor
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
Client-centric Consistency Models
Client-centric Consistency ModelsClient-centric Consistency Models
Client-centric Consistency Models
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming model
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
System interconnect architecture
System interconnect architectureSystem interconnect architecture
System interconnect architecture
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelism
 

Andere mochten auch

Andere mochten auch (20)

Parallel computing chapter 2
Parallel computing chapter 2Parallel computing chapter 2
Parallel computing chapter 2
 
Parallel computing(2)
Parallel computing(2)Parallel computing(2)
Parallel computing(2)
 
Computer architecture kai hwang
Computer architecture   kai hwangComputer architecture   kai hwang
Computer architecture kai hwang
 
Bengali optical character recognition system
Bengali optical character recognition systemBengali optical character recognition system
Bengali optical character recognition system
 
Observer pattern
Observer patternObserver pattern
Observer pattern
 
Mediator pattern
Mediator patternMediator pattern
Mediator pattern
 
Clustering manual
Clustering manualClustering manual
Clustering manual
 
Parallel searching
Parallel searchingParallel searching
Parallel searching
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
 
Program and Network Properties
Program and Network PropertiesProgram and Network Properties
Program and Network Properties
 
Statistical Machine Translation for Language Localisation
Statistical Machine Translation for Language LocalisationStatistical Machine Translation for Language Localisation
Statistical Machine Translation for Language Localisation
 
Brown Junior Public School - EQAO School Report
Brown Junior Public School - EQAO School ReportBrown Junior Public School - EQAO School Report
Brown Junior Public School - EQAO School Report
 
Aca2 08 new
Aca2 08 newAca2 08 new
Aca2 08 new
 
Aca2 01 new
Aca2 01 newAca2 01 new
Aca2 01 new
 
Map reduce
Map reduceMap reduce
Map reduce
 
R with excel
R with excelR with excel
R with excel
 
Apache hadoop & map reduce
Apache hadoop & map reduceApache hadoop & map reduce
Apache hadoop & map reduce
 
Strategy pattern.pdf
Strategy pattern.pdfStrategy pattern.pdf
Strategy pattern.pdf
 
Interviews 1
Interviews 1Interviews 1
Interviews 1
 
Twitter
TwitterTwitter
Twitter
 

Ähnlich wie Parallel computing chapter 3

CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsKrishnan MuthuManickam
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysissumitbardhan
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysisAkshay Dagar
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsAbdullah Al-hazmy
 
Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02mansab MIRZA
 
algorithmanalysis and effciency.pptx
algorithmanalysis and effciency.pptxalgorithmanalysis and effciency.pptx
algorithmanalysis and effciency.pptxChSreenivasuluReddy
 
jn;lm;lkm';m';;lmppt of data structure.pdf
jn;lm;lkm';m';;lmppt of data structure.pdfjn;lm;lkm';m';;lmppt of data structure.pdf
jn;lm;lkm';m';;lmppt of data structure.pdfVinayNassa3
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdfabay golla
 
Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematicalbabuk110
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures NotesRobinRohit2
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithmssangeetha s
 
Algorithm chapter 2
Algorithm chapter 2Algorithm chapter 2
Algorithm chapter 2chidabdu
 
Introduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searchingIntroduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searchingMvenkatarao
 

Ähnlich wie Parallel computing chapter 3 (20)

CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of Algorithms
 
Unit 1
Unit 1Unit 1
Unit 1
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Big Oh.ppt
Big Oh.pptBig Oh.ppt
Big Oh.ppt
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis tools
 
Unit ii algorithm
Unit   ii algorithmUnit   ii algorithm
Unit ii algorithm
 
Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02
 
algorithmanalysis and effciency.pptx
algorithmanalysis and effciency.pptxalgorithmanalysis and effciency.pptx
algorithmanalysis and effciency.pptx
 
Iare ds ppt_3
Iare ds ppt_3Iare ds ppt_3
Iare ds ppt_3
 
jn;lm;lkm';m';;lmppt of data structure.pdf
jn;lm;lkm';m';;lmppt of data structure.pdfjn;lm;lkm';m';;lmppt of data structure.pdf
jn;lm;lkm';m';;lmppt of data structure.pdf
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdf
 
Algorithms
Algorithms Algorithms
Algorithms
 
Chapter two
Chapter twoChapter two
Chapter two
 
Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematical
 
L12 complexity
L12 complexityL12 complexity
L12 complexity
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures Notes
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithms
 
Algorithm chapter 2
Algorithm chapter 2Algorithm chapter 2
Algorithm chapter 2
 
Introduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searchingIntroduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searching
 

Mehr von Md. Mahedi Mahfuj (19)

Parallel computing(1)
Parallel computing(1)Parallel computing(1)
Parallel computing(1)
 
Message passing interface
Message passing interfaceMessage passing interface
Message passing interface
 
Matrix multiplication graph
Matrix multiplication graphMatrix multiplication graph
Matrix multiplication graph
 
Strategy pattern
Strategy patternStrategy pattern
Strategy pattern
 
Database management system chapter16
Database management system chapter16Database management system chapter16
Database management system chapter16
 
Database management system chapter15
Database management system chapter15Database management system chapter15
Database management system chapter15
 
Database management system chapter12
Database management system chapter12Database management system chapter12
Database management system chapter12
 
Strategies in job search process
Strategies in job search processStrategies in job search process
Strategies in job search process
 
Report writing(short)
Report writing(short)Report writing(short)
Report writing(short)
 
Report writing(long)
Report writing(long)Report writing(long)
Report writing(long)
 
Job search_resume
Job search_resumeJob search_resume
Job search_resume
 
Job search_interview
Job search_interviewJob search_interview
Job search_interview
 
Basic and logical implementation of r language
Basic and logical implementation of r language Basic and logical implementation of r language
Basic and logical implementation of r language
 
R language
R languageR language
R language
 
Big data
Big dataBig data
Big data
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial Intelligence
 
Cloud testing v1
Cloud testing v1Cloud testing v1
Cloud testing v1
 
Distributed deadlock
Distributed deadlockDistributed deadlock
Distributed deadlock
 
Paper review
Paper review Paper review
Paper review
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Parallel computing chapter 3

  • 1. Chapter 3: Principles of Scalable Performance • Performance measures • Speedup laws • Scalability principles • Scaling up vs. scaling down EENG-630 Chapter 3 1
  • 2. Performance metrics and measures • Parallelism profiles • Asymptotic speedup factor • System efficiency, utilization and quality • Standard performance measures EENG-630 Chapter 3 2
  • 3. Degree of parallelism • Reflects the matching of software and hardware parallelism • Discrete time function – measure, for each time period, the # of processors used • Parallelism profile is a plot of the DOP as a function of time • Ideally have unlimited resources EENG-630 Chapter 3 3
  • 4. Factors affecting parallelism profiles • Algorithm structure • Program optimization • Resource utilization • Run-time conditions • Realistically limited by # of available processors, memory, and other nonprocessor resources EENG-630 Chapter 3 4
  • 5. Average parallelism variables • n – homogeneous processors • m – maximum parallelism in a profile ∀ ∆ - computing capacity of a single processor (execution rate only, no overhead) • DOP=i – # processors busy during an observation period EENG-630 Chapter 3 5
  • 6. Average parallelism • Total amount of work performed is proportional to the area under the profile curve t2 W = ∆ ∫ DOP(t )dt t1 m W = ∆ ∑ i ⋅ ti i =1 EENG-630 Chapter 3 6
  • 7. Average parallelism 1 t2 A= t 2 − t1 ∫t1 DOP(t )dt  m   m  A =  ∑ i ⋅ ti  /  ∑ ti   i =1   i =1  EENG-630 Chapter 3 7
  • 8. Example: parallelism profile and average parallelism EENG-630 Chapter 3 8
  • 9. Asymptotic speedup m m T (1) = ∑ ti (1) = ∑ m Wi T (1) ∑W i i =1 i =1 ∆ S∞ = = i =1 T (∞ ) m m T ( ∞ ) = ∑ ti ( ∞ ) = ∑ m Wi ∑W /i i =1 i i =1 i =1 i∆ = A in the ideal case (response time) EENG-630 Chapter 3 9
  • 10. Performance measures • Consider n processors executing m programs in various modes • Want to define the mean performance of these multimode computers: – Arithmetic mean performance – Geometric mean performance – Harmonic mean performance EENG-630 Chapter 3 10
  • 11. Arithmetic mean performance m Ra = ∑ Ri / m Arithmetic mean execution rate (assumes equal weighting) i =1 m R = ∑ ( f i Ri ) * Weighted arithmetic mean a execution rate i =1 -proportional to the sum of the inverses of execution times EENG-630 Chapter 3 11
  • 12. Geometric mean performance m Rg = ∏ R 1/ m i Geometric mean execution rate i =1 m R = ∏ Ri * g fi Weighted geometric mean execution rate i =1 -does not summarize the real performance since it does not have the inverse relation with the total time EENG-630 Chapter 3 12
  • 13. Harmonic mean performance Ti = 1 / Ri Mean execution time per instruction For program i 1 m 1 m 1 Ta = ∑ Ti = ∑ Arithmetic mean execution time m i =1 m i =1 Ri per instruction EENG-630 Chapter 3 13
  • 14. Harmonic mean performance m Rh =1 / Ta = m Harmonic mean execution rate ∑1 / R ) ( i=1 i 1 R = * h m Weighted harmonic mean execution rate ∑( f i =1 i / Ri ) -corresponds to total # of operations divided by the total time (closest to the real performance) EENG-630 Chapter 3 14
  • 15. Harmonic Mean Speedup • Ties the various modes of a program to the number of processors used • Program is in mode i if i processors used • Sequential execution time T1 = 1/R1 = 1 1 S = T1 / T = * ∑ n i =1 f i / Ri EENG-630 Chapter 3 15
  • 16. Harmonic Mean Speedup Performance EENG-630 Chapter 3 16
  • 17. Amdahl’s Law • Assume Ri = i, w = (α, 0, 0, …, 1- α) • System is either sequential, with probability α, or fully parallel with prob. 1- α n Sn = 1 + (n − 1)α • Implies S → 1/ α as n → ∞ EENG-630 Chapter 3 17
  • 18. Speedup Performance EENG-630 Chapter 3 18
  • 19. System Efficiency • O(n) is the total # of unit operations • T(n) is execution time in unit time steps • T(n) < O(n) and T(1) = O(1) S ( N ) = T (1) / T (n) S (n) T (1) E ( n) = = n nT (n) EENG-630 Chapter 3 19
  • 20. Redundancy and Utilization • Redundancy signifies the extent of matching software and hardware parallelism R (n) = O(n) / O(1) • Utilization indicates the percentage of resources kept busy during execution O ( n) U ( n) = R ( n) E ( n) = nT (n20 EENG-630 Chapter 3 )
  • 21. Quality of Parallelism • Directly proportional to the speedup and efficiency and inversely related to the redundancy • Upper-bounded by the speedup S(n) 3 S ( n) E ( n) T (1) Q ( n) = = 2 R ( n) nT (n)O(n) EENG-630 Chapter 3 21
  • 22. Example of Performance • Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n) = 4 n3/(n+3) S(n) = (n+3)/4 E(n) = (n+3)/(4n) R(n) = (n + log n)/n U(n) = (n+3)(n + log n)/(4n2) Q(n) = (n+3)2 / (16(n + log n)) EENG-630 Chapter 3 22
  • 23. Standard Performance Measures • MIPS and Mflops – Depends on instruction set and program used • Dhrystone results – Measure of integer performance • Whestone results – Measure of floating-point performance • TPS and KLIPS ratings – Transaction performance and reasoning power EENG-630 Chapter 3 23
  • 24. Parallel Processing Applications • Drug design • High-speed civil transport • Ocean modeling • Ozone depletion research • Air pollution • Digital anatomy EENG-630 Chapter 3 24
  • 25. Application Models for Parallel Computers • Fixed-load model – Constant workload • Fixed-time model – Demands constant program execution time • Fixed-memory model – Limited by the memory bound EENG-630 Chapter 3 25
  • 26. Algorithm Characteristics • Deterministic vs. nondeterministic • Computational granularity • Parallelism profile • Communication patterns and synchronization requirements • Uniformity of operations • Memory requirement and data structures EENG-630 Chapter 3 26
  • 27. Isoefficiency Concept • Relates workload to machine size n needed to maintain a fixed efficiency workload w( s ) E= overhead w( s ) + h( s, n) • The smaller the power of n, the more scalable the system EENG-630 Chapter 3 27
  • 28. Isoefficiency Function • To maintain a constant E, w(s) should grow in proportion to h(s,n) E w( s ) = × h( s, n) 1− E • C = E/(1-E) is constant for fixed E f E ( n) = C × h( s, n) EENG-630 Chapter 3 28
  • 29. Speedup Performance Laws • Amdahl’s law – for fixed workload or fixed problem size • Gustafson’s law – for scaled problems (problem size increases with increased machine size) • Speedup model – for scaled problems bounded by memory capacity EENG-630 Chapter 3 29
  • 30. Amdahl’s Law • As # of processors increase, the fixed load is distributed to more processors • Minimal turnaround time is primary goal • Speedup factor is upper-bounded by a sequential bottleneck • Two cases:  DOP < n  DOP ≥ n EENG-630 Chapter 3 30
  • 31. Fixed Load Speedup Factor • Case 1: DOP > n • Case 2: DOP < n Wi i  ti (i ) = n Wi i∆   t i ( n ) = t i (∞ ) = i∆ m Wi i T ( n) = ∑ n i =1 i∆   m T (1) ∑W i Sn = = i =i T ( n) m Wi i  ∑i i =1 n   EENG-630 Chapter 3 31
  • 32. Gustafson’s Law • With Amdahl’s Law, the workload cannot scale to match the available computing power as n increases • Gustafson’s Law fixes the time, allowing the problem size to increase with higher n • Not saving time, but increasing accuracy EENG-630 Chapter 3 32
  • 33. Fixed-time Speedup • As the machine size increases, have increased workload and new profile • In general, Wi’ > Wi for 2 ≤ i ≤ m’ and W1’ = W1 • Assume T(1) = T’(n) EENG-630 Chapter 3 33
  • 34. Gustafson’s Scaled Speedup ' m Wi m i  ∑Wi = ∑ i i =1 i =1 n   + Q ( n) m' ∑W i ' W1 + nWn S =' n i =1 = m W1 + Wn ∑W i =1 i EENG-630 Chapter 3 34
  • 35. Memory Bounded Speedup Model • Idea is to solve largest problem, limited by memory space • Results in a scaled workload and higher accuracy • Each node can handle only a small subproblem for distributed memory • Using a large # of nodes collectively increases the memory capacity proportionally EENG-630 Chapter 3 35
  • 36. Fixed-Memory Speedup • Let M be the memory requirement and W the computational workload: W = g(M) • g*(nM)=G(n)g(M)=G(n)Wn m* ∑W i * W1 + G (n)Wn S = * n i =1 = m* Wi * i  W1 + G (n)Wn / n ∑ i i =1  n  + Q ( n)   EENG-630 Chapter 3 36
  • 37. Relating Speedup Models • G(n) reflects the increase in workload as memory increases n times • G(n) = 1 : Fixed problem size (Amdahl) • G(n) = n : Workload increases n times when memory increased n times (Gustafson) • G(n) > n : workload increases faster than memory than the memory requirement EENG-630 Chapter 3 37
  • 38. Scalability Metrics • Machine size (n) : # of processors • Clock rate (f) : determines basic m/c cycle • Problem size (s) : amount of computational workload. Directly proportional to T(s,1). • CPU time (T(s,n)) : actual CPU time for execution • I/O demand (d) : demand in moving the program, data, and results for a given run EENG-630 Chapter 3 38
  • 39. Scalability Metrics • Memory capacity (m) : max # of memory words demanded • Communication overhead (h(s,n)) : amount of time for interprocessor communication, synchronization, etc. • Computer cost (c) : total cost of h/w and s/w resources required • Programming overhead (p) : development overhead associated with an application program EENG-630 Chapter 3 39
  • 40. Speedup and Efficiency • The problem size is the independent parameter T ( s,1) S ( s, n) = T ( s, n) + h( s, n) S ( s, n) E ( s, n) = n EENG-630 Chapter 3 40
  • 41. Scalable Systems • Ideally, if E(s,n)=1 for all algorithms and any s and n, system is scalable • Practically, consider the scalability of a m/c S ( s, n) TI ( s, n) Φ ( s, n) = = S I ( s, n) T ( s, n) EENG-630 Chapter 3 41