SlideShare a Scribd company logo
1 of 23
For Dummies
From a Dummy


Ngobrol Ilmiah PPIS #1
16 Desember, 2012
M. Alfian Amrizal
Tohoku University
• Introduction to Parallel Computing
• GPU as an Accelerator




                                       2
Classical science


Nature
         Observation          Theory
                                       blogs.sundaymercury.net




                  Physical
                Experiments

                                       conserve-energy-future.com

            Numerical Simulations


              Modern science
                                                                    3
                                        SX-9 (Tohoku University)
Quantum chemistry                                 Cosmology                                            CFD




                                                                                    autoevolution.com
scidacreview.org

                                            physicsworld.com


                                 Medicine                           Material design




                   albertkents.com
                                                               solid.me.tut.ac.jp
                                                                                                              4
• Supercomputer
         –      The most powerful computers that can be built[2]
         –      First computer “ENIAC” ⇒ 350 mult/sec (1946)
         –      Todays supercomputer > 1,000,000,000 x ENIACS
         –      Todays processor speed only ~ 1,000,000 x ENIACS (?)

                          “Parallel computing”




                            cbc.ca
                                                 datacenterknowledge.com
allvoices.com                                                              5
CPU: The brain of the
computer, all data is
processed here

Memory: The computers
scratch pad, programs
are loaded and run here


GPU: For graphics
processing. Used as
accelerator in HPC


Storage: Hold data
and program files
                          6
•  The free lunch is over!!

                               -Heat
                               -Power restriction
                               -Transistor size
                               CPU arent getting
                               any faster




                                             7
• Multicomputers       • Multicore
                              Core1      Core2




  Distributed memory        Shared memory
   parallel computer       parallel computer
                       (e.g. dual core, quad core etc)
                                                         8
• Trends in HPC system design
     –    More nodes/processors/cores
     –    Deep memory hierarchies
     –    Non-uniform interconnect network
     –    Accelerators  today’s topic
                                                   N

                                            N           P
                                                             P
                                                                …
                                                               … C
                                                                C
                                        N
                                                    P
                                                            C … CC
                                                              C             A C
                                                                                  …   C
                                    N          P
                                                 P
                                                 ……
                                             PP C C            C M
                                                                               C  …   C
 N          N            N       N         P
                                         PP CCC …
                                                ……       CC
                                                           C
                                                               M
                                                                 M     …
                                                                      A C      C
                                                                                      C
                     …                  P
                                     PP CCC   ……
                                               …      CC
                                                        C             ……
                                                                    A C
                                                                         C
                                                                             C
                                                                               C
  P   C      P   C       P   C               ……      C
                                                   CC M M
                                                                  A C
                                                                     …     C
                                             …                      ………
                                    P     CC                   A C    C      C
                                  P
                                      C
                                        C
                                            …     C
                                                C MMM
                                                  MM           M    C
                                                                        C
                                                                           C M
                                                                                 M

      M          M           M
                                              M
                                               MM
                                                 M                C …   C M
                                                                        M M
                                           M         M            C …   CM
                                                 M                    M
                                               M                    M
          Good old days!                   M
          One proc. / node
          One core / proc.       Too complicated …
          Uniform network…       How can we fully exploit the potential?                  9
• Programmers need to learn both Hardware and
  Software




                              Figure: Markus Pueschel
                                                    10
• We need a powerful computer
• CPU speed cannot be increased anymore
• Go parallel:
  – Multicomputer
  – Multicore
• System’s complexity requires programmer
  to learn both HW and SW


                                       11
• Introduction to Parallel Computing
• GPU as Accelerator




                                       12
13
• Power is the problem
  – System size is limited by power budget
• Heterogeneous system is promising
  – CPU + Accelerator (=GPU)
  – CPU and GPU have their own strengths and
    weaknesses
  – CPU: few cores, high frequency (~GHz)
  – GPU: 1000 cores, low frequency (~MHz)

                                               14
• Graphics Processing Unit (GPU)
      – Originally developed for quickly generating 2D and
        3D graphics, images, and video
      – Highly parallel processor
      – GPU is more power-efficient than CPU[3]




*Image from nvidia.com                                       15
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


                  vs



                                             16
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


             vs vs



                                             17
CPU   task 1 task 2 task 3 task 4


          task 1
          task 2
GPU
          task 3
          task 4                    time




      vs vs



                                           18
• Speculative execution by branch prediction is
      effective to shorten the execution time. But
      it makes the hardware complicated


                                       A = 2;
                                       B = 3;
                                       C = A+B;
                                       D = A*B;
                                       E = A-B;
                                       if ( C > 4 )
                                       {
E   D   C   ?                            A = 0;
                                       }
                                       B = 0;
                                                      19
• CPU has a large cache memory and
  control unit
• GPUs devote more hardware resources
  to ALUs




                                        20
• Many simple cores
  – No speculation features
     • Simplicity to increase the number of cores on a chip
     • Fast context switch due to simplicity of its core design




                  comp.      memory access   comp.
     GPU Core A
                           comp.    memory access
                  context switch
                                   comp.               time




                                                                  21
• CPU and GPU are very different
  processors
  – They have own strengths and weaknesses
    • CPU has few big cores to shorten the execution
      time
    • GPU has many simple cores to increase
      throughput
  – CPU for serial execution and GPU for parallel
    execution

                         22
[1] Levin, E. “Grand challenges to computational
science.” Communication of the ACM
32(12):1456-1457, December 1989.

[2] Kauffmann, William J. III, and Larry L. Smarr.
Supercomputing and the Transformation.

[3] Nvidia. “Doing more with less of a scarce
resource.” http://www.nvidia.com/object/gcr-
energy-efficiency.html

                         23

More Related Content

Viewers also liked

10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de empregoAna Cunha
 
Electrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasElectrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasCatalina Guajardo
 
Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Any Flores
 
Exerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularExerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularJuarez Silva
 
Valoracion de enfermeria por Dominios
Valoracion de enfermeria por DominiosValoracion de enfermeria por Dominios
Valoracion de enfermeria por Dominiosmiguel hilario
 

Viewers also liked (8)

Sistema arterial posterior
Sistema arterial posteriorSistema arterial posterior
Sistema arterial posterior
 
10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego
 
Teoriasevolutivas
TeoriasevolutivasTeoriasevolutivas
Teoriasevolutivas
 
Electrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasElectrocardiograma normal y Arritmias
Electrocardiograma normal y Arritmias
 
Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16
 
(2015-09-16)sol
(2015-09-16)sol(2015-09-16)sol
(2015-09-16)sol
 
Exerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularExerccios sobre o Sistema Muscular
Exerccios sobre o Sistema Muscular
 
Valoracion de enfermeria por Dominios
Valoracion de enfermeria por DominiosValoracion de enfermeria por Dominios
Valoracion de enfermeria por Dominios
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Heterogeneous Parallel Computing with GPU: From a Dummy for Dummies

  • 1. For Dummies From a Dummy Ngobrol Ilmiah PPIS #1 16 Desember, 2012 M. Alfian Amrizal Tohoku University
  • 2. • Introduction to Parallel Computing • GPU as an Accelerator 2
  • 3. Classical science Nature Observation Theory blogs.sundaymercury.net Physical Experiments conserve-energy-future.com Numerical Simulations Modern science 3 SX-9 (Tohoku University)
  • 4. Quantum chemistry Cosmology CFD autoevolution.com scidacreview.org physicsworld.com Medicine Material design albertkents.com solid.me.tut.ac.jp 4
  • 5. • Supercomputer – The most powerful computers that can be built[2] – First computer “ENIAC” ⇒ 350 mult/sec (1946) – Todays supercomputer > 1,000,000,000 x ENIACS – Todays processor speed only ~ 1,000,000 x ENIACS (?) “Parallel computing” cbc.ca datacenterknowledge.com allvoices.com 5
  • 6. CPU: The brain of the computer, all data is processed here Memory: The computers scratch pad, programs are loaded and run here GPU: For graphics processing. Used as accelerator in HPC Storage: Hold data and program files 6
  • 7. •  The free lunch is over!! -Heat -Power restriction -Transistor size CPU arent getting any faster 7
  • 8. • Multicomputers • Multicore Core1 Core2 Distributed memory Shared memory parallel computer parallel computer (e.g. dual core, quad core etc) 8
  • 9. • Trends in HPC system design – More nodes/processors/cores – Deep memory hierarchies – Non-uniform interconnect network – Accelerators  today’s topic N N P P … … C C N P C … CC C A C … C N P P …… PP C C C M C … C N N N N P PP CCC … …… CC C M M … A C C C … P PP CCC …… … CC C …… A C C C C P C P C P C …… C CC M M A C … C … ……… P CC A C C C P C C … C C MMM MM M C C C M M M M M M MM M C … C M M M M M C … CM M M M M Good old days! M One proc. / node One core / proc. Too complicated … Uniform network… How can we fully exploit the potential? 9
  • 10. • Programmers need to learn both Hardware and Software Figure: Markus Pueschel 10
  • 11. • We need a powerful computer • CPU speed cannot be increased anymore • Go parallel: – Multicomputer – Multicore • System’s complexity requires programmer to learn both HW and SW 11
  • 12. • Introduction to Parallel Computing • GPU as Accelerator 12
  • 13. 13
  • 14. • Power is the problem – System size is limited by power budget • Heterogeneous system is promising – CPU + Accelerator (=GPU) – CPU and GPU have their own strengths and weaknesses – CPU: few cores, high frequency (~GHz) – GPU: 1000 cores, low frequency (~MHz) 14
  • 15. • Graphics Processing Unit (GPU) – Originally developed for quickly generating 2D and 3D graphics, images, and video – Highly parallel processor – GPU is more power-efficient than CPU[3] *Image from nvidia.com 15
  • 16. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs 16
  • 17. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs vs 17
  • 18. CPU task 1 task 2 task 3 task 4 task 1 task 2 GPU task 3 task 4 time vs vs 18
  • 19. • Speculative execution by branch prediction is effective to shorten the execution time. But it makes the hardware complicated A = 2; B = 3; C = A+B; D = A*B; E = A-B; if ( C > 4 ) { E D C ? A = 0; } B = 0; 19
  • 20. • CPU has a large cache memory and control unit • GPUs devote more hardware resources to ALUs 20
  • 21. • Many simple cores – No speculation features • Simplicity to increase the number of cores on a chip • Fast context switch due to simplicity of its core design comp. memory access comp. GPU Core A comp. memory access context switch comp. time 21
  • 22. • CPU and GPU are very different processors – They have own strengths and weaknesses • CPU has few big cores to shorten the execution time • GPU has many simple cores to increase throughput – CPU for serial execution and GPU for parallel execution 22
  • 23. [1] Levin, E. “Grand challenges to computational science.” Communication of the ACM 32(12):1456-1457, December 1989. [2] Kauffmann, William J. III, and Larry L. Smarr. Supercomputing and the Transformation. [3] Nvidia. “Doing more with less of a scarce resource.” http://www.nvidia.com/object/gcr- energy-efficiency.html 23