SlideShare ist ein Scribd-Unternehmen logo
1 von 177
Downloaden Sie, um offline zu lesen
Massively Parallel Computing
                          CS 264 / CSCI E-292
Lecture #1: Introduction | January 25th, 2011




                Nicolas Pinto (MIT, Harvard)
                       pinto@mit.edu
...
Distant Students
Take a picture
   with...
a friend
           I like
his d
                 og
I like
cool
       hard
              ware
your
       m om
Send it to:
pinto@mit.edu
Today
Outline
Outline
Massively Parallel Computing

                                                  pu ting
                                              om
   Supe                                   r eC
          rcom
              putin                any -co
                    g             M
                         MPC H
                               igh-T
                 uting               hrou
                                          ghpu
                p                              t Co
            om                   Hu                    mput
          dC
   Clou                             ma                     ing
                                       n?
                                          “C
                                            om
                                               pu
                                                  tin
                                                      g”
Massively Parallel Computing

                                                  pu ting
                                              om
   Supe                                   r eC
          rcom
              putin                any -co
                    g             M
                         MPC H
                               igh-T
                 uting               hrou
                                          ghpu
                p                              t Co
            om                   Hu                    mput
          dC
   Clou                             ma                     ing
                                       n?
                                          “C
                                            om
                                               pu
                                                  tin
                                                      g”
http://www.youtube.com/watch?v=jj0WsQYtT7M
Modeling & Simulation

•   Physics, astronomy, molecular dynamics, finance, etc.

•   Data and processing intensive

•   Requires high-performance computing (HPC)

•   Driving HPC architecture development
(20 09)
CS 264
                Top Dog (2008)

•   Roadrunner, LANL
    •   #1 on top500.org in 2008 (now #7)

    •   1.105 petaflop/s

    •   3000 nodes with dual-core AMD Opteron processors

    •   Each node connected via PCIe to two IBM Cell processors

    •   Nodes are connected via Infiniband 4x DDR
http://www.top500.org/lists/2010/11
Tianhe-1A
 at NSC Tianjin

   2.507 Petaflop
   7168 Tesla M2050 GPUs




1 Petaflop/s = ~1M high-end laptops = ~world population
with hand calculators 24/7/365 for ~16 years
                                             Slide courtesy of Bill Dally (NVIDIA)
http://news.cnet.com/8301-13924_3-20021122-64.html
What $100+ million
     can buy you...




Roadrunner (#7)   Jaguar (#2)
Road
     runn
          e          r (#7
                                         )




       http://www.lanl.gov/roadrunner/
(# 2)
     gu ar
Ja
C?
        HP
    ses
 hou
W
Wh
  o us
      es H
          PC?
Massively Parallel Computing

                                                  pu ting
                                              om
   Supe                                   r eC
          rcom
              putin                any -co
                    g             M
                         MPC H
                               igh-T
                 uting               hrou
                                          ghpu
                p                              t Co
            om                   Hu                    mput
          dC
   Clou                             ma                     ing
                                       n?
                                          “C
                                            om
                                               pu
                                                  tin
                                                      g”
Cloud Computing?
Buzzword ?
Careless Computing?
Response from the legend:


...
http://techcrunch.com/2010/12/14/stallman-cloud-computing-careless-computing/
Cloud Utility Computing?
         for CS264
http://code.google.com/appengine/
http://aws.amazon.com/ec2/
http://www.nilkanth.com/my-uploads/2008/04/comparingpaas.png
Web Data Explosion
How much Data?

•   Google processes 24 PB / day, 8 EB / year (’10)

•   Wayback Machine has 3 PB,100 TB/month (’09)

•   Facebook user data: 2.5 PB, 15 TB/day (’09)

•   Facebook photos: 15 B, 3 TB/day (’09) - 90 B (now)

•   eBay user data: 6.5 PB, 50 TB/day (’09)

•   “all words ever spoken by human beings”~ 42 ZB


                                     Adapted from http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/
“640k ought to be enough for anybody.”
    - Bill Gates just a rumor (1981)
Disk Throughput
•   Average Google job size: 180 GB
•   1 SATA HDD = 75 MB / sec
•   Time to read 180 GB off disk: 45 mins
•   Solution: parallel reads
•   1000 HDDs = 75 GB / sec
•   Google’s solutions: BigTable, MapReduce, etc.
Cloud Computing
• Clear trend: centralization of computing
  resources in large data centers
• Q: What do Oregon, Iceland, and
  abandoned mines have in common?
• A: Fiber, juice, and space
• Utility computing!
Massively Parallel Computing

                                                  pu ting
                                              om
   Supe                                   r eC
          rcom
              putin                any -co
                    g             M
                         MPC H
                               igh-T
                 uting               hrou
                                          ghpu
                p                              t Co
            om                   Hu                    mput
          dC
   Clou                             ma                     ing
                                       n?
                                          “C
                                            om
                                               pu
                                                  tin
                                                      g”
Instrument Data
                               Explosion
Sloan Digital Sky Survey




                                    ATLUM / Connectome Project
Another example?
     hint: Switzerland
CERN in 2005....
ool 2005
              me r Sch
    ERN Sum
C
ool 2005
                  me r Sch
    ERN Sum
C




    bad taste party...
ool 2005
                   me r Sch
    ERN Sum
C




     pitchers...
LHC
LHC




      Maximilien Brice, © CERN
LHC




      Maximilien Brice, © CERN
N’s Cl uster
CER




      ~5000 nodes (‘05)
ool 2005
                   me r Sch
    ERN Sum
C



presentations...
Slide courtesy of Hanspeter Pfister




   Diesel Powered HPC



                           Life Support…




Murchison Widefield Array
How much Data?

•   NOAA has ~1 PB climate data (‘07)

•   MWA radio telescope: 8 GB/sec of data

•   Connectome: 1 PB / mm3 of brain tissue
    (1 EB for 1 cm3)

•   CERN’s LHC will generate 15 PB a year (‘08)
High Flops / Watt
Massively Parallel Computing

                                                  pu ting
                                              om
   Supe                                   r eC
          rcom
              putin                any -co
                    g             M
                         MPC H
                               igh-T
                 uting               hrou
                                          ghpu
                p                              t Co
            om                   Hu                    mput
          dC
   Clou                             ma                     ing
                                       n?
                                          “C
                                            om
                                               pu
                                                  tin
                                                      g”
Computer Games
•   PC gaming business:
    •   $15B / year market (2010)

    •   $22B / year in 2015 ?

    •   WOW: $1B / year

•   NVIDIA Shipped 1B GPUs since 1993:
    •   10 years to ship 200M GPUs (1993-2003)

•   1/3 of all PCs have more than one GPU

•   High-end GPUs sell for around $300

•   Now used for science application
CryEngine 2, CRYTEK
Many-Core Processors




Intel Core i7-980X Extreme                               NVIDIA GTX 580 SC
           6 cores                                            512 cores
       1.17B transistors                                    3B transistors
                    http://en.wikipedia.org/wiki/Transistor_count
Data Throughput

 Massive
   Data                                GPU
Parallelism

Instruction
   Level       CPU
Parallelism

              Data Fits in Cache   Huge Data
                                               David Kirk, NVIDIA
3 of Top5 Supercomputers
             $"!!


             $!!!


             #"!!
 !"#$%&'()




             #!!!


              "!!


                !
                    %&'()*+#,   -'./'0   1*2/3'*   %4/2'5*   6788*09::




                                                                         Bill Dally, NVIDIA
Personal Supercomputers




~4 Teraflops
@ 1500 Watts
Disruptive Technologies
•   Utility computing
    •   Commodity off-the-shelf (COTS) hardware

    •   Compute servers with 100s-1000s of processors

•   High-throughput computing
    •   Mass-market hardware

    •   Many-core processors with 100s-1000s of cores

    •   High compute density / high flops/W
Green HPC



NVIDIA/NCSA
 Green 500 Entry
Green HPC
NVIDIA/NCSA Green 500 Entry

 128 nodes, each with:
    1x Core i3 530 (2 cores, 2.93 GHz => 23.4 GFLOP peak)
    1x Tesla C2050 (14 cores, 1.15 GHz => 515.2 GFLOP peak)
    4x QDR Infiniband
    4 GB DRAM
 Theoretical Peak Perf: 68.95 TF
 Footprint: ~20 ft^2 => 3.45 TF/ft^2
 Cost: $500K (street price) => 137.9 MF/$
 Linpack: 33.62 TF, 36.0 kW => 934 MF/W
One more thing...
Massively Parallel Computing

                                                  pu ting
                                              om
   Supe                                   r eC
          rcom
              putin                any -co
                    g             M
                         MPC H
                               igh-T
                 uting               hrou
                                          ghpu
                p                              t Co
            om                   Hu                    mput
          dC
   Clou                             ma                     ing
                                       n?
                                          “C
                                            om
                                               pu
                                                  tin
                                                      g”
Massively Parallel Computing

                                                  pu ting
                                              om
   Supe                                   r eC
          rcom
              putin                any -co
                    g             M
                         MPC H
                               igh-T
                 uting               hrou
                                          ghpu
                p                              t Co
            om                   Hu                    mput
          dC
   Clou                             ma                     ing
                                       n?
                                          “C
                                            om
                                               pu
                                                  tin
                                                      g”
Massively Parallel Human
     Computing ???
•   “Crowdsourcing”

•   Amazon Mechanical Turk
    (artificial artificial intelligence)

•   Wikipedia

•   Stackoverflow

•   etc.
What is this course about?
What is this course about?
  Massively parallel processors
  •   GPU computing with CUDA

  Cloud computing
  •   Amazon’s EC2 as an example of utility
      computing
  •   MapReduce, the “back-end” of cloud
      computing
Less like Rodin...
More like Bob...
Outline
wikipedia.org
Anant Agarwal, MIT
Power Cost
• Power ∝ Voltage2 x Frequency

• Frequency ∝ Voltage
• Power ∝ Frequency3




                                 Jack Dongarra
Power Cost

            Cores   Freq   Perf   Power   P/W
  CPU        1       1      1       1      1
“New” CPU    1      1.5    1.5     3.3    0.45x
Multicore    2      0.75   1.5     0.8    1.88x




                                            Jack Dongarra
Problem with Buses




                     Anant Agarwal, MIT
Problem with Memory




                http://www.OpenSparc.net/
Problem with Disks




               64 MB / sec

                     Tom’s Hardware
Good News
•   Moore’s Law marches on
•   Chip real-estate is essentially free
•   Many-core architectures are commodities
•   Space for new innovations
Bad News
•   Power limits improvements in clock speed
•   Parallelism is the only route to improve
    performance
•   Computation / communication ratio will get
    worse
•   More frequent hardware failures?
Bad
News
A “Simple” Matter of
          Software
•   We have to use all the cores efficiently
•   Careful data and memory management
•   Must rethink software design
•   Must rethink algorithms
•   Must learn new skills!
•   Must learn new strategies!
•   Must learn new tools...
Our mantra: always use the right tool !
Outline
Instructor: Nicolas Pinto




            The Rowland Institute at Harvard
            HARVARD UNIVERSITY
~50% of   is for vision!
Everyone
 knows
  that...
The Approach
Reverse and Forward Engineering the Brain
The Approach
Reverse and Forward Engineering the Brain




     REVERSE                 FORWARD
       Study                       Build
    Natural System            Artificial System
t aflo ps !
      in =2 0 pe
bra
http://vimeo.com/7945275
“   If you want to have good ideas
         you must have many ideas.               ”
    “  Most of them will be wrong,
      and what you have to learn is
        which ones to throw away.                ”
                    Linus Pauling
                   (double Nobel Prize Winner)
High-throughput
       Screening
The curse of speed
...and the blessing of massively parallel computing



   thousands of big models

   large amounts of unsupervised
   learning experience
The curse of speed
...and the blessing of massively parallel computing

  No off-the-shelf solution? DIY!
  Engineering (Hardware/SysAdmin/Software)   Science


  Leverage non-scientific high-tech
  markets and their $billions of R&D...
  Gaming: Graphics Cards (GPUs), PlayStation 3
  Web 2.0: Cloud Computing (Amazon, Google)
r ow n!
 u ild you
B
The blessing of GPUs
    DIY GPU pr0n (since 2006)   Sony Playstation 3s (since 2007)
speed
                 (in billion floating point operations per second)


    Q9450 (Matlab/C) [2008]    0.3


        Q9450 (C/SSE) [2008]   9.0


7900GTX (OpenGL/Cg) [2006]           68.2


     PS3/Cell (C/ASM) [2007]            111.4


  8800GTX (CUDA1.x) [2007]                      192.7


   GTX280 (CUDA2.x) [2008]                                  339.3


                                                                             cha n ging...
                                                                         e
   GTX480 (CUDA3.x) [2010]
                                              pe        edu p is g a m                                           974.3
    (Fermi)
                                     >1 000X s
                                                                                 Pinto, Doukhan, DiCarlo, Cox PLoS 2009
                                                                                      Pinto, Cox GPU Comp. Gems 2011
Tired Of Waiting For Your
Computations?                           n your deskto
                                                             p:
                                     go
                       Supercomputin         n of c h e a p a n
                                                                d
                                                eneratio
                       Prog ramm ing the next g              sing CUDA
                                           all el hardware u
                              massively par


                                                                                           extensive
                                                                        g ive students
                                                       designed to                         disruptive
                        This IA    P has been                        ne   w potentially
                                                    e in using a                         ses having
                        ha nds-  on experienc                         ab  les the mas
                                                    echnology en
                        techno   logy. This t                 apabilities.
                                            rcomputing c
                        access to supe
                                                                                         rog ramming
                                                                        the CUDA p
                                                  e students to orp. which, has been an
                         We    will introduc                NVIDIA C
                                        developed by                                   u n if y in g t h e
                         lan guage                                  p li fy in g a n d
                                                  t o w a rd s s im
                            s e n t ia l s t e p                     el chips.
                         es
                                              of m  assively parall
                          prog ramming
                                                                                             ions from
                                                                      nero   us contribut
                                                     orted by ge             te at Harvard
                                                                                              , and MIT
                          This     IAP is supp                        stitu
                                                     e Rowland In                           s given by
                           NVID   IA Corp. , Th                        e   featuring talk
                                                      ) and will b
                           (OEIT   , BCS, EECS                 .
                                               various fields
                           experts from




                                                               (IAP 09)
                             6. 963
Co-Instructor:Hanspeter Pfister
Visual Computing
•   Large image & video collections



•   Physically-based modeling



•   Face modeling and recognition



•   Visualization
VolumePro 500
                Released
                 1999
GPGPU
Connectome
NSF CDI Grant ’08-’11
NVIDIA CUDA Center
   of Excellence
TFs
•   Claudio Andreoni (MIT Course 18)
•   Dwight Bell (Harvard DCE)
•   Krunal Patel (Accelereyes)
•   Jud Porter (Harvard SEAS)
•   Justin Riley (MIT OEIT)
•   Mike Roberts (Harvard SEAS)
Claudio Andreoni
(MIT Course 18)
Dwight Bell
(Harvard DCE)
Krunal Patel
(Accelereyes)
Jud Porter
(Harvard SEAS)
Justin Riley
(MIT OEIT)
Mike Roberts
(Harvard SEAS)
About You
About you...

•   Undergraduate ? Graduate ?

•   Programming ? >5 years ? <2 years ?

•   CUDA ? MPI ? MapReduce ?

•   CS ? Life Sc ? Applied Sc ? Engineering ? Math ? Physics ?

•   Humanities ? Social Sc ? Economy ?
Outline
CS 264 Goals
•   Have fun!
•   Learn basic principles of parallel computing
•   Learn programming with CUDA
•   Learn to program a cluster of GPUs (e.g. MPI)
•   Learn basics of EC2 and MapReduce
•   Learn new learning strategies, tools, etc.
•   Implement a final project
Experimental Learning         t,   re pe
                                            epea
                  Strategy            peat,r
                                     Re
Memory “recall”
Lectures

•Theory, Architecture, Patterns ?
• Act 1: GPU Computing
• Act II: Cloud Computing
• Act III: Guest Lectures
Lectures “Format”
• 2x ~ 45min regular “lectures”
• ~ 15min “Clinic”
 •   we’ll be here to fix your problems

• ~ 5 min: Life and Code “Hacking”:
 •   GTD Zen
 •   Presentation Zen
 •   Ninja Programming Tricks & Tools, etc.
 •   Interested? email staff+spotlight@cs264.org
Act I: GPU Computing

•   Introduction to GPU Computing
•   CUDA Basics
•   CUDA Advanced
•   CUDA Ninja Tricks !
l u t i on
                                                                n   k   Convo
                                                      Fi lterba
Performance / Effort                             3D



                    Performance (g ops)   Development Time (hours)


         0.3
Matlab
         0.5


           9.0
 C/SSE
           10.0


                                111.4
  PS3
                  30.0


                                                                            339.3
GT200
           10.0
Empirical results...

                                             Performance (g ops)

 Q9450 (Matlab/C) [2008]    0.3


     Q9450 (C/SSE) [2008]   9.0


     7900GTX (Cg) [2006]          68.2


  PS3/Cell (C/ASM) [2007]            111.4


8800GTX (CUDA1.x) [2007]                     192.7


 GTX280 (CUDA2.x) [2008]                             339.3

                                                                                  .
 GTX480 (CUDA3.x) [2010]                                             e cha nging..    974.3
                                                            g   am
                                                  e edup is
                                   >1    0 00X sp
Act II: Cloud Computing

•   Introduction to utility computing
•   EC2 & starcluster (Justin Riley, MIT OEIT)
•   Hadoop (Zak Stone, SEAS)
•   MapReduce with GPU Jobs on EC2
Amazon’s Web Services
•   Elastic Compute Cloud (EC2)
    •   Rent computing resources by the hour

    •   Basic unit of accounting = instance-hour

    •   Additional costs for bandwidth

•   You’ll be getting free AWS credits for course
    assignments
MapReduce
•   Functional programming meets distributed
    processing
•   Processing of lists with <key, value> pairs
•   Batch data processing infrastructure
•   Move the computation where the data is
Act III: Guest Lectures
•   Andreas Knockler (NYU): OpenCL & PyOpenCL
•   John Owens (UC Davis): fundamental algorithms/
    data structures and irregular parallelism
•   Nathan Bell (NVIDIA): Thrust
•   Duane Merrill* (Virginia Tech): Ninja Tricks
•   Mike Bauer* (Stanford): Sequoia
•   Greg Diamos (Georgia Tech): Ocelot
•   Other lecturers* from Google,Yahoo, Sun, Intel,
    NCSA, AMD, Cloudera, etc.
Labs
•   Lead by TF(s)
•   Work on an interesting small problem
•   From skeleton code to solution
•   Hands-on
53 Church St.
53 Church St.
53 Church St.
53 Church St., Room 104
53 Church St., Rm 104
   Thu, Fri 7.35-9.35 pm
53 Church St., Room 105
53 Church St., Rm 105
NVIDIA Fx4800 Quadro
             •   MacPro

             •   NVIDIA Fx4800
                 Quadro, 1.5 GB
Resonance @ SEAS
           •   Quad-core Intel Xeon
               host, 3 GHz, 8 GB

           •   8 Tesla S1070s (32
               GPUs, 4 GB each)

           •   16 quad-core Intel
               Xeons, 2 GHz, 16 GB

           •   http://
               community.crimsongri
               d.harvard.edu/getting-
               started/resources/
               resonance-cuda-host
What do you
            need to know?
•   Programming (ideally in C / C++)
    •   See HW 0

•   Basics of computer systems
    •   CS 61 or similar
Homeworks
•   Programming assignments
•   “Issue Spotter” (code debug & review, Q&A)
•   Contribution to the community
    (OSS, Wikipedia, Stackoverflow, etc.)
•   Due: Fridays at 11 pm EST
    •   Hard deadline - 2 “bonus” days
Office Hours
•   Lead by a TF
•   104 @ 53 Church St
    (check website and news feed)
Participation
•   HW0 (this week)
•   Mandatory attendance for guest lectures
•   forum.cs264.org
    •   Answer questions, help others

    •   Post relevant links and discussions (!)
Final Project
•   Implement a substantial project
•   Pick from a list of suggested projects or design
    your own
•   Milestones along the way (idea, proposal, etc.)
•   In-class final presentations
•   $500+ price for the best project
Grading

•   On a 0-100 scale
    •   Participation:   10%

    •   Homework:        50%

    •   Final project:   40%
www.cs264.org
•   Detailed schedule (soon)
•   News blog w/ RSS feed
•   Video feeds
•   Forum (forum.cs264.org)
•   Academic honesty policy
•   HW0 (due Fri 2/4)
Thank you!
one more thing
         from WikiLeaks?
Is this course for me ???
This course is not for you...
  •   If you’re not genuinely interested in the topic
  •   If you can’t cope with uncertainly,
      unpredictability, poor documentation, and
      immature software
  •   If you’re not ready to do a lot of programming
  •   If you’re not open to thinking about computing in
      new ways
  •   If you can’t put in the time

                                             Slide after Jimmy Lin, iSchool, Maryland
Otherwise...
It will be a richly rewarding experience!
Guaranteed?!
Be Patient

  Be Flexible

Be Constructive


                  http://davidzinger.wordpress.com/2007/05/page/2/
It would be a
                                                       win-win-win situation!




(The Office Season 2, Episode 27: Conflict Resolution)
Hypergrowth ?
Acknowledgements
•   Hanspeter Pfister & Henry Leitner, DCE
•   TFs
•   Rob Parrott & IT Team, SEAS
•   Gabe Russell & Video Team, DCE
•   NVIDIA, esp. David Luebke
•   Amazon
CO ME
Next?

•   Fill out the survey: http://bit.ly/enrb1r
•   Get ready for HW0 (Lab 1 & 2)
•   Subscribe to http://forum.cs264.org
•   Subscribe to RSS feed: http://bit.ly/eFIsqR

Weitere ähnliche Inhalte

Mehr von npinto

High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
npinto
 
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
npinto
 
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
npinto
 
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
npinto
 
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
npinto
 
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
npinto
 
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
npinto
 
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
npinto
 
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
npinto
 
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
npinto
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
npinto
 
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
npinto
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
npinto
 
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
npinto
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
npinto
 
[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming
npinto
 
[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming
npinto
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
npinto
 

Mehr von npinto (20)

"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)
 
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
 
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
 
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
 
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
 
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
 
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
 
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
 
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
 
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
 
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
 
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
 
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
 
[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming
 
[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
 

Kürzlich hochgeladen

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

[Harvard CS264] 01 - Introduction

  • 1. Massively Parallel Computing CS 264 / CSCI E-292 Lecture #1: Introduction | January 25th, 2011 Nicolas Pinto (MIT, Harvard) pinto@mit.edu
  • 2. ...
  • 3.
  • 4.
  • 6. Take a picture with...
  • 7. a friend I like
  • 8. his d og I like
  • 9. cool hard ware
  • 10. your m om
  • 12.
  • 13. Today
  • 16. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 17. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 19. Modeling & Simulation • Physics, astronomy, molecular dynamics, finance, etc. • Data and processing intensive • Requires high-performance computing (HPC) • Driving HPC architecture development
  • 20. (20 09) CS 264 Top Dog (2008) • Roadrunner, LANL • #1 on top500.org in 2008 (now #7) • 1.105 petaflop/s • 3000 nodes with dual-core AMD Opteron processors • Each node connected via PCIe to two IBM Cell processors • Nodes are connected via Infiniband 4x DDR
  • 22. Tianhe-1A at NSC Tianjin 2.507 Petaflop 7168 Tesla M2050 GPUs 1 Petaflop/s = ~1M high-end laptops = ~world population with hand calculators 24/7/365 for ~16 years Slide courtesy of Bill Dally (NVIDIA)
  • 24.
  • 25. What $100+ million can buy you... Roadrunner (#7) Jaguar (#2)
  • 26. Road runn e r (#7 ) http://www.lanl.gov/roadrunner/
  • 27. (# 2) gu ar Ja
  • 28.
  • 29. C? HP ses hou W
  • 30. Wh o us es H PC?
  • 31. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 34.
  • 36. Response from the legend: ...
  • 37.
  • 39.
  • 45. How much Data? • Google processes 24 PB / day, 8 EB / year (’10) • Wayback Machine has 3 PB,100 TB/month (’09) • Facebook user data: 2.5 PB, 15 TB/day (’09) • Facebook photos: 15 B, 3 TB/day (’09) - 90 B (now) • eBay user data: 6.5 PB, 50 TB/day (’09) • “all words ever spoken by human beings”~ 42 ZB Adapted from http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/
  • 46. “640k ought to be enough for anybody.” - Bill Gates just a rumor (1981)
  • 47. Disk Throughput • Average Google job size: 180 GB • 1 SATA HDD = 75 MB / sec • Time to read 180 GB off disk: 45 mins • Solution: parallel reads • 1000 HDDs = 75 GB / sec • Google’s solutions: BigTable, MapReduce, etc.
  • 48. Cloud Computing • Clear trend: centralization of computing resources in large data centers • Q: What do Oregon, Iceland, and abandoned mines have in common? • A: Fiber, juice, and space • Utility computing!
  • 49. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 50. Instrument Data Explosion Sloan Digital Sky Survey ATLUM / Connectome Project
  • 51. Another example? hint: Switzerland
  • 53. ool 2005 me r Sch ERN Sum C
  • 54. ool 2005 me r Sch ERN Sum C bad taste party...
  • 55. ool 2005 me r Sch ERN Sum C pitchers...
  • 56. LHC
  • 57. LHC Maximilien Brice, © CERN
  • 58. LHC Maximilien Brice, © CERN
  • 59. N’s Cl uster CER ~5000 nodes (‘05)
  • 60. ool 2005 me r Sch ERN Sum C presentations...
  • 61. Slide courtesy of Hanspeter Pfister Diesel Powered HPC Life Support… Murchison Widefield Array
  • 62. How much Data? • NOAA has ~1 PB climate data (‘07) • MWA radio telescope: 8 GB/sec of data • Connectome: 1 PB / mm3 of brain tissue (1 EB for 1 cm3) • CERN’s LHC will generate 15 PB a year (‘08)
  • 63. High Flops / Watt
  • 64. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 65. Computer Games • PC gaming business: • $15B / year market (2010) • $22B / year in 2015 ? • WOW: $1B / year • NVIDIA Shipped 1B GPUs since 1993: • 10 years to ship 200M GPUs (1993-2003) • 1/3 of all PCs have more than one GPU • High-end GPUs sell for around $300 • Now used for science application
  • 67. Many-Core Processors Intel Core i7-980X Extreme NVIDIA GTX 580 SC 6 cores 512 cores 1.17B transistors 3B transistors http://en.wikipedia.org/wiki/Transistor_count
  • 68. Data Throughput Massive Data GPU Parallelism Instruction Level CPU Parallelism Data Fits in Cache Huge Data David Kirk, NVIDIA
  • 69. 3 of Top5 Supercomputers $"!! $!!! #"!! !"#$%&'() #!!! "!! ! %&'()*+#, -'./'0 1*2/3'* %4/2'5* 6788*09:: Bill Dally, NVIDIA
  • 71. Disruptive Technologies • Utility computing • Commodity off-the-shelf (COTS) hardware • Compute servers with 100s-1000s of processors • High-throughput computing • Mass-market hardware • Many-core processors with 100s-1000s of cores • High compute density / high flops/W
  • 72.
  • 74. Green HPC NVIDIA/NCSA Green 500 Entry 128 nodes, each with: 1x Core i3 530 (2 cores, 2.93 GHz => 23.4 GFLOP peak) 1x Tesla C2050 (14 cores, 1.15 GHz => 515.2 GFLOP peak) 4x QDR Infiniband 4 GB DRAM Theoretical Peak Perf: 68.95 TF Footprint: ~20 ft^2 => 3.45 TF/ft^2 Cost: $500K (street price) => 137.9 MF/$ Linpack: 33.62 TF, 36.0 kW => 934 MF/W
  • 76. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 77. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 78. Massively Parallel Human Computing ??? • “Crowdsourcing” • Amazon Mechanical Turk (artificial artificial intelligence) • Wikipedia • Stackoverflow • etc.
  • 79. What is this course about?
  • 80. What is this course about? Massively parallel processors • GPU computing with CUDA Cloud computing • Amazon’s EC2 as an example of utility computing • MapReduce, the “back-end” of cloud computing
  • 83.
  • 87. Power Cost • Power ∝ Voltage2 x Frequency • Frequency ∝ Voltage • Power ∝ Frequency3 Jack Dongarra
  • 88.
  • 89. Power Cost Cores Freq Perf Power P/W CPU 1 1 1 1 1 “New” CPU 1 1.5 1.5 3.3 0.45x Multicore 2 0.75 1.5 0.8 1.88x Jack Dongarra
  • 90. Problem with Buses Anant Agarwal, MIT
  • 91. Problem with Memory http://www.OpenSparc.net/
  • 92. Problem with Disks 64 MB / sec Tom’s Hardware
  • 93. Good News • Moore’s Law marches on • Chip real-estate is essentially free • Many-core architectures are commodities • Space for new innovations
  • 94. Bad News • Power limits improvements in clock speed • Parallelism is the only route to improve performance • Computation / communication ratio will get worse • More frequent hardware failures?
  • 96. A “Simple” Matter of Software • We have to use all the cores efficiently • Careful data and memory management • Must rethink software design • Must rethink algorithms • Must learn new skills! • Must learn new strategies! • Must learn new tools...
  • 97. Our mantra: always use the right tool !
  • 98.
  • 100. Instructor: Nicolas Pinto The Rowland Institute at Harvard HARVARD UNIVERSITY
  • 101. ~50% of is for vision!
  • 102. Everyone knows that...
  • 103. The Approach Reverse and Forward Engineering the Brain
  • 104. The Approach Reverse and Forward Engineering the Brain REVERSE FORWARD Study Build Natural System Artificial System
  • 105. t aflo ps ! in =2 0 pe bra
  • 107. If you want to have good ideas you must have many ideas. ” “ Most of them will be wrong, and what you have to learn is which ones to throw away. ” Linus Pauling (double Nobel Prize Winner)
  • 108. High-throughput Screening
  • 109.
  • 110. The curse of speed ...and the blessing of massively parallel computing thousands of big models large amounts of unsupervised learning experience
  • 111. The curse of speed ...and the blessing of massively parallel computing No off-the-shelf solution? DIY! Engineering (Hardware/SysAdmin/Software) Science Leverage non-scientific high-tech markets and their $billions of R&D... Gaming: Graphics Cards (GPUs), PlayStation 3 Web 2.0: Cloud Computing (Amazon, Google)
  • 112.
  • 113. r ow n! u ild you B
  • 114. The blessing of GPUs DIY GPU pr0n (since 2006) Sony Playstation 3s (since 2007)
  • 115. speed (in billion floating point operations per second) Q9450 (Matlab/C) [2008] 0.3 Q9450 (C/SSE) [2008] 9.0 7900GTX (OpenGL/Cg) [2006] 68.2 PS3/Cell (C/ASM) [2007] 111.4 8800GTX (CUDA1.x) [2007] 192.7 GTX280 (CUDA2.x) [2008] 339.3 cha n ging... e GTX480 (CUDA3.x) [2010] pe edu p is g a m 974.3 (Fermi) >1 000X s Pinto, Doukhan, DiCarlo, Cox PLoS 2009 Pinto, Cox GPU Comp. Gems 2011
  • 116. Tired Of Waiting For Your Computations? n your deskto p: go Supercomputin n of c h e a p a n d eneratio Prog ramm ing the next g sing CUDA all el hardware u massively par extensive g ive students designed to disruptive This IA P has been ne w potentially e in using a ses having ha nds- on experienc ab les the mas echnology en techno logy. This t apabilities. rcomputing c access to supe rog ramming the CUDA p e students to orp. which, has been an We will introduc NVIDIA C developed by u n if y in g t h e lan guage p li fy in g a n d t o w a rd s s im s e n t ia l s t e p el chips. es of m assively parall prog ramming ions from nero us contribut orted by ge te at Harvard , and MIT This IAP is supp stitu e Rowland In s given by NVID IA Corp. , Th e featuring talk ) and will b (OEIT , BCS, EECS . various fields experts from (IAP 09) 6. 963
  • 118. Visual Computing • Large image & video collections • Physically-based modeling • Face modeling and recognition • Visualization
  • 119. VolumePro 500 Released 1999
  • 120. GPGPU
  • 122.
  • 123. NSF CDI Grant ’08-’11
  • 124. NVIDIA CUDA Center of Excellence
  • 125. TFs • Claudio Andreoni (MIT Course 18) • Dwight Bell (Harvard DCE) • Krunal Patel (Accelereyes) • Jud Porter (Harvard SEAS) • Justin Riley (MIT OEIT) • Mike Roberts (Harvard SEAS)
  • 133. About you... • Undergraduate ? Graduate ? • Programming ? >5 years ? <2 years ? • CUDA ? MPI ? MapReduce ? • CS ? Life Sc ? Applied Sc ? Engineering ? Math ? Physics ? • Humanities ? Social Sc ? Economy ?
  • 134.
  • 136. CS 264 Goals • Have fun! • Learn basic principles of parallel computing • Learn programming with CUDA • Learn to program a cluster of GPUs (e.g. MPI) • Learn basics of EC2 and MapReduce • Learn new learning strategies, tools, etc. • Implement a final project
  • 137. Experimental Learning t, re pe epea Strategy peat,r Re Memory “recall”
  • 138. Lectures •Theory, Architecture, Patterns ? • Act 1: GPU Computing • Act II: Cloud Computing • Act III: Guest Lectures
  • 139. Lectures “Format” • 2x ~ 45min regular “lectures” • ~ 15min “Clinic” • we’ll be here to fix your problems • ~ 5 min: Life and Code “Hacking”: • GTD Zen • Presentation Zen • Ninja Programming Tricks & Tools, etc. • Interested? email staff+spotlight@cs264.org
  • 140. Act I: GPU Computing • Introduction to GPU Computing • CUDA Basics • CUDA Advanced • CUDA Ninja Tricks !
  • 141. l u t i on n k Convo Fi lterba Performance / Effort 3D Performance (g ops) Development Time (hours) 0.3 Matlab 0.5 9.0 C/SSE 10.0 111.4 PS3 30.0 339.3 GT200 10.0
  • 142. Empirical results... Performance (g ops) Q9450 (Matlab/C) [2008] 0.3 Q9450 (C/SSE) [2008] 9.0 7900GTX (Cg) [2006] 68.2 PS3/Cell (C/ASM) [2007] 111.4 8800GTX (CUDA1.x) [2007] 192.7 GTX280 (CUDA2.x) [2008] 339.3 . GTX480 (CUDA3.x) [2010] e cha nging.. 974.3 g am e edup is >1 0 00X sp
  • 143. Act II: Cloud Computing • Introduction to utility computing • EC2 & starcluster (Justin Riley, MIT OEIT) • Hadoop (Zak Stone, SEAS) • MapReduce with GPU Jobs on EC2
  • 144. Amazon’s Web Services • Elastic Compute Cloud (EC2) • Rent computing resources by the hour • Basic unit of accounting = instance-hour • Additional costs for bandwidth • You’ll be getting free AWS credits for course assignments
  • 145. MapReduce • Functional programming meets distributed processing • Processing of lists with <key, value> pairs • Batch data processing infrastructure • Move the computation where the data is
  • 146. Act III: Guest Lectures • Andreas Knockler (NYU): OpenCL & PyOpenCL • John Owens (UC Davis): fundamental algorithms/ data structures and irregular parallelism • Nathan Bell (NVIDIA): Thrust • Duane Merrill* (Virginia Tech): Ninja Tricks • Mike Bauer* (Stanford): Sequoia • Greg Diamos (Georgia Tech): Ocelot • Other lecturers* from Google,Yahoo, Sun, Intel, NCSA, AMD, Cloudera, etc.
  • 147. Labs • Lead by TF(s) • Work on an interesting small problem • From skeleton code to solution • Hands-on
  • 148.
  • 152. 53 Church St., Room 104 53 Church St., Rm 104 Thu, Fri 7.35-9.35 pm
  • 153. 53 Church St., Room 105 53 Church St., Rm 105
  • 154. NVIDIA Fx4800 Quadro • MacPro • NVIDIA Fx4800 Quadro, 1.5 GB
  • 155. Resonance @ SEAS • Quad-core Intel Xeon host, 3 GHz, 8 GB • 8 Tesla S1070s (32 GPUs, 4 GB each) • 16 quad-core Intel Xeons, 2 GHz, 16 GB • http:// community.crimsongri d.harvard.edu/getting- started/resources/ resonance-cuda-host
  • 156. What do you need to know? • Programming (ideally in C / C++) • See HW 0 • Basics of computer systems • CS 61 or similar
  • 157. Homeworks • Programming assignments • “Issue Spotter” (code debug & review, Q&A) • Contribution to the community (OSS, Wikipedia, Stackoverflow, etc.) • Due: Fridays at 11 pm EST • Hard deadline - 2 “bonus” days
  • 158. Office Hours • Lead by a TF • 104 @ 53 Church St (check website and news feed)
  • 159. Participation • HW0 (this week) • Mandatory attendance for guest lectures • forum.cs264.org • Answer questions, help others • Post relevant links and discussions (!)
  • 160. Final Project • Implement a substantial project • Pick from a list of suggested projects or design your own • Milestones along the way (idea, proposal, etc.) • In-class final presentations • $500+ price for the best project
  • 161. Grading • On a 0-100 scale • Participation: 10% • Homework: 50% • Final project: 40%
  • 162. www.cs264.org • Detailed schedule (soon) • News blog w/ RSS feed • Video feeds • Forum (forum.cs264.org) • Academic honesty policy • HW0 (due Fri 2/4)
  • 163.
  • 165. one more thing from WikiLeaks?
  • 166. Is this course for me ???
  • 167. This course is not for you... • If you’re not genuinely interested in the topic • If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software • If you’re not ready to do a lot of programming • If you’re not open to thinking about computing in new ways • If you can’t put in the time Slide after Jimmy Lin, iSchool, Maryland
  • 168. Otherwise... It will be a richly rewarding experience!
  • 169.
  • 170.
  • 172. Be Patient Be Flexible Be Constructive http://davidzinger.wordpress.com/2007/05/page/2/
  • 173. It would be a win-win-win situation! (The Office Season 2, Episode 27: Conflict Resolution)
  • 175. Acknowledgements • Hanspeter Pfister & Henry Leitner, DCE • TFs • Rob Parrott & IT Team, SEAS • Gabe Russell & Video Team, DCE • NVIDIA, esp. David Luebke • Amazon
  • 176. CO ME
  • 177. Next? • Fill out the survey: http://bit.ly/enrb1r • Get ready for HW0 (Lab 1 & 2) • Subscribe to http://forum.cs264.org • Subscribe to RSS feed: http://bit.ly/eFIsqR