16. Massively Parallel Computing
pu ting
om
Supe r eC
rcom
putin any -co
g M
MPC H
igh-T
uting hrou
ghpu
p t Co
om Hu mput
dC
Clou ma ing
n?
“C
om
pu
tin
g”
17. Massively Parallel Computing
pu ting
om
Supe r eC
rcom
putin any -co
g M
MPC H
igh-T
uting hrou
ghpu
p t Co
om Hu mput
dC
Clou ma ing
n?
“C
om
pu
tin
g”
19. Modeling & Simulation
• Physics, astronomy, molecular dynamics, finance, etc.
• Data and processing intensive
• Requires high-performance computing (HPC)
• Driving HPC architecture development
20. (20 09)
CS 264
Top Dog (2008)
• Roadrunner, LANL
• #1 on top500.org in 2008 (now #7)
• 1.105 petaflop/s
• 3000 nodes with dual-core AMD Opteron processors
• Each node connected via PCIe to two IBM Cell processors
• Nodes are connected via Infiniband 4x DDR
22. Tianhe-1A
at NSC Tianjin
2.507 Petaflop
7168 Tesla M2050 GPUs
1 Petaflop/s = ~1M high-end laptops = ~world population
with hand calculators 24/7/365 for ~16 years
Slide courtesy of Bill Dally (NVIDIA)
31. Massively Parallel Computing
pu ting
om
Supe r eC
rcom
putin any -co
g M
MPC H
igh-T
uting hrou
ghpu
p t Co
om Hu mput
dC
Clou ma ing
n?
“C
om
pu
tin
g”
45. How much Data?
• Google processes 24 PB / day, 8 EB / year (’10)
• Wayback Machine has 3 PB,100 TB/month (’09)
• Facebook user data: 2.5 PB, 15 TB/day (’09)
• Facebook photos: 15 B, 3 TB/day (’09) - 90 B (now)
• eBay user data: 6.5 PB, 50 TB/day (’09)
• “all words ever spoken by human beings”~ 42 ZB
Adapted from http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/
46. “640k ought to be enough for anybody.”
- Bill Gates just a rumor (1981)
47. Disk Throughput
• Average Google job size: 180 GB
• 1 SATA HDD = 75 MB / sec
• Time to read 180 GB off disk: 45 mins
• Solution: parallel reads
• 1000 HDDs = 75 GB / sec
• Google’s solutions: BigTable, MapReduce, etc.
48. Cloud Computing
• Clear trend: centralization of computing
resources in large data centers
• Q: What do Oregon, Iceland, and
abandoned mines have in common?
• A: Fiber, juice, and space
• Utility computing!
49. Massively Parallel Computing
pu ting
om
Supe r eC
rcom
putin any -co
g M
MPC H
igh-T
uting hrou
ghpu
p t Co
om Hu mput
dC
Clou ma ing
n?
“C
om
pu
tin
g”
50. Instrument Data
Explosion
Sloan Digital Sky Survey
ATLUM / Connectome Project
61. Slide courtesy of Hanspeter Pfister
Diesel Powered HPC
Life Support…
Murchison Widefield Array
62. How much Data?
• NOAA has ~1 PB climate data (‘07)
• MWA radio telescope: 8 GB/sec of data
• Connectome: 1 PB / mm3 of brain tissue
(1 EB for 1 cm3)
• CERN’s LHC will generate 15 PB a year (‘08)
64. Massively Parallel Computing
pu ting
om
Supe r eC
rcom
putin any -co
g M
MPC H
igh-T
uting hrou
ghpu
p t Co
om Hu mput
dC
Clou ma ing
n?
“C
om
pu
tin
g”
65. Computer Games
• PC gaming business:
• $15B / year market (2010)
• $22B / year in 2015 ?
• WOW: $1B / year
• NVIDIA Shipped 1B GPUs since 1993:
• 10 years to ship 200M GPUs (1993-2003)
• 1/3 of all PCs have more than one GPU
• High-end GPUs sell for around $300
• Now used for science application
76. Massively Parallel Computing
pu ting
om
Supe r eC
rcom
putin any -co
g M
MPC H
igh-T
uting hrou
ghpu
p t Co
om Hu mput
dC
Clou ma ing
n?
“C
om
pu
tin
g”
77. Massively Parallel Computing
pu ting
om
Supe r eC
rcom
putin any -co
g M
MPC H
igh-T
uting hrou
ghpu
p t Co
om Hu mput
dC
Clou ma ing
n?
“C
om
pu
tin
g”
78. Massively Parallel Human
Computing ???
• “Crowdsourcing”
• Amazon Mechanical Turk
(artificial artificial intelligence)
• Wikipedia
• Stackoverflow
• etc.
80. What is this course about?
Massively parallel processors
• GPU computing with CUDA
Cloud computing
• Amazon’s EC2 as an example of utility
computing
• MapReduce, the “back-end” of cloud
computing
93. Good News
• Moore’s Law marches on
• Chip real-estate is essentially free
• Many-core architectures are commodities
• Space for new innovations
94. Bad News
• Power limits improvements in clock speed
• Parallelism is the only route to improve
performance
• Computation / communication ratio will get
worse
• More frequent hardware failures?
96. A “Simple” Matter of
Software
• We have to use all the cores efficiently
• Careful data and memory management
• Must rethink software design
• Must rethink algorithms
• Must learn new skills!
• Must learn new strategies!
• Must learn new tools...
107. “ If you want to have good ideas
you must have many ideas. ”
“ Most of them will be wrong,
and what you have to learn is
which ones to throw away. ”
Linus Pauling
(double Nobel Prize Winner)
110. The curse of speed
...and the blessing of massively parallel computing
thousands of big models
large amounts of unsupervised
learning experience
111. The curse of speed
...and the blessing of massively parallel computing
No off-the-shelf solution? DIY!
Engineering (Hardware/SysAdmin/Software) Science
Leverage non-scientific high-tech
markets and their $billions of R&D...
Gaming: Graphics Cards (GPUs), PlayStation 3
Web 2.0: Cloud Computing (Amazon, Google)
114. The blessing of GPUs
DIY GPU pr0n (since 2006) Sony Playstation 3s (since 2007)
115. speed
(in billion floating point operations per second)
Q9450 (Matlab/C) [2008] 0.3
Q9450 (C/SSE) [2008] 9.0
7900GTX (OpenGL/Cg) [2006] 68.2
PS3/Cell (C/ASM) [2007] 111.4
8800GTX (CUDA1.x) [2007] 192.7
GTX280 (CUDA2.x) [2008] 339.3
cha n ging...
e
GTX480 (CUDA3.x) [2010]
pe edu p is g a m 974.3
(Fermi)
>1 000X s
Pinto, Doukhan, DiCarlo, Cox PLoS 2009
Pinto, Cox GPU Comp. Gems 2011
116. Tired Of Waiting For Your
Computations? n your deskto
p:
go
Supercomputin n of c h e a p a n
d
eneratio
Prog ramm ing the next g sing CUDA
all el hardware u
massively par
extensive
g ive students
designed to disruptive
This IA P has been ne w potentially
e in using a ses having
ha nds- on experienc ab les the mas
echnology en
techno logy. This t apabilities.
rcomputing c
access to supe
rog ramming
the CUDA p
e students to orp. which, has been an
We will introduc NVIDIA C
developed by u n if y in g t h e
lan guage p li fy in g a n d
t o w a rd s s im
s e n t ia l s t e p el chips.
es
of m assively parall
prog ramming
ions from
nero us contribut
orted by ge te at Harvard
, and MIT
This IAP is supp stitu
e Rowland In s given by
NVID IA Corp. , Th e featuring talk
) and will b
(OEIT , BCS, EECS .
various fields
experts from
(IAP 09)
6. 963
136. CS 264 Goals
• Have fun!
• Learn basic principles of parallel computing
• Learn programming with CUDA
• Learn to program a cluster of GPUs (e.g. MPI)
• Learn basics of EC2 and MapReduce
• Learn new learning strategies, tools, etc.
• Implement a final project
139. Lectures “Format”
• 2x ~ 45min regular “lectures”
• ~ 15min “Clinic”
• we’ll be here to fix your problems
• ~ 5 min: Life and Code “Hacking”:
• GTD Zen
• Presentation Zen
• Ninja Programming Tricks & Tools, etc.
• Interested? email staff+spotlight@cs264.org
140. Act I: GPU Computing
• Introduction to GPU Computing
• CUDA Basics
• CUDA Advanced
• CUDA Ninja Tricks !
141. l u t i on
n k Convo
Fi lterba
Performance / Effort 3D
Performance (g ops) Development Time (hours)
0.3
Matlab
0.5
9.0
C/SSE
10.0
111.4
PS3
30.0
339.3
GT200
10.0
142. Empirical results...
Performance (g ops)
Q9450 (Matlab/C) [2008] 0.3
Q9450 (C/SSE) [2008] 9.0
7900GTX (Cg) [2006] 68.2
PS3/Cell (C/ASM) [2007] 111.4
8800GTX (CUDA1.x) [2007] 192.7
GTX280 (CUDA2.x) [2008] 339.3
.
GTX480 (CUDA3.x) [2010] e cha nging.. 974.3
g am
e edup is
>1 0 00X sp
143. Act II: Cloud Computing
• Introduction to utility computing
• EC2 & starcluster (Justin Riley, MIT OEIT)
• Hadoop (Zak Stone, SEAS)
• MapReduce with GPU Jobs on EC2
144. Amazon’s Web Services
• Elastic Compute Cloud (EC2)
• Rent computing resources by the hour
• Basic unit of accounting = instance-hour
• Additional costs for bandwidth
• You’ll be getting free AWS credits for course
assignments
145. MapReduce
• Functional programming meets distributed
processing
• Processing of lists with <key, value> pairs
• Batch data processing infrastructure
• Move the computation where the data is
146. Act III: Guest Lectures
• Andreas Knockler (NYU): OpenCL & PyOpenCL
• John Owens (UC Davis): fundamental algorithms/
data structures and irregular parallelism
• Nathan Bell (NVIDIA): Thrust
• Duane Merrill* (Virginia Tech): Ninja Tricks
• Mike Bauer* (Stanford): Sequoia
• Greg Diamos (Georgia Tech): Ocelot
• Other lecturers* from Google,Yahoo, Sun, Intel,
NCSA, AMD, Cloudera, etc.
147. Labs
• Lead by TF(s)
• Work on an interesting small problem
• From skeleton code to solution
• Hands-on
156. What do you
need to know?
• Programming (ideally in C / C++)
• See HW 0
• Basics of computer systems
• CS 61 or similar
157. Homeworks
• Programming assignments
• “Issue Spotter” (code debug & review, Q&A)
• Contribution to the community
(OSS, Wikipedia, Stackoverflow, etc.)
• Due: Fridays at 11 pm EST
• Hard deadline - 2 “bonus” days
158. Office Hours
• Lead by a TF
• 104 @ 53 Church St
(check website and news feed)
159. Participation
• HW0 (this week)
• Mandatory attendance for guest lectures
• forum.cs264.org
• Answer questions, help others
• Post relevant links and discussions (!)
160. Final Project
• Implement a substantial project
• Pick from a list of suggested projects or design
your own
• Milestones along the way (idea, proposal, etc.)
• In-class final presentations
• $500+ price for the best project
161. Grading
• On a 0-100 scale
• Participation: 10%
• Homework: 50%
• Final project: 40%
162. www.cs264.org
• Detailed schedule (soon)
• News blog w/ RSS feed
• Video feeds
• Forum (forum.cs264.org)
• Academic honesty policy
• HW0 (due Fri 2/4)
167. This course is not for you...
• If you’re not genuinely interested in the topic
• If you can’t cope with uncertainly,
unpredictability, poor documentation, and
immature software
• If you’re not ready to do a lot of programming
• If you’re not open to thinking about computing in
new ways
• If you can’t put in the time
Slide after Jimmy Lin, iSchool, Maryland
175. Acknowledgements
• Hanspeter Pfister & Henry Leitner, DCE
• TFs
• Rob Parrott & IT Team, SEAS
• Gabe Russell & Video Team, DCE
• NVIDIA, esp. David Luebke
• Amazon
177. Next?
• Fill out the survey: http://bit.ly/enrb1r
• Get ready for HW0 (Lab 1 & 2)
• Subscribe to http://forum.cs264.org
• Subscribe to RSS feed: http://bit.ly/eFIsqR