Annotated slides of Siggraph 2012 course "Graphics Programming on the Web with WebCL"
The full course is available at http://www.khronos.org/webgl/wiki/Presentations#SIGGRAPH_2012_Course_.22Graphics_Programming_for_the_Web.22
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Graphics Programming for the Web with WebCL
1. Click to edit Master title style
This subtitle is 20 points
Bullets are blue
They have 110% line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
Sub bullets look like this
1
2. Click to edit Master title style
This subtitle is 20 points
Bullets are blue
Graphics Programming on the Web
They have 110% line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
with WebCL
there is insufficient line spacing.Motorola the maximum
Mikaël Bourges-Sévenier, This is Mobility
recommended number of lines 2012 slide (seven).
August 9, per
Sub bullets look like this
2
3. Click to edit Master title style
This subtitle is planks ;-) Blender/Bullet/SmallLuxGPU
Over 32000 20 points
Bullets are blue
OpenCL
They have 110% line spacing, 2 points before & after
By Alain Ducharme “Phymec”
Longer bullets in the form of a paragraph are harder to read if
http://www.youtube.com/watch?v=143k1fqPukk
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
Sub bullets look like this
3
4. Motivation
Click to edit Master title style
For compute intensive web applications
This subtitle is 20 points
Bullets are blue
Games: physics, special effects
They have 110% linephotography
Computational spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
Scientific simulations
there is insufficient line spacing. This is the maximum
Augmented reality
recommended number of lines per slide (seven).
… bullets look like this
Sub
Use many devices for general computations
CPU, GPU, DSP, FPGA…
4
5. Motivation
Click to edit Master title style
This subtitle is 20 exponential GFLOPS growth every
GPUs provide points Chapter 1. Introduction
Bullets areCPUs
year vs. blue
They have 110% line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
Sub bullets look like this
NVidia CUDA/OpenCL C programming guide
5
6. Content edit Master title style
Click to
Motivation and 20 points
This subtitle is Goals
General-Purpose computations on GPU (GPGPU)
Bullets are blue
From to
They have 110% line spacing, 2 points before & after
The need for more general data-parallel computations
Longer overview the form of a paragraph are harder to read if
WebCL bullets in
there is insufficient line spacing. This is the maximum
A JavaScript API over OpenCL
recommended number of lines per slide (seven).
OpenCL concepts
WebCL API look like this
Sub bullets
WebCL programming
Pure computations
WebGL interoperability
6
7. Content edit Master title style
Click to
Motivation and 20 points
This subtitle is Goals
General-Purpose computations on GPU (GPGPU)
Bullets are blue
From to
They have 110% line spacing, 2 points before & after
The need for more general data-parallel computations
Longer overview the form of a paragraph are harder to read if
WebCL bullets in
there is insufficient line spacing. This is the maximum
A JavaScript API over OpenCL
recommended number of lines per slide (seven).
OpenCL concepts
WebCL API look like this
Sub bullets
WebCL programming
Pure computations
WebGL interoperability
7
8. WebGL edit Master title style
Click to pipeline
Programmable vertex &
This subtitle is 20 points fragment shaders
Bullets are blue Application
GPU Frame Buffer
They have 110% line spacing, 2 points before & after
vertex
fragment Longer bullets in the form of a paragraph are harder to read if
vertices
(3D)
vertices
(screen)
fragments pixels
Vertex Fragment
there is insufficient line spacing. This is the maximum
processing
Rasterizer
processing
recommended number of lines per slide (seven).
Sub bullets look like this Vertex
Shader
Textures Samplers
Fragment
Shader
8
9. General Purpose computations
Click to edit Master title style on GPU
With clever 20 points
This subtitle ismapping of algorithms to GL pipeline
Textures as data buffers
Bullets are blue
Texture coordinates as computational domain
They have 110% line spacing, 2 points before & after
Vertex coordinates as computational range
Longer bullets in the form of a paragraph are harder to read if
Vertex shaders Scatter (write values)
there is insufficient line spacing. This is the maximum
• to start computations
recommended number of lines per slide (seven).
• scatter operations
Sub bullets look like this
Fragment shaders
Gather (read values)
• for algorithms steps
• gather operations
9
10. GPGPU with GL limitations
Click to edit Master title style
This subtitle is 20 points
Hard to map algorithms to graphics pipeline
Bullets are blue
Hard to do scatter operations
They have 110% line spacing, 2 points before & after
Shader instancesform of a paragraph are harder to read if
Longer bullets in the
can NOT directly communicate
with is insufficient line spacing. This is the maximum
there one another
recommendedGPGPU of linesGL is hack-ish
… number with per slide (seven).
Sub bullets look like this
CL is made for GPGPU, not graphics
10
11. Content edit Master title style
Click to
Motivation and 20 points
This subtitle is Goals
General-Purpose computations on GPU (GPGPU)
Bullets are blue
From to
They have 110% line spacing, 2 points before & after
The need for more general data-parallel computations
Longer overview the form of a paragraph are harder to read if
WebCL bullets in
there is insufficient line spacing. This is the maximum
A JavaScript API over OpenCL
recommended number of lines per slide (seven).
OpenCL concepts
WebCL API look like this
Sub bullets
WebCL programming
Pure computations
WebGL interoperability
11
12. WebCL edit Master
Click to overview title style
WebCL brings parallel computing to
This subtitle is 20 points
the Web through a secure
Bullets are blue
JavaScript binding to OpenCL 1.1
They have 110% line spacing, 2 points before & after
(2011)
Longer bullets inroyalty-freeof a paragraph are harder to read if
Open standard, the form
there is insufficient line spacing. This is the maximum
Platform independent
recommended number of lines per slide (seven).
Device independent
being standardized by Khronos
Sub bullets look like this
First public working draft April 2012
http://www.khronos.org/webcl/
12
13. OpenCL overview
Click to edit Master title style
Features
This subtitle is 20 points
C-based cross-platform API
Bullets are blue
Kernels use a subset of C99 and extensions
They have 110% line spacing, 2 points before & after
• Vector extensions (<type>N)
• No recursion, no function pointers
Longer bullets memory (malloc,of a paragraph libc methods (memcpy…) if
• No dynamic in the form free…), no standard are harder to read
there is insufficient lineaccuracy both for intergers and floats
Well-defined numerical spacing. This is the maximum
Rich-set of built-in functions (e.g. as GLSL and more)
recommended number of lines per slide (seven).
• But no random method
Sub bullets look like this
Close to the hardware
• Control over memory use
• Control over thread scheduling
13
14. OpenCL Device Model
Click to edit Master title style
This subtitle is 20 points or more Compute devices
A host is connected to one
Compute device
Bullets are blue ...
...
A
...
Theycollection of oneline spacing, 2 points before & after
have 110% or more compute
units (~ cores) ...
Longer bullets incomposed of of a paragraph are harder to read if
...
A compute unit is the form
...
Host
(PC)
there is insufficient line spacing. This is the maximum
one or more processing
...
elements (~ threads) ...
recommended number of lines per slide (seven). ...
Processing elements execute code as SIMD or SPMD
Sub bullets look like this Device (GPU, CPU, …)
Compute
...
...
...
Compute Devices (GPU, CPU, DSP, FPGA…)
Compute Unit (Core)
...
...
...
Processing Element (Thread)
14
15. OpenCL Execution title style
Click to edit Master Model
Kernel
This subtitle is 20 code (~ DLL entry point)
Basic unit of executable
points
GPU CPU
Data-parallel or task-parallel
Bullets are blue
Program Context Queue
Queue
They have 110% line spacing, kernels
Collection of kernels and functions called by 2 points before & after
Analogous to a dynamic library (DLL)
Commandbullets in the form of a paragraph are harder to read if
Longer Queue
Control
there is operations on OpenCL objects (memory transfers,is theexecution, synchronization)
insufficient line spacing. This kernels maximum
Commands queued in order
recommendedornumber of lines per slide (seven).
Execution in-order out-of-order
Applications may use multiple command-queues per device
Sub bullets look like this
Work-item
An execution of a kernel by a processing element (~ thread)
Work-group
A collection of work-items that execute on a single compute unit (~ core)
15
16. OpenCL Work-group 2D analogy
Click to edit Master title style
Local
This subtitle is 20 points
Global
Bullets are blue
They have 110% line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
there is insufficient line spacing. This is the maximum
recommended number of lines per slide#(seven). = # pixels
work-items
Sub bullets look like this # work-groups = # tiles
Work-group size = tileW * tileH
All threads in a workgroup run
synchronously
16
17. OpenCL Memory Model
Click to edit Master title style
On Host
This subtitle is 20 points
CPU RAM Private Memory Private Memory Private Memory Private Memory
Bullets are blue
On Compute Device Work-Item 1 Work-Item M Work-Item 1 Work-Item M
Global memory = GPU RAM
They have 110% lineglobal
Constant memory = cached
spacing, 2 points before & after
Workgroup 1 Workgroup N
Longer bullets cached global
memory
Texture memory =
in the form of a paragraph are harder to read if
Local Memory Local Memory
there is insufficient linereads
memory optimized for streaming spacing. This is the maximum Global Memory / Constant and Texture Caches
Local memory = high-speed memory Compute Device
recommended number of lines per slide (seven).
shared among work-items of a Command queues
and
API calls
work-group (~ L1 cache)
Sub bullets look likeof a
Private memory = registers this Host Memory
work-item, very fast memory Host
Memory management is explicit
App must move data host ➞ global ➞ local and back
17
18. OpenCL Kernel
Click to edit Master title style
This subtitle isa20 points
Defined on N-dimensional computation domain
Bullets areis executed at each point of the
A kernel blue
They have 110%domain
computation line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
/ / I n J av aSc r i pt / / I n OpenCL C99
there is insufficient line spacing. This is the maximum
f unc t i on m t i pl e( a, b, n) {
ul
v ar c = [ ] ;
/ **
* @ am a, b, c ar e buf f er s i n gl obal
par m or y
em
f or ( v ar i =0; i <n; ++i ) * @ am n num
par ber of el em ent s i n a, b, and c
recommended number of lines per slide (seven).
c [ i ] = a[ i ] * b[ i ] ;
*/
__k er nel
r et ur n c ;
v oi d m t i pl y ( __gl obal c ons t f l oat * a,
ul
}
Sub bullets look like this __gl obal c ons t f l oat * b,
__gl obal f l oat * c ,
uns i gned i nt n)
{
uns i gned i nt t i d = get _gl obal _i d( 0) ; / / t hr ead number
i f ( t i d >= n) r et ur n; / / m e s ur e we
ak don' t pas s buf f er ar ea
c [ t i d] = a[ t i d] * b[ t i d] ;
}
18
19. WebCL edit
Click to API Master title style
Platform layer
OO model as OpenCL
SameThis subtitle is 20 points
WebCLPlatform WebCLDevice WebCLExtension
with JS classes
Bullets object
WebCL is globalare blue WebCL
WebCLContext
They have 110% line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
*
WebCLProgram
*
WebCLMemoryObject
*
CommandQueue
*
Event
*
Sampler
{abstract}
Sub bullets look like this
WebCLKernel WebCLBuffer WebCLImage
Compiler layer Runtime layer
19
20. Content edit Master title style
Click to
Motivation and 20 points
This subtitle is Goals
General-Purpose computations on GPU (GPGPU)
Bullets are blue
From to
They have 110% line spacing, 2 points before & after
The need for more general data-parallel computations
Longer overview the form of a paragraph are harder to read if
WebCL bullets in
there is insufficient line spacing. This is the maximum
A JavaScript API over OpenCL
recommended number of lines per slide (seven).
OpenCL concepts
WebCL API look like this
Sub bullets
WebCL programming
Pure computations
WebGL interoperability
20
21. WebCL edit Master title style
Click to sequence (host side)
Select Create buffers to store
This subtitle is 20 points
Create context Platform data on devices
Select
Bullets are blue
Compile kernels Device Create command
queues for each device
They have 110% line spacing, 2 points before & after
Setup command-queues Create
Context
Update kernels
arguments
Longerkernels in the form of a paragraph are harder to read if
Setup bullets arguments Load and compile
kernels on devices
there is insufficient line spacing. This is the maximum
Execute commands
Send data to devices
using their command
queues
recommended number of lines per slide (seven).
Read results Platform layer
Send commands to
devices using their
Sub bullets look like this Compiler command queues
Runtime layer Get data from devices
using their command
queues
Release resources
21
22. WebCL edit Master title style
Click to sequence (host side)
try {
This subtitle is 20 points
/ / c r eat e t he OpenCL c ont ex t Select
Platform
Create buffers to store
data on devices
c l Cont ex t = W ebCL. c r eat eCont ex t ( {
Bullets are blue
dev i c eTy pe: WebCL. DEVI CE_TYPE_GPU
Select
Device Create command
}); queues for each device
} They have 110% line spacing, 2 points before & after Create
c at c h( er r ) { Context
Update kernels
Longer bullets in the form of a paragraph are harder to read if
t hr ow " Er r or : Fai l ed t o c r eat e c ont ex t ! " +er r ;
Load and compile
arguments
}
there is insufficient line spacing. This is the maximum kernels on devices
Send data to devices
using their command
v ar dev i c es = c l Cont ex t . get I nf o( WebCL. CONTEXT_DEVI CES) ; queues
recommended number of lines per slide (seven).
i f ( ! dev i c es ) {
Send commands to
t hr ow " Er r or : Fai l ed t o r et r i ev e c omput e dev i c es
Sub bullets look like this
devices using their
f or c ont ex t ! " ; command queues
}
Get data from devices
using their command
queues
Release resources
22
23. WebCL edit Master title style
Click to sequence (host side)
<scr i pt i d=" m t i pl y_scr i pt " t ype=" x- webcl " >
ul
__ker nel
This subtitle is 20 points
voi d m t i pl y( __gl obal const f l oat * a,
ul
__gl obal const f l oat * b,
Select
Platform
Create buffers to store
data on devices
Bullets are blue
__gl obal f l oat * c,
unsi gned i nt n) Select
Device Create command
{ queues for each device
They have 110% line spacing, 2 points before & after
unsi gned i nt t i d = get _gl obal _i d( 0) ; / / t hr ead num
i f ( t i d >= n) r et ur n; / / m
ber
ake sur e we don' t pass buf f er ar ea Create
Context
c[ t i d] = a[ t i d] * b[ t i d] ; Update kernels
} Longer bullets in the form of a paragraph are harder to read if Load and compile
arguments
</ scr i pt >
there is insufficient line spacing. This is the maximum
/ / Cr eat e t he comput e pr ogr am f r om t he sour ce buf f er ( t ext )
kernels on devices
Send data to devices
using their command
cl Pr ogr am = cl Cont ext . cr eat ePr ogr am get Scour ce( " m t i pl y_scr i pt " ) ) ;
( ul queues
recommended number of lines per slide (seven).
/ / Bui l d t he pr ogr am execut abl e Send commands to
Sub bullets look like this
try { devices using their
command queues
cl Pr ogr am bui l d( cl Devi ce, ' - cl - f ast - r el axed- m h - DDEBUG=1' ) ;
. at
} cat ch ( er r ) {
Get data from devices
t hr ow " Er r or : Fai l ed t o bui l d pr ogr am execut abl e! n" using their command
+ c l Pr ogr am get Bui l dI nf o( c l Dev i c e, W
. ebCL. PROGRAM_BUI LD_LOG) ; queues
}
Release resources
cl Ker nel = cl Pr ogr am cr eat eKer nel ( " m t i pl y" ) ;
. ul
23
24. WebCL edit Master title style
Click to sequence (host side)
This subtitle is 20 points
BUFFER_SI ZE=10;
v ar A=new Ui nt 32Ar r ay ( BUFFER_SI ZE) ;
Select
Platform
Create buffers to store
data on devices
v ar B=new Ui nt 32Ar r ay ( BUFFER_SI ZE) ;
Bullets are blue Select
Device Create command
/ / s t or e dat a i n A and B queues for each device
…
They have 110% line spacing, 2 points before & after Create
Context
Longer bullets in the form ENT; a/ paragraph are harder to read if
v ar s i z e=BUFFER_SI ZE* Ui nt 32Ar r ay . BYTES_PER_ELEM Update kernels
of / s i z e i n by t es
Load and compile
arguments
/ / Cr eat e buf f er f or A and B and c opy hos t c ont ent s
v ar aBuf f er = c lis insufficient ( line M _READ_ONLY, This; is the maximum
there Cont ex t . c r eat eBuf f er WebCL. spacing. s i z e) kernels on devices
Send data to devices
EM using their command
v ar bBuf f er = c l Cont ex t . c r eat eBuf f er ( WebCL. M _READ_ONLY, s i z e) ;
EM queues
recommended number of lines per slide (seven). Send commands to
/ / Cr eat e buf f er f or C t o r ead r es ul t s
Sub bullets look like this
devices using their
v ar c Buf f er = c l Cont ex t . c r eat eBuf f er ( WebCL. M _W TE_ONLY, s i z e) ;
EM RI command queues
Get data from devices
using their command
queues
Release resources
24
25. WebCL edit Master title style
Click to sequence (host side)
This subtitle is 20 points
/ / Cr eat e com and queue
m
cl Queue=cont ext . cr eat eCom andQueue( devi ces[ 0] ) ;
m
Select
Platform
Create buffers to store
data on devices
/ / Bullets are blue
enqueue buf f er s Select
Device Create command
cl Queue. enqueueW i t eBuf f er ( aBuf f er , f al se, 0, si ze, A) ;
r queues for each device
cl They have 110% line spacing, ze, points before & after
Queue. enqueueW i t eBuf f er ( bBuf f er , f al se, 0, si 2 B) ;
r Create
Context
Longer bullets in the form of a paragraph are harder to read if
Update kernels
arguments
/ / Set ker nel ar gs Load and compile
cl Ker nel . set Aris 0, aBuf f er ) ;
there g( insufficient line spacing. This is the maximum kernels on devices
Send data to devices
using their command
cl Ker nel . set Ar g( 1, bBuf f er ) ; queues
cl Ker nel . set Ar g( 2, cBuf f er ) number of lines per slide (seven).
recommended ;
cl Ker nel . set Ar g( 3, BUFFER_SI ZE, WebCL. t ype. UI NT) ; Send commands to
Sub bullets look like this
devices using their
command queues
Get data from devices
__ker nel
using their command
voi d m t i pl y( __gl
ul obal const f l oat * a, queues
__gl obal const f l oat * b,
__gl obal f l oat * c,
Release resources
unsi gned i nt n) ;
25
26. WebCL edit Master title style
Click to sequence (host side)
This subtitle is 20 points Select
Platform
Create buffers to store
data on devices
/ / Bullets are blue
Execut e ( enqueue) ker nel Select
Device Create command
cl Queue. enqueueNDRangeKer nel ( cl Ker nel , queues for each device
They have 110% line spacing,obal pointsset
nul l , / / gl 2 wor k of f before & after
Create
[ BUFFER_SI ZE] , / / gl obal wor k si ze
Context
Longer bullets in2]the form/ /ofocal paragraph are harder to read if
Update kernels
[ ); l
a wor k si ze
Load and compile
arguments
there is insufficient line spacing. This is the maximum
kernels on devices
Send data to devices
using their command
queues
Note: Use local work size =number of lines per slide (seven).
recommended [] or null (default) Send commands to
to let Sub bullets best values.
driver chose the look like this devices using their
command queues
Get data from devices
using their command
queues
Release resources
26
27. WebCL edit Master title style
Click to sequence (host side)
This subtitle is 20 points Select
Platform
Create buffers to store
data on devices
/ / Bulletst are bluewhi l e get t i ng t hem
get r esul s and bl ock
Select
Device Create command
queues for each device
cl Queue. enqueueReadBuf f er ( lineerspacing, 2 points before & after
They have 110% cBuf f ,
var C=new Ui nt 32Ar r ay( BUFFER_SI ZE) ;
Create
Context
Longer bullets in 0,r ue, ze, / / bl of a paragraph are harder to read if
t Update kernels
the form ocki ng cal l
si
Load and compile
arguments
C) ;
there is insufficient line spacing. This is the maximum
kernels on devices
Send data to devices
using their command
queues
recommended number of lines per slide (seven). Send commands to
Sub bullets look like this
devices using their
command queues
Get data from devices
using their command
queues
Release resources
27
28. Example: Matrix multiplication
Click to edit Master title style
A B
This subtitle is 20 points
“Hello World of CL”
Bullets are blue
C=AxB
They have 110% line spacing, 2 points before & after
N x N matrices form of a paragraph are harder to read if
Longer bullets in the
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
Sub bullets look like this C
28
29. Example: Matrix multiplication
Click to edit Master title style
A B
This subtitle is 20 points
Optimization
Bullets are blue
N x N matrices
They have 110% line spacing, 2 points before & after
C divided into m x m tiles
Longer bullets in the form of a paragraph are harder to read if
With
there is insufficient line spacing. This is the maximum
• m=N/P
recommended number of lines per slide (seven).
• bullets look like this
SubP = # threads per workgroup (16) C
29
30. Example: Comparison with sequential
Click to edit Master title style
MacBook Pro (early 2011), OSX 10.8
This subtitle is 20 points
CPU:
BulletsIntel Core i7, 2.2GHz, 4 cores
are blue
GPU: AMD Radeon HD 6750M, 1 GB, 480 SPU, 600 MHz, 576 GFLOPS
They have 110% line spacing, 2 points before & after
250
Longer bullets in the form of a paragraph are harder to read if
200
there is insufficient line spacing. This is the maximum
Speedup factor
150 OpenMP
recommended number of lines per slide (seven). CL (CPU)
100 CL (GPU)
Sub bullets look like this
CL (GPU opt)
50
0
128 256 512 1024 2048
30
31. WebCL WebGL interop
Click to /edit Master title style
WebCL context created
This subtitle is 20 points Initialization
Initialize WebGL
from WebGL context
Bullets are blue CL objects
Configure shared Initialize WebCL
They GL counterparts spacing, 2 points before & after
from have 110% line Configure shared CL-GL
Sync GL bullets in the form of a paragraph are harder to read if
data
Longer and CL Rendering loop
Flush GL, acquire GL object Set kernels args
there is insufficient line spacing. This is the maximum
Execute CL
(per frame)
recommended number of lines per slide (seven).
Release CL object, flush CL Enqueue commands
Sub bullets look like this
Vertex arrays, textures, Execute kernels
render-buffers can be shared Update Scene
with CL
Render scene
31
32. WebCL WebGL interop
Click to /edit Master title style
/ / Cr eat e WebGL c ont ex t Initialize WebGL
This subtitle is 20 points
v ar gl = c anv as . get Cont ex t ( " ex per i ment al - webgl " ) ;
/ / I ni t GL
Initialize WebCL
Bullets are blue
…
They have 110% line spacing, 2 points before & after Configure shared CL-GL
data
/ / c r eat e t he OpenCL c ont ex t
t r { Longer bullets in the form of a paragraph are harder to read if
y Set kernels args
there is insufficient line {spacing. This is the maximum
c l Cont ex t = W ebCL. c r eat eCont ex t (
dev i c eTy pe: WebCL. DEVI CE_TYPE_GPU,
s recommended number of lines per slide (seven).
Enqueue commands
har eGr oup: gl
});
} Sub bullets look like this Execute kernels
c at c h( er r ) {
t hr ow " Er r or : Fai l ed t o c r eat e c ont ex t ! " +er r ; Update Scene
}
Render scene
32
33. WebCL WebGL interop (texture)
Click to /edit Master title style
// Cr eat e OpenGL t ext ur e obj ect
gl . act i veText ur e( gl . TEXTURE0) ; Initialize WebGL
gl
gl
This subtitle is 20 points
Text ur e = gl . cr eat eText ur e( ) ;
. bi ndText ur e( gl . TEXTURE_2D, gl Text ur e) ;
gl
gl Bullets are blue
. t exPar am er i ( gl . TEXTURE_2D, gl . TEXTURE_M
et AG_FI LTER, gl . NEAREST) ;
. t exPar am er i ( gl . TEXTURE_2D, gl . TEXTURE_M N_FI LTER, gl . NEAREST) ;
et I
Initialize WebCL
gl . t exI mage2D( gl . TEXTURE_2D, 0, gl . RGBA, Text ur eW dt h, Text ur eHei ght , 0,
i
They have 110% line spacing, 2 points before & after
gl . RGBA, gl . UNSI GNED_BYTE, nul l ) ;
gl . bi ndText ur e( gl . TEXTURE_2D, nul l ) ;
Configure shared CL-GL
data
Longerput e pr ogr aminom t he formbuf f era( paragraph are harder to read if
/ / Cr eat e t he com
bullets f r the sour ce of t ext ) Set kernels args
cl Pr ogr there isext . cr eat ePr ogr am get Scourspacing. This "is ; the maximum
am = cl Cont insufficient line ce( " m t i pl y_scr i pt ) )
( ul
/ / Bui l recommended number of lines per slide (seven).
Enqueue commands
d t he pr ogr am execut abl e
try {
Sub bullets look like this
cl Pr ogr am bui l d( cl Devi ce, ' - cl - f ast - r el axed- m h - DDEBUG=1' ) ;
.
} cat ch ( er r ) {
at
Execute kernels
t hr ow " Er r or : Fai l ed t o bui l d pr ogr am execut abl e! n"
+ c l Pr ogr am get Bui l dI nf o( c l Dev i c e, W
. ebCL. PROGRAM_BUI LD_LOG) ; Update Scene
}
cl Ker nel = cl Pr ogr am cr eat eKer nel ( " m t i pl y" ) ;
. ul Render scene
33
34. Demo: GL Texture update with
Click to edit Master title style CL
This subtitleEvgeny Demidov 2D ink droplet
Based on is 20 points
Bullets are fps
WebGL ~26 blue WebCL ~124 fps
They have 110% line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
Sub bullets look like this
34
35. WebCL WebGL interop (vbo)
Click to /edit Master title style
/ / cr eat e buf f er obj ect Initialize WebGL
This subtitle is 20 points
gl VBO = gl . cr eat eBuf f er ( ) ;
gl . bi ndBuf f er ( gl . ARRAY_BUFFER, gl VBO) ;
/ / ni Bullets are blue
Initialize WebCL
i t i al i ze buf f er obj ect
var si zeI nByt es = m esh_wi dt h * m esh_hei ght * 4 *
They have 110% line spacing, 2 points before & after
Fl oat Ar r ay . BYTES_PER_ELEM ENT;
Configure shared CL-GL
data
gl . buf f er Dat a( gl . ARRAY_BUFFER, si zeI nByt es, gl . DYNAM C_DRAW ;
I )
Longer bullets in the form of a paragraph are harder to read if
/ / cr eat e OpenCL buf f er f r om GL VBO Set kernels args
cl VBO there ext . insufficient line spacing. This VBO) the maximum
= cl Cont is cr eat eFr om GLBuf f er ( WebCL. M _W TE_ONLY, gl is ;
EM RI
recommended number of lines per slide (seven). Enqueue commands
// set ker nel ar gs val ues
cl Sub bullets look like this
Ker nel . set Ar g( 0,
cl VBO) ; Execute kernels
cl Ker nel . set Ar g( 1, mesh_wi dt h, WebCL. t ype. UI NT) ;
cl Ker nel . set Ar g( 2, mesh_hei ght , WebCL. t ype. UI NT) ; Update Scene
Render scene
35
36. Click to edit Master title style
This subtitle is 20 points
Bullets are blue
They have 110% line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
Sub bullets look like this
36
37. WebCL/WebGL interop style
Click to edit Master title(host side)
This subtitle is 20 points
Initialize WebGL
/ / Sy nc GL and ac qui r e buf f er f r om GL
gl . f l us h( ) ;
Bullets are blue
c l Queue. enqueueAc qui r eGLObj ec t s ( c l Tex t ur e) ; Initialize WebCL
They have 110% line spacing, 2 points before & after
/ / Set gl obal and l oc al wor k s i z es f or k er nel
v ar l oc al = nul l ;
Configure shared CL-GL
data
v ar gl obal = [ Tex t ur eW dt h, Tex t ur eHei ght ] ;
i
Longer bullets in the form of a paragraph are harder to read if Set kernels args
try {
c l Queue. enqueueNDRangeKer nel ( c l Ker nel , nul l , gl obal , l This is the maximum
there is insufficient line spacing. oc al ) ;
} c at c h ( er r ) {
t hr ow " Fai l ed t o enqueue k er nel ! " + er r ;of lines per slide (seven).
recommended number Enqueue commands
}
Sub bullets look like this
/ / Rel eas e GL t ex t ur e
Execute kernels
c l Queue. enqueueRel eas eGLObj ec t s ( c l Tex t ur e) ;
c l Queue. f l us h( ) ; Update Scene
Render scene
37
38. Click to edit Master title style
This subtitle is 20 points
Bullets are blue
They have 110% line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
Sub bullets look like this
38
39. Perspectives
Click to edit Master title style
This subtitle is 20 points applications in Web browsers
WebCL enables GPGPU
Bullets are usage of architecture can lead to impressive
Careful blue
They have 110% line spacing, 2 points before & after
speedup
Longer bullets ininteroperability, rich graphicsharder to read if
With WebGL the form of a paragraph are Web
there is insufficient now spacing. This is the maximum
applications are line possible
recommended number of lines per slide (seven).
DRAFT WebCL specification
Sub bullets look like this
Quite stable JavaScript API
Focusing on more security and robustness
39
40. WebCL edit Master title style
Click to Open process and Resources
Khronos open process points Web community
This subtitle is 20 to engage
Public specification
Bullets are blue drafts, mailing lists, forums
http://www.khronos.org/webcl/
They have 110% line spacing, 2 points before & after
webcl_public@khronos.org
Longer bullets in the form of a paragraph are harder to read if
Nokia open source prototype for Firefox in May 2011 (LGPL)
there is insufficient line spacing. This is the maximum
http://webcl.nokiaresearch.com
recommended number of lines per in July (seven).
Samsung open source prototype for WebKit slide 2011 (BSD)
Sub bullets look like this
http://code.google.com/p/webcl/
Motorola open source prototype for NodeJS in March 2012 (BSD)
https://github.com/Motorola-Mobility/node-webcl
40
41. Click to edit Master title style
This subtitle is 20 points
Bullets are blue
They have 110% line spacing, 2 points before & after
Thaank
Longer bullets in the form of paragraph are harder to read if
you!
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
Sub bullets look like this
41
42. Click to edit Master title style
This slide has a 16:9 media window
This subtitle is 20 points
Bullets are blue
They have 110% line spacing, 2 points before & after
Longer bullets in the form of a paragraph are harder to read if
there is insufficient line spacing. This is the maximum
recommended number of lines per slide (seven).
Sub bullets look like this
42
43. Start to edit Master
Click learning Now! title style
OpenCL Programming Guide - The “Red Book” of OpenCL
This subtitle is 20 points
http://www.amazon.com/OpenCL-Programming-Guide-Aaftab-Munshi/dp/0321749642
OpenCL in Action blue
Bullets are
http://www.amazon.com/OpenCL-Action-Accelerate-Graphics-Computations/dp/1617290173/
They have 110% line spacing, 2 points before & after
Heterogeneous Computing with OpenCL
http://www.amazon.com/Heterogeneous-Computing-with-OpenCL-ebook/dp/B005JRHYUS
LongerProgramming Bookthe form of a paragraph are harder to read if
The OpenCL
bullets in
there is insufficient line spacing. This is the maximum
http://www.fixstars.com/en/opencl/book/
recommended number of lines per slide (seven).
Sub bullets look like this
43
Hinweis der Redaktion
This demonstration is not working on a browser but uses OpenCL to speedup physics computations for the position of all the planks.Our goal with WebCL is to be able one day to perform such computations on your web browser.
While CPU tend to have 2 to 32 cores, GPU have much more.
Historically, when GPU became programmable, people try to use vertex and fragment shader programs to perform more general computations than rendering vector graphics.
The scatter & gather operations are fundamental operations for GPGPU. Typically, scatter is difficult because in a graphics pipeline the fragment shader is called for writing one output value. One can still perform scatter using vertex shaders cleverly. Newer versions of graphics API & drivers provide specific methods for scatter.Gather is no brainer since it can be achieved by reading textures.
To understand work-groups and work-items, suppose you have a matrix or an image to process. The image can be decomposed into tiles and each tile can be processed independently.A tile would be a work-group. Inside this work-group, each pixel would be processed by a work-item.Unlike typical CPU multithreading, all work-items (or threads) execute synchronously, thanks to the SIMD nature of GPUs. This has an important consequence: if each thread execute the same number of operations them they will complete at the same time. But if one is taking longer than other threads, e.g. due to branch divergence (like an if…else clause), then other threads will wait until it finishes its operations, thereby slowing down effective computational throughput.
Developers must manage memory explicitly. For best performance increase, move data closer to the cores. However, be aware that the closer you get to cores, the smaller the memory available.
For 1-Dimensional problems, in a sequential language like JavaScript, one would use a for loop to iterate across the 1D array. With CL, we tell the device to iterate over a 1-D domain and only provide the core of the loop. When CL calls the kernel, it provides methods to query which index (i.e. thread) is executing the kernel.By extension, for 2D problems, in JavaScript, we would have 2 imbricated for loops. CL’s work-items are going to iterate over the 2D domain and (x,y) index of the thread calling a kernel is provided by get_global_id(dimension), with dimension = 0 (1st dimension), or 1 (2nd dimension).
WebCL object cannot be constructed with new operator. It is like the Math object of JavaScript.
----- Meeting Notes (8/2/12 16:54) -----kernels can come from anywhere
While we explain how to setup a simple vector multiplication kernel, this would apply to matrices too. Matrix multiplication is probably what I would call the best “Hello World” example for compute languages.
To optimize computations, recall the work-group/work-item analogy we explained earlier with an image. We said that work-groups are tiles onto which work-items operate.Here we do exactly that with P work-items (or threads) per work-group. Use WebCLKernel.getWorkGroupInfo(WebCL.KERNEL_PREFERRED_WORKGROUP_SIZE) to find out what this number is for a device. It is typically a power of 2 like 16, 32, 64.
However, that CPU being hyperthreaded, it is seen as 8 cores rather than 4.Onthis machine, the preferred workgroup size multiple is 1 for CPU, 64 for GPU. The maximum workgroup size is 128 for CPU, 256 for GPU. So we set the local workgroup size to 128x1 (=128) for CPU and 16x16 (=256 and divisible by 64) for GPU.As you can see, the performance of CL on CPU is pretty good and even better than GPU for small matrix sizes, less than 512x512. As the matrix size grow, the CPU performance remains constant but the GPU performance grows exponentially; as expected. Note: the OpenMP code uses the same tiling optimization as for GPU with 8 threads.If you recall the video at the beginning of this course, the physics engine is essentially doing matrix/vector multiplication for 32k items. With these results, a tremendous speedup can be achieved compared to a CPU approach.
This example comes from Nvidia CUDA/OpenCL SDK. A sphere is rendered by GL but the vertices’ positions are modified by CL with some noise to create this cool effect.
This is the recommended way to synchronize GL and CL queue. However, there is a more optimal way using GL and CL events rather than flushing their queues. However, synchronization with events is an advanced subject we don’t have time to discuss in this course and you can found presentations online.
The Khronos web site has a wiki with links to all these WebCL implementation prototypes. On this web site, you will also find links to this presentation, course notes, and updates.All examples in this course were done with node-webcl from Motorola and rendered with node-webgl, both are freely available on github. While this is not an implementation within a web browser, it uses the same JavaScript engine as Chrome/Chromium browsers i.e. Google v8 engine. We use nodejs for server-side processing and the same code is being ported to Chrome browser. Using nodejs we can prototype new features quickly before adding them to browsers.