SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
Introduction to OpenCL
How to select OpenCL devices, initialise a compute context, allocate device memory,
compile and run kernels, output results

OpenCL Workshop | December 1, 2010 | Brisbane, Australia!
Tomasz Bednarz, CESRE!
OpenCL is a trademark of Apple, Inc.

Welcome to Open Computing Language (OpenCLTM)
•  N-Body Simulation Demo"
•  Khronos Group and OpenCL standard"
•  OpenCL Anatomy"
•  Platform Model"
•  Execution Model"
•  Memory Model"

•  Short Introduction to OpenCL Programming "
•  OpenCL C language"
•  Supported data types"
•  Synchronisation primitives"

•  Additional information and resources."

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
N-Body Simulation: demo
N-Body Simulation

Lars Nyland, Mark Harris, Jan Prins “Fast N-Body Simulation with CUDA”. In Hubert
Nguyen, editor, GPU Gems 3, chapter 31, pages 677-695, Addison Wesley 2007.

•  Applications"
• 
• 
• 
• 

Molecular dynamics"
Astronomical and astrophysical simulations"
Fluid dynamics simulation"
Radiosity (Radiometric transfer)"

•  N2 interactions to compute per time-step"
•  For the brute force all-pairs approach
discussed here"

•  Highly Parallel"
•  High Arithmetic intensity"

Two of these galaxies
attract each other.
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
N-Body Simulation (http://developer.nvidia.com/gpugems3)
•  N-Body simulation models the motion of particles subject to a
force due to the particle-particle interactions between all particles
in the system"
•  Typical example: simulation of stars in a galaxy subject to the
gravitational force"

•  Given N bodies with an initial position xj and velocity vj for 1≤i≤N,
the force fij on body i caused by its gravitational attraction to body
j is given by the following:"

fij = G

mi m j
rij

2

!

rij
rij

Fi =

#

fij = Gmi

1! j!N
i" j

#

m j rij

1! j!N
i" j

rij

3

where mi and mj are the masses of bodies i and j."
•  The acceleration is computed as:"
F

ai =

j

i

mi

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

i

rij = x j ! xi
N-Body Simulation
•  As bodies approach each other, the force between
them grows without bound, therefore softening factor
e2>0 may be added"

Fi ! Gmi

#
1" j"N

m j rij

(

2

rij + e

2

)

3

2

•  The softening factor limits the magnitude of the force
between the bodies, which is desirable for numerical
integration of the system state"
•  Acceleration:"

F
ai = i ! G " $
mi
1# j#N

m j rij

(

2

rij + e

2

)

3

2

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
N-Body Simulation: parallel concept

single interaction
between i and j

Outer Loop (i)

Particle i

Particle j

Inner Loop (j)

•  Particles i, j interact with each other"
•  OpenCL can be used to compute acceleration on all bodies in parallel "
•  N/p work groups of p work items process p bodies at a time"
•  Every work item loads all other body positions from off-chip memory"
•  N2 loads … bandwidth bound = poor performance "

•  Optimization (using tiles) to be presented in the afternoon session"
N-Body Simulation: body-body force calculation

Fi ! Gmi

#
1" j"N

ai =

Fi
! G" $
mi
1# j#N

m j rij

(
(

http://developer.download.nvidia.com/compute/opencl/sdk/website/samples.html#oclNbody
http://developer.apple.com/library/mac/#samplecode/OpenCL_NBody_Simulation_Example/Introduction/Intro.html

2

rij + e

2

m j rij
2

rij + e

2

)
)

3

3

2

2
N-Body Simulation: demo
The Khronos Group
http://www.khronos.org/opencl/
http://www.khronos.org/opencl/
http://www.khronos.org/opencl/

What is OpenCL?
OpenCL - Open Computing Language: open, royalty-free standard for programming
heterogeneous parallel computing at the intersection of GPU and multi-core CPU capabilities.

CPUs
Multiple cores driving
performance increases

Multi-processor
programming, threading
libraries - e.g. OpenMP

GPUs
Emerging
Intersection

Heterogeneous
Computing

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

Increasingly general
purpose data-parallel
computing

Graphics APIs and
Shading Languages,
Vendor Compute APIs

Courtesy of
What is OpenCL?
Roadmap convergence

OpenGL 4.0 and OpenGL ES 2.0
are both streamlined, programmable
pipelines. GL and ES working groups
are working on convergence. WebGL
is a positive pressure for portable 3D
content for all platforms.

Desktop Visual Computing

OpenGL and OpenCL have direct
interoperability. OpenCL objects can be
Created from OpenGL Textures, Buffer
Objects and Renderbuffers.

Parallel computing and
visualisation
OpenCL – the center of a
visual computing
ecosystem with parallel
computations, 3D, video,
audio, and image
processing on desktop,
embedded and mobile
systems!

Desktop 3D Ecosystem

Cross-platform
desktop 3D

3D for Web
Heterogeneous
Parallel Programing
Embedded 3D

Surface and
synch abstraction

Streaming Media and
Image Processing

Mobile Visual Computing
Compute, graphics and AV APIs
interoperate through EGL.

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

Hundreds of men
years invested by
industry experts in
coordinated
ecosystem!

Streamlined APIs for mobile and
embedded graphics, media and
compute acceleration
Based on http://www.khronos.org/opencl/
OpenCL Timeline
•  OpenCL 1.0 was released six months after the proposal was created"
•  OpenCL ships first on Appleʼs Mac OS X Snow Leopard"
•  18 month cadence between OpenCL 1.0 and OpenCL 1.1"
•  Backward compatible to protect software investment"
Multiple conformant
implementations ship
across diverse OS and
platforms.!

Khronos releases
publicly OpenCL 1.1 as
royalty-free specication.!

June 2008

May 2009
December 2008

OpenCL working group!
is proposed by Apple. !
Draft spec is contributed!
to Khronos.!

June 2010
2nd Half 2009

Khronos releases
OpenCL 1.0 conformance
tests to ensure highquality implementations.!

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

OpenCL 1.1 spec is
released and rst
implementation ship.!

Based on http://www.khronos.org/opencl/
OCL Quick Reference Cards

http://www.khronos.org/files/opencl-quick-reference-card.pdf
Design goals of OpenCL
•  Enable all compute resources in system"
•  CPUs, GPUs, and other processors enabled as peers"
•  Data- and task- parallel compute model"

•  Efficient parallel programming model"
•  ANSI C99 based kernel language"

•  Low-level abstraction"
•  Abstracts the specifics of the underlying hardware"
•  High-performance, but device independent "

•  Define precision requirements for all floating-point computations"
•  Consistent results on all platforms and devices"

•  Interoperability with Graphics APIs"
•  Dedicated support for OpenGL, OpenGL ES and DirectX"

•  Drive future hardware requirements"
•  Applicable to both consumer and HPC applications"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL Platform Model
It’s heterogeneous world
•  Platform model encapsulates
compute resources"
•  A modern platform includes:"
• 
• 
• 
• 

One or more CPUs"
One or more GPUs"
Optional accelerators (e.g. DSPs)"
Other?"

Using OpenCL Programmers write a single portable
program that uses ALL resources !
in the heterogeneous platform!

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

Based on http://www.khronos.org/opencl/
OpenCL Platform Model
•  One Host connected to one or more Compute Devices"
•  Compute device can be a CPU, GPU or other processor"

•  Each Compute Device is composed of one or more Compute Units"
•  Compute Unit can may be a core, multi-processor, etc."

•  Each Compute Unit is further divided into one or more Processing Elements "
•  Processing Elements execute code as SIMD or SPMD!
PROCESSING ELEMENT

….
COMPUTE
UNIT

COMPUTE
UNIT

COMPUTE
UNIT

COMPUTE
UNIT

COMPUTE
UNIT

COMPUTE
UNIT

.....

COMPUTE DEVICE

COMPUTE DEVICE

HOST!
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

COMPUTE
UNIT
Anatomy of OpenCL Application
OpenCL Application
Device Code
- Written in OpenCL C
- Executes on the device

Host Code
- Written in C/C++
- Executes on the host

COMPUTE
UNIT

COMPUTE
UNIT

COMPUTE
UNIT

COMPUTE
UNIT

COMPUTE DEVICE

….

HOST!

COMPUTE
UNIT

COMPUTE
UNIT

.....

COMPUTE
DEVICES

COMPUTE
UNIT

COMPUTE DEVICE

•  Host code sends commands to the Devices:"

•  To transfer data between host memory and device memories!
•  To execute device code!

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Anatomy of OpenCL Application
•  Serial code executes in a Host (CPU) thread"
•  Parallel code executes in many Device (GPU) threads across multiple processing elements"
OCL Application
Serial code
Parallel code
Serial code
Parallel code

Host = CPU

Device = GPU

…

Host = CPU

Device = GPU

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

…
OpenCL Execution Model
OpenCL Execution Model
•  OpenCL application runs on a Host which submits
work to the Compute Devices!
•  Work item: the basic unit of work on an OpenCL device"
•  Kernel: the code for a work item, which is basically C
function"
•  Program: Collection of kernels and other functions
(analogous to a dynamic library). Managed by host."
•  Context: The environment within which work-items
execute, which includes devices and their memories and
command queues (contains all resources for computation)"
•  Command queue: A queue used by the Host application
to submit work to a Device (kernel execution instances)"
•  Work is queued in-order, one queue per device"
•  Work can be executed in-order or out of order"
•  Events are used for synchronisation"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

MEMORY!

GPU!

CPU!

CONTEXT
GPU
&
CPU
Queues

COMMANDS
OpenCL Execution Model
•  Portable execution model that allows a kernel to execute at each point in a
problem domain (N-dimensional computational domain) à decomposition of a
task into work-items!
Traditional loop as a function in C

OpenCL C kernel

void !
addVector(const float *A,!
const float *B,!
float *C,!
int N)!
{!
int index;!

__kernel void !
addVector(__global const float *A,!
__global const float *B,!
__global float *C,!
int N)!
{!
int index = get_global_id(0);!

!

!
for (index=0; index<N, index++)!
C[index] = A[index]+B[index];!
}!

if (index < N)!
C[index] = A[index]+B[index];!
}!
!

Work item: the basic unit of work on an OpenCL device
Kernel: the code for a work item, which is basically C function
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Kernel Execution on Platform Model
Work-Item

Compute element

Work-Group

Compute unit

Kernel execution instance

•  Each work-item is executed by a
compute element!
•  Each work-group is executed on a
compute unit"
•  Several concurrent work-groups can
reside on one compute unit depending
on work-groupĘźs memory requirements
and compute unitĘźs memory resources"

Compute device

…

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

•  Each kernel is executed on a compute
device!
Benefits of Work-Groups
•  Automatic scalability across devices with different numbers of compute units"
•  Work-groups can execute in any order, concurrently or sequentially"

•  Efficient cooperation between work-items of same work-group"
•  Fast shared memory and synchronization"

•  Independence between work-groups gives scalability:"
•  A kernel scales across any number of compute units"
Device with 2 compute units

Kernel
Launch

Device with 4 compute units

Unit 0

Unit 1

Unit 0

Unit 1

Unit 2

Unit 3

Work-group 0!

Work-group 1!

Work-group 0!

Work-group 0!

Work-group 1!

Work-group 2!

Work-group 3!

Work-group 2!

Work-group 3!

Work-group 1!

Work-group 4!

Work-group 5!

Work-group 6!

Work-group 7!

Work-group 4!

Work-group 5!

Work-group 2!

Work-group 6!

Work-group 7!

Work-group 3!
Work-group 4!
Work-group 5!
Work-group 6!
Work-group 7!

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Work-group synchronisation
•  Always define the best N-dimensional index
space (NDRange) for your algorithms
(currently 1D, 2D and 3D index spaces are
supported)"
•  Kernels are executed across a global domain of
work-items!
•  Work-items are single points of execution and
are grouped into local work-groups!
•  Global Dimensions: 1024x1024 (whole problem space)"
•  Local Dimensions: 32x32 (work-group)"

Cannot synchronise outside "
of work-groups"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

1024

1024

Synchronisation between work-items"
possible only within workgroups:"
barriers and memory fences!
Work-items and work-groups
•  A kernel is a function executed in each point of a problem
domain (for each work-item)"
•  Number of work items = 4096 (16 work-groups, 256 workitems each):"
get_group_id(0) = 2

DEVICE

__kernel void !
addVector(__global const float *A,!
__global const float *B,!
__global float *C,!
int N)!
{!
int index = get_global_id(0);!
!
if (index < N)!
C[index] = A[index]+B[index];!
}!

get_global_id(0) = 1792

NDRANGE
0

1

2

3

4

…

15

get_global_size(0) = 4096

0

1

get_num_groups (0) = 16

…

WORK GROUP

255
WORK ITEM

get_local_size(0) = 256

get_local_id(0) = 255
Work-items and work-groups in 2D
•  Number of work items to execute 128 x 128 = 16384:" (A kernel is executed in each point of a problem domain)
get_group_id(0),get_group_id(1)

DEVICE
0,0 1,0 2,0

…

7,0

0,0 1,0 2,0
0,1

1,1

0,2

0,2

…

1,1

…

15,0
4,1

…

2,2

3,4

.
0,7
get_global_size(0)
get_global_id(0),get_global_id(1)

7,7

0,15
get_local_size(0)
get_local_id(0),get_local_id(1)

get_local_size(1)

get_global_size(1)

0,1

WORK ITEMS

WORK GROUP

NDRANGE
OpenCL Memory Model
OpenCL Memory Model
•  Address spaces"
• 
• 
• 
• 

Private: read/write access for work-item only"
Local: read/write access for entire work-group"
Global/Constant: visible to all work-groups"
Host: accessible by the CPU"

•  Synchronisation"

Private
Memory!

Private
Memory!

Private
Memory!

Private
Memory!

Work Item1

Work ItemJ

Work Item1

Work ItemJ

PE!

PE!

PE!

PE!

Compute Unit 1
Local Memory!

•  All Synchronisation for all memory accesses
must be done explicitly"

Compute Unit N
Local Memory!

Global/Constant Memory!

Compute Device

Memory management is Explicit!
You must move data from host à global à local … and back"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

Host Memory!

Host
OpenCL Programming

• 
• 
• 
• 

How to dene the platform"
How to execute code on the platform"
How to move data around in memory"
How to write (and build) programs"
Host application
OpenCL Language and API Highlights
•  Platform Layer API (called from host)"
•  Abstraction layer for diverse computational resources"
•  Query, select and initialise compute devices"
•  Create compute contexts and work-queues"

•  Runtime API (called from host)"
•  Launch compute kernels"
•  Set kernel execution configuration"
•  Manage scheduling, compute, and memory resources"

•  OpenCL language"
•  To write C-based compute kernels for execution on a compute device"
•  Includes rich set of build-in functions"
•  Can be compiled JIT/Online or offline"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL Language Highlights
•  Function qualifiers"

__kernel void !
addVector(__global const float *A,!
__global const float *B,!
__global float *C,!
int N)!
{!
int index = get_global_id(0);!

•  __kernel qualifier declares a function as a kernel"

•  Address space qualifiers"

!
if (index < N)!
C[index] = A[index]+B[index];!
}!

•  __global, __local, __constant, __private"

•  Work-item functions"
•  get_work_dim(), get_global_id(), get_local_id(), get_group_id(), get_local_size()"

•  Image functions"
•  Images must be accessed through built-in functions"
•  Read/writes performed through sampler objects from host or defined in source"

•  Synchronisation functions"
•  Barriers – all work-items within a work-group must execute the barrier function
before any work-item in the work-group can continue"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL Framework: Overview
•  Platform layer: platform query and context creation"
•  Compiler for OpenCL C"
•  Runtime: memory management and command execution within a context"
CPU!

GPU!
CONTEXT!
KERNELS!

PROGRAMS!

__kernel void !
addVector(!
__global float *A,!
__global const float *B,!
__global float *C)!
{!
int i = get_global_id(0);!
C[i] = A[i]+B[i];!
}!

GPU binary!

addVector!

CPU binary!

MEMORY OBJECTS!
BUFFERS!

IMAGES!

arg[0] value!

IN
ORDER!
QUEUE!

OUT OF
ORDER
QUEUE!

arg[1] value!
arg[2] value!

COMPILE CODE!

COMMAND QUEUES!

CREATE ARGS AND DATA!

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

COMPUTE DEVICE

SEND TO EXECUTION!
OpenCL Framework: Objects Types
• 
• 
• 
• 
• 
• 
• 

cl_platform_id
"– identifier for a specific platform"
cl_device_id
"– identifier for a specific compute device "
cl_context
"– handle for a compute context"
cl_command_queue "– handle for a command queue (for a compute device)"
cl_mem
"– handle for a memory resource (managed by context)"
cl_program
"– handle for a program resource (library of kernels)"
cl_kernel
"– handle for a compute kernel "

•  All object types are opaque handles"
•  Enables cross-platform compatibility for complex data types"

•  All objects are reference counted and garbage collected"
•  When reference count reaches zero, object is deallocated"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL Framework: Platform Layer
•  To query platform information:"
•  clGetPlatformIDs() à obtain the list of platforms available"
•  clGetPlatformInfo() à platform profile, version, name, vendor, extensions"

•  To query Devices: "
•  clGetDeviceIDs() à obtain the list of devices available on platform"
•  clGetDeviceInfo() à type, capabilities, vendor, name, etc."

•  Create an OpenCL context for one or more devices"
One or more devices!
cl_device_id!

Context!

cl_context!

Memory and device code shared by these devices!
cl_mem

!cl_program!

Command queues to send commands to these devices!
cl_command_queue!

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Context creation: platform IDs
•  SIMPLE EXAMPLE get the platform ID:!
"

// get rst OpenCL platform ID available"
cl_platform_id platform;"
err = clGetPlatformIDs(1, &platform, NULL);"

cl_int clGetPlatformIDs(!
cl_uint num_entries,"
cl_platform_id *platforms,"
cl_uint *num_platforms)"

•  Get all platform IDs:!
"

// get number of OpenCL platforms available"
cl_int err;"
cl_uint num_platforms;"
std::vector<cl_platform_id> platformIDs;"
err = clGetPlatformIDs(NULL, NULL, &num_platforms);
if (err != CL_SUCCESS) { … }
platformIDs.resize(num_platforms);
// get all OpenCL platform IDs
err = clGetPlatformIDs(num_platforms, &platformIDs[0], NULL);

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

If NULL, the arguments are ignored
Context creation: device IDs
•  SIMPLE: get first GPU associated with the platform:"
"

cl_device_id device;"
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);"

•  Get all platform IDs:"
"

cl_uint nDevices;"
cl_device_type deviceType;"
vector<cl_device_id> deviceIDs;"
"

cl_int clGetDeviceIDs(!
cl_platform_id platform,"
cl_device_type device_type,"
cl_uint num_entries,"
cl_device_id *devices,"
cl_uint *num_devices)"
DEVICE TYPE:!

if (platformIDs.size() == 0) {"
CL_DEVICE_TYPE_CPU"
// get number of device IDs for default platform"
CL_DEVICE_TYPE_GPU"
CL_DEVICE_TYPE_ACCELERATOR"
err = clGetDeviceIDs(NULL, deviceType, 0, NULL, &nDevices); "
CL_DEVICE_TYPE_DEFAULT"
} else {"
CL_DEVICE_TYPE_ALL"
// get number of device IDs for selected platform"
err = clGetDeviceIDs(platformIDs[selectedPlatform], deviceType, 0, NULL, &nDevices); "
}"
deviceIDs.resize(nDevices);"
if (platformIDs.size() == 0) {"
// get default device IDs of default platform"
err = clGetDeviceIDs(NULL, deviceType, nDevices, &deviceIDs[0], NULL); "
} else {"
// get device IDs of selected platform"
err = clGetDeviceIDs(platformIDs[selectedPlatform], deviceType, nDevices, &deviceIDs[0], NULL); "
}"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Context creation
•  SIMPLE EXAMPLE: create context object!
"

cl_context context;"
context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);"

•  Create OpenCL context for few devices:!
"

cl_int err;"
cl_context context;
context = clCreateContext(NULL, deviceIDs.size(), &deviceIDs[0], NULL, NULL, &err);
if (err != CL_SUCCESS) { … }
cl_context clCreateContext(!
const cl_context_properties *properties,"
cl_uint num_devices,"
const cl_device_id *devices, "
void CL_CALLBACK *pfn_notify,"
void *user_data,"
cl_int *errcode_ret)"

cl_contet_properties_enum:!
CL_CONTEXT_PLATFORM"
CL_CONTEXT_D3D10_DEVICE_KHR"
CL_GL_CONTEXT_KHR"
CL_EGL_DISPLAY_KHR"
..."
…"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Error Handling and Resource Deallocation
•  Error handling:"
•  All host functions return an error code"
•  Context error callback"
•  The callback function may be called asynchronously by OpenCL and it is the applicationʼs
responsibility to ensure that the callback function is thread-safe"

•  Resource deallocation"
•  Reference counting API: clRetain*(), clRelease*()"
• 
• 
• 
• 
• 
• 

clRetainContext();"
clReleaseContext();"
clRetainMemObject();"
clReleaseMemObject();"
clRetainKernel();"
clReleaseKernel();"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL C
•  Derived from ISO C99!
•  Features added to the language:!
•  Work-items and work-groups"
•  Vector types"
•  Synchronisation"
•  Address space qualifiers"
•  Also includes a large set of built-in functions:!
•  Image manipulation"
•  Work-item manipulation"
•  Math functions"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL C
Language Restrictions:!
•  No functions defined in C99 standard headers"
•  No recursion supported"
•  Pointers to function are not permitted"
•  Pointers to pointers allowed within a kernel, but not as an argument"
•  No variable length arrays and structures"
•  Bit fields are not supported"
•  Writes to a pointer to a type less than 32 bits are not supported*"
•  Double types are not supported, but reserved"
•  3D Image writes are not supported"
"

"
*Some restrictions are addressed through extensions

"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL C Optional Extensions
•  Extensions are optional features exposed through OpenCL"
•  The OpenCL working group has already approved many extensions to the
OpenCL specication:"
• 
• 
• 
• 
• 
• 

Double precision floating-point types"
Built-in functions to support doubles"
Atomic functions*"
Byte-addressable stores (write to pointers to types < 32 bits)*"
3D Image writes"
Built-in functions to support half types"

* New core features in OpenCL 1.1

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL C: Data Types
•  Scalar data types"
•  char, uchar, short, ushort, int, uint, long, ulong, float"
•  bool, intptr_t, ptrdiff_t, size_t, uintptr_t, void, half (storage)"

•  Image types"
•  Image2d_t, image3d_t, sampler_t, event_t"

•  Vector data types"
• 
• 
• 
• 
• 

Vector lengths 2, 3*, 4, 8, 16 (char2, ushort4, int8, float16, double2^, …)"
Endian safe"
Aligned at vector length"
Vector operations"
Built-in function "

* New core features in OpenCL 1.1
^ Double is optional type in OpenCL
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL C: Synchronisation Primitives
•  Built-in functions to order memory operations and synchronise execution:"
•  mem_fence(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)"
•  Waits until all reads/writes to local and/or global memory made by calling work-item prior to
mem_fence() are visible to all threads in the work-group"

•  barrier(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)"
•  Waits until all work-items in the work-group have reached this point and calls mem_fence
(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)"

•  Used to coordinate accesses to local or global memory shared among workitems "

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL Runtime

• 
• 
• 
• 

Command queues creation and management"
Device memory allocation and management"
Device code compilation and execution"
Event creation and management
(synchronisation and proling)"
Kernel Compilation
•  We use cl_program object that encapsulates some source code and its last
successful build (it may contain several kernel functions): "
•  clCreateProgramWithSource() à creates a program object for a context, and loads
the source code specied by the strings array into the program object"
•  clCreateProgramWithBinary() à create program objects and loads the binary there"
•  clBuildProgram() à compiles and links a program executable from program source
or binary"

•  Weʼll use also cl_kernel object which encapsulates the values of the kernelʼs
arguments used when the kernel is executed: "
•  clCreateKernel() à creates a kernel object from successfully compiled program "
•  clSetKernelArg() à sets the argument value for a specific argument of a kernel"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Kernel Compilation
•  Write a kernel:"
"

const char* src = ”__kernel void vectorMul(__global const float *a,n” "
”
__global const float *b,n” "
”
__global float *c,n” "
”
int numElements)n”"
”{n”"
"
”
int i = get_global_id(0);n”"
"
”
if (i < numElements)n”"
”
c[i] = a[i]*b[i];n”"
”}n”;"

•  Create program:

"

cl_program program = "
clCreateProgramWithSource(context, 1, &src, NULL, NULL); "

•  Build program and create kernel:

cl_program clCreateProgramWithSource(!
cl_context context,"
cl_uint count,"
const char **strings,"
const size_t *lengths,"
cl_int *errcode_ret)"
cl_int clBuildProgram(!
cl_program program,"
cl_uint num_devices,"
const cl_device_id *device_list,"
const char *options;"
void CL_CALLBACK *pfn_notify,"
void *user_data)"

"

clBuildProgram(program, 0, NULL, NULL, NULL, NULL); "
cl_kernel kernel = clCreateKernel(program, ”vectorMul”, NULL);"

•  Set kernel arguments:

"

clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&devSrcA); "
clSetKernelArg(kernel, 1, sizeof(cl_mem), (void*)&devSrcB);
clSetKernelArg(kernel, 2, sizeof(cl_mem), (void*)&devDst);
clSetKernelArg(kernel, 3, sizeof(cl_int), (void*)&numElements); "
"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

-cl-opt-disable, !
-cl-mad-enable!
…"
cl_kernel clCreateKernel(!
cl_program program,"
const char *kernel_name,"
cl_int *errcode_ret)"
Memory Objects
•  Memory objects (cl_mem) are categorized into two types:"
•  Buffer objects"
•  Image objects!

•  Memory objects can be copied to host memory, from host memory, or to other
memory objects"
•  Kernels take memory objects as input, and output to one or more memory
objects"
•  Regions of a memory object can be accessed by host by mapping them into
the host address space"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Memory Objects: Buffer Object
•  A buffer object stored a one-dimensional collection of elements (1D array)"
•  Elements of a buffer object can be:"
•  Scalar data type (such as an int, float)"
•  Vector data type"
•  User-defined structure"

•  Elements in a buffer are stored in sequential fashion and can be accessed
using pointer by a kernel executing on a device"
•  Data is stored in the same format as it is accessed by the kernel"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Memory Objects: Image Object
•  Image object stores a two- or three-dimensional texture, frame-buffer or
image"
•  Can be created from existing OpenGL texture or render-buffer"
•  The elements of an image object are selected from a list of predefined image
formats"
•  Image elements are always a 4-component vector (each component can be a
float or signed/unsigned integer) in a kernel"
•  Accessed within device via built-in functions (storage format not exposed to
application)"
•  Sampler objects are used to configure how built-in functions sample images
(addressing modes, ltering modes)"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Command Queue
•  Memory, program and kernel objects à created using a context"
•  Operations on objects performed using a command-queue"
•  The command-queue used to schedule commands for execution on a device"
•  En-queuing functions: clEnqueue*()"
•  Multiple queues can execute on the same device"

•  Modes of execution:"
•  In-order: Each command in the queue executes only when the proceeding
command has completed (including memory writes) "
•  Out-of-order: No guaranteed order of completion for commands"
•  CL_QUEUE_PROFILING ENABLE: enable or disable profiling commands in the
command-queue"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Command Queue
•  Create command queue for a specific device"
cl_command_queue queue = clCreateCommandQueue(context, device, 0, NULL); "
cl_command_queue clCreateCommandQueue(!
cl_context context,"
cl_device_id device,"
cl_command_queue_properties properties,"
cl_int *errcode_ret)"

•  Properties"
•  CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE determines if command-queue are
executed in-order or out-of-order. If set, the commands are executed out-of-order."
•  CL_QUEUE_PROFILING_ENABLE enables or disables profiling of commands in the
command-queue. If set, the proling of commands is enabled. "

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Data Transfer between Host and Device
•  Create buffers on host and device

"

size_t size = 100000*sizeof(int);"
int *host_buffer = (int*)malloc(size); "
cl_mem devSrcA =
clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, NULL); "
cl_mem devSrcB =
clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, NULL);
…"

•  Write to buffer objects from host memory

"

clEnqueueWriteBuffer(queue, devSrcA, "
CL_FALSE, 0, size, host_buffer, 0, NULL, NULL); "
…"

•  Read from buffer object to host memory

"

clEnqueueReadBuffer(queue, devDst, "
CL_TRUE, 0, size, host_buffer, 0, NULL, NULL); "
…"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

cl_mem clCreateBuffer(!
cl_context context,"
cl_mem_flags flags,"
size_t size,"
void *host_ptr,"
cl_int *errcode_ret)"
CL_MEM_READ_WRITE,!
CL_MEM_WRITE_ONLY,!
CL_MEM_READ_ONLY,!
…"
cl_int clEnqueueWriteBuffer(!
cl_command_queue queue,"
cl_mem buffer,"
cl_bool blocking_write,"
size_t offset,"
size_t size,"
const void *ptr,"
cl_uint num_events_in_wait_list,!
const cl_event *event_wait_list,"
cl_event *event)"
Kernel Invocation over NDRange
•  Host code invokes a kernel over an index space NDRange (1D, 2D or 3D)!
•  Work-group dimensionality matches work-item dimensionality"
•  Set number of work-items in a work-group"
size_t localWorkSize = 256;"
int numWorkGroups = (N+localWorkSize-1)/localWorkSize; // round up"
size_t globalWorkSize = numWorkGroups * localWorkSize; // must be divisible by localWorkSize

•  Enqueue kernel"
clEnqueueNDRangeKernel("
queue, kernel 1, NULL, &globalWorkSize, &localWorkSize, 0, NULL, NULL); "

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.

cl_int clEnqueueNDRangeKernel(!
cl_command_queue queue,"
cl_kernel kernel,"
Cl_uint work_dim,"
cont size_t *global_work_offset,"
cont size_t *global_work_size,"
cont size_t *local_work_offset,"
cl_uint num_events_in_wait_list,!
const cl_event *event_wait_list,"
cl_event *event)"
Command Synchronisation
•  Queue barrier command: clEnqueueBarrier()"
•  Commands after the barrier start executing only after all commands before the
barrier have completed"

•  Events: a cl_event object can be associated with each command"
•  Commands return evens and obey event waitlist"
•  clEnqueue*(…, num_events_in_waitlist, *event_waitlist, *event);"

•  Any commands (or clWaitForEvents()) can wait on events before executing"
•  Event object can be queried to track execution status of associated command and
get proling information"

•  Some clEnqueue*() calls can be optionally blocking"
•  clEnqueueReadBuffer(…, CL_TRUE, …);"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Synchronisation: Queues & Events
•  You must explicitly synchronise between queues"
•  Multiple devices each have their own queue (possibly multiple queues per device)"
•  Use events to synchronise kernel executions between queues"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL Resources
•  OpenCL at Khronos"
•  http://www.khronos.org/opencl (spec, registry, man, forums, reference card)"

•  NVIDIA OpenCL website, forum"
•  http://www.nvidia.com/object/cuda_opencl_new.html"
•  http://developer.nvidia.com/object/opencl.html (drivers, profiler, code samples)"

•  AMD Developer Central"
•  http://developer.amd.com/gpu/atistreamsdk/pages/default.aspx"

•  Intel OpenCL SDK"
•  http://software.intel.com/en-us/articles/intel-opencl-sdk/"

•  IBM OpenCL Development Kid for Linux on Power"
•  http://www.alphaworks.ibm.com/tech/opencl"

•  OpenCL Studio"
•  http://www.opencldev.com (develop, visualize, prototype UIs)"

CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Earth Science and Resource Engineering
Tomasz P Bednarz
3D Visualisation Engineer
Mining Technology Team
Mobile: +61 429 153 274
Email: tomasz.bednarz(_at_)csiro.au
Web: www.tomaszbednarz.com

Acknowledgments
Mark Harris, Derek Gerstmann, Mike Houston, Justin Hensley, Jason Young, Dominik Behr, Con Caris,
John Taylor, Khronos Group, AMD, NVIDIA and all others for sharing publicly their GPGPU knowledge
(this presentation is based on)

Thank you …

Contact us
Phone: 1300 363 400 or +61 3 9545 2176
Email: enquiries@csiro.au Web: www.csiro.au

Weitere ähnliche Inhalte

Was ist angesagt?

Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101Yoss Cohen
 
Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and designSatya Harish
 
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted CoresThe Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Coresyeokm1
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9inside-BigData.com
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V IntroductionYi-Hsiu Hsu
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic WorkingNived R Nambiar
 
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)gnkeshava
 
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia SolutionsQualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia SolutionsQualcomm Developer Network
 
Embedded Android : System Development - Part II (Linux device drivers)
Embedded Android : System Development - Part II (Linux device drivers)Embedded Android : System Development - Part II (Linux device drivers)
Embedded Android : System Development - Part II (Linux device drivers)Emertxe Information Technologies Pvt Ltd
 
Building Embedded Linux Systems Introduction
Building Embedded Linux Systems IntroductionBuilding Embedded Linux Systems Introduction
Building Embedded Linux Systems IntroductionSherif Mousa
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architectureKhanh Le
 
Real-Time Operating Systems
Real-Time Operating SystemsReal-Time Operating Systems
Real-Time Operating SystemsPraveen Penumathsa
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory modelSeongJae Park
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)MuntasirMuhit
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
 

Was ist angesagt? (20)

Linux programming - Getting self started
Linux programming - Getting self started Linux programming - Getting self started
Linux programming - Getting self started
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101
 
Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and design
 
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted CoresThe Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Cores
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
 
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
 
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia SolutionsQualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Embedded Android : System Development - Part II (Linux device drivers)
Embedded Android : System Development - Part II (Linux device drivers)Embedded Android : System Development - Part II (Linux device drivers)
Embedded Android : System Development - Part II (Linux device drivers)
 
Building Embedded Linux Systems Introduction
Building Embedded Linux Systems IntroductionBuilding Embedded Linux Systems Introduction
Building Embedded Linux Systems Introduction
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
 
Real-Time Operating Systems
Real-Time Operating SystemsReal-Time Operating Systems
Real-Time Operating Systems
 
Ipc in linux
Ipc in linuxIpc in linux
Ipc in linux
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
 

Andere mochten auch

Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingAndreas Schreiber
 
The OpenCL C++ Wrapper 1.2 Reference Card
The OpenCL C++ Wrapper 1.2 Reference CardThe OpenCL C++ Wrapper 1.2 Reference Card
The OpenCL C++ Wrapper 1.2 Reference CardThe Khronos Group Inc.
 
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLBoosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLJanakiRam Raghumandala
 
Mobile gpu cloud computing
Mobile gpu cloud computing Mobile gpu cloud computing
Mobile gpu cloud computing marwa Ayad Mohamed
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...AMD Developer Central
 
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ..."Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...Edge AI and Vision Alliance
 
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ..."New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...Edge AI and Vision Alliance
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
Nvidia cuda programming_guide_0.8.2
Nvidia cuda programming_guide_0.8.2Nvidia cuda programming_guide_0.8.2
Nvidia cuda programming_guide_0.8.2Piyush Mittal
 
General Programming on the GPU - Confoo
General Programming on the GPU - ConfooGeneral Programming on the GPU - Confoo
General Programming on the GPU - ConfooSirKetchup
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...AMD Developer Central
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jancstalks
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...npinto
 

Andere mochten auch (20)

Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
 
The OpenCL C++ Wrapper 1.2 Reference Card
The OpenCL C++ Wrapper 1.2 Reference CardThe OpenCL C++ Wrapper 1.2 Reference Card
The OpenCL C++ Wrapper 1.2 Reference Card
 
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLBoosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
 
Mobile gpu cloud computing
Mobile gpu cloud computing Mobile gpu cloud computing
Mobile gpu cloud computing
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
 
CUDA vs OpenCL
CUDA vs OpenCLCUDA vs OpenCL
CUDA vs OpenCL
 
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ..."Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
 
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ..."New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Nvidia cuda programming_guide_0.8.2
Nvidia cuda programming_guide_0.8.2Nvidia cuda programming_guide_0.8.2
Nvidia cuda programming_guide_0.8.2
 
Gpgpu intro
Gpgpu introGpgpu intro
Gpgpu intro
 
General Programming on the GPU - Confoo
General Programming on the GPU - ConfooGeneral Programming on the GPU - Confoo
General Programming on the GPU - Confoo
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jan
 
Cliff sugerman
Cliff sugermanCliff sugerman
Cliff sugerman
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
 

Ähnlich wie Introduction to OpenCL, 2010

Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Kento Aoyama
 
Intro to Open Source Hardware (OSHW)
Intro to Open Source Hardware (OSHW)Intro to Open Source Hardware (OSHW)
Intro to Open Source Hardware (OSHW)Drew Fustini
 
OpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADOpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADDesign World
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaopenseesdays
 
20072311272506
2007231127250620072311272506
20072311272506Vinod Vyas
 
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) OverviewNaoki (Neo) SATO
 
DemoCamp Budapest 2016 - Introdcution
DemoCamp Budapest 2016 - IntrodcutionDemoCamp Budapest 2016 - Introdcution
DemoCamp Budapest 2016 - IntrodcutionÁkos Horvåth
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...Edge AI and Vision Alliance
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allEclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allMarc Dutoo
 
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware
 
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxgopikahari7
 
The Future of Node - @rvagg - NodeConf Christchurch 2015
The Future of Node - @rvagg - NodeConf Christchurch 2015The Future of Node - @rvagg - NodeConf Christchurch 2015
The Future of Node - @rvagg - NodeConf Christchurch 2015rvagg
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingSachintha Gunasena
 
Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Grigori Fursin
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...Edge AI and Vision Alliance
 
Survey of open source cloud architectures
Survey of open source cloud architecturesSurvey of open source cloud architectures
Survey of open source cloud architecturesabhinav vedanbhatla
 
Current & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightCurrent & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightabhijit2511
 
OpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIsOpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIsJungsoo Nam
 

Ähnlich wie Introduction to OpenCL, 2010 (20)

Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
 
Intro to Open Source Hardware (OSHW)
Intro to Open Source Hardware (OSHW)Intro to Open Source Hardware (OSHW)
Intro to Open Source Hardware (OSHW)
 
OpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADOpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CAD
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
20072311272506
2007231127250620072311272506
20072311272506
 
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
 
DemoCamp Budapest 2016 - Introdcution
DemoCamp Budapest 2016 - IntrodcutionDemoCamp Budapest 2016 - Introdcution
DemoCamp Budapest 2016 - Introdcution
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allEclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
 
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
 
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptx
 
The Future of Node - @rvagg - NodeConf Christchurch 2015
The Future of Node - @rvagg - NodeConf Christchurch 2015The Future of Node - @rvagg - NodeConf Christchurch 2015
The Future of Node - @rvagg - NodeConf Christchurch 2015
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
 
Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
 
Survey of open source cloud architectures
Survey of open source cloud architecturesSurvey of open source cloud architectures
Survey of open source cloud architectures
 
Current & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightCurrent & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylight
 
OpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIsOpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIs
 

Mehr von Tomasz Bednarz

eResearch AU 2015, intro slides
eResearch AU 2015, intro slideseResearch AU 2015, intro slides
eResearch AU 2015, intro slidesTomasz Bednarz
 
Four Hats of Math: CFD
Four Hats of Math: CFDFour Hats of Math: CFD
Four Hats of Math: CFDTomasz Bednarz
 
NVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 PresentationNVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 PresentationTomasz Bednarz
 
Multi-Modal High-End Visualization System
Multi-Modal High-End Visualization SystemMulti-Modal High-End Visualization System
Multi-Modal High-End Visualization SystemTomasz Bednarz
 
Expanded Perception and Interaction Centre (EPICentre)
Expanded Perception and Interaction Centre (EPICentre)Expanded Perception and Interaction Centre (EPICentre)
Expanded Perception and Interaction Centre (EPICentre)Tomasz Bednarz
 
Seminar 2019 at CSE
Seminar 2019 at CSESeminar 2019 at CSE
Seminar 2019 at CSETomasz Bednarz
 
High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS) High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS) Tomasz Bednarz
 
SIGGRAPH Asia 2019 Opening Ceremony
SIGGRAPH Asia 2019 Opening CeremonySIGGRAPH Asia 2019 Opening Ceremony
SIGGRAPH Asia 2019 Opening CeremonyTomasz Bednarz
 
STEM Camp Virtual Reality
STEM Camp Virtual RealitySTEM Camp Virtual Reality
STEM Camp Virtual RealityTomasz Bednarz
 
Demoscene Stories, and Old-School Code Tricks presented at FMX2015
Demoscene Stories, and Old-School Code Tricks presented at FMX2015Demoscene Stories, and Old-School Code Tricks presented at FMX2015
Demoscene Stories, and Old-School Code Tricks presented at FMX2015Tomasz Bednarz
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 
Design + Art + Science, and Demoscene
Design + Art + Science, and DemosceneDesign + Art + Science, and Demoscene
Design + Art + Science, and DemosceneTomasz Bednarz
 
Big Data in Finance, 2012
Big Data in Finance, 2012Big Data in Finance, 2012
Big Data in Finance, 2012Tomasz Bednarz
 
Hadoop, HDFS, MapReduce and Pig
Hadoop, HDFS, MapReduce and PigHadoop, HDFS, MapReduce and Pig
Hadoop, HDFS, MapReduce and PigTomasz Bednarz
 

Mehr von Tomasz Bednarz (16)

eResearch AU 2015, intro slides
eResearch AU 2015, intro slideseResearch AU 2015, intro slides
eResearch AU 2015, intro slides
 
Four Hats of Math: CFD
Four Hats of Math: CFDFour Hats of Math: CFD
Four Hats of Math: CFD
 
NVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 PresentationNVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 Presentation
 
Multi-Modal High-End Visualization System
Multi-Modal High-End Visualization SystemMulti-Modal High-End Visualization System
Multi-Modal High-End Visualization System
 
Expanded Perception and Interaction Centre (EPICentre)
Expanded Perception and Interaction Centre (EPICentre)Expanded Perception and Interaction Centre (EPICentre)
Expanded Perception and Interaction Centre (EPICentre)
 
Seminar 2019 at CSE
Seminar 2019 at CSESeminar 2019 at CSE
Seminar 2019 at CSE
 
High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS) High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS)
 
EPICentre UNSW
EPICentre UNSWEPICentre UNSW
EPICentre UNSW
 
SIGGRAPH Asia 2019 Opening Ceremony
SIGGRAPH Asia 2019 Opening CeremonySIGGRAPH Asia 2019 Opening Ceremony
SIGGRAPH Asia 2019 Opening Ceremony
 
SoS
SoSSoS
SoS
 
STEM Camp Virtual Reality
STEM Camp Virtual RealitySTEM Camp Virtual Reality
STEM Camp Virtual Reality
 
Demoscene Stories, and Old-School Code Tricks presented at FMX2015
Demoscene Stories, and Old-School Code Tricks presented at FMX2015Demoscene Stories, and Old-School Code Tricks presented at FMX2015
Demoscene Stories, and Old-School Code Tricks presented at FMX2015
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Design + Art + Science, and Demoscene
Design + Art + Science, and DemosceneDesign + Art + Science, and Demoscene
Design + Art + Science, and Demoscene
 
Big Data in Finance, 2012
Big Data in Finance, 2012Big Data in Finance, 2012
Big Data in Finance, 2012
 
Hadoop, HDFS, MapReduce and Pig
Hadoop, HDFS, MapReduce and PigHadoop, HDFS, MapReduce and Pig
Hadoop, HDFS, MapReduce and Pig
 

KĂźrzlich hochgeladen

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervĂŠ Boutemy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

KĂźrzlich hochgeladen (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Introduction to OpenCL, 2010

  • 1. Introduction to OpenCL How to select OpenCL devices, initialise a compute context, allocate device memory, compile and run kernels, output results OpenCL Workshop | December 1, 2010 | Brisbane, Australia! Tomasz Bednarz, CESRE!
  • 2. OpenCL is a trademark of Apple, Inc. Welcome to Open Computing Language (OpenCLTM) •  N-Body Simulation Demo" •  Khronos Group and OpenCL standard" •  OpenCL Anatomy" •  Platform Model" •  Execution Model" •  Memory Model" •  Short Introduction to OpenCL Programming " •  OpenCL C language" •  Supported data types" •  Synchronisation primitives" •  Additional information and resources." CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 4. N-Body Simulation Lars Nyland, Mark Harris, Jan Prins “Fast N-Body Simulation with CUDA”. In Hubert Nguyen, editor, GPU Gems 3, chapter 31, pages 677-695, Addison Wesley 2007. •  Applications" •  •  •  •  Molecular dynamics" Astronomical and astrophysical simulations" Fluid dynamics simulation" Radiosity (Radiometric transfer)" •  N2 interactions to compute per time-step" •  For the brute force all-pairs approach discussed here" •  Highly Parallel" •  High Arithmetic intensity" Two of these galaxies attract each other. CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 5. N-Body Simulation (http://developer.nvidia.com/gpugems3) •  N-Body simulation models the motion of particles subject to a force due to the particle-particle interactions between all particles in the system" •  Typical example: simulation of stars in a galaxy subject to the gravitational force" •  Given N bodies with an initial position xj and velocity vj for 1≤i≤N, the force fij on body i caused by its gravitational attraction to body j is given by the following:" fij = G mi m j rij 2 ! rij rij Fi = # fij = Gmi 1! j!N i" j # m j rij 1! j!N i" j rij 3 where mi and mj are the masses of bodies i and j." •  The acceleration is computed as:" F ai = j i mi CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. i rij = x j ! xi
  • 6. N-Body Simulation •  As bodies approach each other, the force between them grows without bound, therefore softening factor e2>0 may be added" Fi ! Gmi # 1" j"N m j rij ( 2 rij + e 2 ) 3 2 •  The softening factor limits the magnitude of the force between the bodies, which is desirable for numerical integration of the system state" •  Acceleration:" F ai = i ! G " $ mi 1# j#N m j rij ( 2 rij + e 2 ) 3 2 CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 7. N-Body Simulation: parallel concept single interaction between i and j Outer Loop (i) Particle i Particle j Inner Loop (j) •  Particles i, j interact with each other" •  OpenCL can be used to compute acceleration on all bodies in parallel " •  N/p work groups of p work items process p bodies at a time" •  Every work item loads all other body positions from off-chip memory" •  N2 loads … bandwidth bound = poor performance " •  Optimization (using tiles) to be presented in the afternoon session"
  • 8. N-Body Simulation: body-body force calculation Fi ! Gmi # 1" j"N ai = Fi ! G" $ mi 1# j#N m j rij ( ( http://developer.download.nvidia.com/compute/opencl/sdk/website/samples.html#oclNbody http://developer.apple.com/library/mac/#samplecode/OpenCL_NBody_Simulation_Example/Introduction/Intro.html 2 rij + e 2 m j rij 2 rij + e 2 ) ) 3 3 2 2
  • 13. http://www.khronos.org/opencl/ What is OpenCL? OpenCL - Open Computing Language: open, royalty-free standard for programming heterogeneous parallel computing at the intersection of GPU and multi-core CPU capabilities. CPUs Multiple cores driving performance increases Multi-processor programming, threading libraries - e.g. OpenMP GPUs Emerging Intersection Heterogeneous Computing CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. Increasingly general purpose data-parallel computing Graphics APIs and Shading Languages, Vendor Compute APIs Courtesy of
  • 14. What is OpenCL? Roadmap convergence OpenGL 4.0 and OpenGL ES 2.0 are both streamlined, programmable pipelines. GL and ES working groups are working on convergence. WebGL is a positive pressure for portable 3D content for all platforms. Desktop Visual Computing OpenGL and OpenCL have direct interoperability. OpenCL objects can be Created from OpenGL Textures, Buffer Objects and Renderbuffers. Parallel computing and visualisation OpenCL – the center of a visual computing ecosystem with parallel computations, 3D, video, audio, and image processing on desktop, embedded and mobile systems! Desktop 3D Ecosystem Cross-platform desktop 3D 3D for Web Heterogeneous Parallel Programing Embedded 3D Surface and synch abstraction Streaming Media and Image Processing Mobile Visual Computing Compute, graphics and AV APIs interoperate through EGL. CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. Hundreds of men years invested by industry experts in coordinated ecosystem! Streamlined APIs for mobile and embedded graphics, media and compute acceleration Based on http://www.khronos.org/opencl/
  • 15. OpenCL Timeline •  OpenCL 1.0 was released six months after the proposal was created" •  OpenCL ships rst on AppleĘźs Mac OS X Snow Leopard" •  18 month cadence between OpenCL 1.0 and OpenCL 1.1" •  Backward compatible to protect software investment" Multiple conformant implementations ship across diverse OS and platforms.! Khronos releases publicly OpenCL 1.1 as royalty-free specication.! June 2008 May 2009 December 2008 OpenCL working group! is proposed by Apple. ! Draft spec is contributed! to Khronos.! June 2010 2nd Half 2009 Khronos releases OpenCL 1.0 conformance tests to ensure highquality implementations.! CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. OpenCL 1.1 spec is released and rst implementation ship.! Based on http://www.khronos.org/opencl/
  • 16. OCL Quick Reference Cards http://www.khronos.org/files/opencl-quick-reference-card.pdf
  • 17. Design goals of OpenCL •  Enable all compute resources in system" •  CPUs, GPUs, and other processors enabled as peers" •  Data- and task- parallel compute model" •  Efcient parallel programming model" •  ANSI C99 based kernel language" •  Low-level abstraction" •  Abstracts the specics of the underlying hardware" •  High-performance, but device independent " •  Dene precision requirements for all floating-point computations" •  Consistent results on all platforms and devices" •  Interoperability with Graphics APIs" •  Dedicated support for OpenGL, OpenGL ES and DirectX" •  Drive future hardware requirements" •  Applicable to both consumer and HPC applications" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 19. It’s heterogeneous world •  Platform model encapsulates compute resources" •  A modern platform includes:" •  •  •  •  One or more CPUs" One or more GPUs" Optional accelerators (e.g. DSPs)" Other?" Using OpenCL Programmers write a single portable program that uses ALL resources ! in the heterogeneous platform! CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. Based on http://www.khronos.org/opencl/
  • 20. OpenCL Platform Model •  One Host connected to one or more Compute Devices" •  Compute device can be a CPU, GPU or other processor" •  Each Compute Device is composed of one or more Compute Units" •  Compute Unit can may be a core, multi-processor, etc." •  Each Compute Unit is further divided into one or more Processing Elements " •  Processing Elements execute code as SIMD or SPMD! PROCESSING ELEMENT …. COMPUTE UNIT COMPUTE UNIT COMPUTE UNIT COMPUTE UNIT COMPUTE UNIT COMPUTE UNIT ..... COMPUTE DEVICE COMPUTE DEVICE HOST! CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. COMPUTE UNIT
  • 21. Anatomy of OpenCL Application OpenCL Application Device Code - Written in OpenCL C - Executes on the device Host Code - Written in C/C++ - Executes on the host COMPUTE UNIT COMPUTE UNIT COMPUTE UNIT COMPUTE UNIT COMPUTE DEVICE …. HOST! COMPUTE UNIT COMPUTE UNIT ..... COMPUTE DEVICES COMPUTE UNIT COMPUTE DEVICE •  Host code sends commands to the Devices:" •  To transfer data between host memory and device memories! •  To execute device code! CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 22. Anatomy of OpenCL Application •  Serial code executes in a Host (CPU) thread" •  Parallel code executes in many Device (GPU) threads across multiple processing elements" OCL Application Serial code Parallel code Serial code Parallel code Host = CPU Device = GPU … Host = CPU Device = GPU CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. …
  • 24. OpenCL Execution Model •  OpenCL application runs on a Host which submits work to the Compute Devices! •  Work item: the basic unit of work on an OpenCL device" •  Kernel: the code for a work item, which is basically C function" •  Program: Collection of kernels and other functions (analogous to a dynamic library). Managed by host." •  Context: The environment within which work-items execute, which includes devices and their memories and command queues (contains all resources for computation)" •  Command queue: A queue used by the Host application to submit work to a Device (kernel execution instances)" •  Work is queued in-order, one queue per device" •  Work can be executed in-order or out of order" •  Events are used for synchronisation" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. MEMORY! GPU! CPU! CONTEXT GPU & CPU Queues COMMANDS
  • 25. OpenCL Execution Model •  Portable execution model that allows a kernel to execute at each point in a problem domain (N-dimensional computational domain) à decomposition of a task into work-items! Traditional loop as a function in C OpenCL C kernel void ! addVector(const float *A,! const float *B,! float *C,! int N)! {! int index;! __kernel void ! addVector(__global const float *A,! __global const float *B,! __global float *C,! int N)! {! int index = get_global_id(0);! ! ! for (index=0; index<N, index++)! C[index] = A[index]+B[index];! }! if (index < N)! C[index] = A[index]+B[index];! }! ! Work item: the basic unit of work on an OpenCL device Kernel: the code for a work item, which is basically C function CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 26. Kernel Execution on Platform Model Work-Item Compute element Work-Group Compute unit Kernel execution instance •  Each work-item is executed by a compute element! •  Each work-group is executed on a compute unit" •  Several concurrent work-groups can reside on one compute unit depending on work-groupĘźs memory requirements and compute unitĘźs memory resources" Compute device … CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. •  Each kernel is executed on a compute device!
  • 27. Benefits of Work-Groups •  Automatic scalability across devices with different numbers of compute units" •  Work-groups can execute in any order, concurrently or sequentially" •  Efcient cooperation between work-items of same work-group" •  Fast shared memory and synchronization" •  Independence between work-groups gives scalability:" •  A kernel scales across any number of compute units" Device with 2 compute units Kernel Launch Device with 4 compute units Unit 0 Unit 1 Unit 0 Unit 1 Unit 2 Unit 3 Work-group 0! Work-group 1! Work-group 0! Work-group 0! Work-group 1! Work-group 2! Work-group 3! Work-group 2! Work-group 3! Work-group 1! Work-group 4! Work-group 5! Work-group 6! Work-group 7! Work-group 4! Work-group 5! Work-group 2! Work-group 6! Work-group 7! Work-group 3! Work-group 4! Work-group 5! Work-group 6! Work-group 7! CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 28. Work-group synchronisation •  Always dene the best N-dimensional index space (NDRange) for your algorithms (currently 1D, 2D and 3D index spaces are supported)" •  Kernels are executed across a global domain of work-items! •  Work-items are single points of execution and are grouped into local work-groups! •  Global Dimensions: 1024x1024 (whole problem space)" •  Local Dimensions: 32x32 (work-group)" Cannot synchronise outside " of work-groups" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. 1024 1024 Synchronisation between work-items" possible only within workgroups:" barriers and memory fences!
  • 29. Work-items and work-groups •  A kernel is a function executed in each point of a problem domain (for each work-item)" •  Number of work items = 4096 (16 work-groups, 256 workitems each):" get_group_id(0) = 2 DEVICE __kernel void ! addVector(__global const float *A,! __global const float *B,! __global float *C,! int N)! {! int index = get_global_id(0);! ! if (index < N)! C[index] = A[index]+B[index];! }! get_global_id(0) = 1792 NDRANGE 0 1 2 3 4 … 15 get_global_size(0) = 4096 0 1 get_num_groups (0) = 16 … WORK GROUP 255 WORK ITEM get_local_size(0) = 256 get_local_id(0) = 255
  • 30. Work-items and work-groups in 2D •  Number of work items to execute 128 x 128 = 16384:" (A kernel is executed in each point of a problem domain) get_group_id(0),get_group_id(1) DEVICE 0,0 1,0 2,0 … 7,0 0,0 1,0 2,0 0,1 1,1 0,2 0,2 … 1,1 … 15,0 4,1 … 2,2 3,4 . 0,7 get_global_size(0) get_global_id(0),get_global_id(1) 7,7 0,15 get_local_size(0) get_local_id(0),get_local_id(1) get_local_size(1) get_global_size(1) 0,1 WORK ITEMS WORK GROUP NDRANGE
  • 32. OpenCL Memory Model •  Address spaces" •  •  •  •  Private: read/write access for work-item only" Local: read/write access for entire work-group" Global/Constant: visible to all work-groups" Host: accessible by the CPU" •  Synchronisation" Private Memory! Private Memory! Private Memory! Private Memory! Work Item1 Work ItemJ Work Item1 Work ItemJ PE! PE! PE! PE! Compute Unit 1 Local Memory! •  All Synchronisation for all memory accesses must be done explicitly" Compute Unit N Local Memory! Global/Constant Memory! Compute Device Memory management is Explicit! You must move data from host à global à local … and back" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. Host Memory! Host
  • 33. OpenCL Programming •  •  •  •  How to dene the platform" How to execute code on the platform" How to move data around in memory" How to write (and build) programs"
  • 35. OpenCL Language and API Highlights •  Platform Layer API (called from host)" •  Abstraction layer for diverse computational resources" •  Query, select and initialise compute devices" •  Create compute contexts and work-queues" •  Runtime API (called from host)" •  Launch compute kernels" •  Set kernel execution conguration" •  Manage scheduling, compute, and memory resources" •  OpenCL language" •  To write C-based compute kernels for execution on a compute device" •  Includes rich set of build-in functions" •  Can be compiled JIT/Online or offline" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 36. OpenCL Language Highlights •  Function qualiers" __kernel void ! addVector(__global const float *A,! __global const float *B,! __global float *C,! int N)! {! int index = get_global_id(0);! •  __kernel qualier declares a function as a kernel" •  Address space qualiers" ! if (index < N)! C[index] = A[index]+B[index];! }! •  __global, __local, __constant, __private" •  Work-item functions" •  get_work_dim(), get_global_id(), get_local_id(), get_group_id(), get_local_size()" •  Image functions" •  Images must be accessed through built-in functions" •  Read/writes performed through sampler objects from host or dened in source" •  Synchronisation functions" •  Barriers – all work-items within a work-group must execute the barrier function before any work-item in the work-group can continue" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 37. OpenCL Framework: Overview •  Platform layer: platform query and context creation" •  Compiler for OpenCL C" •  Runtime: memory management and command execution within a context" CPU! GPU! CONTEXT! KERNELS! PROGRAMS! __kernel void ! addVector(! __global float *A,! __global const float *B,! __global float *C)! {! int i = get_global_id(0);! C[i] = A[i]+B[i];! }! GPU binary! addVector! CPU binary! MEMORY OBJECTS! BUFFERS! IMAGES! arg[0] value! IN ORDER! QUEUE! OUT OF ORDER QUEUE! arg[1] value! arg[2] value! COMPILE CODE! COMMAND QUEUES! CREATE ARGS AND DATA! CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. COMPUTE DEVICE SEND TO EXECUTION!
  • 38. OpenCL Framework: Objects Types •  •  •  •  •  •  •  cl_platform_id "– identier for a specic platform" cl_device_id "– identier for a specic compute device " cl_context "– handle for a compute context" cl_command_queue "– handle for a command queue (for a compute device)" cl_mem "– handle for a memory resource (managed by context)" cl_program "– handle for a program resource (library of kernels)" cl_kernel "– handle for a compute kernel " •  All object types are opaque handles" •  Enables cross-platform compatibility for complex data types" •  All objects are reference counted and garbage collected" •  When reference count reaches zero, object is deallocated" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 39. OpenCL Framework: Platform Layer •  To query platform information:" •  clGetPlatformIDs() à obtain the list of platforms available" •  clGetPlatformInfo() à platform prole, version, name, vendor, extensions" •  To query Devices: " •  clGetDeviceIDs() à obtain the list of devices available on platform" •  clGetDeviceInfo() à type, capabilities, vendor, name, etc." •  Create an OpenCL context for one or more devices" One or more devices! cl_device_id! Context! cl_context! Memory and device code shared by these devices! cl_mem !cl_program! Command queues to send commands to these devices! cl_command_queue! CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 40. Context creation: platform IDs •  SIMPLE EXAMPLE get the platform ID:! " // get rst OpenCL platform ID available" cl_platform_id platform;" err = clGetPlatformIDs(1, &platform, NULL);" cl_int clGetPlatformIDs(! cl_uint num_entries," cl_platform_id *platforms," cl_uint *num_platforms)" •  Get all platform IDs:! " // get number of OpenCL platforms available" cl_int err;" cl_uint num_platforms;" std::vector<cl_platform_id> platformIDs;" err = clGetPlatformIDs(NULL, NULL, &num_platforms); if (err != CL_SUCCESS) { … } platformIDs.resize(num_platforms); // get all OpenCL platform IDs err = clGetPlatformIDs(num_platforms, &platformIDs[0], NULL); CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. If NULL, the arguments are ignored
  • 41. Context creation: device IDs •  SIMPLE: get rst GPU associated with the platform:" " cl_device_id device;" err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);" •  Get all platform IDs:" " cl_uint nDevices;" cl_device_type deviceType;" vector<cl_device_id> deviceIDs;" " cl_int clGetDeviceIDs(! cl_platform_id platform," cl_device_type device_type," cl_uint num_entries," cl_device_id *devices," cl_uint *num_devices)" DEVICE TYPE:! if (platformIDs.size() == 0) {" CL_DEVICE_TYPE_CPU" // get number of device IDs for default platform" CL_DEVICE_TYPE_GPU" CL_DEVICE_TYPE_ACCELERATOR" err = clGetDeviceIDs(NULL, deviceType, 0, NULL, &nDevices); " CL_DEVICE_TYPE_DEFAULT" } else {" CL_DEVICE_TYPE_ALL" // get number of device IDs for selected platform" err = clGetDeviceIDs(platformIDs[selectedPlatform], deviceType, 0, NULL, &nDevices); " }" deviceIDs.resize(nDevices);" if (platformIDs.size() == 0) {" // get default device IDs of default platform" err = clGetDeviceIDs(NULL, deviceType, nDevices, &deviceIDs[0], NULL); " } else {" // get device IDs of selected platform" err = clGetDeviceIDs(platformIDs[selectedPlatform], deviceType, nDevices, &deviceIDs[0], NULL); " }" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 42. Context creation •  SIMPLE EXAMPLE: create context object! " cl_context context;" context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);" •  Create OpenCL context for few devices:! " cl_int err;" cl_context context; context = clCreateContext(NULL, deviceIDs.size(), &deviceIDs[0], NULL, NULL, &err); if (err != CL_SUCCESS) { … } cl_context clCreateContext(! const cl_context_properties *properties," cl_uint num_devices," const cl_device_id *devices, " void CL_CALLBACK *pfn_notify," void *user_data," cl_int *errcode_ret)" cl_contet_properties_enum:! CL_CONTEXT_PLATFORM" CL_CONTEXT_D3D10_DEVICE_KHR" CL_GL_CONTEXT_KHR" CL_EGL_DISPLAY_KHR" ..." …" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 43. Error Handling and Resource Deallocation •  Error handling:" •  All host functions return an error code" •  Context error callback" •  The callback function may be called asynchronously by OpenCL and it is the applicationĘźs responsibility to ensure that the callback function is thread-safe" •  Resource deallocation" •  Reference counting API: clRetain*(), clRelease*()" •  •  •  •  •  •  clRetainContext();" clReleaseContext();" clRetainMemObject();" clReleaseMemObject();" clRetainKernel();" clReleaseKernel();" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 44. OpenCL C •  Derived from ISO C99! •  Features added to the language:! •  Work-items and work-groups" •  Vector types" •  Synchronisation" •  Address space qualiers" •  Also includes a large set of built-in functions:! •  Image manipulation" •  Work-item manipulation" •  Math functions" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 45. OpenCL C Language Restrictions:! •  No functions dened in C99 standard headers" •  No recursion supported" •  Pointers to function are not permitted" •  Pointers to pointers allowed within a kernel, but not as an argument" •  No variable length arrays and structures" •  Bit elds are not supported" •  Writes to a pointer to a type less than 32 bits are not supported*" •  Double types are not supported, but reserved" •  3D Image writes are not supported" " " *Some restrictions are addressed through extensions " CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 46. OpenCL C Optional Extensions •  Extensions are optional features exposed through OpenCL" •  The OpenCL working group has already approved many extensions to the OpenCL specication:" •  •  •  •  •  •  Double precision floating-point types" Built-in functions to support doubles" Atomic functions*" Byte-addressable stores (write to pointers to types < 32 bits)*" 3D Image writes" Built-in functions to support half types" * New core features in OpenCL 1.1 CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 47. OpenCL C: Data Types •  Scalar data types" •  char, uchar, short, ushort, int, uint, long, ulong, float" •  bool, intptr_t, ptrdiff_t, size_t, uintptr_t, void, half (storage)" •  Image types" •  Image2d_t, image3d_t, sampler_t, event_t" •  Vector data types" •  •  •  •  •  Vector lengths 2, 3*, 4, 8, 16 (char2, ushort4, int8, float16, double2^, …)" Endian safe" Aligned at vector length" Vector operations" Built-in function " * New core features in OpenCL 1.1 ^ Double is optional type in OpenCL CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 48. OpenCL C: Synchronisation Primitives •  Built-in functions to order memory operations and synchronise execution:" •  mem_fence(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)" •  Waits until all reads/writes to local and/or global memory made by calling work-item prior to mem_fence() are visible to all threads in the work-group" •  barrier(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)" •  Waits until all work-items in the work-group have reached this point and calls mem_fence (CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)" •  Used to coordinate accesses to local or global memory shared among workitems " CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 49. OpenCL Runtime •  •  •  •  Command queues creation and management" Device memory allocation and management" Device code compilation and execution" Event creation and management (synchronisation and proling)"
  • 50. Kernel Compilation •  We use cl_program object that encapsulates some source code and its last successful build (it may contain several kernel functions): " •  clCreateProgramWithSource() à creates a program object for a context, and loads the source code specied by the strings array into the program object" •  clCreateProgramWithBinary() à create program objects and loads the binary there" •  clBuildProgram() à compiles and links a program executable from program source or binary" •  WeĘźll use also cl_kernel object which encapsulates the values of the kernelĘźs arguments used when the kernel is executed: " •  clCreateKernel() à creates a kernel object from successfully compiled program " •  clSetKernelArg() à sets the argument value for a specic argument of a kernel" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 51. Kernel Compilation •  Write a kernel:" " const char* src = ”__kernel void vectorMul(__global const float *a,n” " ” __global const float *b,n” " ” __global float *c,n” " ” int numElements)n”" ”{n”" " ” int i = get_global_id(0);n”" " ” if (i < numElements)n”" ” c[i] = a[i]*b[i];n”" ”}n”;" •  Create program: " cl_program program = " clCreateProgramWithSource(context, 1, &src, NULL, NULL); " •  Build program and create kernel: cl_program clCreateProgramWithSource(! cl_context context," cl_uint count," const char **strings," const size_t *lengths," cl_int *errcode_ret)" cl_int clBuildProgram(! cl_program program," cl_uint num_devices," const cl_device_id *device_list," const char *options;" void CL_CALLBACK *pfn_notify," void *user_data)" " clBuildProgram(program, 0, NULL, NULL, NULL, NULL); " cl_kernel kernel = clCreateKernel(program, ”vectorMul”, NULL);" •  Set kernel arguments: " clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&devSrcA); " clSetKernelArg(kernel, 1, sizeof(cl_mem), (void*)&devSrcB); clSetKernelArg(kernel, 2, sizeof(cl_mem), (void*)&devDst); clSetKernelArg(kernel, 3, sizeof(cl_int), (void*)&numElements); " " CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. -cl-opt-disable, ! -cl-mad-enable! …" cl_kernel clCreateKernel(! cl_program program," const char *kernel_name," cl_int *errcode_ret)"
  • 52. Memory Objects •  Memory objects (cl_mem) are categorized into two types:" •  Buffer objects" •  Image objects! •  Memory objects can be copied to host memory, from host memory, or to other memory objects" •  Kernels take memory objects as input, and output to one or more memory objects" •  Regions of a memory object can be accessed by host by mapping them into the host address space" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 53. Memory Objects: Buffer Object •  A buffer object stored a one-dimensional collection of elements (1D array)" •  Elements of a buffer object can be:" •  Scalar data type (such as an int, float)" •  Vector data type" •  User-dened structure" •  Elements in a buffer are stored in sequential fashion and can be accessed using pointer by a kernel executing on a device" •  Data is stored in the same format as it is accessed by the kernel" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 54. Memory Objects: Image Object •  Image object stores a two- or three-dimensional texture, frame-buffer or image" •  Can be created from existing OpenGL texture or render-buffer" •  The elements of an image object are selected from a list of predened image formats" •  Image elements are always a 4-component vector (each component can be a float or signed/unsigned integer) in a kernel" •  Accessed within device via built-in functions (storage format not exposed to application)" •  Sampler objects are used to congure how built-in functions sample images (addressing modes, ltering modes)" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 55. Command Queue •  Memory, program and kernel objects à created using a context" •  Operations on objects performed using a command-queue" •  The command-queue used to schedule commands for execution on a device" •  En-queuing functions: clEnqueue*()" •  Multiple queues can execute on the same device" •  Modes of execution:" •  In-order: Each command in the queue executes only when the proceeding command has completed (including memory writes) " •  Out-of-order: No guaranteed order of completion for commands" •  CL_QUEUE_PROFILING ENABLE: enable or disable proling commands in the command-queue" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 56. Command Queue •  Create command queue for a specic device" cl_command_queue queue = clCreateCommandQueue(context, device, 0, NULL); " cl_command_queue clCreateCommandQueue(! cl_context context," cl_device_id device," cl_command_queue_properties properties," cl_int *errcode_ret)" •  Properties" •  CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE determines if command-queue are executed in-order or out-of-order. If set, the commands are executed out-of-order." •  CL_QUEUE_PROFILING_ENABLE enables or disables proling of commands in the command-queue. If set, the proling of commands is enabled. " CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 57. Data Transfer between Host and Device •  Create buffers on host and device " size_t size = 100000*sizeof(int);" int *host_buffer = (int*)malloc(size); " cl_mem devSrcA = clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, NULL); " cl_mem devSrcB = clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, NULL); …" •  Write to buffer objects from host memory " clEnqueueWriteBuffer(queue, devSrcA, " CL_FALSE, 0, size, host_buffer, 0, NULL, NULL); " …" •  Read from buffer object to host memory " clEnqueueReadBuffer(queue, devDst, " CL_TRUE, 0, size, host_buffer, 0, NULL, NULL); " …" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. cl_mem clCreateBuffer(! cl_context context," cl_mem_flags flags," size_t size," void *host_ptr," cl_int *errcode_ret)" CL_MEM_READ_WRITE,! CL_MEM_WRITE_ONLY,! CL_MEM_READ_ONLY,! …" cl_int clEnqueueWriteBuffer(! cl_command_queue queue," cl_mem buffer," cl_bool blocking_write," size_t offset," size_t size," const void *ptr," cl_uint num_events_in_wait_list,! const cl_event *event_wait_list," cl_event *event)"
  • 58. Kernel Invocation over NDRange •  Host code invokes a kernel over an index space NDRange (1D, 2D or 3D)! •  Work-group dimensionality matches work-item dimensionality" •  Set number of work-items in a work-group" size_t localWorkSize = 256;" int numWorkGroups = (N+localWorkSize-1)/localWorkSize; // round up" size_t globalWorkSize = numWorkGroups * localWorkSize; // must be divisible by localWorkSize •  Enqueue kernel" clEnqueueNDRangeKernel(" queue, kernel 1, NULL, &globalWorkSize, &localWorkSize, 0, NULL, NULL); " CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010. cl_int clEnqueueNDRangeKernel(! cl_command_queue queue," cl_kernel kernel," Cl_uint work_dim," cont size_t *global_work_offset," cont size_t *global_work_size," cont size_t *local_work_offset," cl_uint num_events_in_wait_list,! const cl_event *event_wait_list," cl_event *event)"
  • 59. Command Synchronisation •  Queue barrier command: clEnqueueBarrier()" •  Commands after the barrier start executing only after all commands before the barrier have completed" •  Events: a cl_event object can be associated with each command" •  Commands return evens and obey event waitlist" •  clEnqueue*(…, num_events_in_waitlist, *event_waitlist, *event);" •  Any commands (or clWaitForEvents()) can wait on events before executing" •  Event object can be queried to track execution status of associated command and get proling information" •  Some clEnqueue*() calls can be optionally blocking" •  clEnqueueReadBuffer(…, CL_TRUE, …);" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 60. Synchronisation: Queues & Events •  You must explicitly synchronise between queues" •  Multiple devices each have their own queue (possibly multiple queues per device)" •  Use events to synchronise kernel executions between queues" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 61. OpenCL Resources •  OpenCL at Khronos" •  http://www.khronos.org/opencl (spec, registry, man, forums, reference card)" •  NVIDIA OpenCL website, forum" •  http://www.nvidia.com/object/cuda_opencl_new.html" •  http://developer.nvidia.com/object/opencl.html (drivers, proler, code samples)" •  AMD Developer Central" •  http://developer.amd.com/gpu/atistreamsdk/pages/default.aspx" •  Intel OpenCL SDK" •  http://software.intel.com/en-us/articles/intel-opencl-sdk/" •  IBM OpenCL Development Kid for Linux on Power" •  http://www.alphaworks.ibm.com/tech/opencl" •  OpenCL Studio" •  http://www.opencldev.com (develop, visualize, prototype UIs)" CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
  • 62. Earth Science and Resource Engineering Tomasz P Bednarz 3D Visualisation Engineer Mining Technology Team Mobile: +61 429 153 274 Email: tomasz.bednarz(_at_)csiro.au Web: www.tomaszbednarz.com Acknowledgments Mark Harris, Derek Gerstmann, Mike Houston, Justin Hensley, Jason Young, Dominik Behr, Con Caris, John Taylor, Khronos Group, AMD, NVIDIA and all others for sharing publicly their GPGPU knowledge (this presentation is based on) Thank you … Contact us Phone: 1300 363 400 or +61 3 9545 2176 Email: enquiries@csiro.au Web: www.csiro.au