SlideShare a Scribd company logo
1 of 13
Download to read offline
INTRODUCTION
TO OPENCL
Unai Lopez

Intelligent Systems Group

Department of Computer Architecture & Technology

University of the Basque Country
Outline
1)  Introduction

2)  Programming Basics


3)  “Hello World”


4)  Final remarks
OpenCL
•  Standard for the development of data parallel applications

•  Most used for the development of GPGPU applications:
 General Purpose computing on Graphics Processing Units

•  A GPU is comprised of hundreds of compute cores




     nVidia GTX 285 (240 Compute cores)   nVidia GT200b Architecture

•  Specialized for massively data parallel computation
OpenCL & GPGPU
•  GPGPU: Take advantage of GPU’s computing power to
 make massively parallel applications

•  Parallel applications with huge acceleration in Molecular
 Dynamics, Image Processing, Evolutionary Computation,…

•  All cases based on data parallelism:
 each thread processes a subset of the data

•  For example, a vector addition:

        A	

        +	

        B	

        ||	

        C	

   Thread ID    0   1   2   3   4   5   6   7   8   9   10   11
OpenCL
•  Furthermore, OpenCL provides portability:
  same code can run on different architectures

•  For example:




Intel Core i5 CPU      STICell B/E        Intel Xeon Phi    AMD HD 6950 GPU
4 cores @ 2’5 Ghz   8 cores @ 3,2 Ghz   50 cores @ 1 Ghz   1408 cores @ 800 Mhz
OpenCL
•  Provides the following abstraction:
 A compute device is composed by compute units




•  OpenCL platform: Host + Compute Devices

•  Each manufacturer provides an SDK:
   •  NVIDIA SDK for GPUs
   •  AMD APP for CPUs/GPU
   •  Intel for CPUs
   •  IBM for PowerPC and Cell B/E
Programming Basics
•  Kernel: function that defines the behavior of each thread


•  For example, kernel for vector addition:
  __kernel void sumKernel (
  __global int* a, __global int* b, __global int* c)
  {
     int i = get_global_id(0);
     c[i] = a[i] + b[i];
  }


•  Written in OpenCL-C: ANSI-C + Set of kernel functions, e.g.:
   •  get_global_id: obtains thread index
   •  barrier: synchronizes threads
Programming Basics
•  An OpenCL applications consists of:



 Kernel file (OpenCL-C): problem computation   Host code(C): kernel management


•  Basic host application flow:
   1.  Load and Compilation of kernel
   2.  Data copy from host to device (e.g. from CPU to GPU)
   3.  Execution of kernel
   4.  Data copy from device to host
   5.  Release kernels and data from device memory
•  Execution using command queue in each device
Programming Basics
•  Host code: programmed using OpenCL API

•  API Calls, such as:
   •  clCreateProgramWithSource: Load kernel from char*
   •  clBuildProgram: Compile kernel
   •  clSetKernelArgs: Set kernel arguments for the device
   •  clEnqueueWriteBuffer/clEnqueueRead: Copy data vector to device
   •  clEnqueueNDRangerKernel: Launch kernel in device


•  API Types, such as:
   •  cl_mem: Pointer to device memory objects
   •  cl_program: Kernel object
   •  cl_float / cl_int / cl_uint: Redefinition of C types
Hello World

•  Implementation of simple vector addition in OpenCL


•  Checks for default platform and device in the system


•  Modify Makefile with proper paths in each system


•  Run: vectorAdd <size_of_vector>
Final Remarks
•  OpenCL does not provide performance portability

•  Alternative to NVIDIA CUDA:
 Programming paradigm for NVIDIA GPU cards

•  Combinable with other parallel programming models:
   •  OpenMP for SMPs / MPI for MPPs


•  Huge ecosystems for OpenCL, e.g. OpenACC:
 Develop GPGPU applications using directives
           #pragma acc kernels
           for(i = 0; i< N; i++)
              c[i] = b[i] + a[i];
More about OpenCL
•  Before starting to develop take a look at:
   •  Context, command queues, events,…
•  Documentation
   •  Khronos Group: Maintainers of OpenCL
   •  OpenCL Best practices guide in CUDA/AMD SDKs
   •  Programming Massively Parallel Processors (Book for CUDA)


•  OpenCL sample applications:
   •  Most SDKs include example OpenCL applications
   •  Rodinia: http://lava.cs.virginia.edu/wiki/rodinia
   •  Parboil: http://impact.crhc.illinois.edu/parboil.aspx
INTRODUCTION
TO OPENCL
Unai Lopez – ulopez009@ehu.es

Intelligent Systems Group

Department of Computer Architecture & Technology

University of the Basque Country

More Related Content

What's hot

Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
Parallel computing
Parallel computingParallel computing
Parallel computingVinay Gupta
 
GPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive SolutionsGPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive SolutionsGlobalLogic Ukraine
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)MuntasirMuhit
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisPaul V. Novarese
 
Introduction Linux Device Drivers
Introduction Linux Device DriversIntroduction Linux Device Drivers
Introduction Linux Device DriversNEEVEE Technologies
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture Haris456
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
 
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101Yoss Cohen
 

What's hot (20)

Linux device drivers
Linux device drivers Linux device drivers
Linux device drivers
 
OpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel ComputingOpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel Computing
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
GPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive SolutionsGPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive Solutions
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)
 
I2c drivers
I2c driversI2c drivers
I2c drivers
 
Linux-Internals-and-Networking
Linux-Internals-and-NetworkingLinux-Internals-and-Networking
Linux-Internals-and-Networking
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
 
Linux Kernel Overview
Linux Kernel OverviewLinux Kernel Overview
Linux Kernel Overview
 
Introduction Linux Device Drivers
Introduction Linux Device DriversIntroduction Linux Device Drivers
Introduction Linux Device Drivers
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
Open MPI
Open MPIOpen MPI
Open MPI
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Gpu
GpuGpu
Gpu
 
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101
 

Viewers also liked

Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomicsUSC
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...Edge AI and Vision Alliance
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentationomutukuda
 
Field programable gate array
Field programable gate arrayField programable gate array
Field programable gate arrayNeha Agarwal
 
FPGAs : An Overview
FPGAs : An OverviewFPGAs : An Overview
FPGAs : An OverviewSanjiv Malik
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processingPage Maker
 
Fundamentals of FPGA
Fundamentals of FPGAFundamentals of FPGA
Fundamentals of FPGAvelamakuri
 

Viewers also liked (13)

Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentation
 
Field programable gate array
Field programable gate arrayField programable gate array
Field programable gate array
 
FPGAs : An Overview
FPGAs : An OverviewFPGAs : An Overview
FPGAs : An Overview
 
FPGA Introduction
FPGA IntroductionFPGA Introduction
FPGA Introduction
 
FPGA
FPGAFPGA
FPGA
 
What is FPGA?
What is FPGA?What is FPGA?
What is FPGA?
 
FPGA
FPGAFPGA
FPGA
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Fundamentals of FPGA
Fundamentals of FPGAFundamentals of FPGA
Fundamentals of FPGA
 

Similar to Introduction to OpenCL

Introduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam MustafaIntroduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam MustafaHAMMAD GHULAM MUSTAFA
 
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxgopikahari7
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computingbakers84
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...AMD Developer Central
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...mouhouioui
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Open cl programming using python syntax
Open cl programming using python syntaxOpen cl programming using python syntax
Open cl programming using python syntaxcsandit
 
OpenCL programming using Python syntax
OpenCL programming using Python syntax OpenCL programming using Python syntax
OpenCL programming using Python syntax cscpconf
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer George Markomanolis
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersJoy Qiao
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsUnai Lopez-Novoa
 
Lec 10-linux-review
Lec 10-linux-reviewLec 10-linux-review
Lec 10-linux-reviewabinaya m
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdfJunZhao68
 

Similar to Introduction to OpenCL (20)

Introduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam MustafaIntroduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam Mustafa
 
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptx
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Open cl programming using python syntax
Open cl programming using python syntaxOpen cl programming using python syntax
Open cl programming using python syntax
 
OpenCL programming using Python syntax
OpenCL programming using Python syntax OpenCL programming using Python syntax
OpenCL programming using Python syntax
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clusters
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Cuda
CudaCuda
Cuda
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
Lec 10-linux-review
Lec 10-linux-reviewLec 10-linux-review
Lec 10-linux-review
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
 

More from Unai Lopez-Novoa

Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...Unai Lopez-Novoa
 
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...Unai Lopez-Novoa
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Unai Lopez-Novoa
 
Introducción a la Computación Paralela
Introducción a la Computación ParalelaIntroducción a la Computación Paralela
Introducción a la Computación ParalelaUnai Lopez-Novoa
 
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de RendimientoComputación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de RendimientoUnai Lopez-Novoa
 
Tolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con CheckpointingTolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con CheckpointingUnai Lopez-Novoa
 

More from Unai Lopez-Novoa (8)

Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...
 
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
 
Introducción a la Computación Paralela
Introducción a la Computación ParalelaIntroducción a la Computación Paralela
Introducción a la Computación Paralela
 
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de RendimientoComputación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
Tolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con CheckpointingTolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con Checkpointing
 
Introduccion a MPI
Introduccion a MPIIntroduccion a MPI
Introduccion a MPI
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Introduction to OpenCL

  • 1. INTRODUCTION TO OPENCL Unai Lopez Intelligent Systems Group Department of Computer Architecture & Technology University of the Basque Country
  • 2. Outline 1)  Introduction 2)  Programming Basics 3)  “Hello World” 4)  Final remarks
  • 3. OpenCL •  Standard for the development of data parallel applications •  Most used for the development of GPGPU applications: General Purpose computing on Graphics Processing Units •  A GPU is comprised of hundreds of compute cores nVidia GTX 285 (240 Compute cores) nVidia GT200b Architecture •  Specialized for massively data parallel computation
  • 4. OpenCL & GPGPU •  GPGPU: Take advantage of GPU’s computing power to make massively parallel applications •  Parallel applications with huge acceleration in Molecular Dynamics, Image Processing, Evolutionary Computation,… •  All cases based on data parallelism: each thread processes a subset of the data •  For example, a vector addition: A + B || C Thread ID 0 1 2 3 4 5 6 7 8 9 10 11
  • 5. OpenCL •  Furthermore, OpenCL provides portability: same code can run on different architectures •  For example: Intel Core i5 CPU STICell B/E Intel Xeon Phi AMD HD 6950 GPU 4 cores @ 2’5 Ghz 8 cores @ 3,2 Ghz 50 cores @ 1 Ghz 1408 cores @ 800 Mhz
  • 6. OpenCL •  Provides the following abstraction: A compute device is composed by compute units •  OpenCL platform: Host + Compute Devices •  Each manufacturer provides an SDK: •  NVIDIA SDK for GPUs •  AMD APP for CPUs/GPU •  Intel for CPUs •  IBM for PowerPC and Cell B/E
  • 7. Programming Basics •  Kernel: function that defines the behavior of each thread •  For example, kernel for vector addition: __kernel void sumKernel ( __global int* a, __global int* b, __global int* c) { int i = get_global_id(0); c[i] = a[i] + b[i]; } •  Written in OpenCL-C: ANSI-C + Set of kernel functions, e.g.: •  get_global_id: obtains thread index •  barrier: synchronizes threads
  • 8. Programming Basics •  An OpenCL applications consists of: Kernel file (OpenCL-C): problem computation Host code(C): kernel management •  Basic host application flow: 1.  Load and Compilation of kernel 2.  Data copy from host to device (e.g. from CPU to GPU) 3.  Execution of kernel 4.  Data copy from device to host 5.  Release kernels and data from device memory •  Execution using command queue in each device
  • 9. Programming Basics •  Host code: programmed using OpenCL API •  API Calls, such as: •  clCreateProgramWithSource: Load kernel from char* •  clBuildProgram: Compile kernel •  clSetKernelArgs: Set kernel arguments for the device •  clEnqueueWriteBuffer/clEnqueueRead: Copy data vector to device •  clEnqueueNDRangerKernel: Launch kernel in device •  API Types, such as: •  cl_mem: Pointer to device memory objects •  cl_program: Kernel object •  cl_float / cl_int / cl_uint: Redefinition of C types
  • 10. Hello World •  Implementation of simple vector addition in OpenCL •  Checks for default platform and device in the system •  Modify Makefile with proper paths in each system •  Run: vectorAdd <size_of_vector>
  • 11. Final Remarks •  OpenCL does not provide performance portability •  Alternative to NVIDIA CUDA: Programming paradigm for NVIDIA GPU cards •  Combinable with other parallel programming models: •  OpenMP for SMPs / MPI for MPPs •  Huge ecosystems for OpenCL, e.g. OpenACC: Develop GPGPU applications using directives #pragma acc kernels for(i = 0; i< N; i++) c[i] = b[i] + a[i];
  • 12. More about OpenCL •  Before starting to develop take a look at: •  Context, command queues, events,… •  Documentation •  Khronos Group: Maintainers of OpenCL •  OpenCL Best practices guide in CUDA/AMD SDKs •  Programming Massively Parallel Processors (Book for CUDA) •  OpenCL sample applications: •  Most SDKs include example OpenCL applications •  Rodinia: http://lava.cs.virginia.edu/wiki/rodinia •  Parboil: http://impact.crhc.illinois.edu/parboil.aspx
  • 13. INTRODUCTION TO OPENCL Unai Lopez – ulopez009@ehu.es Intelligent Systems Group Department of Computer Architecture & Technology University of the Basque Country