SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
ENHANCING OPENCL PERFORMANCE IN
COREL AFTERSHOT™ PRO WITH HSA
COREL AFTERSHOT™ PRO
 What is Corel AfterShot™ Pro?

 Corel AfterShot™ Pro is photo workflow software
 Non-destructive photo editing of JPEG, TIFF, and Raw formats from hundreds of cameras
 Photo Management
 Batch Processing of modified files

2 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
AfterShot Pro
Basics
INSIDE AFTERSHOT

Architectural
Features:
‒ Task Scheduling
‒ Tile Processing

4 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
AFTERSHOT TASK MANAGEMENT
 Work is broken down into Tasks. Tasks
typically:
‒ Contain execution logic (code)
‒ May store resultant data
‒ Track whether they are complete

Disk

Photo
Thumbnail

File Reader

 The Task Scheduler:
‒ Allocates a worker thread per CPU core
‒ Runs Tasks based on priority
‒ Allows Tasks to block on each other

JPEG Decoder
Task Dependency
Data

A Simple Task Dependency Graph
5 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
PROCESSING WITH TILES
 The standard simpler approach is to use large monolithic images
 Images are broken down into tiles for processing
 Tiling provides faster screen updates. Only compute the visible parts of the image
 Tiling allows more effective memory management

6 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
PROCESSING WITH TILES CONTINUED
 The Image Processing Pipeline is made
up of several discrete steps [or filters]
 To process a single tile:
‒ Load the input data (e.g. raw or jpeg data)
‒ Apply each Filter step in turn

 Generally, we only need the output of
the last step, the top Tile in the Stack

Raw Data
7 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013

Final Image
ADVANCED TILE PROCESSING
 Some Image Filters require a radius of pixels
as input
 Partially processed neighbor Tiles must
complete before the main Tile can continue

 Intermediate Tiles must be stored in memory
so they do not rerun
 Example Filters:
‒ Sharpening
‒ Lens Correction
‒ Noise Reduction
‒ Cropping
8 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013

Requires multiple source tiles
OpenCL™ in
AfterShot Pro
ACCELERATING AFTERSHOT WITH OPENCL™
Goals for the AfterShot Pro OpenCL port
 Offload image processing from Tiles
 Work within the existing System
‒ Contain changes to a few critical modules
‒ Maintain full CPU utilization
‒ Integrate OpenCL Events into the Task System

10 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
GETTING WORK TO OPENCL
 Identify the longest running image Filter functions and replace them with OpenCL
kernels
 Do not block CPU threads, use OpenCL event callbacks.
 Processing becomes Asynchronous

 Limit total work in flight to conserve memory
 Marshall data automatically

11 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
CAVEATS OF ASYNCHRONOUS OPENCL PROCESSING
 High Buffer Usage
‒ Each kernel that runs needs input, output, and possibly scratch buffers.
‒ Buffers must “stick around” until the kernels complete
‒ Multiple chains of kernels a needed to keep the GPU busy

Buffer

Buffer

Buffer

Buffer

Buffer

Buffer

Buffer
Kernel
1

Kernel
2

Kernel
3

Kernel
4

Kernel
5

Processing one 512 x 512 image requires multiple 3 MB buffers resident in device memory (VRAM)

12 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
CAVEATS OF ASYNCHRONOUS OPENCL PROCESSING – CONTINUED
 Dependencies Must Be Resolved in Advance
‒ For best performance all kernels in a chain should be enqueued together
‒ The state of all dependencies must be known before the first kernel is queued
‒ Difficult to track
‒ Compromise: only use OpenCL for Filters with simple linear dependencies

Kernel chaining and asynchronous execution provides excellent GPU utilization.

13 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
OpenCL
Challenges
LARGE RADIUS IMAGE FILTERS
 Several image processing operations require neighbor pixels. In AfterShot image Filters
are broken down into one of two categories:

Normal

Large Radius

Only requires the local Tile

Requires multiple Tiles

15 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
LARGE RADIUS IMAGE FILTERS ARE DIFFICULT
Large Radius AfterShot Filters are particularly difficult to implement in OpenCL
 Large Radius filters will “break” kernel chaining
 A extra layer of Intermediate Tiles must be resident, which will:
‒ Exhaust Device Memory, or
‒ Cause excessive bus transfers, hurting performance

And the solution is…

16 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
LARGE RADIUS FILTERS - NO

Don’t do it.
 Large Radius filters are possible but at great development cost
 Performance would ultimately depend on tricky optimizations
 Large radius filters were left to run on the CPU

17 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
AFTERSHOT OPENCL RESULTS
 Approximately 70% of image processing work was moved off of the CPU cores*
 Batch processing speed improved by 3.5x*
 Maintains 100% utilization on 8 CPU cores*
 Only a mid-level GPU is required

 Supported on Windows, Linux, and OS X

AfterShot Pro with OpenCL was a success

*measured on developer’s system

18 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
OpenCL 2.0
SVM
OPENCL 2.0 SHARED VIRTUAL MEMORY

OpenCL 2.0 introduces Shared Virtual Memory (SVM)
 Basic [Coarse Grain] SVM
‒ Host and kernels can share pointers

 Advanced [Fine Grain] SVM is available on some hardware
‒ Host and kernels can operate concurrently on the same memory

 Fine Grain System SVM
‒ Kernels can access the entire host process’ address space. Kernels can read or write malloc
buffers
‒ System SVM can greatly simplify buffer management in an OpenCL application

20 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
AfterShot
Redux
RECONSIDERING LARGE RADIUS FILTERS
 Large Radius OpenCL filters were dropped as an AfterShot feature. The reasons were
both technical and resource related
 Can System SVM make Large Radius AfterShot filters feasible? Signs point to yes
‒ No Device Memory required for Intermediate buffers
‒ Input streams from SVM, no buffer transfers
‒ Behavior more in-line with Software [non-OpenCL] filters
‒ Dependencies could be resolved just as they would for a Software filter

22 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
LOCAL CONTRAST – A LARGE RADIUS AFTERSHOT FILTER
 The next version of AfterShot Pro will contain a new Local Contrast filter.
‒ GPU accelerated on systems with OpenCL and SVM.
‒ Increases image contrast in detailed areas while leaving large constant areas unchanged
‒ The effect is achieved through a large radius Unsharp Mask (10-20% of the overall image width)

23 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
SETTING UP A KERNEL TO USE SVM MEMORY

24 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
LOADING SVM MEMORY FROM INSIDE THE KERNEL

25 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
LOCAL CONTRAST RESULTS
 System SVM simplified Local Contrast
‒ No complicated buffer management
‒ No clever optimizations were required to hide Device memory transfers
‒ Additional memory pressure is similar to a software filter

 Performance is good. The OpenCL code runs in ¼ the time of the optimized software
filter*

*measured on developer’s system

26 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
THANK YOU
Questions

27 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). OpenCL and
the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. Other names are for informational purposes only and may be trademarks of
their respective owners.
28 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013

Weitere ähnliche Inhalte

Was ist angesagt?

PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattAMD Developer Central
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...AMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...AMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosAMD Developer Central
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauAMD Developer Central
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 

Was ist angesagt? (20)

PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-Bilodeau
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 

Ähnlich wie HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton

CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...
CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...
CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...AMD Developer Central
 
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...AMD Developer Central
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools Matteo Valoriani
 
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Codemotion
 
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Codemotion
 
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldCloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldOmer Kilic
 
LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLinaro
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator Ganesan Narayanasamy
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderAMD Developer Central
 
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Embarcados
 
Relax and Recover rear-server Proposal 1.0
Relax and Recover rear-server Proposal 1.0Relax and Recover rear-server Proposal 1.0
Relax and Recover rear-server Proposal 1.0Schlomo Schapiro
 
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Intel® Software
 
MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08Kenny Gryp
 
Improving User Experience with Ubiquitous QuickBoot
 Improving User Experience with Ubiquitous QuickBoot Improving User Experience with Ubiquitous QuickBoot
Improving User Experience with Ubiquitous QuickBootICS
 
SiriusCon2016 - Modelling Spacecraft On-board Software with Sirius
SiriusCon2016 - Modelling Spacecraft On-board Software with SiriusSiriusCon2016 - Modelling Spacecraft On-board Software with Sirius
SiriusCon2016 - Modelling Spacecraft On-board Software with SiriusObeo
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 

Ähnlich wie HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton (20)

CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...
CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...
CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...
 
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools
 
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
 
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
 
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldCloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
 
LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMG
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
 
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
 
Relax and Recover rear-server Proposal 1.0
Relax and Recover rear-server Proposal 1.0Relax and Recover rear-server Proposal 1.0
Relax and Recover rear-server Proposal 1.0
 
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
 
MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08
 
Code One 2018 maven
Code One 2018   mavenCode One 2018   maven
Code One 2018 maven
 
Improving User Experience with Ubiquitous QuickBoot
 Improving User Experience with Ubiquitous QuickBoot Improving User Experience with Ubiquitous QuickBoot
Improving User Experience with Ubiquitous QuickBoot
 
SiriusCon2016 - Modelling Spacecraft On-board Software with Sirius
SiriusCon2016 - Modelling Spacecraft On-board Software with SiriusSiriusCon2016 - Modelling Spacecraft On-board Software with Sirius
SiriusCon2016 - Modelling Spacecraft On-board Software with Sirius
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 

Mehr von AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 

Mehr von AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 

Kürzlich hochgeladen

KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 

Kürzlich hochgeladen (20)

KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 

HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton

  • 1. ENHANCING OPENCL PERFORMANCE IN COREL AFTERSHOT™ PRO WITH HSA
  • 2. COREL AFTERSHOT™ PRO  What is Corel AfterShot™ Pro?  Corel AfterShot™ Pro is photo workflow software  Non-destructive photo editing of JPEG, TIFF, and Raw formats from hundreds of cameras  Photo Management  Batch Processing of modified files 2 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 4. INSIDE AFTERSHOT Architectural Features: ‒ Task Scheduling ‒ Tile Processing 4 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 5. AFTERSHOT TASK MANAGEMENT  Work is broken down into Tasks. Tasks typically: ‒ Contain execution logic (code) ‒ May store resultant data ‒ Track whether they are complete Disk Photo Thumbnail File Reader  The Task Scheduler: ‒ Allocates a worker thread per CPU core ‒ Runs Tasks based on priority ‒ Allows Tasks to block on each other JPEG Decoder Task Dependency Data A Simple Task Dependency Graph 5 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 6. PROCESSING WITH TILES  The standard simpler approach is to use large monolithic images  Images are broken down into tiles for processing  Tiling provides faster screen updates. Only compute the visible parts of the image  Tiling allows more effective memory management 6 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 7. PROCESSING WITH TILES CONTINUED  The Image Processing Pipeline is made up of several discrete steps [or filters]  To process a single tile: ‒ Load the input data (e.g. raw or jpeg data) ‒ Apply each Filter step in turn  Generally, we only need the output of the last step, the top Tile in the Stack Raw Data 7 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013 Final Image
  • 8. ADVANCED TILE PROCESSING  Some Image Filters require a radius of pixels as input  Partially processed neighbor Tiles must complete before the main Tile can continue  Intermediate Tiles must be stored in memory so they do not rerun  Example Filters: ‒ Sharpening ‒ Lens Correction ‒ Noise Reduction ‒ Cropping 8 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013 Requires multiple source tiles
  • 10. ACCELERATING AFTERSHOT WITH OPENCL™ Goals for the AfterShot Pro OpenCL port  Offload image processing from Tiles  Work within the existing System ‒ Contain changes to a few critical modules ‒ Maintain full CPU utilization ‒ Integrate OpenCL Events into the Task System 10 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 11. GETTING WORK TO OPENCL  Identify the longest running image Filter functions and replace them with OpenCL kernels  Do not block CPU threads, use OpenCL event callbacks.  Processing becomes Asynchronous  Limit total work in flight to conserve memory  Marshall data automatically 11 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 12. CAVEATS OF ASYNCHRONOUS OPENCL PROCESSING  High Buffer Usage ‒ Each kernel that runs needs input, output, and possibly scratch buffers. ‒ Buffers must “stick around” until the kernels complete ‒ Multiple chains of kernels a needed to keep the GPU busy Buffer Buffer Buffer Buffer Buffer Buffer Buffer Kernel 1 Kernel 2 Kernel 3 Kernel 4 Kernel 5 Processing one 512 x 512 image requires multiple 3 MB buffers resident in device memory (VRAM) 12 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 13. CAVEATS OF ASYNCHRONOUS OPENCL PROCESSING – CONTINUED  Dependencies Must Be Resolved in Advance ‒ For best performance all kernels in a chain should be enqueued together ‒ The state of all dependencies must be known before the first kernel is queued ‒ Difficult to track ‒ Compromise: only use OpenCL for Filters with simple linear dependencies Kernel chaining and asynchronous execution provides excellent GPU utilization. 13 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 15. LARGE RADIUS IMAGE FILTERS  Several image processing operations require neighbor pixels. In AfterShot image Filters are broken down into one of two categories: Normal Large Radius Only requires the local Tile Requires multiple Tiles 15 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 16. LARGE RADIUS IMAGE FILTERS ARE DIFFICULT Large Radius AfterShot Filters are particularly difficult to implement in OpenCL  Large Radius filters will “break” kernel chaining  A extra layer of Intermediate Tiles must be resident, which will: ‒ Exhaust Device Memory, or ‒ Cause excessive bus transfers, hurting performance And the solution is… 16 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 17. LARGE RADIUS FILTERS - NO Don’t do it.  Large Radius filters are possible but at great development cost  Performance would ultimately depend on tricky optimizations  Large radius filters were left to run on the CPU 17 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 18. AFTERSHOT OPENCL RESULTS  Approximately 70% of image processing work was moved off of the CPU cores*  Batch processing speed improved by 3.5x*  Maintains 100% utilization on 8 CPU cores*  Only a mid-level GPU is required  Supported on Windows, Linux, and OS X AfterShot Pro with OpenCL was a success *measured on developer’s system 18 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 20. OPENCL 2.0 SHARED VIRTUAL MEMORY OpenCL 2.0 introduces Shared Virtual Memory (SVM)  Basic [Coarse Grain] SVM ‒ Host and kernels can share pointers  Advanced [Fine Grain] SVM is available on some hardware ‒ Host and kernels can operate concurrently on the same memory  Fine Grain System SVM ‒ Kernels can access the entire host process’ address space. Kernels can read or write malloc buffers ‒ System SVM can greatly simplify buffer management in an OpenCL application 20 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 22. RECONSIDERING LARGE RADIUS FILTERS  Large Radius OpenCL filters were dropped as an AfterShot feature. The reasons were both technical and resource related  Can System SVM make Large Radius AfterShot filters feasible? Signs point to yes ‒ No Device Memory required for Intermediate buffers ‒ Input streams from SVM, no buffer transfers ‒ Behavior more in-line with Software [non-OpenCL] filters ‒ Dependencies could be resolved just as they would for a Software filter 22 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 23. LOCAL CONTRAST – A LARGE RADIUS AFTERSHOT FILTER  The next version of AfterShot Pro will contain a new Local Contrast filter. ‒ GPU accelerated on systems with OpenCL and SVM. ‒ Increases image contrast in detailed areas while leaving large constant areas unchanged ‒ The effect is achieved through a large radius Unsharp Mask (10-20% of the overall image width) 23 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 24. SETTING UP A KERNEL TO USE SVM MEMORY 24 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 25. LOADING SVM MEMORY FROM INSIDE THE KERNEL 25 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 26. LOCAL CONTRAST RESULTS  System SVM simplified Local Contrast ‒ No complicated buffer management ‒ No clever optimizations were required to hide Device memory transfers ‒ Additional memory pressure is similar to a software filter  Performance is good. The OpenCL code runs in ¼ the time of the optimized software filter* *measured on developer’s system 26 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 27. THANK YOU Questions 27 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
  • 28. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. Other names are for informational purposes only and may be trademarks of their respective owners. 28 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013