6. PHOTO EDITING PIPELINE
RAW PROCESSING
Photo Retouch
(Preview Size)
RAW Decoder
Photo Retouch
RAW Decoder
RAW Decoder
IMG_0077.CR2
IMG_0077.CR2
RAW
Decoder
JPEG Encoder
JPEG Encoder
NEW.JPG NEW.JPG
Photo Retouch
(Full Scale Size)
KEY Area for potential performance improvement
Camera Model
RAW Decode time
(single photo)
Canon 1D-X
7.347 seconds
Canon 1Ds MK3
8.400 seconds
Panasonic DMC FZ100
7.916 seconds
Test Tool
Phase One P25
10.475 seconds
PhotoDirector 5
Phase One P30
12.495 seconds
Phase One P45
13.049 seconds
Samsung NX10
6.263 seconds
Samsung NX100
5.280 seconds
Sony A700
5.522 seconds
Sony F828
6.996 seconds
‒ RAW Decoder
‒ Decoder elapse time is long for complex RAW formats
RAW Decode is necessary during all stages in the editing
pipeline
‒
‒
‒
‒
When generating FULL SCALE preview
When entering Retouch module for the first time
When resuming from previous editing
When exporting to JPG/TIFF files
6 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
Test Platform
CPU: AMD A10-4655M
RAM: 4GB
OS: Windows 7 32-bit
7. PHOTO EDITING PIPELINE
OPENCL AND MEMORY MANAGEMENT
RAW Decoder
(GPU)
Photo Retouch
(CPU & GPU)
RAW Decoder
(GPU)
JPEG Encoder
(CPU)
IMG_0077.CR2
NEW.JPG
Frame Buffer
Frame Buffer
Frame Buffer
UN-MAP
MAP
HOST Memory
UN-MAP
MAP
DEVICE Memory
Frame Buffer
Frame Buffer
Performance can be improved by utilizing GPU compute
power (OpenCL 1.x)
‒ Improve RAW decode performance
‒ Improve EDITING (Retouch) performance
‒ OpenCL 1.x is great, however…
7 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
Frame Buffer
8. MEMORY SPACE AND PERFORMANCE
RELATIVE KERNEL VS. BUFFER PERFORMANCE ANALYSIS
OpenCL 1.x can speed up performance substantially and
yet creates new challenges
‒ Buffering between HOST and DEVICE creates overheads
‒ Sometimes the overheads are taking up a large portion of
execution time
‒ DEVICE memory space is limited
‒ 512MB can only hold one 36MP photo, or two 24MP photos
‒ Creates more read and writes between HOST and DEVICE
memories
512MB
Frame Buffer
Tiling
36MP
More Reads
More Writes
8 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
10. OPTIMIZING PERFORMANCE WITH AMD HSA
THE ADVANTAGE OF ADOPTING HSA WITH OPENCL
RAW Decoder
Photo Retouch
RAW Decoder
JPEG Encoder
IMG_0077.CR2
NEW.JPG
HOST Memory
Frame Buffer
Frame Buffer
Frame Buffer
DEVICE Memory
Using AMD HSA to improve performance over OpenCL 1.x
‒ Share virtual memory breaks border of CPU and GPU
‒ Reduce overheads of moving data
‒ Use AMD APU platform to achieve true Heterogeneous Computing
10 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
11. 3 LEVELS OF SHARED VIRTUAL MEMORY
CHOOSING SHARED VIRTUAL MEMORY
3 Levels of Shared Virtual Memory support (can be configured during initialization)
‒ Coarse Grain Buffer
‒ Ability to share virtual pointers between HOST and DEVICE
‒ Fine Grain Buffer
‒ Ability to share buffer space between HOST and DEVICE
‒ Fine Grain System Buffer
‒ Ability to allow DEVICE to access entire HOST address space
‒ **Eliminates the need to specify explicit SVM pointers
Coding Complexity
‒ Complexity: Coarse Grain > Fine Grain > Fine Grain System
11 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
12. COARSE GRAIN SHARED BUFFER
OPENCL BUFFER VS. HSA BUFFER
PhotoDirector’s existing code base does not contain excessive pointers, we are able to choose the buffer
type that gives the best performance
Standard OCL Buffers
HSA Coarse Grain Buffers
DEVICE
Buffer 1
Buffer 2
Buffer 2
Buffer 1
12 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
HOST
DEVICE
Buffer 1
Buffer 1
Buffer 2
HOST
Buffer 2
14. AMD HSA BUFFER TYPES
RELATIVE PERFORMANCE COMPARISON
Performance Index of Applying Hue Change to RAW Photo
Our proof of concept codes showed
potential performance difference
‒ Good potential performance when using
Coarse Grain Buffers
‒ Results show roughly 2x difference between
Coarse Grain vs. Fine Grain implementation
Test Tool
PhotoDirector 5 Testbed
Test Platform
CPU: AMD KAVERI
RAM: 4GB
OS: Windows 7 64-bit
14 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
Coarse
Grain
Fine
Grain
16. KEY TAKEAWAY
AMD HSA SHOWS GREAT POTENTIAL
AMD HSA shows great potential for
photo editing application
– CyberLink PhotoDirector
‒ Many more photo editing tasks can
leverage the performance advantage on
AMD HSA Platforms
‒ It’s important to experiment and work
with the most suitable HSA buffer type
‒ Potential performance improvements for
Parallelizable and Memory intensive
applications
16 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL