SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
CE-4030: OPTIMIZING PHOTO EDITING APPLICATION
FOR AMD HETEROGENEOUS SYSTEM ARCHITECTURE
CYBERLINK MARKETING MANAGER
STANLEY LAM
AGENDA

Why Photo Editing Application – PhotoDirector?
Photo Editing Pipelines (RAW processing)
How AMD HSA helps in Photo Editing?
Proof of Concept: HSA Performance Showcase
Key Takeaways
2 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
Why Photo Editing Software
– PhotoDirector?
WHY PHOTO EDITING SOFTWARE?
THE RIGHT APPLICATION FOR HSA

CyberLink Multimedia Software
‒ Media Playback: PowerDVD
‒ Video Editing: PowerDirector
‒ Photo Editing: PhotoDirector

Nikon D3S

Resolution
(M)
24

6034

4012

Nikon D4

24

6048

4032

Nikon D70S

24

6034

4028

Nikon D800E

36

7378

4924

Model

Width Height

7360

4912

5616

3744

21

5616

3744

Canon Eos 600D

‒ Many editing tasks can be parallelize
‒ Processing / Decoding RAW files is time consuming
‒ RAW image editing can be both computational & memory
intensive

36
21

Canon Eos 5D Mark Iii

Why Photo Editing Software?

Nikon D90
Canon Eos 20D

22

5760

3840

Canon Eos 7D

20

5472

3648
3648

Samsung Nx11

20

5472

Samsung Dslr-A700

20

5472

3648

Sony Slt-A77V

24

6000

4000

Sony Dslr-A850

24

6000

4000

Sony Dslr-A900

24

6048

4032

Sony Nex-5N

24

6048

4032

Sony Dsc-Rx100

24

6000

4000

Sony Dsc-Rx1

How AMD HSA helps in Photo Editing?
‒ Utilize GPU compute units to speed up performance
‒ Eliminate overheads and memory copy bottlenecks between
HOST and DEVICE memories

20

5472

3648

Sony Dsc-F828

24

6000

4000

Pentax K-5 Ii

40

7264

5440

Phase One P 20

22

4096

5456

Phase One P 30

22

4096

5456

Phase One P40+

32

6526

4904

Phase One P 45+

39

7246

5444

4 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

Phase One P65+

39

7246

5444

Phase One Dslr-A100

60

8984

6732

MEM Space

193,667,264
195,084,288
194,439,616
290,634,176
289,218,560
168,210,432
168,210,432
176,947,200
159,694,848
159,694,848
159,694,848
192,000,000
192,000,000
195,084,288
195,084,288
192,000,000
159,694,848
192,000,000
316,129,280
178,782,208
178,782,208
256,028,032
315,577,792
315,577,792
483,842,304
Photo Editing Pipeline
PHOTO EDITING PIPELINE
RAW PROCESSING
Photo Retouch
(Preview Size)
RAW Decoder
Photo Retouch

RAW Decoder
RAW Decoder
IMG_0077.CR2
IMG_0077.CR2

RAW
Decoder

JPEG Encoder
JPEG Encoder
NEW.JPG NEW.JPG

Photo Retouch
(Full Scale Size)

KEY Area for potential performance improvement

Camera Model

RAW Decode time
(single photo)

Canon 1D-X

7.347 seconds

Canon 1Ds MK3

8.400 seconds

Panasonic DMC FZ100

7.916 seconds

Test Tool

Phase One P25

10.475 seconds

PhotoDirector 5

Phase One P30

12.495 seconds

Phase One P45

13.049 seconds

Samsung NX10

6.263 seconds

Samsung NX100

5.280 seconds

Sony A700

5.522 seconds

Sony F828

6.996 seconds

‒ RAW Decoder
‒ Decoder elapse time is long for complex RAW formats

RAW Decode is necessary during all stages in the editing
pipeline
‒
‒
‒
‒

When generating FULL SCALE preview
When entering Retouch module for the first time
When resuming from previous editing
When exporting to JPG/TIFF files

6 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

Test Platform
CPU: AMD A10-4655M
RAM: 4GB
OS: Windows 7 32-bit
PHOTO EDITING PIPELINE
OPENCL AND MEMORY MANAGEMENT

RAW Decoder
(GPU)

Photo Retouch
(CPU & GPU)

RAW Decoder
(GPU)

JPEG Encoder
(CPU)

IMG_0077.CR2

NEW.JPG

Frame Buffer

Frame Buffer

Frame Buffer

UN-MAP
MAP

HOST Memory

UN-MAP

MAP

DEVICE Memory
Frame Buffer

Frame Buffer

Performance can be improved by utilizing GPU compute
power (OpenCL 1.x)
‒ Improve RAW decode performance
‒ Improve EDITING (Retouch) performance
‒ OpenCL 1.x is great, however…
7 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

Frame Buffer
MEMORY SPACE AND PERFORMANCE
RELATIVE KERNEL VS. BUFFER PERFORMANCE ANALYSIS

OpenCL 1.x can speed up performance substantially and
yet creates new challenges
‒ Buffering between HOST and DEVICE creates overheads
‒ Sometimes the overheads are taking up a large portion of
execution time

‒ DEVICE memory space is limited
‒ 512MB can only hold one 36MP photo, or two 24MP photos
‒ Creates more read and writes between HOST and DEVICE
memories
512MB
Frame Buffer

Tiling
36MP

More Reads

More Writes

8 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
How AMD HSA helps in
Photo Editing?
OPTIMIZING PERFORMANCE WITH AMD HSA
THE ADVANTAGE OF ADOPTING HSA WITH OPENCL

RAW Decoder

Photo Retouch

RAW Decoder

JPEG Encoder

IMG_0077.CR2

NEW.JPG

HOST Memory
Frame Buffer

Frame Buffer

Frame Buffer

DEVICE Memory
Using AMD HSA to improve performance over OpenCL 1.x
‒ Share virtual memory breaks border of CPU and GPU
‒ Reduce overheads of moving data
‒ Use AMD APU platform to achieve true Heterogeneous Computing

10 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
3 LEVELS OF SHARED VIRTUAL MEMORY
CHOOSING SHARED VIRTUAL MEMORY

3 Levels of Shared Virtual Memory support (can be configured during initialization)
‒ Coarse Grain Buffer
‒ Ability to share virtual pointers between HOST and DEVICE

‒ Fine Grain Buffer
‒ Ability to share buffer space between HOST and DEVICE

‒ Fine Grain System Buffer
‒ Ability to allow DEVICE to access entire HOST address space
‒ **Eliminates the need to specify explicit SVM pointers

Coding Complexity
‒ Complexity: Coarse Grain > Fine Grain > Fine Grain System

11 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
COARSE GRAIN SHARED BUFFER
OPENCL BUFFER VS. HSA BUFFER

PhotoDirector’s existing code base does not contain excessive pointers, we are able to choose the buffer
type that gives the best performance

Standard OCL Buffers

HSA Coarse Grain Buffers

DEVICE

Buffer 1

Buffer 2

Buffer 2
Buffer 1

12 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

HOST

DEVICE

Buffer 1

Buffer 1

Buffer 2

HOST

Buffer 2
Proof of concept:
HSA Performance Showcase
AMD HSA BUFFER TYPES
RELATIVE PERFORMANCE COMPARISON
Performance Index of Applying Hue Change to RAW Photo

Our proof of concept codes showed
potential performance difference
‒ Good potential performance when using
Coarse Grain Buffers
‒ Results show roughly 2x difference between
Coarse Grain vs. Fine Grain implementation

Test Tool
PhotoDirector 5 Testbed

Test Platform
CPU: AMD KAVERI
RAM: 4GB
OS: Windows 7 64-bit

14 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

Coarse
Grain

Fine
Grain
Key Takeaways
KEY TAKEAWAY
AMD HSA SHOWS GREAT POTENTIAL

AMD HSA shows great potential for
photo editing application
– CyberLink PhotoDirector
‒ Many more photo editing tasks can
leverage the performance advantage on
AMD HSA Platforms
‒ It’s important to experiment and work
with the most suitable HSA buffer type
‒ Potential performance improvements for
Parallelizable and Memory intensive
applications

16 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names
are for informational purposes only and may be trademarks of their respective owners.

17 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

Weitere ähnliche Inhalte

Was ist angesagt?

PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
AMD Developer Central
 

Was ist angesagt? (20)

GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
 
GS-4139, RapidFire for Cloud Gaming, by Dmitry Kozlov
GS-4139, RapidFire for Cloud Gaming, by Dmitry KozlovGS-4139, RapidFire for Cloud Gaming, by Dmitry Kozlov
GS-4139, RapidFire for Cloud Gaming, by Dmitry Kozlov
 
CE-4114, Screen Mirror, a unified screen mirroring solution that utilizes AMD...
CE-4114, Screen Mirror, a unified screen mirroring solution that utilizes AMD...CE-4114, Screen Mirror, a unified screen mirroring solution that utilizes AMD...
CE-4114, Screen Mirror, a unified screen mirroring solution that utilizes AMD...
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
GS-4151, Developing Thief with new AMD technology, by Jurjen Katsman
GS-4151, Developing Thief with new AMD technology, by Jurjen KatsmanGS-4151, Developing Thief with new AMD technology, by Jurjen Katsman
GS-4151, Developing Thief with new AMD technology, by Jurjen Katsman
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
 
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
 
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelIS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 

Ähnlich wie CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory Access
AMD
 
dassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guidedassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guide
Jason Kyungho Lee
 
AMD 2014 Mobility APU Lineup Announcement
AMD 2014 Mobility APU Lineup AnnouncementAMD 2014 Mobility APU Lineup Announcement
AMD 2014 Mobility APU Lineup Announcement
AMD
 

Ähnlich wie CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam (20)

Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...
CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...
CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
Building Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery NetworksBuilding Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery Networks
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat Presentation
 
Media and entertainment workload comparison: HP Z8 vs. Apple Mac Pro
Media and entertainment workload comparison: HP Z8 vs. Apple Mac ProMedia and entertainment workload comparison: HP Z8 vs. Apple Mac Pro
Media and entertainment workload comparison: HP Z8 vs. Apple Mac Pro
 
AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory Access
 
dassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guidedassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guide
 
System z Technology Summit Streamlining Utilities
System z Technology Summit Streamlining UtilitiesSystem z Technology Summit Streamlining Utilities
System z Technology Summit Streamlining Utilities
 
5 Things You Need to Know About Enterprise Fl
 5 Things You Need to Know About Enterprise Fl 5 Things You Need to Know About Enterprise Fl
5 Things You Need to Know About Enterprise Fl
 
AMD 2014 Mobility APU Lineup Announcement
AMD 2014 Mobility APU Lineup AnnouncementAMD 2014 Mobility APU Lineup Announcement
AMD 2014 Mobility APU Lineup Announcement
 
OpenEye Optix Network Cameras
OpenEye Optix Network CamerasOpenEye Optix Network Cameras
OpenEye Optix Network Cameras
 
Well Behaved Mobile Apps on AIR - Performance Related
Well Behaved Mobile Apps on AIR - Performance RelatedWell Behaved Mobile Apps on AIR - Performance Related
Well Behaved Mobile Apps on AIR - Performance Related
 
All about Azure workshop deck
All about Azure workshop deckAll about Azure workshop deck
All about Azure workshop deck
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
 
Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020
Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020
Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
AMD AND OPEN COMPUTE
AMD AND OPEN COMPUTEAMD AND OPEN COMPUTE
AMD AND OPEN COMPUTE
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 

Mehr von AMD Developer Central

Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
AMD Developer Central
 

Mehr von AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

  • 1. CE-4030: OPTIMIZING PHOTO EDITING APPLICATION FOR AMD HETEROGENEOUS SYSTEM ARCHITECTURE CYBERLINK MARKETING MANAGER STANLEY LAM
  • 2. AGENDA Why Photo Editing Application – PhotoDirector? Photo Editing Pipelines (RAW processing) How AMD HSA helps in Photo Editing? Proof of Concept: HSA Performance Showcase Key Takeaways 2 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 3. Why Photo Editing Software – PhotoDirector?
  • 4. WHY PHOTO EDITING SOFTWARE? THE RIGHT APPLICATION FOR HSA CyberLink Multimedia Software ‒ Media Playback: PowerDVD ‒ Video Editing: PowerDirector ‒ Photo Editing: PhotoDirector Nikon D3S Resolution (M) 24 6034 4012 Nikon D4 24 6048 4032 Nikon D70S 24 6034 4028 Nikon D800E 36 7378 4924 Model Width Height 7360 4912 5616 3744 21 5616 3744 Canon Eos 600D ‒ Many editing tasks can be parallelize ‒ Processing / Decoding RAW files is time consuming ‒ RAW image editing can be both computational & memory intensive 36 21 Canon Eos 5D Mark Iii Why Photo Editing Software? Nikon D90 Canon Eos 20D 22 5760 3840 Canon Eos 7D 20 5472 3648 3648 Samsung Nx11 20 5472 Samsung Dslr-A700 20 5472 3648 Sony Slt-A77V 24 6000 4000 Sony Dslr-A850 24 6000 4000 Sony Dslr-A900 24 6048 4032 Sony Nex-5N 24 6048 4032 Sony Dsc-Rx100 24 6000 4000 Sony Dsc-Rx1 How AMD HSA helps in Photo Editing? ‒ Utilize GPU compute units to speed up performance ‒ Eliminate overheads and memory copy bottlenecks between HOST and DEVICE memories 20 5472 3648 Sony Dsc-F828 24 6000 4000 Pentax K-5 Ii 40 7264 5440 Phase One P 20 22 4096 5456 Phase One P 30 22 4096 5456 Phase One P40+ 32 6526 4904 Phase One P 45+ 39 7246 5444 4 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL Phase One P65+ 39 7246 5444 Phase One Dslr-A100 60 8984 6732 MEM Space 193,667,264 195,084,288 194,439,616 290,634,176 289,218,560 168,210,432 168,210,432 176,947,200 159,694,848 159,694,848 159,694,848 192,000,000 192,000,000 195,084,288 195,084,288 192,000,000 159,694,848 192,000,000 316,129,280 178,782,208 178,782,208 256,028,032 315,577,792 315,577,792 483,842,304
  • 6. PHOTO EDITING PIPELINE RAW PROCESSING Photo Retouch (Preview Size) RAW Decoder Photo Retouch RAW Decoder RAW Decoder IMG_0077.CR2 IMG_0077.CR2 RAW Decoder JPEG Encoder JPEG Encoder NEW.JPG NEW.JPG Photo Retouch (Full Scale Size) KEY Area for potential performance improvement Camera Model RAW Decode time (single photo) Canon 1D-X 7.347 seconds Canon 1Ds MK3 8.400 seconds Panasonic DMC FZ100 7.916 seconds Test Tool Phase One P25 10.475 seconds PhotoDirector 5 Phase One P30 12.495 seconds Phase One P45 13.049 seconds Samsung NX10 6.263 seconds Samsung NX100 5.280 seconds Sony A700 5.522 seconds Sony F828 6.996 seconds ‒ RAW Decoder ‒ Decoder elapse time is long for complex RAW formats RAW Decode is necessary during all stages in the editing pipeline ‒ ‒ ‒ ‒ When generating FULL SCALE preview When entering Retouch module for the first time When resuming from previous editing When exporting to JPG/TIFF files 6 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL Test Platform CPU: AMD A10-4655M RAM: 4GB OS: Windows 7 32-bit
  • 7. PHOTO EDITING PIPELINE OPENCL AND MEMORY MANAGEMENT RAW Decoder (GPU) Photo Retouch (CPU & GPU) RAW Decoder (GPU) JPEG Encoder (CPU) IMG_0077.CR2 NEW.JPG Frame Buffer Frame Buffer Frame Buffer UN-MAP MAP HOST Memory UN-MAP MAP DEVICE Memory Frame Buffer Frame Buffer Performance can be improved by utilizing GPU compute power (OpenCL 1.x) ‒ Improve RAW decode performance ‒ Improve EDITING (Retouch) performance ‒ OpenCL 1.x is great, however… 7 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL Frame Buffer
  • 8. MEMORY SPACE AND PERFORMANCE RELATIVE KERNEL VS. BUFFER PERFORMANCE ANALYSIS OpenCL 1.x can speed up performance substantially and yet creates new challenges ‒ Buffering between HOST and DEVICE creates overheads ‒ Sometimes the overheads are taking up a large portion of execution time ‒ DEVICE memory space is limited ‒ 512MB can only hold one 36MP photo, or two 24MP photos ‒ Creates more read and writes between HOST and DEVICE memories 512MB Frame Buffer Tiling 36MP More Reads More Writes 8 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 9. How AMD HSA helps in Photo Editing?
  • 10. OPTIMIZING PERFORMANCE WITH AMD HSA THE ADVANTAGE OF ADOPTING HSA WITH OPENCL RAW Decoder Photo Retouch RAW Decoder JPEG Encoder IMG_0077.CR2 NEW.JPG HOST Memory Frame Buffer Frame Buffer Frame Buffer DEVICE Memory Using AMD HSA to improve performance over OpenCL 1.x ‒ Share virtual memory breaks border of CPU and GPU ‒ Reduce overheads of moving data ‒ Use AMD APU platform to achieve true Heterogeneous Computing 10 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 11. 3 LEVELS OF SHARED VIRTUAL MEMORY CHOOSING SHARED VIRTUAL MEMORY 3 Levels of Shared Virtual Memory support (can be configured during initialization) ‒ Coarse Grain Buffer ‒ Ability to share virtual pointers between HOST and DEVICE ‒ Fine Grain Buffer ‒ Ability to share buffer space between HOST and DEVICE ‒ Fine Grain System Buffer ‒ Ability to allow DEVICE to access entire HOST address space ‒ **Eliminates the need to specify explicit SVM pointers Coding Complexity ‒ Complexity: Coarse Grain > Fine Grain > Fine Grain System 11 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 12. COARSE GRAIN SHARED BUFFER OPENCL BUFFER VS. HSA BUFFER PhotoDirector’s existing code base does not contain excessive pointers, we are able to choose the buffer type that gives the best performance Standard OCL Buffers HSA Coarse Grain Buffers DEVICE Buffer 1 Buffer 2 Buffer 2 Buffer 1 12 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL HOST DEVICE Buffer 1 Buffer 1 Buffer 2 HOST Buffer 2
  • 13. Proof of concept: HSA Performance Showcase
  • 14. AMD HSA BUFFER TYPES RELATIVE PERFORMANCE COMPARISON Performance Index of Applying Hue Change to RAW Photo Our proof of concept codes showed potential performance difference ‒ Good potential performance when using Coarse Grain Buffers ‒ Results show roughly 2x difference between Coarse Grain vs. Fine Grain implementation Test Tool PhotoDirector 5 Testbed Test Platform CPU: AMD KAVERI RAM: 4GB OS: Windows 7 64-bit 14 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL Coarse Grain Fine Grain
  • 16. KEY TAKEAWAY AMD HSA SHOWS GREAT POTENTIAL AMD HSA shows great potential for photo editing application – CyberLink PhotoDirector ‒ Many more photo editing tasks can leverage the performance advantage on AMD HSA Platforms ‒ It’s important to experiment and work with the most suitable HSA buffer type ‒ Potential performance improvements for Parallelizable and Memory intensive applications 16 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 17. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 17 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL