SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
ADVANCED	
  OPENCL™	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL
	
  
BUDI	
  PURNOMO
	
  
URI	
  SHOMRONI
	
  
	
  GNANABASKARAN	
  MUTHUMANI
	
  
ABOUT	
  OpenCL™	
  
OpenCL™	
  is	
  FUN!	
  
!  Parallel	
  compute	
  programming	
  language	
  
!  Exposes	
  the	
  massively	
  mulPthreaded	
  GPU	
  
!  A	
  lot	
  of	
  horsepower,	
  opPmized	
  for	
  parallel	
  compuPng	
  
!  Order-­‐of-­‐magnitude	
  performance	
  improvement!	
  

2	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
OpenCL™	
  DEBUGGING	
  AND	
  PROFILING	
  CHALLENGES	
  
However,	
  
!  Debugging	
  and	
  profiling	
  parallel	
  processing	
  applicaPons	
  is	
  hard	
  	
  
!  On-­‐Pme	
  delivery	
  of	
  robust	
  (bug-­‐free)	
  OpenCL™	
  applicaPons	
  is	
  challenging	
  
!  It	
  is	
  almost	
  impossible	
  to	
  opPmize	
  an	
  OpenCL™-­‐based	
  applicaPon	
  to	
  fully	
  uPlize	
  the	
  available	
  parallel	
  
processing	
  system	
  resources	
  

3	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
OpenCL™	
  DEBUGGING	
  AND	
  PROFILING	
  CHALLENGES	
  
OpenCL™	
  is	
  a	
  “Black	
  Box”	
  
!  The	
  applicaPon	
  enqueues	
  OpenCL™	
  commands	
  
! 

The	
  OpenCL™	
  runPme	
  executes	
  the	
  commands	
  

ApplicaPon	
  

!  Using	
  a	
  host	
  profiler	
  and	
  debugger,	
  the	
  developer	
  cannot	
  
‒  Debug	
  and	
  profile	
  the	
  OpenCL™	
  kernels	
  	
  
‒  See	
  the	
  execuPon	
  details	
  
‒  View	
  runPme	
  loads	
  

OpenCL™	
  

4	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
AMD	
  CodeXL	
  
!  APU	
  and	
  GPU	
  Debugging	
  FuncPonality	
  
‒  OpenCL™	
  and	
  OpenGL	
  API-­‐Level	
  
‒  OpenCL™	
  Kernel	
  Source	
  Code	
  

!  APU,	
  CPU	
  and	
  GPU	
  Profiling	
  
!  OpenCL™	
  StaPc	
  Kernel	
  Analysis	
  
!  Provides	
  the	
  informaPon	
  a	
  developer	
  needs	
  to	
  
help	
  find	
  bugs	
  and	
  opPmize	
  the	
  applicaPon’s	
  
performance	
  
!  Integrated	
  into	
  Microsoa®	
  Visual	
  Studio®	
  
!  Standalone	
  applicaPon	
  for	
  Windows®	
  and	
  Linux®	
  

5	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU
	
  
Debugging
	
  
GPU	
  DEBUGGING	
  WITH	
  AMD	
  CodeXL	
  |	
  DEMO	
  
AMDTTEAPOT	
  

!  Sample	
  provided	
  with	
  CodeXL	
  tools	
  suite	
  
‒  API-­‐Level	
  debugging	
  
‒  PinpoinPng	
  OpenCL	
  ™	
  Errors	
  

‒  Entering	
  Kernel	
  debugging	
  
‒  Locals	
  and	
  Watch	
  views	
  
‒  Kernel	
  Source	
  breakpoints	
  

‒  Finding	
  problemaPc	
  work	
  items	
  
‒  OpenCL	
  ™	
  Kernel	
  MulPwatch	
  view	
  

7	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  DEBUGGING	
  WITH	
  AMD	
  CodeXL	
  |	
  VIEWS	
  
API	
  CALLS	
  HISTORY	
  VIEW	
  

!  Displays	
  OpenCL	
  ™	
  and	
  OpenGL	
  API	
  calls	
  
‒  Supports	
  funcPon	
  calls	
  from	
  OpenCL™	
  up	
  to	
  version	
  1.2	
  and	
  OpenGL	
  up	
  to	
  version	
  4.3	
  
‒  FuncPon	
  parameters	
  
‒  Object	
  links	
  in	
  properPes	
  
‒  API	
  calls	
  are	
  divided	
  per	
  Compute	
  /	
  Render	
  context.	
  
‒  Calls	
  history	
  recording	
  to	
  an	
  HTML	
  log	
  file	
  

8	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  DEBUGGING	
  WITH	
  AMD	
  CodeXL	
  |	
  VIEWS	
  
CODEXL	
  EXPLORER	
  

!  Displays	
  OpenCL™	
  and	
  OpenGL	
  allocated	
  objects	
  calls	
  
‒  Object	
  Hierarchy	
  and	
  counts	
  
‒  Object	
  properPes	
  
‒  For	
  objects	
  with	
  data	
  /	
  sources	
  -­‐	
  double	
  click	
  to	
  open	
  a	
  main	
  view	
  
‒  Display	
  detected	
  memory	
  leaks	
  if	
  "Break	
  on	
  Memory	
  Leaks"	
  is	
  selected.	
  

9	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  DEBUGGING	
  WITH	
  AMD	
  CodeXL	
  |	
  VIEWS	
  
SOURCE	
  AND	
  CALL	
  STACK	
  VIEWS	
  

!  Displays	
  host	
  code,	
  OpenCL™	
  kernel	
  source,	
  and	
  OpenGL	
  shader	
  source	
  
‒  Set	
  source-­‐level	
  breakpoints	
  in	
  OpenCL™	
  kernels	
  
‒  Display	
  host	
  thread	
  and	
  OpenCL™	
  kernel	
  wavefront	
  call	
  stacks	
  
‒  Visual	
  Studio®	
  integraPon	
  

10	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  DEBUGGING	
  WITH	
  AMD	
  CodeXL	
  |	
  VIEWS	
  
OBJECT	
  VIEWS	
  

!  Displays	
  image,	
  buffer	
  and	
  texture	
  data	
  
‒  Image	
  view	
  for	
  OpenCL™	
  images	
  and	
  OpenGL	
  textures	
  and	
  render	
  buffers	
  
‒  3D	
  image	
  support	
  with	
  layer	
  selecPon	
  slider	
  
‒  Non-­‐RGB	
  images	
  mapped	
  to	
  grayscale	
  range,	
  with	
  selecPon	
  of	
  minimum	
  and	
  maximum	
  values	
  clearly	
  displaying	
  out-­‐of-­‐range	
  
values	
  

‒  Data	
  view	
  for	
  all	
  objects	
  
‒  Channel	
  order	
  /	
  type	
  selecPon	
  for	
  buffer	
  data	
  
‒  ConnecPon	
  to	
  image	
  view	
  for	
  objects	
  that	
  support	
  it	
  

11	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  DEBUGGING	
  WITH	
  AMD	
  CodeXL	
  |	
  VIEWS	
  
LOCALS	
  AND	
  WATCH	
  VIEWS	
  

!  Display	
  OpenCL™	
  kernel	
  variables	
  
‒  Structure	
  and	
  vector	
  types	
  support	
  
‒  Global	
  and	
  Private	
  memory	
  array	
  dereferencing	
  
‒  Local	
  and	
  Constant	
  memory	
  support	
  planned	
  for	
  future	
  releases	
  

‒  Visual	
  Studio®	
  integraPon	
  

12	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  DEBUGGING	
  WITH	
  AMD	
  CodeXL	
  |	
  VIEWS	
  
MULTIWATCH	
  VIEWS	
  

!  Display	
  a	
  single	
  OpenCL™	
  kernel	
  variable	
  value	
  across	
  the	
  current	
  work	
  items	
  
‒  Image	
  and	
  Data	
  visualizaPon	
  
‒  Range	
  slider,	
  like	
  Object	
  image	
  view	
  
‒  Current	
  work	
  item	
  is	
  highlighted	
  and	
  can	
  be	
  changed	
  by	
  double-­‐clicking	
  the	
  data	
  view.	
  

13	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  DEBUGGING	
  WITH	
  AMD	
  CodeXL	
  |	
  FEATURES	
  
NEW	
  IN	
  CODEXL	
  1.3	
  

!  Remote	
  debugging	
  
‒  Debug	
  capabiliPes	
  on	
  a	
  remote	
  machine	
  
‒  API-­‐level	
  debugging	
  
‒  Kernel	
  debugging	
  

‒  Requires	
  a	
  CodeXL	
  agent	
  running	
  on	
  the	
  target	
  machine	
  
‒  The	
  agent	
  is	
  included	
  as	
  an	
  opPon	
  in	
  the	
  CodeXL	
  installer	
  
‒  Same	
  agent	
  for	
  remote	
  GPU	
  debugging	
  	
  and	
  remote	
  GPU	
  profiling	
  

‒  Currently	
  only	
  supports	
  Windows-­‐to-­‐Windows	
  and	
  Linux-­‐to-­‐Linux	
  debugging	
  

!  OpenCL™	
  API	
  support	
  increased	
  up	
  to	
  OpenCL™	
  1.2	
  
‒  New	
  API	
  funcPons	
  
‒  New	
  deprecated	
  funcPons	
  and	
  behaviors	
  

!  OpenGL	
  API	
  support	
  increased	
  up	
  to	
  OpenGL	
  4.3	
  
‒  New	
  API	
  funcPons	
  and	
  tokens	
  
‒  New	
  shader	
  types	
  

14	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  DEBUGGING	
  WITH	
  AMD	
  CodeXL	
  |	
  FEATURES	
  
UPCOMING	
  RELEASES	
  

!  Hardware-­‐based	
  kernel	
  debugging	
  
‒  Current	
  implementaPon	
  retrieves	
  hardware	
  values	
  but	
  performs	
  kernel	
  playback	
  for	
  breakpoint	
  implementaPon	
  
‒  Display	
  data	
  for	
  the	
  enPre	
  grid	
  
‒  OpPmized	
  for	
  small-­‐	
  and	
  medium-­‐sized	
  kernels	
  
‒  Does	
  not	
  support	
  debugging	
  kernels	
  that	
  can't	
  be	
  replayed	
  consistently	
  (such	
  as	
  kernels	
  using	
  atomics)	
  

‒  New	
  implementaPon	
  will	
  use	
  hardware	
  breakpoints	
  
‒  Display	
  data	
  according	
  to	
  the	
  wavefronts	
  executed	
  in	
  the	
  actual	
  hardware	
  
‒  Faster	
  for	
  large	
  kernels	
  
‒  Stop	
  and	
  resume	
  wavefront	
  execuPon	
  
‒  Can	
  break	
  a	
  running	
  kernel	
  
‒  Can	
  support	
  debugging	
  persistent	
  kernels	
  (aoach	
  to	
  kernel)	
  
‒  Will	
  allow	
  data	
  breakpoints	
  

‒  Working	
  development	
  build	
  in	
  the	
  demo	
  area!	
  

15	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  Profiling
	
  
GPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  
KEY	
  FEATURES	
  

!  Analyze	
  and	
  profile	
  OpenCL™	
  host	
  and	
  device	
  code	
  
‒  Collect	
  applicaPon	
  trace	
  mode	
  
‒  Collect	
  GPU	
  performance	
  counter	
  mode	
  

!  Views:	
  
‒  API	
  trace:	
  View	
  API	
  calls	
  with	
  inputs	
  and	
  outputs	
  
‒  Timeline	
  visualizaPon:	
  View	
  host	
  and	
  device	
  synch	
  issue	
  
‒  Summary	
  pages:	
  Find	
  top	
  booleneck	
  
‒  Warnings/Errors:	
  View	
  performance	
  suggesPons	
  
‒  Kernel	
  occupancy:	
  Find	
  kernel	
  resource	
  booleneck	
  
‒  Performance	
  counter:	
  View	
  kernel	
  perf	
  booleneck	
  

!  Does	
  not	
  require	
  source	
  or	
  project	
  modificaPons	
  to	
  
the	
  applicaPon	
  
!  Does	
  not	
  even	
  require	
  the	
  applicaPon	
  source	
  code	
  

17	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  |	
  Views	
  
API	
  TRACE	
  

!  Analyze	
  and	
  profile	
  OpenCL™	
  applicaPons	
  
‒  View	
  API	
  input	
  arguments	
  and	
  output	
  results	
  
‒  Find	
  API	
  hotspots	
  
‒  Determine	
  top	
  ten	
  data	
  transfer	
  and	
  kernel	
  execuPon	
  operaPons	
  
‒  IdenPfy	
  failed	
  API	
  calls,	
  resource	
  leaks	
  and	
  best	
  pracPces	
  

18	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  |	
  Views	
  
TIMELINE	
  VISUALIZATION	
  

!  Visualize	
  host	
  and	
  device	
  execuPon	
  in	
  a	
  Pmeline	
  chart	
  
‒  View	
  number	
  of	
  OpenCL™	
  contexts	
  and	
  command	
  queues	
  created	
  and	
  the	
  relaPonships	
  between	
  these	
  items	
  
‒  View	
  data	
  transfer	
  operaPons	
  and	
  kernel	
  execuPons	
  on	
  the	
  device	
  
‒  Determine	
  proper	
  synchronizaPon	
  and	
  load	
  balancing	
  	
  

19	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  |	
  Views	
  
SUMMARY	
  PAGES	
  

!  Find	
  top	
  boolenecks	
  
‒  I/O	
  bound	
  
‒  Compute	
  bound	
  

20	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  |	
  Views	
  
WARNING	
  AND	
  ERROR	
  MESSAGES	
  

!  Provide	
  performance	
  improvement	
  suggesPons	
  
!  Detect	
  errors	
  in	
  an	
  OpenCL™	
  applicaPon	
  

21	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  |	
  Views	
  
PERFORMANCE	
  COUNTER	
  

!  Analyze	
  the	
  OpenCL™	
  kernel	
  execuPon	
  for	
  AMD	
  APUs	
  and	
  GPUs	
  
‒  Collect	
  GPU	
  Performance	
  Counters	
  
‒  The	
  number	
  of	
  ALU,	
  global	
  and	
  local	
  memory	
  instrucPons	
  executed	
  
‒  GPU	
  uPlizaPon	
  and	
  memory	
  access	
  characterisPcs	
  

‒  Show	
  the	
  kernel	
  resource	
  usages	
  
‒  View	
  the	
  AMD	
  intermediate	
  language	
  (AMD	
  IL)	
  and	
  hardware	
  disassembly	
  (ISA)	
  

22	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  |	
  Views	
  
KERNEL	
  OCCUPANCY	
  

!  EsPmate	
  OpenCL™	
  kernel	
  occupancy	
  for	
  AMD	
  APUs	
  and	
  
GPUs	
  
‒  Visual	
  indicaPon	
  of	
  the	
  limiPng	
  kernel	
  resources	
  for	
  number	
  of	
  
wavefronts	
  in	
  flight	
  
‒  View	
  the	
  maximum	
  number	
  of	
  wavefronts	
  in	
  flight	
  limited	
  by	
  
‒  Work	
  group	
  size	
  
‒  Number	
  of	
  allocated	
  scalar	
  or	
  vector	
  registers	
  
‒  Amount	
  of	
  allocated	
  LDS	
  

‒  View	
  the	
  maximum	
  resource	
  limit	
  for	
  the	
  GPU	
  device	
  

23	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
GPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  |	
  DEMO	
  
OpPmizing	
  AMD	
  teapot	
  applicaPon	
  
!  Finding	
  and	
  fixing	
  non-­‐opPmized	
  kernel	
  launch	
  parameters	
  
‒  API	
  Trace	
  and	
  Warning	
  and	
  Error	
  Messages	
  View	
  
!  Visualizing	
  host	
  device	
  synchronizaPon	
  
‒  Timeline	
  VisualizaPon	
  
!  NavigaPng	
  to	
  find	
  the	
  top	
  booleneck	
  
‒  Summary	
  Pages	
  View	
  
!  OpPmizing	
  the	
  kernel	
  
‒  Kernel	
  Occupancy	
  and	
  GPU	
  Performance	
  Counter	
  View	
  

24	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
StaPc	
  Kernel
	
  
Analysis
	
  
STATIC	
  KERNEL	
  ANALYSIS	
  WITH	
  AMD	
  CodeXL	
  
KEY	
  FEATURES	
  

!  Compile,	
  analyze	
  and	
  disassemble	
  an	
  OpenCL™	
  kernel	
  
for	
  AMD	
  APUs,	
  GPUs	
  and	
  CPUs.	
  
‒  View	
  AMD	
  IL	
  and	
  hardware	
  disassembly	
  (ISA)	
  
‒  View	
  compilaPon	
  warning	
  and	
  error	
  messages	
  

!  Generate	
  offline	
  compilaPon	
  of	
  OpenCL™	
  kernel	
  binary	
  
!  View	
  compiler	
  staPsPcs	
  and	
  esPmate	
  performance	
  
!  Only	
  require	
  the	
  OpenCL™	
  kernel	
  source	
  code	
  as	
  an	
  
input	
  
!  Does	
  not	
  require	
  a	
  GPU	
  in	
  the	
  system	
  

26	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
STATIC	
  KERNEL	
  ANALYSIS	
  WITH	
  AMD	
  CodeXL	
  |	
  FEATURES	
  
NEW	
  IN	
  CODEXL	
  1.3	
  

!  Integrated	
  into	
  AMD	
  CodeXL	
  standalone	
  and	
  Visual	
  
Studio®	
  extension	
  
!  Brand	
  new	
  user	
  experience	
  
‒  View	
  OpenCL™	
  kernel	
  source,	
  IL	
  and	
  ISA	
  simultaneously	
  
‒  View	
  overview	
  
‒  Generate	
  analysis	
  for	
  SI	
  and	
  CI	
  families	
  of	
  GPUs	
  
‒  EsPmated	
  cycle	
  count	
  with	
  isa	
  branch	
  execuPon	
  
classificaPon	
  

‒  Navigate	
  compilaPon	
  and	
  analysis	
  results	
  in	
  tree	
  view	
  

!  Support	
  compilaPon	
  for	
  the	
  latest	
  AMD	
  APUs,	
  GPUs	
  
and	
  CPUs	
  

27	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
CPU	
  Profiling
	
  
CPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  
!  IdenPfy	
  and	
  invesPgate	
  CPU	
  performance	
  hot-­‐spots	
  
!  Profiles	
  C,	
  C++,	
  FORTRAN,	
  Java,	
  .NET,	
  OpenCL™	
  applicaPons	
  
!  Profiles	
  soaware	
  components	
  
‒  ApplicaPons,	
  Libraries,	
  Dynamically	
  loaded	
  modules	
  
‒  OS	
  Kernel	
  modules	
  

!  Profile	
  modes	
  
‒  Per	
  Process	
  (target	
  applicaPon	
  and	
  its	
  children)	
  
‒  System	
  Wide	
  Profiling	
  

!  Uses	
  HW	
  Performance	
  Monitoring	
  counters	
  
‒  Low	
  overhead	
  

!  No	
  change	
  to	
  source	
  code	
  required	
  	
  
‒  Symbolic	
  informaPon	
  required	
  to	
  aoribute	
  the	
  performance	
  data	
  at	
  funcPon/source	
  level	
  

29	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
CPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  
!  Profiling	
  Types	
  
‒  Time-­‐based	
  profiling	
  
‒  Event-­‐based	
  profiling	
  
‒  InstrucPon	
  Based	
  Sampling	
  (IBS)	
  
‒  Cache	
  Line	
  UPlizaPon	
  
‒  Call	
  Graph	
  

!  Pre-­‐defined	
  profile	
  configuraPon	
  of	
  HW	
  
	
  	
  	
  	
  	
  	
  performance	
  events	
  
‒  Assess	
  Performance	
  
‒  InvesPgate	
  	
  Data	
  Access	
  
‒  InvesPgate	
  Branching	
  

30	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
CPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  
!  Performance	
  data	
  are	
  displayed	
  in	
  configurable	
  views	
  
‒  Samples	
  aoributed	
  at	
  Process	
  and	
  Modules	
  level	
  
‒  Drill	
  down	
  to	
  FuncPons,	
  Source	
  code	
  and	
  InstrucPons	
  level	
  

31	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
CPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  
!  Call	
  Graph	
  view	
  displays	
  the	
  parents	
  and	
  children	
  of	
  hooest	
  funcPon	
  calls	
  

32	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
CPU	
  PROFILING	
  WITH	
  AMD	
  CodeXL	
  
!  IdenPfy	
  Hotspots	
  
‒  Where	
  the	
  applicaPon	
  spends	
  its	
  Pme	
  
‒  Source	
  level/algorithm	
  related	
  performance	
  issues	
  	
  
‒  Use	
  Time-­‐base	
  profiling	
  

!  IdenPfy	
  the	
  cause	
  
‒  How	
  well	
  the	
  applicaPon	
  is	
  using	
  the	
  CPU	
  and	
  Memory	
  resources	
  
‒  Performance	
  boolenecks	
  due	
  to	
  the	
  micro-­‐architectural	
  constraints	
  
‒  Use	
  Event-­‐based	
  profiling	
  or	
  InstrucPon	
  Based	
  Sampling	
  

!  Precise	
  instrucPon	
  level	
  profiling	
  
‒  Use	
  InstrucPon	
  Based	
  Sampling	
  

!  Cache-­‐Line	
  UPlizaPon	
  -­‐	
  Data	
  access	
  paoern	
  
33	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
CPU	
  Profiling
	
  
Demo
	
  
AMD	
  CodeXL	
  SUMMARY	
  
!  Powerful	
  APU	
  and	
  GPU	
  Debugging	
  
‒  OpenCL™	
  API	
  Level	
  
‒  OpenCL™	
  Kernel	
  Source	
  Code	
  

!  APU/GPU	
  and	
  CPU	
  Profiling	
  
‒  IdenPfy	
  “hot	
  spots”	
  with	
  inefficient	
  code	
  

!  StaPc	
  Kernel	
  Analysis	
  
‒  Compile,	
  analyze	
  and	
  disassemble	
  OpenCL™	
  kernel	
  
‒  Generate	
  offline	
  compilaPon	
  of	
  OpenCL™	
  kernel	
  
binary	
  

!  Integrated	
  into	
  Microsoa®	
  Visual	
  Studio®	
  
!  Standalone	
  applicaPon	
  for	
  Windows®	
  and	
  Linux®	
  	
  
!  Free	
  download	
  at	
  hop://developer.amd.com	
  	
  

35	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  
Thank	
  you!
	
  
QuesPons?
	
  

Contact	
  us:	
  
!  Budi.Purnomo@amd.com	
  
!  Uri.Shomroni@amd.com	
  
!  Gnanabaskaran.Muthumani@amd.com	
  
DISCLAIMER	
  &	
  ATTRIBUTION	
  
The	
  informaPon	
  presented	
  in	
  this	
  document	
  is	
  for	
  informaPonal	
  purposes	
  only	
  and	
  may	
  contain	
  technical	
  inaccuracies,	
  omissions	
  and	
  typographical	
  errors.	
  
	
  
The	
  informaPon	
  contained	
  herein	
  is	
  subject	
  to	
  change	
  and	
  may	
  be	
  rendered	
  inaccurate	
  for	
  many	
  reasons,	
  including	
  but	
  not	
  limited	
  to	
  product	
  and	
  roadmap	
  
changes,	
  component	
  and	
  motherboard	
  version	
  changes,	
  new	
  model	
  and/or	
  product	
  releases,	
  product	
  differences	
  between	
  differing	
  manufacturers,	
  soaware	
  
changes,	
  BIOS	
  flashes,	
  firmware	
  upgrades,	
  or	
  the	
  like.	
  AMD	
  assumes	
  no	
  obligaPon	
  to	
  update	
  or	
  otherwise	
  correct	
  or	
  revise	
  this	
  informaPon.	
  However,	
  AMD	
  
reserves	
  the	
  right	
  to	
  revise	
  this	
  informaPon	
  and	
  to	
  make	
  changes	
  from	
  Pme	
  to	
  Pme	
  to	
  the	
  content	
  hereof	
  without	
  obligaPon	
  of	
  AMD	
  to	
  noPfy	
  any	
  person	
  of	
  
such	
  revisions	
  or	
  changes.	
  
	
  
AMD	
  MAKES	
  NO	
  REPRESENTATIONS	
  OR	
  WARRANTIES	
  WITH	
  RESPECT	
  TO	
  THE	
  CONTENTS	
  HEREOF	
  AND	
  ASSUMES	
  NO	
  RESPONSIBILITY	
  FOR	
  ANY	
  
INACCURACIES,	
  ERRORS	
  OR	
  OMISSIONS	
  THAT	
  MAY	
  APPEAR	
  IN	
  THIS	
  INFORMATION.	
  
	
  
AMD	
  SPECIFICALLY	
  DISCLAIMS	
  ANY	
  IMPLIED	
  WARRANTIES	
  OF	
  MERCHANTABILITY	
  OR	
  FITNESS	
  FOR	
  ANY	
  PARTICULAR	
  PURPOSE.	
  IN	
  NO	
  EVENT	
  WILL	
  AMD	
  BE	
  
LIABLE	
  TO	
  ANY	
  PERSON	
  FOR	
  ANY	
  DIRECT,	
  INDIRECT,	
  SPECIAL	
  OR	
  OTHER	
  CONSEQUENTIAL	
  DAMAGES	
  ARISING	
  FROM	
  THE	
  USE	
  OF	
  ANY	
  INFORMATION	
  
CONTAINED	
  HEREIN,	
  EVEN	
  IF	
  AMD	
  IS	
  EXPRESSLY	
  ADVISED	
  OF	
  THE	
  POSSIBILITY	
  OF	
  SUCH	
  DAMAGES.	
  
	
  
ATTRIBUTION	
  
©	
  2013	
  Advanced	
  Micro	
  Devices,	
  Inc.	
  All	
  rights	
  reserved.	
  AMD,	
  the	
  AMD	
  Arrow	
  logo,	
  the	
  AMD	
  Radeon	
  and	
  combinaPons	
  thereof	
  are	
  trademarks	
  of	
  
Advanced	
  Micro	
  Devices,	
  Inc.	
  in	
  the	
  United	
  States	
  and/or	
  other	
  jurisdicPons.	
  	
  OpenCL	
  is	
  a	
  trademark	
  of	
  Apple	
  Inc.	
  Microsoa,	
  Windows	
  and	
  Visual	
  Studio	
  are	
  
trademarks	
  of	
  Microsoa	
  Corp.	
  Linux	
  is	
  a	
  trademark	
  of	
  Linus	
  Torvalds.	
  Other	
  names	
  are	
  for	
  informaPonal	
  purposes	
  only	
  and	
  may	
  be	
  trademarks	
  of	
  their	
  
respecPve	
  owners.	
  
37	
   |	
  	
  	
  ADVANCED	
  OPENCLTM	
  DEBUGGING	
  AND	
  PROFILING	
  USING	
  CODEXL	
  	
  	
  |	
  	
  	
  NOVEMBER	
  13,	
  2013	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbr Skip
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderAMD Developer Central
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelIS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelAMD Developer Central
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...AMD Developer Central
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoAMD Developer Central
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsAMD Developer Central
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengAMD Developer Central
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansAMD Developer Central
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...AMD Developer Central
 

Was ist angesagt? (20)

Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelIS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 

Ähnlich wie PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri Shomroni

HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPUIlya Kuzovkin
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...Edge AI and Vision Alliance
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Pradeep Singh
 
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLBoosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLJanakiRam Raghumandala
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapManolis Vavalis
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2Linaro
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationclCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationIntel® Software
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...Edge AI and Vision Alliance
 
Seminar presentation on OpenGL
Seminar presentation on OpenGLSeminar presentation on OpenGL
Seminar presentation on OpenGLMegha V
 
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleHSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Minko - Flash Conference #5
Minko - Flash Conference #5Minko - Flash Conference #5
Minko - Flash Conference #5Minko3D
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldCloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldOmer Kilic
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
OpenCms Days 2016: Next generation content repository
OpenCms Days 2016: Next generation content repository OpenCms Days 2016: Next generation content repository
OpenCms Days 2016: Next generation content repository Alkacon Software GmbH & Co. KG
 

Ähnlich wie PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri Shomroni (20)

HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPU
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
 
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLBoosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmap
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationclCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
 
Seminar presentation on OpenGL
Seminar presentation on OpenGLSeminar presentation on OpenGL
Seminar presentation on OpenGL
 
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleHSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Minko - Flash Conference #5
Minko - Flash Conference #5Minko - Flash Conference #5
Minko - Flash Conference #5
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldCloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
OpenCms Days 2016: Next generation content repository
OpenCms Days 2016: Next generation content repository OpenCms Days 2016: Next generation content repository
OpenCms Days 2016: Next generation content repository
 

Mehr von AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 

Mehr von AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri Shomroni

  • 1. ADVANCED  OPENCL™  DEBUGGING  AND  PROFILING  USING  CODEXL   BUDI  PURNOMO   URI  SHOMRONI    GNANABASKARAN  MUTHUMANI  
  • 2. ABOUT  OpenCL™   OpenCL™  is  FUN!   !  Parallel  compute  programming  language   !  Exposes  the  massively  mulPthreaded  GPU   !  A  lot  of  horsepower,  opPmized  for  parallel  compuPng   !  Order-­‐of-­‐magnitude  performance  improvement!   2   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 3. OpenCL™  DEBUGGING  AND  PROFILING  CHALLENGES   However,   !  Debugging  and  profiling  parallel  processing  applicaPons  is  hard     !  On-­‐Pme  delivery  of  robust  (bug-­‐free)  OpenCL™  applicaPons  is  challenging   !  It  is  almost  impossible  to  opPmize  an  OpenCL™-­‐based  applicaPon  to  fully  uPlize  the  available  parallel   processing  system  resources   3   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 4. OpenCL™  DEBUGGING  AND  PROFILING  CHALLENGES   OpenCL™  is  a  “Black  Box”   !  The  applicaPon  enqueues  OpenCL™  commands   !  The  OpenCL™  runPme  executes  the  commands   ApplicaPon   !  Using  a  host  profiler  and  debugger,  the  developer  cannot   ‒  Debug  and  profile  the  OpenCL™  kernels     ‒  See  the  execuPon  details   ‒  View  runPme  loads   OpenCL™   4   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 5. AMD  CodeXL   !  APU  and  GPU  Debugging  FuncPonality   ‒  OpenCL™  and  OpenGL  API-­‐Level   ‒  OpenCL™  Kernel  Source  Code   !  APU,  CPU  and  GPU  Profiling   !  OpenCL™  StaPc  Kernel  Analysis   !  Provides  the  informaPon  a  developer  needs  to   help  find  bugs  and  opPmize  the  applicaPon’s   performance   !  Integrated  into  Microsoa®  Visual  Studio®   !  Standalone  applicaPon  for  Windows®  and  Linux®   5   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 7. GPU  DEBUGGING  WITH  AMD  CodeXL  |  DEMO   AMDTTEAPOT   !  Sample  provided  with  CodeXL  tools  suite   ‒  API-­‐Level  debugging   ‒  PinpoinPng  OpenCL  ™  Errors   ‒  Entering  Kernel  debugging   ‒  Locals  and  Watch  views   ‒  Kernel  Source  breakpoints   ‒  Finding  problemaPc  work  items   ‒  OpenCL  ™  Kernel  MulPwatch  view   7   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 8. GPU  DEBUGGING  WITH  AMD  CodeXL  |  VIEWS   API  CALLS  HISTORY  VIEW   !  Displays  OpenCL  ™  and  OpenGL  API  calls   ‒  Supports  funcPon  calls  from  OpenCL™  up  to  version  1.2  and  OpenGL  up  to  version  4.3   ‒  FuncPon  parameters   ‒  Object  links  in  properPes   ‒  API  calls  are  divided  per  Compute  /  Render  context.   ‒  Calls  history  recording  to  an  HTML  log  file   8   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 9. GPU  DEBUGGING  WITH  AMD  CodeXL  |  VIEWS   CODEXL  EXPLORER   !  Displays  OpenCL™  and  OpenGL  allocated  objects  calls   ‒  Object  Hierarchy  and  counts   ‒  Object  properPes   ‒  For  objects  with  data  /  sources  -­‐  double  click  to  open  a  main  view   ‒  Display  detected  memory  leaks  if  "Break  on  Memory  Leaks"  is  selected.   9   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 10. GPU  DEBUGGING  WITH  AMD  CodeXL  |  VIEWS   SOURCE  AND  CALL  STACK  VIEWS   !  Displays  host  code,  OpenCL™  kernel  source,  and  OpenGL  shader  source   ‒  Set  source-­‐level  breakpoints  in  OpenCL™  kernels   ‒  Display  host  thread  and  OpenCL™  kernel  wavefront  call  stacks   ‒  Visual  Studio®  integraPon   10   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 11. GPU  DEBUGGING  WITH  AMD  CodeXL  |  VIEWS   OBJECT  VIEWS   !  Displays  image,  buffer  and  texture  data   ‒  Image  view  for  OpenCL™  images  and  OpenGL  textures  and  render  buffers   ‒  3D  image  support  with  layer  selecPon  slider   ‒  Non-­‐RGB  images  mapped  to  grayscale  range,  with  selecPon  of  minimum  and  maximum  values  clearly  displaying  out-­‐of-­‐range   values   ‒  Data  view  for  all  objects   ‒  Channel  order  /  type  selecPon  for  buffer  data   ‒  ConnecPon  to  image  view  for  objects  that  support  it   11   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 12. GPU  DEBUGGING  WITH  AMD  CodeXL  |  VIEWS   LOCALS  AND  WATCH  VIEWS   !  Display  OpenCL™  kernel  variables   ‒  Structure  and  vector  types  support   ‒  Global  and  Private  memory  array  dereferencing   ‒  Local  and  Constant  memory  support  planned  for  future  releases   ‒  Visual  Studio®  integraPon   12   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 13. GPU  DEBUGGING  WITH  AMD  CodeXL  |  VIEWS   MULTIWATCH  VIEWS   !  Display  a  single  OpenCL™  kernel  variable  value  across  the  current  work  items   ‒  Image  and  Data  visualizaPon   ‒  Range  slider,  like  Object  image  view   ‒  Current  work  item  is  highlighted  and  can  be  changed  by  double-­‐clicking  the  data  view.   13   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 14. GPU  DEBUGGING  WITH  AMD  CodeXL  |  FEATURES   NEW  IN  CODEXL  1.3   !  Remote  debugging   ‒  Debug  capabiliPes  on  a  remote  machine   ‒  API-­‐level  debugging   ‒  Kernel  debugging   ‒  Requires  a  CodeXL  agent  running  on  the  target  machine   ‒  The  agent  is  included  as  an  opPon  in  the  CodeXL  installer   ‒  Same  agent  for  remote  GPU  debugging    and  remote  GPU  profiling   ‒  Currently  only  supports  Windows-­‐to-­‐Windows  and  Linux-­‐to-­‐Linux  debugging   !  OpenCL™  API  support  increased  up  to  OpenCL™  1.2   ‒  New  API  funcPons   ‒  New  deprecated  funcPons  and  behaviors   !  OpenGL  API  support  increased  up  to  OpenGL  4.3   ‒  New  API  funcPons  and  tokens   ‒  New  shader  types   14   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 15. GPU  DEBUGGING  WITH  AMD  CodeXL  |  FEATURES   UPCOMING  RELEASES   !  Hardware-­‐based  kernel  debugging   ‒  Current  implementaPon  retrieves  hardware  values  but  performs  kernel  playback  for  breakpoint  implementaPon   ‒  Display  data  for  the  enPre  grid   ‒  OpPmized  for  small-­‐  and  medium-­‐sized  kernels   ‒  Does  not  support  debugging  kernels  that  can't  be  replayed  consistently  (such  as  kernels  using  atomics)   ‒  New  implementaPon  will  use  hardware  breakpoints   ‒  Display  data  according  to  the  wavefronts  executed  in  the  actual  hardware   ‒  Faster  for  large  kernels   ‒  Stop  and  resume  wavefront  execuPon   ‒  Can  break  a  running  kernel   ‒  Can  support  debugging  persistent  kernels  (aoach  to  kernel)   ‒  Will  allow  data  breakpoints   ‒  Working  development  build  in  the  demo  area!   15   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 17. GPU  PROFILING  WITH  AMD  CodeXL   KEY  FEATURES   !  Analyze  and  profile  OpenCL™  host  and  device  code   ‒  Collect  applicaPon  trace  mode   ‒  Collect  GPU  performance  counter  mode   !  Views:   ‒  API  trace:  View  API  calls  with  inputs  and  outputs   ‒  Timeline  visualizaPon:  View  host  and  device  synch  issue   ‒  Summary  pages:  Find  top  booleneck   ‒  Warnings/Errors:  View  performance  suggesPons   ‒  Kernel  occupancy:  Find  kernel  resource  booleneck   ‒  Performance  counter:  View  kernel  perf  booleneck   !  Does  not  require  source  or  project  modificaPons  to   the  applicaPon   !  Does  not  even  require  the  applicaPon  source  code   17   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 18. GPU  PROFILING  WITH  AMD  CodeXL  |  Views   API  TRACE   !  Analyze  and  profile  OpenCL™  applicaPons   ‒  View  API  input  arguments  and  output  results   ‒  Find  API  hotspots   ‒  Determine  top  ten  data  transfer  and  kernel  execuPon  operaPons   ‒  IdenPfy  failed  API  calls,  resource  leaks  and  best  pracPces   18   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 19. GPU  PROFILING  WITH  AMD  CodeXL  |  Views   TIMELINE  VISUALIZATION   !  Visualize  host  and  device  execuPon  in  a  Pmeline  chart   ‒  View  number  of  OpenCL™  contexts  and  command  queues  created  and  the  relaPonships  between  these  items   ‒  View  data  transfer  operaPons  and  kernel  execuPons  on  the  device   ‒  Determine  proper  synchronizaPon  and  load  balancing     19   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 20. GPU  PROFILING  WITH  AMD  CodeXL  |  Views   SUMMARY  PAGES   !  Find  top  boolenecks   ‒  I/O  bound   ‒  Compute  bound   20   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 21. GPU  PROFILING  WITH  AMD  CodeXL  |  Views   WARNING  AND  ERROR  MESSAGES   !  Provide  performance  improvement  suggesPons   !  Detect  errors  in  an  OpenCL™  applicaPon   21   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 22. GPU  PROFILING  WITH  AMD  CodeXL  |  Views   PERFORMANCE  COUNTER   !  Analyze  the  OpenCL™  kernel  execuPon  for  AMD  APUs  and  GPUs   ‒  Collect  GPU  Performance  Counters   ‒  The  number  of  ALU,  global  and  local  memory  instrucPons  executed   ‒  GPU  uPlizaPon  and  memory  access  characterisPcs   ‒  Show  the  kernel  resource  usages   ‒  View  the  AMD  intermediate  language  (AMD  IL)  and  hardware  disassembly  (ISA)   22   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 23. GPU  PROFILING  WITH  AMD  CodeXL  |  Views   KERNEL  OCCUPANCY   !  EsPmate  OpenCL™  kernel  occupancy  for  AMD  APUs  and   GPUs   ‒  Visual  indicaPon  of  the  limiPng  kernel  resources  for  number  of   wavefronts  in  flight   ‒  View  the  maximum  number  of  wavefronts  in  flight  limited  by   ‒  Work  group  size   ‒  Number  of  allocated  scalar  or  vector  registers   ‒  Amount  of  allocated  LDS   ‒  View  the  maximum  resource  limit  for  the  GPU  device   23   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 24. GPU  PROFILING  WITH  AMD  CodeXL  |  DEMO   OpPmizing  AMD  teapot  applicaPon   !  Finding  and  fixing  non-­‐opPmized  kernel  launch  parameters   ‒  API  Trace  and  Warning  and  Error  Messages  View   !  Visualizing  host  device  synchronizaPon   ‒  Timeline  VisualizaPon   !  NavigaPng  to  find  the  top  booleneck   ‒  Summary  Pages  View   !  OpPmizing  the  kernel   ‒  Kernel  Occupancy  and  GPU  Performance  Counter  View   24   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 26. STATIC  KERNEL  ANALYSIS  WITH  AMD  CodeXL   KEY  FEATURES   !  Compile,  analyze  and  disassemble  an  OpenCL™  kernel   for  AMD  APUs,  GPUs  and  CPUs.   ‒  View  AMD  IL  and  hardware  disassembly  (ISA)   ‒  View  compilaPon  warning  and  error  messages   !  Generate  offline  compilaPon  of  OpenCL™  kernel  binary   !  View  compiler  staPsPcs  and  esPmate  performance   !  Only  require  the  OpenCL™  kernel  source  code  as  an   input   !  Does  not  require  a  GPU  in  the  system   26   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 27. STATIC  KERNEL  ANALYSIS  WITH  AMD  CodeXL  |  FEATURES   NEW  IN  CODEXL  1.3   !  Integrated  into  AMD  CodeXL  standalone  and  Visual   Studio®  extension   !  Brand  new  user  experience   ‒  View  OpenCL™  kernel  source,  IL  and  ISA  simultaneously   ‒  View  overview   ‒  Generate  analysis  for  SI  and  CI  families  of  GPUs   ‒  EsPmated  cycle  count  with  isa  branch  execuPon   classificaPon   ‒  Navigate  compilaPon  and  analysis  results  in  tree  view   !  Support  compilaPon  for  the  latest  AMD  APUs,  GPUs   and  CPUs   27   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 29. CPU  PROFILING  WITH  AMD  CodeXL   !  IdenPfy  and  invesPgate  CPU  performance  hot-­‐spots   !  Profiles  C,  C++,  FORTRAN,  Java,  .NET,  OpenCL™  applicaPons   !  Profiles  soaware  components   ‒  ApplicaPons,  Libraries,  Dynamically  loaded  modules   ‒  OS  Kernel  modules   !  Profile  modes   ‒  Per  Process  (target  applicaPon  and  its  children)   ‒  System  Wide  Profiling   !  Uses  HW  Performance  Monitoring  counters   ‒  Low  overhead   !  No  change  to  source  code  required     ‒  Symbolic  informaPon  required  to  aoribute  the  performance  data  at  funcPon/source  level   29   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 30. CPU  PROFILING  WITH  AMD  CodeXL   !  Profiling  Types   ‒  Time-­‐based  profiling   ‒  Event-­‐based  profiling   ‒  InstrucPon  Based  Sampling  (IBS)   ‒  Cache  Line  UPlizaPon   ‒  Call  Graph   !  Pre-­‐defined  profile  configuraPon  of  HW              performance  events   ‒  Assess  Performance   ‒  InvesPgate    Data  Access   ‒  InvesPgate  Branching   30   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 31. CPU  PROFILING  WITH  AMD  CodeXL   !  Performance  data  are  displayed  in  configurable  views   ‒  Samples  aoributed  at  Process  and  Modules  level   ‒  Drill  down  to  FuncPons,  Source  code  and  InstrucPons  level   31   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 32. CPU  PROFILING  WITH  AMD  CodeXL   !  Call  Graph  view  displays  the  parents  and  children  of  hooest  funcPon  calls   32   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 33. CPU  PROFILING  WITH  AMD  CodeXL   !  IdenPfy  Hotspots   ‒  Where  the  applicaPon  spends  its  Pme   ‒  Source  level/algorithm  related  performance  issues     ‒  Use  Time-­‐base  profiling   !  IdenPfy  the  cause   ‒  How  well  the  applicaPon  is  using  the  CPU  and  Memory  resources   ‒  Performance  boolenecks  due  to  the  micro-­‐architectural  constraints   ‒  Use  Event-­‐based  profiling  or  InstrucPon  Based  Sampling   !  Precise  instrucPon  level  profiling   ‒  Use  InstrucPon  Based  Sampling   !  Cache-­‐Line  UPlizaPon  -­‐  Data  access  paoern   33   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 35. AMD  CodeXL  SUMMARY   !  Powerful  APU  and  GPU  Debugging   ‒  OpenCL™  API  Level   ‒  OpenCL™  Kernel  Source  Code   !  APU/GPU  and  CPU  Profiling   ‒  IdenPfy  “hot  spots”  with  inefficient  code   !  StaPc  Kernel  Analysis   ‒  Compile,  analyze  and  disassemble  OpenCL™  kernel   ‒  Generate  offline  compilaPon  of  OpenCL™  kernel   binary   !  Integrated  into  Microsoa®  Visual  Studio®   !  Standalone  applicaPon  for  Windows®  and  Linux®     !  Free  download  at  hop://developer.amd.com     35   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013  
  • 36. Thank  you!   QuesPons?   Contact  us:   !  Budi.Purnomo@amd.com   !  Uri.Shomroni@amd.com   !  Gnanabaskaran.Muthumani@amd.com  
  • 37. DISCLAIMER  &  ATTRIBUTION   The  informaPon  presented  in  this  document  is  for  informaPonal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.     The  informaPon  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap   changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  soaware   changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obligaPon  to  update  or  otherwise  correct  or  revise  this  informaPon.  However,  AMD   reserves  the  right  to  revise  this  informaPon  and  to  make  changes  from  Pme  to  Pme  to  the  content  hereof  without  obligaPon  of  AMD  to  noPfy  any  person  of   such  revisions  or  changes.     AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY   INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.     AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE   LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION   CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.     ATTRIBUTION   ©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo,  the  AMD  Radeon  and  combinaPons  thereof  are  trademarks  of   Advanced  Micro  Devices,  Inc.  in  the  United  States  and/or  other  jurisdicPons.    OpenCL  is  a  trademark  of  Apple  Inc.  Microsoa,  Windows  and  Visual  Studio  are   trademarks  of  Microsoa  Corp.  Linux  is  a  trademark  of  Linus  Torvalds.  Other  names  are  for  informaPonal  purposes  only  and  may  be  trademarks  of  their   respecPve  owners.   37   |      ADVANCED  OPENCLTM  DEBUGGING  AND  PROFILING  USING  CODEXL      |      NOVEMBER  13,  2013