SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  
NASTRAN	
  AND	
  GPUS	
  
LEONARD	
  HOFFNUNG	
  
SIEMENS	
  PLM	
  SOFTWARE	
  	
  
INTRODUCTION	
  
INTRO	
  /	
  FREQ	
  RESP	
  /	
  MODES	
  /	
  CONCLUSIONS	
  
ABOUT	
  NX	
  NASTRAN	
  
	
  	
  

!  Industry	
  standard	
  finite	
  element	
  package	
  from	
  Siemens	
  PLM	
  
!  Analysis	
  opSons	
  include:	
  
‒  Stress,	
  vibraSon,	
  structural	
  failure	
  
‒  Heat	
  transfer,	
  acousScs,	
  rotor	
  dynamics,	
  and	
  more	
  

!  Advanced	
  numerical	
  capabiliSes	
  and	
  proven	
  scalability:	
  
‒  Problem	
  sizes	
  approaching	
  1	
  billion	
  dofs	
  
‒  SMP	
  to	
  24	
  cores	
  
‒  DMP	
  to	
  2048	
  nodes	
  

3	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
MODAL	
  FREQUENCY	
  RESPONSE	
  OVERVIEW	
  
	
  NASTRAN	
  SOL	
  111	
  

!  Bread	
  and	
  buer	
  industrial	
  computaSon:	
  modal	
  frequency	
  response	
  
!  Widely	
  used	
  in	
  automoSve	
  &	
  aerospace	
  to	
  determine	
  response	
  under	
  varying	
  excitaSons	
  
‒  OpSmize	
  weight,	
  rigidity	
  
‒  Minimize	
  noise,	
  resonance	
  

!  Two	
  phase	
  calculaSon	
  more	
  efficient	
  than	
  direct:	
  
‒  Modal	
  analysis	
  
‒  Frequency	
  response	
  calculaSon	
  
	
  

	
  

4	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
MODAL	
  FREQUENCY	
  RESPONSE	
  
	
  COMPUTATIONAL	
  STEPS	
  

!  EigensoluSon	
  -­‐-­‐	
  ℎ	
  normal	
  modes	
  of	
   𝑓× 𝑓	
  structural	
  matrices:	
  

​ 𝐾↓𝑓𝑓 ​Φ↓𝑓ℎ =​ 𝑀↓𝑓𝑓 ​Φ↓𝑓ℎ ​Λ↓ℎℎ 	
  
!  Frequency	
  response	
  -­‐-­‐	
  ℎ×ℎ	
  complex	
  linear	
  soluSon	
  at	
  each	
  of	
   𝑛 𝑟𝑒𝑠𝑝	
  frequencies:	
  
(​ 𝐾↓ℎℎ +​ 𝜔↓𝑘 𝑖  ​ 𝐵↓ℎℎ −​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ )​ 𝑥↓𝑘 =​ 𝑏↓𝑘 ,     𝑘=1,…, 𝑛𝑟𝑒𝑠𝑝	
  

	
  
!  All	
  parameters	
  large	
  in	
  typical	
  customer	
  usage:	
  
‒  𝑓-­‐size	
  10-­‐30M	
  for	
  model	
  fidelity	
  
‒  ℎ-­‐size	
  10-­‐60K	
  for	
  modal	
  accuracy	
  
‒  𝑛𝑟𝑒𝑠𝑝	
  20K	
  for	
  detailed	
  response	
  graph	
  

5	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
PERFORMANCE	
  CASE	
  STUDY	
  

	
  PR	
  MODEL	
  –	
  FREQUENCY	
  RESPONSE	
  COST	
  

!  Shell	
  dominated	
  SOL	
  111	
  model	
  
‒  245K	
  degrees	
  of	
  freedom	
  ( 𝑓-­‐size)	
  
‒  1200	
  eigenpairs	
  (ℎ-­‐size)	
  
‒  20K	
  frequency	
  responses	
  	
  ( 𝑛𝑟𝑒𝑠𝑝)	
  

!  EigensoluSon	
  Sme:	
  30	
  minutes	
  
!  Frequency	
  response:	
  127	
  minutes	
  
!  Frequency	
  response	
  cost	
   𝑂( 𝑛𝑟𝑒𝑠𝑝  ∗​ℎ↑3 )	
  
‒  EsSmated	
  run	
  Sme	
  in	
  decades	
  as	
  ℎ→60 𝐾	
  

6	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
PERFORMANCE	
  CASE	
  STUDY	
  
	
  CUSTOMER	
  BENCHMARK	
  

!  More	
  typical	
  industrial	
  model:	
  
‒  11	
  million	
  degrees	
  of	
  freedom	
  ( 𝑓-­‐size)	
  
‒  Shell	
  dominated	
  model	
  
‒  Approximately	
  3000	
  eigenpairs	
  (ℎ-­‐size)	
  
‒  300	
  frequency	
  responses	
  ( 𝑛𝑟𝑒𝑠𝑝)	
  

!  Frequency	
  response	
  expensive,	
  but	
  modal	
  calculaSon	
  sSll	
  expensive	
  even	
  with	
  RDMODES:	
  
‒  Modal	
  calculaSon:	
  375	
  minutes	
  
‒  Frequency	
  response	
  Sme:	
  22	
  minutes	
  

!  Need	
  to	
  improve	
  performance	
  in	
  both	
  phases	
  

7	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
FREQUENCY	
  RESPONSE	
  
INTRO	
  /	
  FREQ	
  RESP	
  /	
  MODES	
  /	
  CONCLUSIONS	
  
FREQUENCY	
  RESPONSE	
  IMPLEMENTATION	
  
	
  DETAILS	
  OF	
  ORIGINAL	
  METHOD	
  

!  NX	
  Nastran	
  implementaSon	
  uses	
  symmetric	
   𝐿 𝐷​ 𝐿↑𝑇 	
  factorizaSon	
  and	
  forward-­‐backward	
  subsStuSon:	
  
	
  
	
  

	
  For	
   𝑘=1,…, 𝑛𝑟𝑒𝑠𝑝	
  

	
  

	
  

	
  Assemble	
   𝐴=​ 𝐾↓ℎℎ +​ 𝜔↓𝑘 𝑖​ 𝐵↓ℎℎ −​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ 	
  

	
  

	
  

	
  Factor	
   𝐴= 𝐿𝐷​ 𝐿↑𝑇 	
  

	
  

	
  

	
  Solve	
  ​ 𝑥↓𝑘 =​ 𝐴↑−1 ​ 𝑏↓𝑘 =​ 𝐿↑− 𝑇 ​ 𝐷↑−1 ​ 𝐿↑−1 ​ 𝑏↓𝑘 	
  

	
  

	
  End	
  for	
  

	
  
!  NX	
  Nastran	
  sparse	
  factorizaSon	
  difficult	
  to	
  adapt	
  to	
  GPU:	
  
‒  	
  Disk	
  oriented	
  
‒  Tuned	
  for	
  sparse	
  matrices	
  
‒  Symmetric	
  pivoSng	
  required	
  for	
  stability	
  (indefiniteness)	
  

	
  
9	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
FREQUENCY	
  RESPONSE	
  IMPLEMENTATION	
  
	
  DETAILS	
  OF	
  REVISED	
  METHOD	
  

!  For	
  GPU	
  code,	
  use	
  LU	
  factorizaSon	
  instead:	
  
	
  
	
  

	
  For	
   𝑘=1,…, 𝑛𝑟𝑒𝑠𝑝	
  

	
  

	
  

	
  Assemble	
   𝐴=​ 𝐾↓ℎℎ +​ 𝜔↓𝑘 𝑖​ 𝐵↓ℎℎ −​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ 	
  

	
  

	
  

	
  Factor	
   𝐴= 𝐿𝑈	
  

	
  

	
  

	
  Solve	
  ​ 𝑥↓𝑘 =​ 𝐴↑−1 ​ 𝑏↓𝑘 =​ 𝑈↑−1 ​ 𝐿↑−1 ​ 𝑏↓𝑘 	
  

	
  

	
  End	
  for	
  

	
  
!  OpenCL	
  port	
  of	
  LAPACK	
  zgesv	
  available	
  with	
  clMAGMA	
  and	
  clBLAS	
  
‒  In	
  core	
  storage	
  
‒  Dense	
  oriented	
  (okay	
  for	
  this	
  applicaSon)	
  
‒  Benefit	
  mainly	
  in	
  factorizaSon	
  step	
  (cubic	
  operaSon	
  count)	
  

	
  

10	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
FREQUENCY	
  RESPONSE	
  IMPLEMENTATION	
  
	
  LINEAR	
  SOLVER	
  SELECTION	
  STRATEGY	
  

!  Original	
  NX	
  Nastran	
  sparse	
  symmetric	
  solver	
  
‒  Spills	
  to	
  disk,	
  requires	
  minimal	
  memory	
  
‒  Minimizes	
  flops	
  by	
  uSlizing	
  symmetry	
  
‒  Takes	
  advantage	
  of	
  sparsity	
  

!  Improved	
  SMP	
  method	
  (system462=1	
  in	
  NXN9.0)	
  
‒  In	
  core,	
  based	
  on	
  LAPACK	
  	
  zsytrf/zsytrs
‒  Efficient	
  parallelizaSon	
  of	
   𝑛 𝑟𝑒𝑠𝑝	
  loop	
  
‒  Large	
  memory	
  requirements	
  

!  OpenCL	
  method	
  (to	
  appear	
  in	
  NXN9	
  MP)	
  
‒  In	
  core,	
  based	
  on	
  clMAGMA	
  zgesv (LU	
  factorizaSon)
‒  USlizing	
  GPU	
  for	
  best	
  performance	
  

11	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
FREQUENCY	
  RESPONSE	
  

	
  INITIAL	
  PERFORMANCE	
  COMPARISON	
  

!  Test	
  machine:	
  
‒  Magny-­‐Cours	
  2.1	
  GHz,	
  24	
  cores	
  
‒  32GB	
  memory	
  
‒  4GB	
  TahiS	
  GPU	
  

!  GPU	
  roughly	
  40%	
  faster	
  than	
  
	
  24-­‐way	
  SMP	
  
	
  
Model	
  

Modes	
  

e10k	
  

1785	
  

e20k	
  

3631	
  

e30k	
  

5576	
  

e40k	
  

2:24:00	
  
2:09:36	
  

serial	
  

1:55:12	
  

smp=8	
  

1:40:48	
  

smp=24	
  

1:26:24	
  

GPU	
  

1:12:00	
  
0:57:36	
  
0:43:12	
  

7646	
  

0:28:48	
  
0:14:24	
  
0:00:00	
  

12	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

e10k	
  

e20k	
  

e30k	
  

e40k	
  
FREQUENCY	
  RESPONSE	
  –	
  FURTHER	
  IMPROVEMENTS	
  
	
  SINGLE	
  PRECISION	
  ARITHMETIC	
  

!  Use	
  single	
  precision	
  on	
  GPU	
  for	
  improved	
  performance	
  
‒  Higher	
  flop	
  rate	
  (typically	
  4-­‐5	
  Smes)	
  
‒  Lower	
  memory	
  uSlizaSon	
  	
  
‒  (larger	
  dimension	
  problems	
  possible)	
  
‒  Beer	
  scaling	
  with	
  larger	
  systems	
  
‒  Single	
  precision	
  disadvantage:	
  lower	
  precision	
  
‒  Accuracy	
  acceptable	
  for	
  most	
  engineering	
  purposes	
  
‒  (largest	
  relaSve	
  error	
  of	
  ​10↑−5 )	
  

1	
  
Double	
  precision	
  
0.1	
  
0.01	
  
0.001	
  
0.0001	
  
0.00001	
  
0.000001	
  
0.0000001	
  
1E-­‐08	
  

13	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

Single	
  precision	
  
RelaSve	
  error	
  
FREQUENCY	
  RESPONSE	
  –	
  FURTHER	
  IMPROVEMENTS	
  
	
  SINGLE	
  PRECISION	
  ACCURACY	
  AND	
  PERFORMANCE	
  

!  40-­‐50%	
  reducSon	
  in	
  run	
  Sme	
  
0:17:17	
  

!  Largest	
  example	
  only	
  possible	
  in	
  single	
  precision	
  

0:14:24	
  

Double	
  
Single	
  

0:11:31	
  
0:08:38	
  

Model	
  

Modes	
  

e10k	
  

1785	
  

0:05:46	
  

e20k	
  

3631	
  

0:02:53	
  

e30k	
  

5576	
  

e40k	
  

7646	
  

e60k	
  

12088	
  

14	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

0:00:00	
  
e10k	
   e20k	
   e30k	
   e40k	
   e60k	
  
FREQUENCY	
  RESPONSE	
  –	
  FURTHER	
  IMPROVEMENTS	
  
	
  MATRIX	
  SUMMATION	
  ON	
  GPU	
  

!  Perform	
  addiSon	
  of	
  matrices	
  at	
  each	
  frequency	
  on	
  GPU	
  (assembly	
  step)	
  

𝐴=​ 𝐾↓ℎℎ +​ 𝜔↓𝑘 𝑖​ 𝐵↓ℎℎ −​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ 	
  
!  I.e.	
  store	
  ​ 𝐾↓ℎℎ ,  ​ 𝐵↓ℎℎ ,  ​ 𝑀↓ℎℎ 	
  in	
  GPU	
  buffers	
  and	
  sum	
  using	
  zaxpy/saxpy kernels:	
  
	
  
𝐴≔​ 𝐾↓ℎℎ 	
  
𝐴≔ 𝐴+​ 𝜔↓𝑘 𝑖  ​ 𝐵↓ℎℎ 	
  
𝐴≔ 𝐴−​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ 	
  

!  Minimizes	
  data	
  transfer	
  to/from	
  main	
  memory	
  
!  AddiSonal	
  GPU	
  memory	
  consumpSon	
  
15	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
FREQUENCY	
  RESPONSE	
  –	
  FURTHER	
  IMPROVEMENTS	
  
	
  MATRIX	
  SUMMATION	
  ON	
  GPU	
  PERFORMANCE	
  

!  Double	
  precision	
  best	
  result	
  (e30k):	
  
‒  Time	
  reduced	
  30%	
  from	
  6:52	
  to	
  4:50	
  
‒  2x	
  faster	
  than	
  best	
  CPU	
  Sme	
  

0:12:58	
  
0:11:31	
  
0:10:05	
  
0:08:38	
  

!  Single	
  precision	
  best	
  result	
  (e40k):	
  
‒  Time	
  reduced	
  22%	
  from	
  6:23	
  to	
  4:58	
  
‒  4x	
  faster	
  than	
  best	
  CPU	
  Sme	
  

0:07:12	
  

Double	
  
Double	
  +	
  zaxpy	
  
Single	
  
Single	
  +	
  caxpy	
  

0:05:46	
  
0:04:19	
  
0:02:53	
  

!  Best	
  scaling	
  with	
  largest	
  problems	
  
‒  Limited	
  by	
  GPU	
  memory	
  

0:01:26	
  
0:00:00	
  
e10k	
  

16	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

e20k	
  

e30k	
  

e40k	
  
MODAL	
  ANALYSIS	
  	
  
INTRO	
  /	
  FREQ	
  RESP	
  /	
  MODES	
  /	
  CONCLUSIONS	
  
MODAL	
  ANALYSIS	
  WITH	
  RDMODES	
  
	
  OVERVIEW	
  

!  RDMODES	
  –	
  proprietary	
  high-­‐performance	
  approximate	
  eigensolver	
  
!  Tuned	
  for	
  typical	
  customer	
  use	
  cases:	
  
‒  Larger	
  models	
  (10	
  million+	
  dofs)	
  
‒  Many	
  modes	
  (300+)	
  
‒  Accelerated	
  computaSon	
  when	
  few	
  output	
  dofs	
  required	
  
‒  Sufficient	
  accuracy	
  for	
  frequency	
  response	
  calculaSons	
  

!  Performance	
  up	
  to	
  20x	
  faster	
  than	
  Lanczos	
  
!  Demonstrated	
  DMP	
  scalability	
  to	
  2048	
  nodes	
  

18	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
MODAL	
  ANALYSIS	
  WITH	
  RDMODES	
  
	
  COST	
  BREAKDOWN	
  

!  RDMODES	
  method	
  comprised	
  of	
  mulSple	
  smaller	
  operaSons	
  –	
  five	
  areas	
  listed	
  below	
  
!  Costs	
  for	
  customer	
  benchmark:	
  

!  Dense	
  operaSons	
  good	
  candidates	
  for	
  GPU	
  
‒  FactorizaSon,	
  eigensoluSon	
  

19	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

Wall	
  ?me	
  

Sparse	
  factorizaSon	
  

18:40	
  

Dense	
  factorizaSon	
  

24:00	
  

Sparse	
  eigensoluSon	
  

9:33	
  

Dense	
  eigensoluSon	
  

‒  11	
  million	
  dofs	
  
‒  Shell	
  dominated	
  
‒  3000	
  modes	
  below	
  400	
  Hz	
  
‒  300	
  frequency	
  responses	
  

Opera?on	
  

65:00	
  

Reduced	
  (dense)	
  eigensoluSon	
   21:16	
  
Total	
  

250:06	
  
RDMODES	
  FACTORIZATION	
  
	
  CLASSIFICATION	
  

!  Fairly	
  large	
  quanSty	
  of	
  each	
  type	
  
!  Sparse	
  factorizaSons:	
  
‒  Typically	
  too	
  large	
  to	
  treat	
  efficiently	
  as	
  dense	
  
‒  NXN	
  mulSfrontal	
  solver	
  very	
  efficient	
  
‒  Efficient	
  sparse	
  soluSon	
  on	
  GPU	
  difficult	
  (acSve	
  research)	
  

!  Dense	
  factorizaSons:	
  
‒  Model	
  dependent,	
  typically	
  small	
  
‒  Symmetric	
  posiSve	
  definite,	
  may	
  use	
  clMAGMA	
  dposv	
  
‒  Candidate	
  for	
  GPU	
  

20	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
RDMODES	
  FACTORIZATION	
  

	
  DENSE	
  FACTORIZATION	
  COST	
  COMPARISON	
  

!  Dense	
  factorizaSon	
  wall	
  Smes	
  
‒  Costs	
  include	
  factorizaSon	
  and	
  miscellaneous	
  assembly	
  

Dense	
  factoriza?on	
  ?mes	
  
0:25:55	
  
0:23:02	
  

!  As	
  with	
  frequency	
  response,	
  GPU	
  suitable	
  above	
  

0:20:10	
  

	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  threshold	
  

NXN	
  

0:17:17	
  

‒  Threshold	
  of	
  5000	
  for	
  this	
  example	
  

!  Dense	
  in	
  core	
  methods	
  helpful	
  

LAPACK	
  
GPU	
  

0:14:24	
  
0:11:31	
  
0:08:38	
  
0:05:46	
  

!  GPU	
  ineffecSve	
  for	
  this	
  model	
  
‒  (all	
  linear	
  soluSons	
  relaSvely	
  small)	
  

0:02:53	
  
0:00:00	
  
Serial	
  

21	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

SMP=24	
  
RDMODES	
  EIGENSOLUTION	
  
	
  CLASSIFICATION	
  

!  Sparse	
  eigensoluSons:	
  
‒  Large	
  number	
  
‒  Sparse,	
  relaSvely	
  large	
  dimension	
  
‒  Inexpensive	
  with	
  NXN	
  sparse	
  eigensolvers	
  

!  Dense	
  eigensoluSons:	
  
‒  Large	
  number	
  
‒  Dense,	
  small-­‐medium	
  dimension	
  
‒  Candidate	
  for	
  GPU	
  

!  Reduced	
  eigensoluSon:	
  
‒  Only	
  one	
  instance	
  
‒  Dense,	
  fairly	
  large,	
  many	
  modes	
  
‒  Strong	
  candidate	
  for	
  GPU	
  

22	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
RDMODES	
  EIGENSOLUTION	
  
	
  DENSE	
  SOLUTION	
  METHODS	
  

!  Householder	
  type	
  soluSon	
  for	
  real	
  symmetric	
  problem	
  (dsyev):	
  
‒  Reduce	
  to	
  tridiagonal: 	
  	
  
‒  Eigenvalues	
  of	
  tridiagonal:
‒  Compute	
  eigenvectors:	
  
‒  Then
	
  
	
  

	
  	
  ​ 𝑄↑𝑇 𝐴𝑄= 𝑇	
  
	
  	
  ​ 𝑍↑𝑇 𝑇𝑍=Λ	
  
	
  	
  Φ= 𝑄𝑍	
  
	
  	
   𝐴Φ=ΦΛ	
  

!  Efficient	
  choice	
  for	
  dense	
  problems,	
  and/or	
  many	
  eigenvectors	
  needed	
  
‒  High	
  memory	
  consumpSon	
  

!  Transform	
  generalized	
  eigenvalue	
  problem	
  as	
  follows:	
  
‒  Factor:
	
  
	
  
‒  Solve:
	
  
	
  
‒  Generalized	
  eigensoluSon:

	
  	
   𝑀= 𝐿​ 𝐿↑𝑇 	
  
	
  	
  ​ 𝐿↑−1 𝐾​ 𝐿↑− 𝑇 𝑋= 𝑋Λ	
  
	
   𝐾(​ 𝐿↑− 𝑇 𝑋)= 𝑀(​ 𝐿↑− 𝑇 𝑋)Λ	
  

23	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
RDMODES	
  EIGENSOLUTION	
  
	
  DENSE	
  EIGENSOLUTION	
  SCALABILITY	
  

!  Dimensions	
  range	
  from	
  2800	
  to	
  8800	
  
‒  Dense	
  problems,	
  modes	
  variable	
  

!  GPU	
  beneficial	
  for	
  larger	
  sizes	
  
!  Total	
  Smes	
  (serial)	
  -­‐-­‐	
  50%	
  reducSon:	
  
‒  56:29
‒  15:30
‒  7:29

	
  (all	
  Lanczos)	
  
	
  (all	
  LAPACK)	
  
	
  (using	
  GPU)	
  

2:24:00	
  

Serial	
  

0:14:24	
  
0:01:26	
  
Lanczos	
  
LAPACK	
  
GPU	
  

0:00:09	
  
0:00:01	
  
2000	
  
2:24:00	
  

4000	
  

8000	
  

SMP=24	
  

0:14:24	
  
0:01:26	
  

!  Total	
  Smes	
  (SMP)	
  –	
  36%	
  reducSon:	
  	
  
‒  52:22
‒  4:41
‒  3:00

	
  (all	
  Lanczos)	
  
	
  (all	
  LAPACK)	
  
	
  (using	
  GPU)	
  

Lanczos	
  
LAPACK	
  
GPU	
  

0:00:09	
  
0:00:01	
  
2000	
  

24	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

4000	
  

8000	
  
RDMODES	
  EIGENSOLUTION	
  
	
  GPU	
  SUPPORT	
  

!  Householder	
  methods	
  well	
  suited	
  (as	
  expected)	
  
!  Larger	
  dimension	
  dense	
  problems	
  benefit	
  from	
  the	
  GPU	
  
‒  And	
  are	
  the	
  most	
  Sme	
  consuming	
  

!  Send	
  most	
  expensive	
  problems	
  to	
  GPU	
  
!  Threshold	
  set	
  to	
  3800	
  for	
  this	
  test	
  
‒  Note:	
  opSmal	
  threshold	
  depends	
  on	
  hardware	
  and	
  SMP	
  

25	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
RDMODES	
  EIGENSOLUTION	
  

	
  MOST	
  SIGNIFICANT	
  COST	
  COMPONENTS	
  

!  Reduced	
  eigensoluSon	
  
‒  Not	
  ideally	
  suited	
  to	
  NXN	
  Lanczos	
  eigensolver	
  
‒  Unique,	
  but	
  large	
  (14K	
  dofs)	
  
‒  Many	
  eigenvectors	
  needed	
  
‒  GPU	
  30%	
  speedup	
  (both	
  SMP	
  and	
  serial)	
  

!  GPU	
  in	
  RDMODES	
  conclusions	
  
‒  Dense	
  and	
  reduced	
  eigensoluSons	
  benefit	
  
‒  Threshold	
  for	
  dense	
  eigensoluSon	
  
‒  Dense	
  factorizaSon	
  benefits	
  from	
  LAPACK:	
  
	
  	
  	
  	
  lile	
  addiSonal	
  benefit	
  on	
  GPU	
  
	
  

!  Sparse	
  methods	
  not	
  supported	
  yet	
  

Reduced	
  Eigensolu?on	
  
0:57:36	
  

NXN	
  
LAPACK	
  
GPU	
  

0:50:24	
  
0:43:12	
  
0:36:00	
  
0:28:48	
  
0:21:36	
  
0:14:24	
  
0:07:12	
  
0:00:00	
  

	
  
	
  
26	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

Serial	
  

SMP=24	
  
RDMODES	
  AND	
  FREQUENCY	
  RESPONSE	
  
	
  BENCHMARK	
  PERFORMANCE	
  RESULTS	
  

!  SMP=24,	
  customer	
  benchmark	
  

8:24:00	
  

Frequency	
  
response	
  

7:12:00	
  

!  Compared	
  to	
  NXN	
  system:	
  
‒  Frequency	
  response	
  3x	
  faster	
  
‒  Reduced	
  eigensoluSon	
  2.8x	
  faster	
  
‒  FactorizaSon	
  28%	
  faster	
  
‒  Dense	
  eigensoluSon	
  9x	
  faster	
  
‒  30%	
  reducSon	
  in	
  total	
  run	
  Sme	
  

Reduced	
  
eigensoluSon	
  

6:00:00	
  
4:48:00	
  

Dense	
  
eigensoluSon	
  

3:36:00	
  

FactorizaSon	
  

2:24:00	
  
Other	
  

1:12:00	
  

!  Compared	
  to	
  LAPACK:	
  
‒  Frequency	
  response	
  3x	
  faster	
  
‒  Reduced	
  eigensoluSon	
  2x	
  faster	
  
‒  10%	
  reducSon	
  in	
  total	
  run	
  Sme	
  

0:00:00	
  

27	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

NXN	
  

LAPACK	
  

GPU	
  
RDMODES	
  EIGENSOLUTION	
  
	
  SINGLE	
  PRECISION	
  

!  Performance	
  advantages	
  with	
  single	
  precision	
  eigensoluSon	
  
‒  As	
  with	
  linear	
  soluSon	
  in	
  frequency	
  response,	
  single	
  precision	
  faster	
  on	
  GPU	
  
‒  Lower	
  GPU	
  memory	
  consumpSon	
  
‒  (larger	
  problems)	
  

!  Dense	
  eigensoluSons	
  (customer	
  benchmark)	
  –	
  35-­‐40%	
  speedup:	
  
Double	
  precision	
  

Single	
  precision	
  

7:01	
  

4:16	
  

SMP=24	
   3:41	
  

2:23	
  

Serial	
  

!  Reduced	
  eigensoluSon	
  also	
  benefits	
  –	
  20%	
  speedup:	
  
‒  3:05	
  to	
  2:29	
  

28	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
CONCLUSIONS	
  
INTRO	
  /	
  FREQ	
  RESP	
  /	
  MODES	
  /	
  CONCLUSIONS	
  
CONCLUSIONS	
  
	
  	
  

!  Significant	
  benefit	
  with	
  GPU	
  for	
  certain	
  computaSon	
  types	
  
‒  Frequency	
  response	
  calculaSon	
  2x-­‐3x	
  faster,	
  dense	
  eigensoluSon	
  2x	
  faster	
  
‒  AddiSonal	
  35-­‐50%	
  improvement	
  possible	
  with	
  single	
  precision	
  
‒  30%	
  lower	
  turnaround	
  Sme	
  for	
  typical	
  customer	
  benchmark	
  

!  Efficient	
  dense	
  matrix	
  algebra	
  on	
  GPU	
  with	
  clMath,	
  clMAGMA	
  
!  Many	
  thanks	
  to:	
  Ben-­‐Shan	
  Liao,	
  Wei	
  Zhang	
  (Siemens	
  PLM),	
  Antoine	
  Reymond	
  (AMD)	
  

Thank	
  you!	
  
	
  

30	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
DISCLAIMER	
  &	
  ATTRIBUTION	
  

The	
  informaSon	
  presented	
  in	
  this	
  document	
  is	
  for	
  informaSonal	
  purposes	
  only	
  and	
  may	
  contain	
  technical	
  inaccuracies,	
  omissions	
  and	
  typographical	
  errors.	
  
	
  
The	
  informaSon	
  contained	
  herein	
  is	
  subject	
  to	
  change	
  and	
  may	
  be	
  rendered	
  inaccurate	
  for	
  many	
  reasons,	
  including	
  but	
  not	
  limited	
  to	
  product	
  and	
  roadmap	
  
changes,	
  component	
  and	
  motherboard	
  version	
  changes,	
  new	
  model	
  and/or	
  product	
  releases,	
  product	
  differences	
  between	
  differing	
  manufacturers,	
  sotware	
  
changes,	
  BIOS	
  flashes,	
  firmware	
  upgrades,	
  or	
  the	
  like.	
  AMD	
  assumes	
  no	
  obligaSon	
  to	
  update	
  or	
  otherwise	
  correct	
  or	
  revise	
  this	
  informaSon.	
  However,	
  AMD	
  
reserves	
  the	
  right	
  to	
  revise	
  this	
  informaSon	
  and	
  to	
  make	
  changes	
  from	
  Sme	
  to	
  Sme	
  to	
  the	
  content	
  hereof	
  without	
  obligaSon	
  of	
  AMD	
  to	
  noSfy	
  any	
  person	
  of	
  
such	
  revisions	
  or	
  changes.	
  
	
  
AMD	
  MAKES	
  NO	
  REPRESENTATIONS	
  OR	
  WARRANTIES	
  WITH	
  RESPECT	
  TO	
  THE	
  CONTENTS	
  HEREOF	
  AND	
  ASSUMES	
  NO	
  RESPONSIBILITY	
  FOR	
  ANY	
  
INACCURACIES,	
  ERRORS	
  OR	
  OMISSIONS	
  THAT	
  MAY	
  APPEAR	
  IN	
  THIS	
  INFORMATION.	
  
	
  
AMD	
  SPECIFICALLY	
  DISCLAIMS	
  ANY	
  IMPLIED	
  WARRANTIES	
  OF	
  MERCHANTABILITY	
  OR	
  FITNESS	
  FOR	
  ANY	
  PARTICULAR	
  PURPOSE.	
  IN	
  NO	
  EVENT	
  WILL	
  AMD	
  BE	
  
LIABLE	
  TO	
  ANY	
  PERSON	
  FOR	
  ANY	
  DIRECT,	
  INDIRECT,	
  SPECIAL	
  OR	
  OTHER	
  CONSEQUENTIAL	
  DAMAGES	
  ARISING	
  FROM	
  THE	
  USE	
  OF	
  ANY	
  INFORMATION	
  
CONTAINED	
  HEREIN,	
  EVEN	
  IF	
  AMD	
  IS	
  EXPRESSLY	
  ADVISED	
  OF	
  THE	
  POSSIBILITY	
  OF	
  SUCH	
  DAMAGES.	
  
	
  
ATTRIBUTION	
  
©	
  2013	
  Advanced	
  Micro	
  Devices,	
  Inc.	
  All	
  rights	
  reserved.	
  AMD,	
  the	
  AMD	
  Arrow	
  logo	
  and	
  combinaSons	
  thereof	
  are	
  trademarks	
  of	
  Advanced	
  Micro	
  Devices,	
  
Inc.	
  in	
  the	
  United	
  States	
  and/or	
  other	
  jurisdicSons.	
  	
  SPEC	
  	
  is	
  a	
  registered	
  trademark	
  of	
  the	
  Standard	
  Performance	
  EvaluaSon	
  CorporaSon	
  (SPEC).	
  Other	
  
names	
  are	
  for	
  informaSonal	
  purposes	
  only	
  and	
  may	
  be	
  trademarks	
  of	
  their	
  respecSve	
  owners.	
  
31	
   |	
  	
  	
  FAST	
  MODAL	
  ANALYSIS	
  WITH	
  NX	
  NASTRAN	
  AND	
  GPUS	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

Weitere ähnliche Inhalte

Was ist angesagt?

MM-4099, Adapting game content to the viewing environment, by Noman Hashim
MM-4099, Adapting game content to the viewing environment, by Noman HashimMM-4099, Adapting game content to the viewing environment, by Noman Hashim
MM-4099, Adapting game content to the viewing environment, by Noman HashimAMD Developer Central
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering WorkflowTakahiro Harada
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsAMD Developer Central
 
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...Takahiro Harada
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauAMD Developer Central
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
 
WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...
WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...
WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...AMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandAMD Developer Central
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 

Was ist angesagt? (20)

MM-4099, Adapting game content to the viewing environment, by Noman Hashim
MM-4099, Adapting game content to the viewing environment, by Noman HashimMM-4099, Adapting game content to the viewing environment, by Noman Hashim
MM-4099, Adapting game content to the viewing environment, by Noman Hashim
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
 
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
 
WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...
WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...
WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
SEED - Halcyon Architecture
SEED - Halcyon ArchitectureSEED - Halcyon Architecture
SEED - Halcyon Architecture
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 

Andere mochten auch

Introduction to NX Nastran SOL 200 - Design Optimization
Introduction to NX Nastran SOL 200 - Design OptimizationIntroduction to NX Nastran SOL 200 - Design Optimization
Introduction to NX Nastran SOL 200 - Design OptimizationAswin John
 
Fabrication of composite leaf spring
Fabrication of composite  leaf springFabrication of composite  leaf spring
Fabrication of composite leaf springPratik Gandhi
 
Aeroelasticity in Femap and NX Nastran
Aeroelasticity in Femap and NX NastranAeroelasticity in Femap and NX Nastran
Aeroelasticity in Femap and NX NastranAswin John
 

Andere mochten auch (6)

Introduction to NX Nastran SOL 200 - Design Optimization
Introduction to NX Nastran SOL 200 - Design OptimizationIntroduction to NX Nastran SOL 200 - Design Optimization
Introduction to NX Nastran SOL 200 - Design Optimization
 
Composites/ prosthodontic courses
Composites/ prosthodontic coursesComposites/ prosthodontic courses
Composites/ prosthodontic courses
 
Fabrication of composite leaf spring
Fabrication of composite  leaf springFabrication of composite  leaf spring
Fabrication of composite leaf spring
 
Aeroelasticity in Femap and NX Nastran
Aeroelasticity in Femap and NX NastranAeroelasticity in Femap and NX Nastran
Aeroelasticity in Femap and NX Nastran
 
Composites
CompositesComposites
Composites
 
Composite resin
Composite resinComposite resin
Composite resin
 

Ähnlich wie PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...Spark Summit
 
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCMOptimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCMcloudSME
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Ieee 2015 2014 nexgen tech vlsi abstract
Ieee 2015 2014 nexgen  tech vlsi   abstractIeee 2015 2014 nexgen  tech vlsi   abstract
Ieee 2015 2014 nexgen tech vlsi abstractNexgen Technology
 
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...Nexgen Technology
 
Complete maxwell 2d - v15-Latest Version
Complete maxwell 2d - v15-Latest VersionComplete maxwell 2d - v15-Latest Version
Complete maxwell 2d - v15-Latest VersionRojan Mazouji
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...Edge AI and Vision Alliance
 
Nexgen tech vlsi 2015 2014
Nexgen  tech vlsi 2015 2014Nexgen  tech vlsi 2015 2014
Nexgen tech vlsi 2015 2014nexgentech
 
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
 BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I... BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...Nexgen Technology
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...KTN
 
VLSI IEEE Transaction 2018 - IEEE Transaction
VLSI IEEE Transaction 2018 - IEEE Transaction VLSI IEEE Transaction 2018 - IEEE Transaction
VLSI IEEE Transaction 2018 - IEEE Transaction Nxfee Innovation
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performances.rohit
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9inside-BigData.com
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelKoichi Shirahata
 

Ähnlich wie PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung (20)

Maxwell3 d
Maxwell3 dMaxwell3 d
Maxwell3 d
 
Maxwell3 d
Maxwell3 dMaxwell3 d
Maxwell3 d
 
Maxwell 3D
Maxwell 3DMaxwell 3D
Maxwell 3D
 
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
 
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCMOptimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Ieee 2015 2014 nexgen tech vlsi abstract
Ieee 2015 2014 nexgen  tech vlsi   abstractIeee 2015 2014 nexgen  tech vlsi   abstract
Ieee 2015 2014 nexgen tech vlsi abstract
 
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...
 
Complete maxwell 2d - v15-Latest Version
Complete maxwell 2d - v15-Latest VersionComplete maxwell 2d - v15-Latest Version
Complete maxwell 2d - v15-Latest Version
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurations
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
Group 10
Group 10Group 10
Group 10
 
Nexgen tech vlsi 2015 2014
Nexgen  tech vlsi 2015 2014Nexgen  tech vlsi 2015 2014
Nexgen tech vlsi 2015 2014
 
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
 BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I... BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
 
VLSI IEEE Transaction 2018 - IEEE Transaction
VLSI IEEE Transaction 2018 - IEEE Transaction VLSI IEEE Transaction 2018 - IEEE Transaction
VLSI IEEE Transaction 2018 - IEEE Transaction
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 

Mehr von AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 

Mehr von AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 

Kürzlich hochgeladen

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Kürzlich hochgeladen (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

  • 1. FAST  MODAL  ANALYSIS  WITH  NX   NASTRAN  AND  GPUS   LEONARD  HOFFNUNG   SIEMENS  PLM  SOFTWARE    
  • 2. INTRODUCTION   INTRO  /  FREQ  RESP  /  MODES  /  CONCLUSIONS  
  • 3. ABOUT  NX  NASTRAN       !  Industry  standard  finite  element  package  from  Siemens  PLM   !  Analysis  opSons  include:   ‒  Stress,  vibraSon,  structural  failure   ‒  Heat  transfer,  acousScs,  rotor  dynamics,  and  more   !  Advanced  numerical  capabiliSes  and  proven  scalability:   ‒  Problem  sizes  approaching  1  billion  dofs   ‒  SMP  to  24  cores   ‒  DMP  to  2048  nodes   3   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 4. MODAL  FREQUENCY  RESPONSE  OVERVIEW    NASTRAN  SOL  111   !  Bread  and  buer  industrial  computaSon:  modal  frequency  response   !  Widely  used  in  automoSve  &  aerospace  to  determine  response  under  varying  excitaSons   ‒  OpSmize  weight,  rigidity   ‒  Minimize  noise,  resonance   !  Two  phase  calculaSon  more  efficient  than  direct:   ‒  Modal  analysis   ‒  Frequency  response  calculaSon       4   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 5. MODAL  FREQUENCY  RESPONSE    COMPUTATIONAL  STEPS   !  EigensoluSon  -­‐-­‐  ℎ  normal  modes  of   𝑓× 𝑓  structural  matrices:   ​ 𝐾↓𝑓𝑓 ​Φ↓𝑓ℎ =​ 𝑀↓𝑓𝑓 ​Φ↓𝑓ℎ ​Λ↓ℎℎ    !  Frequency  response  -­‐-­‐  ℎ×ℎ  complex  linear  soluSon  at  each  of   𝑛 𝑟𝑒𝑠𝑝  frequencies:   (​ 𝐾↓ℎℎ +​ 𝜔↓𝑘 𝑖  ​ 𝐵↓ℎℎ −​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ )​ 𝑥↓𝑘 =​ 𝑏↓𝑘 ,     𝑘=1,…, 𝑛𝑟𝑒𝑠𝑝     !  All  parameters  large  in  typical  customer  usage:   ‒  𝑓-­‐size  10-­‐30M  for  model  fidelity   ‒  ℎ-­‐size  10-­‐60K  for  modal  accuracy   ‒  𝑛𝑟𝑒𝑠𝑝  20K  for  detailed  response  graph   5   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 6. PERFORMANCE  CASE  STUDY    PR  MODEL  –  FREQUENCY  RESPONSE  COST   !  Shell  dominated  SOL  111  model   ‒  245K  degrees  of  freedom  ( 𝑓-­‐size)   ‒  1200  eigenpairs  (ℎ-­‐size)   ‒  20K  frequency  responses    ( 𝑛𝑟𝑒𝑠𝑝)   !  EigensoluSon  Sme:  30  minutes   !  Frequency  response:  127  minutes   !  Frequency  response  cost   𝑂( 𝑛𝑟𝑒𝑠𝑝  ∗​ℎ↑3 )   ‒  EsSmated  run  Sme  in  decades  as  ℎ→60 𝐾   6   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 7. PERFORMANCE  CASE  STUDY    CUSTOMER  BENCHMARK   !  More  typical  industrial  model:   ‒  11  million  degrees  of  freedom  ( 𝑓-­‐size)   ‒  Shell  dominated  model   ‒  Approximately  3000  eigenpairs  (ℎ-­‐size)   ‒  300  frequency  responses  ( 𝑛𝑟𝑒𝑠𝑝)   !  Frequency  response  expensive,  but  modal  calculaSon  sSll  expensive  even  with  RDMODES:   ‒  Modal  calculaSon:  375  minutes   ‒  Frequency  response  Sme:  22  minutes   !  Need  to  improve  performance  in  both  phases   7   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 8. FREQUENCY  RESPONSE   INTRO  /  FREQ  RESP  /  MODES  /  CONCLUSIONS  
  • 9. FREQUENCY  RESPONSE  IMPLEMENTATION    DETAILS  OF  ORIGINAL  METHOD   !  NX  Nastran  implementaSon  uses  symmetric   𝐿 𝐷​ 𝐿↑𝑇   factorizaSon  and  forward-­‐backward  subsStuSon:        For   𝑘=1,…, 𝑛𝑟𝑒𝑠𝑝        Assemble   𝐴=​ 𝐾↓ℎℎ +​ 𝜔↓𝑘 𝑖​ 𝐵↓ℎℎ −​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ         Factor   𝐴= 𝐿𝐷​ 𝐿↑𝑇         Solve  ​ 𝑥↓𝑘 =​ 𝐴↑−1 ​ 𝑏↓𝑘 =​ 𝐿↑− 𝑇 ​ 𝐷↑−1 ​ 𝐿↑−1 ​ 𝑏↓𝑘       End  for     !  NX  Nastran  sparse  factorizaSon  difficult  to  adapt  to  GPU:   ‒   Disk  oriented   ‒  Tuned  for  sparse  matrices   ‒  Symmetric  pivoSng  required  for  stability  (indefiniteness)     9   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 10. FREQUENCY  RESPONSE  IMPLEMENTATION    DETAILS  OF  REVISED  METHOD   !  For  GPU  code,  use  LU  factorizaSon  instead:        For   𝑘=1,…, 𝑛𝑟𝑒𝑠𝑝        Assemble   𝐴=​ 𝐾↓ℎℎ +​ 𝜔↓𝑘 𝑖​ 𝐵↓ℎℎ −​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ         Factor   𝐴= 𝐿𝑈        Solve  ​ 𝑥↓𝑘 =​ 𝐴↑−1 ​ 𝑏↓𝑘 =​ 𝑈↑−1 ​ 𝐿↑−1 ​ 𝑏↓𝑘       End  for     !  OpenCL  port  of  LAPACK  zgesv  available  with  clMAGMA  and  clBLAS   ‒  In  core  storage   ‒  Dense  oriented  (okay  for  this  applicaSon)   ‒  Benefit  mainly  in  factorizaSon  step  (cubic  operaSon  count)     10   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 11. FREQUENCY  RESPONSE  IMPLEMENTATION    LINEAR  SOLVER  SELECTION  STRATEGY   !  Original  NX  Nastran  sparse  symmetric  solver   ‒  Spills  to  disk,  requires  minimal  memory   ‒  Minimizes  flops  by  uSlizing  symmetry   ‒  Takes  advantage  of  sparsity   !  Improved  SMP  method  (system462=1  in  NXN9.0)   ‒  In  core,  based  on  LAPACK    zsytrf/zsytrs ‒  Efficient  parallelizaSon  of   𝑛 𝑟𝑒𝑠𝑝  loop   ‒  Large  memory  requirements   !  OpenCL  method  (to  appear  in  NXN9  MP)   ‒  In  core,  based  on  clMAGMA  zgesv (LU  factorizaSon) ‒  USlizing  GPU  for  best  performance   11   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 12. FREQUENCY  RESPONSE    INITIAL  PERFORMANCE  COMPARISON   !  Test  machine:   ‒  Magny-­‐Cours  2.1  GHz,  24  cores   ‒  32GB  memory   ‒  4GB  TahiS  GPU   !  GPU  roughly  40%  faster  than    24-­‐way  SMP     Model   Modes   e10k   1785   e20k   3631   e30k   5576   e40k   2:24:00   2:09:36   serial   1:55:12   smp=8   1:40:48   smp=24   1:26:24   GPU   1:12:00   0:57:36   0:43:12   7646   0:28:48   0:14:24   0:00:00   12   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL   e10k   e20k   e30k   e40k  
  • 13. FREQUENCY  RESPONSE  –  FURTHER  IMPROVEMENTS    SINGLE  PRECISION  ARITHMETIC   !  Use  single  precision  on  GPU  for  improved  performance   ‒  Higher  flop  rate  (typically  4-­‐5  Smes)   ‒  Lower  memory  uSlizaSon     ‒  (larger  dimension  problems  possible)   ‒  Beer  scaling  with  larger  systems   ‒  Single  precision  disadvantage:  lower  precision   ‒  Accuracy  acceptable  for  most  engineering  purposes   ‒  (largest  relaSve  error  of  ​10↑−5 )   1   Double  precision   0.1   0.01   0.001   0.0001   0.00001   0.000001   0.0000001   1E-­‐08   13   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL   Single  precision   RelaSve  error  
  • 14. FREQUENCY  RESPONSE  –  FURTHER  IMPROVEMENTS    SINGLE  PRECISION  ACCURACY  AND  PERFORMANCE   !  40-­‐50%  reducSon  in  run  Sme   0:17:17   !  Largest  example  only  possible  in  single  precision   0:14:24   Double   Single   0:11:31   0:08:38   Model   Modes   e10k   1785   0:05:46   e20k   3631   0:02:53   e30k   5576   e40k   7646   e60k   12088   14   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL   0:00:00   e10k   e20k   e30k   e40k   e60k  
  • 15. FREQUENCY  RESPONSE  –  FURTHER  IMPROVEMENTS    MATRIX  SUMMATION  ON  GPU   !  Perform  addiSon  of  matrices  at  each  frequency  on  GPU  (assembly  step)   𝐴=​ 𝐾↓ℎℎ +​ 𝜔↓𝑘 𝑖​ 𝐵↓ℎℎ −​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ    !  I.e.  store  ​ 𝐾↓ℎℎ ,  ​ 𝐵↓ℎℎ ,  ​ 𝑀↓ℎℎ   in  GPU  buffers  and  sum  using  zaxpy/saxpy kernels:     𝐴≔​ 𝐾↓ℎℎ    𝐴≔ 𝐴+​ 𝜔↓𝑘 𝑖  ​ 𝐵↓ℎℎ    𝐴≔ 𝐴−​ 𝜔↓𝑘↑2 ​ 𝑀↓ℎℎ    !  Minimizes  data  transfer  to/from  main  memory   !  AddiSonal  GPU  memory  consumpSon   15   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 16. FREQUENCY  RESPONSE  –  FURTHER  IMPROVEMENTS    MATRIX  SUMMATION  ON  GPU  PERFORMANCE   !  Double  precision  best  result  (e30k):   ‒  Time  reduced  30%  from  6:52  to  4:50   ‒  2x  faster  than  best  CPU  Sme   0:12:58   0:11:31   0:10:05   0:08:38   !  Single  precision  best  result  (e40k):   ‒  Time  reduced  22%  from  6:23  to  4:58   ‒  4x  faster  than  best  CPU  Sme   0:07:12   Double   Double  +  zaxpy   Single   Single  +  caxpy   0:05:46   0:04:19   0:02:53   !  Best  scaling  with  largest  problems   ‒  Limited  by  GPU  memory   0:01:26   0:00:00   e10k   16   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL   e20k   e30k   e40k  
  • 17. MODAL  ANALYSIS     INTRO  /  FREQ  RESP  /  MODES  /  CONCLUSIONS  
  • 18. MODAL  ANALYSIS  WITH  RDMODES    OVERVIEW   !  RDMODES  –  proprietary  high-­‐performance  approximate  eigensolver   !  Tuned  for  typical  customer  use  cases:   ‒  Larger  models  (10  million+  dofs)   ‒  Many  modes  (300+)   ‒  Accelerated  computaSon  when  few  output  dofs  required   ‒  Sufficient  accuracy  for  frequency  response  calculaSons   !  Performance  up  to  20x  faster  than  Lanczos   !  Demonstrated  DMP  scalability  to  2048  nodes   18   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 19. MODAL  ANALYSIS  WITH  RDMODES    COST  BREAKDOWN   !  RDMODES  method  comprised  of  mulSple  smaller  operaSons  –  five  areas  listed  below   !  Costs  for  customer  benchmark:   !  Dense  operaSons  good  candidates  for  GPU   ‒  FactorizaSon,  eigensoluSon   19   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL   Wall  ?me   Sparse  factorizaSon   18:40   Dense  factorizaSon   24:00   Sparse  eigensoluSon   9:33   Dense  eigensoluSon   ‒  11  million  dofs   ‒  Shell  dominated   ‒  3000  modes  below  400  Hz   ‒  300  frequency  responses   Opera?on   65:00   Reduced  (dense)  eigensoluSon   21:16   Total   250:06  
  • 20. RDMODES  FACTORIZATION    CLASSIFICATION   !  Fairly  large  quanSty  of  each  type   !  Sparse  factorizaSons:   ‒  Typically  too  large  to  treat  efficiently  as  dense   ‒  NXN  mulSfrontal  solver  very  efficient   ‒  Efficient  sparse  soluSon  on  GPU  difficult  (acSve  research)   !  Dense  factorizaSons:   ‒  Model  dependent,  typically  small   ‒  Symmetric  posiSve  definite,  may  use  clMAGMA  dposv   ‒  Candidate  for  GPU   20   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 21. RDMODES  FACTORIZATION    DENSE  FACTORIZATION  COST  COMPARISON   !  Dense  factorizaSon  wall  Smes   ‒  Costs  include  factorizaSon  and  miscellaneous  assembly   Dense  factoriza?on  ?mes   0:25:55   0:23:02   !  As  with  frequency  response,  GPU  suitable  above   0:20:10                      threshold   NXN   0:17:17   ‒  Threshold  of  5000  for  this  example   !  Dense  in  core  methods  helpful   LAPACK   GPU   0:14:24   0:11:31   0:08:38   0:05:46   !  GPU  ineffecSve  for  this  model   ‒  (all  linear  soluSons  relaSvely  small)   0:02:53   0:00:00   Serial   21   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL   SMP=24  
  • 22. RDMODES  EIGENSOLUTION    CLASSIFICATION   !  Sparse  eigensoluSons:   ‒  Large  number   ‒  Sparse,  relaSvely  large  dimension   ‒  Inexpensive  with  NXN  sparse  eigensolvers   !  Dense  eigensoluSons:   ‒  Large  number   ‒  Dense,  small-­‐medium  dimension   ‒  Candidate  for  GPU   !  Reduced  eigensoluSon:   ‒  Only  one  instance   ‒  Dense,  fairly  large,  many  modes   ‒  Strong  candidate  for  GPU   22   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 23. RDMODES  EIGENSOLUTION    DENSE  SOLUTION  METHODS   !  Householder  type  soluSon  for  real  symmetric  problem  (dsyev):   ‒  Reduce  to  tridiagonal:     ‒  Eigenvalues  of  tridiagonal: ‒  Compute  eigenvectors:   ‒  Then        ​ 𝑄↑𝑇 𝐴𝑄= 𝑇      ​ 𝑍↑𝑇 𝑇𝑍=Λ      Φ= 𝑄𝑍       𝐴Φ=ΦΛ   !  Efficient  choice  for  dense  problems,  and/or  many  eigenvectors  needed   ‒  High  memory  consumpSon   !  Transform  generalized  eigenvalue  problem  as  follows:   ‒  Factor:     ‒  Solve:     ‒  Generalized  eigensoluSon:     𝑀= 𝐿​ 𝐿↑𝑇       ​ 𝐿↑−1 𝐾​ 𝐿↑− 𝑇 𝑋= 𝑋Λ     𝐾(​ 𝐿↑− 𝑇 𝑋)= 𝑀(​ 𝐿↑− 𝑇 𝑋)Λ   23   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 24. RDMODES  EIGENSOLUTION    DENSE  EIGENSOLUTION  SCALABILITY   !  Dimensions  range  from  2800  to  8800   ‒  Dense  problems,  modes  variable   !  GPU  beneficial  for  larger  sizes   !  Total  Smes  (serial)  -­‐-­‐  50%  reducSon:   ‒  56:29 ‒  15:30 ‒  7:29  (all  Lanczos)    (all  LAPACK)    (using  GPU)   2:24:00   Serial   0:14:24   0:01:26   Lanczos   LAPACK   GPU   0:00:09   0:00:01   2000   2:24:00   4000   8000   SMP=24   0:14:24   0:01:26   !  Total  Smes  (SMP)  –  36%  reducSon:     ‒  52:22 ‒  4:41 ‒  3:00  (all  Lanczos)    (all  LAPACK)    (using  GPU)   Lanczos   LAPACK   GPU   0:00:09   0:00:01   2000   24   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL   4000   8000  
  • 25. RDMODES  EIGENSOLUTION    GPU  SUPPORT   !  Householder  methods  well  suited  (as  expected)   !  Larger  dimension  dense  problems  benefit  from  the  GPU   ‒  And  are  the  most  Sme  consuming   !  Send  most  expensive  problems  to  GPU   !  Threshold  set  to  3800  for  this  test   ‒  Note:  opSmal  threshold  depends  on  hardware  and  SMP   25   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 26. RDMODES  EIGENSOLUTION    MOST  SIGNIFICANT  COST  COMPONENTS   !  Reduced  eigensoluSon   ‒  Not  ideally  suited  to  NXN  Lanczos  eigensolver   ‒  Unique,  but  large  (14K  dofs)   ‒  Many  eigenvectors  needed   ‒  GPU  30%  speedup  (both  SMP  and  serial)   !  GPU  in  RDMODES  conclusions   ‒  Dense  and  reduced  eigensoluSons  benefit   ‒  Threshold  for  dense  eigensoluSon   ‒  Dense  factorizaSon  benefits  from  LAPACK:          lile  addiSonal  benefit  on  GPU     !  Sparse  methods  not  supported  yet   Reduced  Eigensolu?on   0:57:36   NXN   LAPACK   GPU   0:50:24   0:43:12   0:36:00   0:28:48   0:21:36   0:14:24   0:07:12   0:00:00       26   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL   Serial   SMP=24  
  • 27. RDMODES  AND  FREQUENCY  RESPONSE    BENCHMARK  PERFORMANCE  RESULTS   !  SMP=24,  customer  benchmark   8:24:00   Frequency   response   7:12:00   !  Compared  to  NXN  system:   ‒  Frequency  response  3x  faster   ‒  Reduced  eigensoluSon  2.8x  faster   ‒  FactorizaSon  28%  faster   ‒  Dense  eigensoluSon  9x  faster   ‒  30%  reducSon  in  total  run  Sme   Reduced   eigensoluSon   6:00:00   4:48:00   Dense   eigensoluSon   3:36:00   FactorizaSon   2:24:00   Other   1:12:00   !  Compared  to  LAPACK:   ‒  Frequency  response  3x  faster   ‒  Reduced  eigensoluSon  2x  faster   ‒  10%  reducSon  in  total  run  Sme   0:00:00   27   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL   NXN   LAPACK   GPU  
  • 28. RDMODES  EIGENSOLUTION    SINGLE  PRECISION   !  Performance  advantages  with  single  precision  eigensoluSon   ‒  As  with  linear  soluSon  in  frequency  response,  single  precision  faster  on  GPU   ‒  Lower  GPU  memory  consumpSon   ‒  (larger  problems)   !  Dense  eigensoluSons  (customer  benchmark)  –  35-­‐40%  speedup:   Double  precision   Single  precision   7:01   4:16   SMP=24   3:41   2:23   Serial   !  Reduced  eigensoluSon  also  benefits  –  20%  speedup:   ‒  3:05  to  2:29   28   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 29. CONCLUSIONS   INTRO  /  FREQ  RESP  /  MODES  /  CONCLUSIONS  
  • 30. CONCLUSIONS       !  Significant  benefit  with  GPU  for  certain  computaSon  types   ‒  Frequency  response  calculaSon  2x-­‐3x  faster,  dense  eigensoluSon  2x  faster   ‒  AddiSonal  35-­‐50%  improvement  possible  with  single  precision   ‒  30%  lower  turnaround  Sme  for  typical  customer  benchmark   !  Efficient  dense  matrix  algebra  on  GPU  with  clMath,  clMAGMA   !  Many  thanks  to:  Ben-­‐Shan  Liao,  Wei  Zhang  (Siemens  PLM),  Antoine  Reymond  (AMD)   Thank  you!     30   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  
  • 31. DISCLAIMER  &  ATTRIBUTION   The  informaSon  presented  in  this  document  is  for  informaSonal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.     The  informaSon  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap   changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  sotware   changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obligaSon  to  update  or  otherwise  correct  or  revise  this  informaSon.  However,  AMD   reserves  the  right  to  revise  this  informaSon  and  to  make  changes  from  Sme  to  Sme  to  the  content  hereof  without  obligaSon  of  AMD  to  noSfy  any  person  of   such  revisions  or  changes.     AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY   INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.     AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE   LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION   CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.     ATTRIBUTION   ©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combinaSons  thereof  are  trademarks  of  Advanced  Micro  Devices,   Inc.  in  the  United  States  and/or  other  jurisdicSons.    SPEC    is  a  registered  trademark  of  the  Standard  Performance  EvaluaSon  CorporaSon  (SPEC).  Other   names  are  for  informaSonal  purposes  only  and  may  be  trademarks  of  their  respecSve  owners.   31   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL