1. Preliminary XSX fact finding
(it is not about 100% correctness,
it is about journey for connecting the dot), so dont use this as a fact!!!
plus RDNA1 Slide and Cache data size
by @blueisviolet
2. AMD Slide / Navi Fact
In Navi, two Compute Engines form a Workgroup Processor, and five of those form an
Asynchronous Compute Engine (ACE)
important link RDNA1 (Navi10)
https://www.amd.com/system/files/documents/rdna-
whitepaper.pdf
https://gpuopen.com/wp-
content/uploads/2019/08/RDNA_Architecture_public.pdf
https://gpuopen.com/rdna-shader-instruction-set-architecture-
document-now-available/
3. AMD Slide / Navi Fact
Some interesting of per SIMD32 block
- VGPR (128KB)
- SGPR (10KB)
also hmmm so ALU provide: x32 ALU + DP unit x2 + Transcedental? unit x8
32KB I$ for 4 SPU ( 2 WGP)
16KB K$ (data cache) per 2 WGP
4. AMD Slide / Navi Fact
In Navi, two Compute Engines (CU) form a Workgroup Processor, and five of those form
an Asynchronous Compute Engine (ACE)
7. Zoomed, in normal CU, the 4 5 6 7 usually different
it is like 10 Group, instead 4 like Navi, it is 6, what we know it seems
per WGP can be customized, also remember X1 Hotchip
on higher level diagram they showed as per 6 CU
8. Another thing is from Github, it is said 3 GDS and 6
LDS per group. Also provided X1 Hotchip slide (per 6)