4. 1x 2x 4x 8x 16x 32x
IPC
%
Uplift
(Linear
Scale)
L3 Cache Size
28nm 16nm 10nm 7nm 5nm
Analog
SRAM
Silicon Area Scaling by Function
Logic
* Naffziger, VLSI Short Course, 2020
But SRAM don’t scale as fast as
logics
*
*
* J. Wuu , ISSCC,
2022.
5. CHIPLET-1
CHIPLET-2
CHIPLET-3
LATENCY OF
CHIPLET-TO-
CHIPLET CHANNELS
(ns)
ENERGY PER UNIT
BIT TRANSFERRED
BETWEEN
CHIPLETS (pJ/bit)
THERMAL DENSITY &
EFFECTIVENESS OF
HEAT EXTRACTION
(Qja: C/W)
COST OF
PACKAGING ($)
EFFICIENT
POWER/GROUND
DISTRIBUTION (Z)
TESTABILITY
B.W. DENSITY OF
CHIPLET-TO-
CHIPLET DATA
COMMUNICATION
(Gb/sec/sq.mm)
RELIABILITY
SUPPORT SPEED
MATCHING OF
CHIPLETS
8. Die 2
Organic Substrate
Die 1 Die 2
C4 bumps
micro bumps
Die 1 Die 1
Die 1
Die 2
Die 1
EFB
Cu-Cu
Interconnect
s
9. Based on AMD engineering internal analysis, May 2021 See endnotes.
* R. Swaminathan,” Hot chips Tutorials, HC33,
2021.
10. • Face to back stacking approach used for seamless inter-portability of bottom die
design.
• Chip on Wafer stacking scheme used to enable different die sizes in the stack.
X3D
CC
CC
CC
CC
CC
CC
CC
CC
X3D
CCD Die
Packaging without 3D stacking Packaging with 3D stacking
12. *L. Su, “High-Performance Computing: Services and Products Essential to our Daily Lives” Computex, 2021.
13. “Zen 3” x86-64 CPU Core Complex Die (CCD)
• TSMC 7nm technology
• 8 cores per Core Complex (CCX)
• 32MB shared L3 Cache
• 81mm2
• AMD 3D V-Cache support integrated from Day 1
AMD 3D V-Cache extended L3 Die (L3Die)
• TSMC 7nm Technology
• 64MB L3 Cache Extension
• 41mm2
AMD 3D V-Cache Structural Dies
• Structural support for thinned CCD
• Thermal dissipation for CPU cores
14. Top
Die
Bottom
Die
M11
M12 M12
M11 M11
M12
Via
M13
M13 M13
Al
Al
T
S
V
Silicon
BPM
Die
Interface
• TSMC SoIC process
• Cu-Cu Hybrid bonding using Bond Pad Metal (BPM)
• TSV pitch = Hybrid bond pitch
• Bond Pad Via (BPV) connects BPM to M13
• Die to Wafer bonding process
• Face to back integration scheme
• 9um minimum TSV pitch
• MCM Package with C4 bump attach to substrate
X3D
15. Cross-section showing HB interface after 1000hrs of HTOL
Cu-Cu inter-diffused interface is ultra robust
Successfully passed various JEDEC specific package level reliability tests
17. AMD 3D V-Cache™ supports L3 Cache extension for both server and desktop product families
AMD 3rd Gen EPYC™
Server CPU
AMD RYZEN™ 7 5800X3D Gaming
CPU
18. ~15% faster gaming at 1080p high
UP TO
1.36X UP TO
1.24X
UP TO
1.21X
UP TO
1.16X
UP TO
1.09X TIE
AMD RYZENTM 9
5900X
AMD RYZENTM 7
5800X3D WITH AMD 3D V-CACHE™
AMD RYZEN™ 7 5800X3D WITH AMD 3D V-CACHE™
SEE ENDNOTES: R5K-106
Watch Dogs®… Far Cry® 6 Gears 5TM
Final FantasyTM XIV Shadow of the… CS:GOTM
19. World’s fastest gaming processor
UP TO
1.17X UP TO
1.08X
UP TO
1.06X
UP TO
1.01X
UP TO
0.98X
UP TO
TIE
CORE i9
12900K
AMD RYZENTM 7
5800X3D WITH AMD 3D V-CACHE™
AMD RYZEN™ 7 5800X3D WITH AMD 3D V-CACHE™
SEE ENDNOTES: R5K-107
Watch Dogs®…
Far Cry® 6 Gears 5TM
Final FantasyTM XIV Shadow of the… CS:GOTM
20. 3RD GEN AMD EPYC™ 16-CORE
WITH AMD 3D V-CACHE™
JOBS/HOUR
40.6
JOBS/HOUR
24.4
3RD GEN AMD EPYC™ 16-CORE
WITHOUT AMD 3D V-CACHE™
~66%
FASTER RTL
VERIFICATION
SYNOPSYS® VCS®
RESULTS MAY VARY. SEE ENDNOTES: MLNX-001R
It is generally well known that larger L3 cache significantly improves IPC (Instructions per cycle) for a given device. As as chart from John Wuu shows this relationship between IPC uplift and L3 cache is pretty Linear.
However, L3 cache or SRAMs they don’t scale as fast as logics with infliction point accruing at 10nm node. To get larger on die SRAMs/L3 caches, SOC die size has to grow significantly.
So the question was how to achieve this IPC uplift in our devices without increasing SOC footprint. We started looking at various packging platforms which can be used to put an L3 cache die on top of existing die and provide the right PPAC (power, performance, area and cost) advantage.
When we started looking at packaging platforms to get larger L3 cache without significant increase in SOC die size, there were various figure of merits which were considered. These are generally true with any heterogeneous integration concepts.
Things like thermal conductivity/heat extraction, energy efficiency, Low latency, B/W, PDN, speed matching, KGD/testability, cost and reliability were looked at for various packaging platforms we have available to us.
So the platform we choose was 3d fabric technology from TSMC
This platform enables a novel process called hybrid bonding, and Extended L3 V-cache is enabled and developed in deep partnership with TSMC, on this platform.
Hybrid bonding is fundamentally a 2-phase bonding approach, where the initial bond is created between dielectric layers via Vanderwaal forces. At this stage Cu-Cu contact is not made. Actual interconnects are made at the 2nd stage when the assembly is anneled causing solid state diffusion and forming Cu-Cu bonds.
This chart show the interconnect pitch differences between 2D/C4 architectures (at 130u bump pitch) 2.5D micro-bump architectures (at 50u bump pitch) 3D TSV based hybrid bond interconnects (at 9u bump pitch)
Due to this higher packing density using TSVs and hybrid bonding, we can achieve 3x improved in interconnect energy efficiency relative to micro-bump architectures as well, providing the best PPAC benefits to enable highest performance computing products.
Now, we compare AMD’s 3D V-Cache technology with the current best in class micro-bump 3D architecture.
Solder based Micro-bump technology with tall TSVs is based on traditional solder-based packaging technologies and can scale from 50u to 36u (maybe a bit lower) and is Ok for specific applications.
AMD’s 3D chiplet architecture, by contrast, uses Silicon Fab like manufacturing methods with back-end design rule based TSVs with Cu only interconnects without the presence of Solder.
This is a transformational point in the industry’s advanced packaging journey, where interconnect technologies are now being enabled using Silicon Fab based techniques.
As a result of the extreme scaling, we are able to achieve >3x higher interconnect energy efficiency, >16x higher interconnect density as well as better signal and power performance compared to micro bump 3D architectures.
Now, we know HB can enable better energy efficient interconnects, offer higher B/W and other performance advantages,
To actually enable it in our product, one of the key bounding box for this development was to enable a technology with existing designs and act as a drop-in solution.
This interportability between products was critical for the success of this technology. We were able to achieve this critical milestone by enabling F2B die stacking scheme. Only difference between a stacked die product and a non-stacked die product is the presence of TSVs and stacked dies.
We also enabled CoW stacking scheme to get all the advantages of KGDs and have flexibility on die sizes.
The 3D chiplet architecture has been carefully engineered to enable the highest bandwidth at the lowest silicon area using the direct Cu to Cu hybrid bonding technology + TSVs for the die-to-die communication.
The architecture and silicon floor plan was also carefully engineered to enable optimized thermal performance. Thermally aware floorplanning of the product enabled us to put 3D 64MB SRAM over the SRAM cells of the CCD, to keep thermal density low (over just L3 on CCD, and avoided overlapping on the CCD)
We enabled structural silicon to support thin dies and also allow for heat escape from the higher density cores or CCDs, illustrating how 3D can be done in a thermally friendly manner
This slide shows details of various components which form part of 3D V-Cache platform
1st demonstration of this technology is done on Zen 3 x86-64 CPU CCD. CCDs are fabricated using TSMC’s 7nm platform and as I said earlier 3D V-cache support was integrated from Day 1 of Zen 3 architecture.
Cache die is also fabricated on TSMC’s n7 node giving us a extra 64MB cache extension
As mentioned earlier, this is enabled using Foundry BEOL based design rules, enabling THE best in class interconnect densities and PPAC over any other commercially existing platform.