Weitere ähnliche Inhalte Ähnlich wie Open power topics20191023 (20) Mehr von Yutaka Kawai (20) Kürzlich hochgeladen (20) Open power topics201910231. BL .VL BRN R S Q E5 S Q31 9 VWS
S Q E5 6RWQ GVMRQ r
1 AWPPMV . u
H YI I I IP
2. © 2019 IBM Corporation
2
. L
U S D3 4T SJGXOTS roep
U S D3 @ RROX fx
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
• U S @/ 3 U VOR SXW
U S1/ roep
• @ / roep
• D3 , 4OSGQ /JJOXOTS
• U S1/
4. OpenPOWER Foundation Mission
“Through the growing open ecosystem of the POWER Architecture and its associated technologies, the
OpenPOWER Foundation facilitates its Members to share expertise, investment and intellectual property
to serve the evolving needs of all end users.”
Artificial Intelligence
Custom Hyperscale
Data Centers
Hybrid Cloud
Open Solutions
IT consumption models
are expanding
Price/Performance
Full system stack innovation required
Moore’s Law
Technology and
Processors
2000 2020
Firmware / OS
Accelerators
Software
Storage
Network
Full Stack
Acceleration
(Lower is
better)
IT innovation can no longer come
from just the processor
6. Software
Implementation / HPC / Research
Chip / SOC
I/O / Storage / Acceleration
Boards / Systems
System / Integration
350+
Members
35
Countries
80+
ISVs
A Revolution Looks Like © 2018 OpenPOWER Foundation
7. OpenPOWER Ecosystem - breadth of solutions
Sub USD$1,500 developer systems to TCO competitive server solutions right up to world’s fastest
supercomputers. All part of the OpenPOWER Ecosystem, all run open software stacks firmware to apps.
10. OpenPOWER Announcements at North American Summit - II
● IBM releases proof of concept POWER ISA Compliant FPGA Soft Core
○ Allows anyone to experiment with the POWER ISA - researchers to hobbyists, chip
manufacturers to hardware accelerator vendors
○ Micropython port also announced
○ Within just two weeks already additional FPGAs supported
○ Zephyr IoT kernel support under development
○ Linux Kernel support expected by year end
Refer to :
https://openpowerfoundation.org/the-next-step-in-the-openpower-foundation-journey/
POWER Instruction Set Architecture (ISA)
12. © 2019 IBM Corporation
12
. L
U S D3 4T SJGXOTS roep
U S D3 @ RROX fx
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
• U S @/ 3 U VOR SXW
U S1/ roep
• @ / roep
• D3 , 4OSGQ /JJOXOTS
• U S1/
13. © 2019 IBM Corporation
13
1F 218 4 CC I
U S D3 @ RROX t c d
• B@ /VINOX IX V 3B 0 WOS WW 1NOSG 3SMOS VOSM s) g y
• t 8GUGS w
s U S D3 @ RROX B@ t 9OS 4T SJGXOTS
• % , ( .@GS 2O MT
e
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
− NXXUW-%%WXGXOI WIN J ITR%NTWX JFLOQ W%TU SUT VSG ,%)J% TWXMV @ 9 ( TS ( 1/ V M ( ( @ ( ( , UJL
• U S @/ 3 U VOR SXW
− NXXUW-%%WXGXOI WIN J ITR%NTWX JFLOQ W%TU SUT VSG ,% %0QGSINGVJF ( UJL
14. © 2019 IBM Corporation
14
© 2018 IBM Corporation | IBM Confidential 3
PostgreSQL Accelerated with Regular Expression Matching
§ Two ways for user interface:
§ UDF (User Defined Function)
§ PostgreSQL Hooks/Plugins (Standard SQL)
SELECT psql_regex_capi(table, pattern, attr_id);
SELECT * FROM table WHERE pkt ~ pattern;
2. 1F 2. 9I 2EHI /
15. © 2019 IBM Corporation
15
2. 1F 2. 9I 2EHI /
© 2018 IBM Corporation | IBM Confidential 4
Overall Architecture: Multi-Threading and Multi-Engine
Host Memory User Space)
Buffer Cache
Packet Buffer 0
Packet
…
Packet
FPGA
Result 0
Result 1
…
Result N
…
PostgreSQL Query
Results
Query results to
DB clients
A
X
I
I
n
t
e
r
c
o
n
n
e
c
t
Job Manager
Job Queue
Packet Buffer
Pointer 0
…
Packet Buffer
Pointer N
Result Buffer
Packet transfer via
CAPI/OpenCAPI
with Virtual
Address Pointers
Tuples
Tuples
Page
Thread N
memcpy
memcpy
……
Local Configuration Bus
Storage
Pages
Pages
…
Pages
Pages
Relations
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
Tuples
Tuples
Page
Thread 0
Pages
Pages
…
Pages
Pages
Relation 0
Relation N
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
M RegEx 0
M RegEx 1
M …
M RegEx N
General
Query
M
M Others
M
Packet Buffer N
Packet
…
Packet
FPGA Modules under construction
FPGA Engines / User space buffers for CAPI
PostgreSQL internal data structures
AXI/PSL
Bridge or
AXI/TLX
Bridge
16. © 2019 IBM Corporation
16
© 2018 IBM Corporation | IBM Confidential 9
Performance Evaluation: Environment Setups
Host Server Romulus, 2-socket, POWER 9 22 cores, 512GB memory, normal SATA hard disk
FPGA Card Xilinx VU9P
CAPI CAPI 2.0 + SNAP 2.0
CAPI-Regex 8 16X1 engines, 16 packet pipelines per engine; 2 64X1 engines, 64 pipelines per engine. All running @225Mhz
PostgreSQL
Version 11.2, complied from source
shared_buffers: the amount of memory the
database server uses for shared memory buffers
4GB, 1GB
max_worker_processes: the maximum number
of background processes that the system can
support.
176 (the max value supported by Romulus)
max_parallel_workers: the maximum
number of workers that the system can support
for parallel queries
176 (the max value supported by Romulus)
Queries CAPI-Regex in UDF mode SELECT * FROM table WHERE pkt ~ pattern
Test Table
Table Type Synthetic Tables
Table Schema
Each row contains 2 columns (ID, packet); packet is 1024-byte
random string
Table Size Number of rows varies between tables
2. 1F 2. 9I 2EHI /
17. © 2019 IBM Corporation
17
© 2018 IBM Corporation | IBM Confidential 10
Performance Comparison with CPU
ü CPU version rans as it is; CAPI version runs with 8 threads on 8 16X1 regex engines with the optimal number of jobs per thread
ü CAPI-regex can be ~x5 to ~x10 faster than the best PostgreSQL built-in functions (CPU multi-threads enabled)
ü Max 4 threads are enabled for CPU multi-threading when table size larger than 128000
ü Buffer cache size can impact CPU version but not too much on CAPI version
512k 256k 128k 64k 32k 16k 8k 4k
regex_capi BC 1GB 0.23 0.25 0.06 0.06 0.07 0.07 0.09 0.15
regex_capi BC 4GB 0.20 0.25 0.06 0.05 0.06 0.07 0.09 0.14
CPU BC 1GB 1.03 1.02 1.00 0.92 1.00 1.05 1.08 1.07
CPU BC 4GB 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
0.00
0.20
0.40
0.60
0.80
1.00
1.20
RelativeQueryTime
Table Size (Number of 1024-Byte Lines)
Query Time Comparison Between CAPI-regex and CPU
regex_capi BC 1GB regex_capi BC 4GB CPU BC 1GB CPU BC 4GB
2. 1F 2. 9I 2EHI /
18. © 2019 IBM Corporation
18
1F . LF C IH
:OIVT GXX
• AOS D3 ITV
• U S D3 @/ WIGQGV LO J UTOSX W HW X
• DVOXX S OS C 29 (
− MNJQ U S @T VI WOR QGXOTS XTTQW
− EOQOS CO GJT LTV 4 / W SXN WOW
• NXXUW-%%MOXN H ITR%GSXTSHQGSINGVJ%ROIVT GXX
19. © 2019 IBM Corporation
19
1F . LF C IH
:OIVTU XNTS
• @RGQQ RH JJ J XNTS OSX VUV X V
− NXXUW-%%ROIVTU XNTS TVM%
• :OING Q QOSM UTVX J OX XT XN ROIVT GXX ITV
− T RTJOLOIGXOTSW XT M S VOI ITJ RTWXQ UQGXLTVR WU IOLOI ITJ
GUUQOIGXOTS WXGVX U ITSWTQ XI
21. © 2019 IBM Corporation
21
. L
U S D3 4T SJGXOTS roep
U S D3 @ RROX fx
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
• U S @/ 3 U VOR SXW
U S1/ roep
• @ / roep
• D3 , 4OSGQ /JJOXOTS
• U S1/
22. © 2019 IBM Corporation
22
11SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
Memory Subsystem
Virt Addr
CPU
Core
CPU
Core
CPU
Core
CPU
Core
CPU
Core
CPU
Core
App
External
Device
I/F
VariablesInput
Data
DD
Device Driver
Storage Area
Variables
Input
Data
Variables
Input
Data
Output
Data
Output
Data
• An application calls a device driver to utilize an Accelerator or any device outside the
chip
• The device driver performed a memory mapping operation.
3 versions of the data (not coherent).
1000s of instructions in the device driver.
An application without CAPI
23. © 2019 IBM Corporation
23
12SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
Memory Subsystem
Virt Addr
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
App
FPGA
PCIE
PSL
Variables
Input
Data
Output
Data
1 coherent version of the data.
No device driver call/instructions.
• CPU unloaded since no device driver and accelerator doing the application work
• The FPGA shares memory with the cores
An application with CAPI
24. © 2019 IBM Corporation
24
13SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
Effect of CAPI hardware vs. PCI-E Device Driver
Typical I/O Model Flow:
Flow with a CAPI Model:
Shared Mem.
Notify Accelerator
Acceleration
Shared Memory
Completion
DD Call
Copy or Pin
Source Data
MMIO Notify
Accelerator
Acceleration
Poll / Interrupt
Completion
Copy or Unpin
Result Data
Ret. From DD
Completion
Application
Dependent, but
Equal to below
Application
Dependent, but
Equal to above
300 Instructions 10,000 Instructions 3,000 Instructions
1,000 Instructions
1,000 Instructions
7.9µs 4.9µs
Total ~13µs for data prep
400 Instructions 100 Instructions
0.3µs 0.06µs
Total 0.36µs
25. © 2019 IBM Corporation
25
18SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
FPGA
POWER8
Core
Recall CAPI technology connections
Proprietary hardware and designs to enable coherent
acceleration
Operating system enablement
• little endian linux
• kernel driver (cxl)
• user library (libcxl)
Customer application and accelerator
POWER8 Processor
OS
App
Memory (Coherent)
AFU
PSL
PCIe
CAPP
cxl
libcxl
▪ PSLSE models the red outlined area
▪ Re-implements libcxl api calls
▪ Models memory access
▪ Provides hardware ports to afu
▪ Enables co-simulation of AFU and App
▪ Publicly available on github
26. © 2019 IBM Corporation
26
19SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
FPGA
POWER9
Core
OpenCAPI technology connections
Proprietary hardware and reference designs to
enable coherent acceleration
Operating system enablement
• little endian linux
• reference kernel driver (ocxl)
• reference user library (libocxl)
Customer application and accelerator
POWER9 Processor
OS
App
Memory (Coherent)
AFU
TLx
DLx
25G phy
25G phy
DL
TL
NPU
(w/CAPP
fcn)
PSL
ocxl
libocxl
▪ OCSE models the red outlined area
▪ OCSE enable AFU and App co-simulation IF
reference libocxl and reference TLx/DLx are
used.
▪ OCSE dependencies/assumptions
– Fixed reference TLx/AFU interface
– Fixed reference libocxl user API
▪ Will be available to consortium members
TERMS:
OpenCAPI Simulation Environment (OCSE)
OpenCAPI defines a Data Link Layer (DL) and Transaction Layer (TL)
cited by https://www.kernel.org/doc/html/latest/userspace-api/accelerators/ocxl.html
27. © 2019 IBM Corporation
27 17SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
Feature CAPI 1.0 CAPI 2.0 OpenCAPI 3.0 OpenCAPI 4.0
Processor Generation POWER8 POWER9 POWER9 Future
CAPI Logic Placement FPGA/ASIC FPGA/ASIC NA
DL/TL on Host
DLx/TLx on endpoint
FPGA/ASIC
NA
DL/TL on Host
DLx/TLx on endpoint
FPGA/ASIC
Interface
Lanes per Instance
Lane bit rate
PCIe Gen3
x8/x16
8 Gb/s
PCIe Gen4
2 x (Dual x8)
16 Gb/s
Direct 25G
x8
25 Gb/s
Direct 25G+
x4, x8, x16, x32
25+ Gb/s
Address Translation on CPU No Yes Yes Yes
Native DMA from Endpoint
Accelerator
No Yes Yes Yes
Home Agent Memory on
OpenCAPI Endpoint with
Load/Store Access
No No Yes Yes
Native Atomic Ops to Host
Processor Memory from
Accelerator
No Yes Yes Yes
Accelerator -> HW Thread
Wake-up
No Yes Yes Yes
Low-latency small message push
128B Writes to Accelerator
MMIO 4/8B only MMIO 4/8B only MMIO 4/8B only Yes
Host Memory Caching Function
on Accelerator
Real Address Cache
in PSL
Real Address Cache
in PSL
No Effective Address Cache
in Accelerator
Remove PCIe layers to
reduce latency
significantlyComparison of IBM CAPI Implementations
28. © 2019 IBM Corporation
28
October 25th 2018 Power™ Coherent Acceleration Processor Interface (CAPI) 22
SNAP framework
Process C
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Process B
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Process A
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Application on Host Acceleration on FPGA
Software Program
PSL/AXI bridge
DRAM
on-card
Network
(TBD)
NVMeAXI
Host
DMA
Control
MMIO
Job
Manager
Job
Queue
Quick and easy developing
Use High Level Synthesis tool to convert C/C++ to RTL, or directly use RTL
Programming based on SNAP library and AXI interface
AXI is an industry standard for on-chip interconnection (https://www.arm.com/products/system-ip/amba-specifications)
C/C++
or RTL
Hardware Action
HDK:
CAPI
PSL
or
BSP
CAPI
29. © 2019 IBM Corporation
29
October 25th 2018 Power™ Coherent Acceleration Processor Interface (CAPI) 45
Scatter gather memory access
Results: (Power9 – CAPI2.0 – 2.154GHz, 512MB RAM) (FPGA card: FW609 + S241: VU9P Gen3x16) (SNAP)
- CAPI way saves the time for “SW gather” with relatively small penalty when K grows
N=1024 blocks
block size= 2kBytes
Traditional way
Time (µs)
CAPI way
Time (µs)
How scattered SW gather DMA Sum Verilog HLS
-RK1 309.3 183.5 492.8 171.65 173.3
-RK4 319.05 186.05 505.1 180.9 180.9
-RK16 305.1 185.7 490.8 184.6 186.95
-RK64 320.6 186.85 507.45 186.3 187.5
-RK256 318.3 185.65 503.95 218.55 215.35
-RK1024 333 189.15 522.15 236.85 224.95
-RK4096 324.4 189.35 513.75 241.15 225.55
-RK16384 307.4 185.75 493.15 240.9 224.9
0
100
200
300
400
500
600
-RK1 -RK4 -RK16 -RK64 -RK256 -RK1024 -RK4096 -RK16384
Verilog HLS Sum
Time:µs
More scatteredMore scattered
Contiguous
- Once tuned (using pragmas), HLS can compete with Verilog coding
190us to transfer 2MiB: speed = 11.04GB/s
1 2 2
R = random
K is the dispersion factor of the blocks
Allocate 2MB in a K * 2MB memory area
→ K=1 : all blocks contiguous
→ K=2: 2MB allocated amongst 4MB
→ K=4: 2MB allocated amongst 8MB
31. © 2019 IBM Corporation
31
. L
U S D3 4T SJGXOTS roep
U S D3 @ RROX fx
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
• U S @/ 3 U VOR SXW
U S1/ roep
• @ / roep
• D3 , 4OSGQ /JJOXOTS
• U S1/
32. © 2019 IBM Corporation
32
2 EFEH 218 4 2 E HHE E E M 9 . 1 4E9 C9F
© 2019 IBM Corporation 2
Proposed POWER Processor Technology and I/O Roadmap
POWER8 Architecture POWER9 Architecture
2014
POWER8
12 cores
22nm
New Micro-
Architecture
New Process
Technology
2016
POWER8
w/ NVLink
12 cores
22nm
Enhanced
Micro-
Architecture
With NVLink
2017
P9 SO
12/24 cores
14nm
New Micro-
Architecture
Direct attach
memory
New Process
Technology
2018
P9 SU
12/24 cores
14nm
Enhanced
Micro-
Architecture
Buffered
Memory
POWER7 Architecture
2010
POWER7
8 cores
45nm
New Micro-
Architecture
New Process
Technology
2012
POWER7+
8 cores
32nm
Enhanced
Micro-
Architecture
New Process
Technology
2021
P10
TBA cores
New Micro-
Architecture
New Process
Technology
POWER10
2020
P9 AIO
12/24 cores
14nm
Enhanced
Micro-
Architecture
New
Memory
Subsystem
Up To
150 GB/s
PCIe Gen4 x48
25 GT/s
300GB/s
CAPI 2.0,
OpenCAPI3.0,
NVLink
Sustained Memory Bandwidth
Standard I/O Interconnect
Advanced I/O Signaling
Advanced I/O Architecture
Up To
210 GB/s
PCIe Gen4 x48
25 GT/s
300GB/s
CAPI 2.0,
OpenCAPI3.0,
NVLink
Up To
650 GB/s
PCIe Gen4 x48
25 GT/s
300GB/s
CAPI 2.0,
OpenCAPI4.0,
NVLink
Up To
800 GB/s
PCIe Gen5
32 & 50 GT/s
TBA
Up To
210 GB/s
PCIe Gen3
N/A
CAPI 1.0
Up To
210 GB/s
PCIe Gen3
20 GT/s
160GB/s
CAPI 1.0 ,
NVLink
Up To
65 GB/s
PCIe Gen2
N/A
N/A
Up To
65 GB/s
PCIe Gen2
N/A
N/A
Statement of Direction, Subject to Change 2
Focus of today’s talk
Statement of Direction, Subject to Change
33. © 2019 IBM Corporation
33
9 I E IE I 218 4 2 E HHE 9C M
© 2019 IBM Corporation 6
Memory Signaling (8x8 OMI)
Memory Signaling (8x8 OMI)PowerAXON (x48)
PowerAXON (x48)
PCIeGen4Signaling(x48)
LocalSMPSignaling(3x30)
SMPandAcceleratorInterconnect
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
L3
L3
10 MB
L3 Region
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
L3
L3
Processor Chip Details
• 728 mm2
( 25.3 x 28.8 mm)
• 8 Billion Transistors
• Up to 24 SMT4 Cores
• Up to 120 MB eDRAM L3 cache
Semiconductor Technology
• 14nm finFET
• Improved device performance
• Reduced energy
• eDRAM
• 17 layer metal stack
High Bandwidth Signaling
• 25 GT/s low energy differential
• PowerAXON, OMI memory
• 16 GT/s low energy differential
• Local SMP
• 16 GT/s PCIe Gen4
Open Memory Interface (OMI)
• 16 channels x8 at 25 GT/s
• 650 GB/s peak 1:1 r/w bandwidth
• Technology Agnostic
• Offered w/ Microchip DDR4 buffer
(410 GB/s peak bandwidth)
PowerAXON 25 GT/s Attach
• Up to 16 socket glue-less SMP
(4x24 SMP added to 3x30 local)
• Up to x48 NVIDIA NVLINK GPU
attach
• Up to x48 OpenCAPI 4.0 coherent
accelerator / memory attach
Industry Standard I/O Attach
• x48 PCIe Gen 4 at 16 GT/s
• Up to x16 CAPI 2.0 coherent
accelerator / storage attach
Final Addition to the POWER9 Processor Family
2 TB/s Raw Signaling Bandwidth
Shared by 6 Attach Protocols
The Bandwidth Beast
Advanced I/O (AIO)
34. © 2019 IBM Corporation
34
218 4 MHI CH 0 CE M I 9I M
© 2019 IBM Corporation
Connect all memory technologies to Power systems through OpenCAPI and OMI
Why?
– It is a high speed interface that allows flexibility to attach any new and emerging
memory technology, including persistent memories like storage class memory (SCM)
CPU
P9
Switch
DDR DDR
NVME
OpenCAPI
P9 – OpenCAPI
CPU
P9’/P10
Switch
OMI OMI
NVME
OpenCAPI
P9’ Axone, P10 – OpenCAPI and OMI
POWER Systems Memory Strategy
35. © 2019 IBM Corporation
35
-M: 0 CE M :HMHI C -0
© 2019 IBM Corporation
• Hybrid Memory Subsystem using Low Latency NAND and DRAM
– Exclusive partnership for low latency NAND media, and with Bittware for
design of accelerator card 250-HMS
– Low Latency NAND for capacity and persistence, with DRAM used for
caching to lower average latency
• Capabilities
– SCM on OpenCAPI using Load/Store memory semantics
– Competitive latency and bandwidth at reduced cost for systems with high
capacity memory requirements
• Target Applications
– Primary: cost reduction on in-memory applications and databases with
predominantly Sequential and mostly Read-Only processing
Hybrid Memory Subsystem - HMS
36. © 2019 IBM Corporation
36
1F 2. HMCC I 1F 9IE II9
© 2019 IBM Corporation 9
OpenCAPI 4.0: Asymmetric Open Accelerator Attach
Roadmap of Capabilities and Host Silicon Delivery
Accelerator Protocol CAPI 1.0 CAPI 2.0 OpenCAPI 3.0 OpenCAPI 4.0 OpenCAPI 5.0
First Host Silicon POWER8
(GA 2014)
POWER9 SO
(GA 2017)
POWER9 SO
(GA 2017)
POWER9 AIO
(GA 2020)
POWER10
(GA 2021)
Functional Partitioning Asymmetric Asymmetric Asymmetric Asymmetric Asymmetric
Host Architecture POWER POWER Any Any Any
Cache Line Size Supported 128B 128B 64/128/256B 64/128/256B 64/128/256B
Attach Vehicle PCIe Gen 3
Tunneled
PCIe Gen 4
Tunneled
25 G (open)
Native DL/TL
25 G (open)
Native DL/TL
32/50 G (open)
Native DL/TL
Address Translation On Accelerator Host Host (secure) Host (secure) Host (secure)
Native DMA to Host Mem No Yes Yes Yes Yes
Atomics to Host Mem No Yes Yes Yes Yes
Host Thread Wake-up No Yes Yes Yes Yes
Host Memory Attach Agent No No Yes Yes Yes
Low Latency Short Msg 4B/8B MMIO 4B/8B MMIO 4B/8B MMIO 128B push 128B push
Posted Writes to Host Mem No No No Yes Yes
Caching of Host Mem RA Cache RA Cache No VA Cache VA Cache
39. Intel
CPU
92 u D:MQN u7 Cs9QV O
a au7 C
IBM
POWER
CPU
150GB/s 150GB/s
150GB/s
GPU
150GB/s
32GB/s
CPU-GPU
4.7
39
92 RX T A UV P 13. X OO
• M MRRo a p F 6 3 9AE DM RI F( u
• 350))p F=PT )% nta5AE 9AE a TYMR c cp) ,
GPU GPU GPU
*0
40. a adAWPPMVb ce
13. B h
zc p c c c cfC SSPYd e a
) (/ - p c c c c D A, mh u krklb
1. 200
u
a ar19 t
ozp
8 3 19upxt ly
F=PT )% m5AE 9AE dA5 M 9MT* 0%, e
/- ) p c
- ns u a
/- 5A 6m )%0 p yc u b
/- 5OIPTMW *%. a5INNM *%/ p yc m
u b PTMYPKIsA G7B/ / b
* https://www-03.ibm.com/press/jp/ja/pressrelease/53461.wss
POWER9
14nm a
DOMQN 3 C 7 C
41. RX T A UV P 13.
(
l / -(( 7B8b c 7BF c
l 1 )E x
l 5AE1 A G7B0 C D2( ) *) rl )
7B8 IRT , 78 BWTHR ( ( 78
7B8 IRT ) 78 BWTHR ( 78
7BF -IRT ( 78 BWTHR ( ) 78
7BF IRT - 78 BWTHR ( 78
l 1 -72 ),-94 ,()94 ( ) 94 )-72 66B )---9: (-
l 1 * 94 c ” b(. 94 ) e
l A5 M
(- 9MT =U AWUNPRM1 ) 53A )% e
/ 9MT =U AWUNPRM1 ( 53A )% e
9MT =U AWUNPRM1 (
l C88 )%, C3D3 w ) :661 D4 rl CC61 .%-/D4
l 7 C/ D9491 B UOG D 7B8/ ) 7BF/ ) i
( 72
(-94 ”
l 1 ) ) F
l C1 85:, :5 CHWQVW - )
42. © 2019 IBM Corporation
42 6
WISTRON “MiHawk”
24 x NVMe = 96 lanes Gen3 PCIe = 48 lanes Gen4 PCIe = 32 lanes OpenCAPI 3.0
Image Source: Wistron
43. © 2019 IBM Corporation
43
1F 2. E 0 9 A
7
WISTRON “MiHawk”
24 x NVMe = 96 lanes Gen3 PCIe = 48 lanes Gen4 PCIe = 32 lanes OpenCAPI 3.0
Image Source: Wistron
OpenCAPI !