SlideShare a Scribd company logo
1 of 25
Download to read offline
Allan Cantle - 6/14/2021
Decoupling Compute from
Memory, Storage & IO with OMI
An Open Source Hardware Initiative
Nomenclature : Read “Processor” as CPU and/or Accelerator
Overview
• Why Decouple Compute from Memory, Storage & IO?


• Top Down Systems Perspective and Introduction of OCP HPC Concepts


• Introduction to the Open Memory Interface, OMI


• Decoupling Compute with OCP OAM
-
HPC Module & OMI
Why Decouple Compute from Memory, Storage & IO?
Rethinking Computing Architecture…..
• Because the Data is at the heart of Computing Architecture Today


• Compute is rapidly becoming a Commodity


• Intel i386 = 276K Transistors = $0.01 retail!


• Power Ef
f
iciency and Cost are Today’s Primary Drivers


• Distribute the Compute ef
f
iciently : It’s not the Center of Attention anymore


• Compute therefore needs 1 simple interface for easy re-use everywhere
So, Let’s Rede
f
ine Computing Architecture
Back to First Principles with a Primary Focus on Power & Latency
• In Computing


• Latency ≈ Time taken to Move Data


• More Clock Cycles = More Latency = More Power


• More Distance = More Latency = More Power


• Hence Power can be seen as a proxy for Latency & Vice Versa


• A Focus on Power & Latency will beget Performance


• Heterogeneous Processors effectively do this for Speci
f
ic algorithm types


• For HPC, we now need to Focus our energy at the System Architecture Level
Cache Memory’s Bipolar relationship with Power
It’s Implementation Needs Rethinking
• Cache is Very Good for Power ef
f
iciency


• Minimizes data movement on repetitively used data


• Cache is very Bad for Power Ef
f
iciency


• Unnecessary Cache thrashing for data that’s touched once or not at all


• Many layers of Cache = multiple copies burning more power


• Conclusion


• Hardware Caching must be ef
f
iciently managed at the application level
Today’s Processor interface Choices
From a Latency / Power Perspective
• Processor Designers must allocate % beachfront for each interface


• Choices are increasing at the moment making the decision harder!
CPU /
Accel
CPU /
Accel
DDR / HBM / OMI Memory Access ~50ns*†
SMP / OpenCAPI / Memory Inception Memory Access 150ns to 300ns*†
CXL / PCIe / GenZ / CCIX Interface only <500ns* (Switchless)
Ethernet / In
f
iniband Network Interface only >500ns*
*Latencies are round trip approximations only : † Includes DDR Memory Read access time
Or
Memory
Pool
Simplify to a Shared Memory Interface?
Successfully Decouple Processors from Memory, Storage & IO
• One Standardized, Low Latency, Low Power, Processor Interface


• Graceful increase in latency and power beyond local memory


• Processor Companies can focus on their core expertise and Support ALL Domain Speci
f
ic Use Cases
CPU /
Accel
CPU /
Accel
~50ns*†
*Latencies are round trip approximations only : † Includes Memory Read access time
~50ns*†
~50ns*†
~50ns*†
~50ns*†
~50ns*†
CPU /
Accel
150ns - 300ns*†
>500ns*
<500ns*
~50ns*†
~50ns*†
~50ns*†
Or
Memory
Pool
Overview
• Why Decouple Compute from Memory, Storage & IO?


• Top Down Systems Perspective and Introduction of OCP HPC Concepts


• Introduction to the Open Memory Interface, OMI


• Decoupling Compute with OCP OAM
-
HPC Module & OMI
S
C M
M
M
C
IO IO
S S
S S S
S S
A
Disaggregated Racks to Hyper-converged Chiplets
Classic server being torn in opposite directions!
Software
Composable
Expensive Physical
composability
Baseline Physical
Composability
Power Ignored
Rack Interconnect
>20pJ/bit
Power Optimized


Chiplet Interconnect
<1pJ/bit
Power Baseline


Node Interconnect
5
-
10pJ/bit
Node Volume
>800 Cubic Inches
SIP Volume
<1 Cubic Inch
Rack Volume
>53K Cubic Inches
Baseline Latency
Poor Latency Optimal Latency
S
C M
M
M
C
IO IO
S S
S S S
S S
A
An OCP OAM & EDSFF Inspired solution?
Bringing the bene
f
its of Disaggregation and Chiplets together
Software
Composable
Expensive Physical
composability
Baseline Physical
Composability
Power Ignored
Rack Interconnect
>20pJ/bit
Power Optimized


Chiplet Interconnect
<1pJ/bit
Power Baseline


Node Interconnect
5
-
10pJ/bit
Node Volume
>800 Cubic Inches
SIP Volume
<1 Cubic Inch
Rack Volume
>53K Cubic Inches
Baseline Latency
Poor Latency Optimal Latency
Software & Physical
Composability
Power Optimized


Flexible Chiplet
Interconnect 1
-
2pJ/bit
Optimal Latency
Module Volume
<150 Cubic Inches
OCP OAM
-
HPC Module


Populated with E3.S, NIC
-
3.0, & Cable IO
Fully Composable Compute Node Module
Leveraged from OCP’s OAM Module - nicknamed OAM
-
HPC
• Modular, Flexible and Composable Module - Protocol Agnostic!


• Memory, Storage & IO interchangeable depending on Application Need


• Processor must use HBM or have Serially Attached Memory
OCP OAM
-
HPC Module


Top & Bottom View
OAM
-
HPC Module Common Bottom View
for all types of Processor Implementations


16x EDSFF TA
-
1002 4C/4C+ Connectors +


8x Nearstack x8 Connectors


Total of 320x Transceivers
OAM
-
HPC Standard
could Support
Today’s Processors


e.g.


NVIDIA Ampere


Google TPU


IBM POWER10


Xilinx FPGAs


Intel FPGAs


Graphcore IPU


Example OAM
-
HPC Module
Bottom View Populated with


8x E3.S Modules,


2x OCP NIC 3.0 Modules,


4x TA1002 4C Cables &


8x Nearstack x8 Cables
OMI in E3.S
OMI
Memory IO is
f
inally going Serial!
• Bringing Memory into the composable world of Storage and IO with E3.S
DDR DIMM OMI in DDIMM Format
CXL.mem in E3.S
Introduced in August 2019
Introduced in May 2021
Proposed in 2020
GenZ in E3.S
Introduced in 2020
Dual OMI x8
DDR4/5 Channel
CXL x16 DDR5 Channel
GenZ x16 DDR4 Channel
IBM POWER10 OCP
-
HPC Modules Example
OCP
-
HPC Module Block Schematic
288x of 320x Transceiver Lanes in Total


32x PCIe Lanes


128x OMI Lanes


128 SMP / OpenCAPI Lanes
EDSFF TA
-
1002


4C / 4C+
Connector
IBM POWER10
Single Chiplet
Package
16 16
8 8
8 8
16 16
8 8
8 8
= 8 Lane OMI Channel
= SMP / OpenCAPI Channel
= PCIe-G5 Channel
Nearstack PCIe
x8 Connector
4
16
8
4
E3.S
Up to
512GByte
Dual OMI
Channel
DDR5
Module
E3.S
Up to
512GByte
Dual OMI
Channel
DDR5
Module
E3.S
x4
NVMe SSD
NIC 3.0
x16
Cabled / PCIe x8 IO
Cabled SMP / OpenCAPI
SMP/OpenCAPI
SMP/OpenCAPI SMP/OpenCAPI
SMP
SMP/OpenCAPI SMP/OpenCAPI
SMP/OpenCAPI SMP
SMP SMP
SMP SMP
E3.S
x4
NVMe SSD
Dense Modularity = Power Saving Opportunity
A Potential Flexible Chiplet Level Interconnect
• Distance from Processor Die Bump to E3.S ASIC <5 Inches (128mm) - Worst Case Manhattan Distance


• Opportunity to reduce PHY Channel to 5
-
10dB, 1
-
2pJ/bit - Similar to XSR


• Opportunity to use the OAM
-
HPC & E3.S Modules as Processor & ASIC Package Substrates


• Better Power Integrity and Signal Integrity
24mm
67mm
26mm x


26mm


676mm2
19mm
18mm
Overview
• Why Decouple Compute from Memory, Storage & IO?


• Top Down Systems Perspective and Introduction of OCP HPC Concepts


• Introduction to the Open Memory Interface, OMI


• Decoupling Compute with OCP OAM
-
HPC Module & OMI
Introduction to OMI - Open Memory Interface?
OMI = Bandwidth of HBM at DDR Latency, Capacity & Cost
• DDR4/5


• Low Bandwidth per Die Area/Beachfront


• Not Physically Composable


• HBM


• In
f
lexible & Expensive


• Capacity Limited


• CXL.Mem, OpenCAPI, CCIX


• Higher Latency, Far Memory


• GenZ


• Data Center Level Far Memory
= DDR4 / DDR5 = OMI = HBM2E
DRAM
Capacity,
TBytes
Log Scale
0.01
0.1
1.0
10
0.01 0.1 1 10
Memory Bandwidth, TBytes/s Log Scale
OMI
HBM2E
DDR4
0.001
DDR5
Comparison to OMI - In Production since 2019
Memory Interface Comparison
OMI, the ideal Processor Shared Memory Interface!
Speci
f
ication LRDIMM DDR4 DDR5 HBM2E(8
-
High) OMI
Protocol Parallel Parallel Parallel Serial
Signalling Single-Ended Single-Ended Single-Ended Di
ff
erential
I/O Type Duplex Duplex Simplex Simplex
LANES/Channel (Read/
Write)
64 32 512R/512W 8R/8W
LANE Speed 3,200MT/s 6,400MT/s 3,200MT/S 32,000MT/s
Channel Bandwidth (R+W) 25.6GBytes/s 25.6GBytes/s 400GBytes/s 64GBytes/s
Latency 41.5ns ? 60.4ns 45.5ns
Driver Area / Channel 7.8mm2 3.9mm2 11.4mm2 2.2mm2
Bandwidth/mm2 3.3GBytes/s/mm2 6.6GBytes/s/mm2 35GBytes/s/mm2 33.9GBytes/s/mm2
Max Capacity / Channel 64GB 256GB 16GB 256GB
Connection Multi Drop Multi Drop Point-to-Point Point-to-Point
Data Resilience Parity Parity Parity CRC
Similar Bandwidth/mm2
provides an opportunity for
an HBM Memory with an OMI
Interface on its logic layer.
Brings Flexibility and
Capacity options to
Processors with HBM
Interfaces!
OMI Today on IBM’s POWER10 Die
POWER10 

18B Transisters on 

Samsung 7nm - 602 mm2

~24.26mm x ~24.82mm
Die photo courtesy of Samsung Foundry


Scale 1mm : 20pts
OMI Memory PHY Area

2 Channels

1.441mm x 2.626mm

3.78mm2

Or

1.441mm x 1.313mm / Channel

1.89mm2 / Channel

Or

30.27mm2 for 16x Channels

Peak Bandwidth per Channel

= 32Gbits/s * 8 * 2(Tx + Rx)

= 64 GBytes/s
Peak Bandwidth per Area

= 64 GBytes/s / 1.89mm2

33.9 GBytes/s/mm2
Maximum DRAM Capacity 

per OMI DDIMM = 256GB
32Gb/s x8 OMI Channel
OMI
Bu
ff
er
Chip
30dB @ <5pJ/bit
2.5W per 64GBytes/s


Tx + Rx OMI Channel


At each end
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
16Gbit Monolithic Memory


Jedec con
f
igurations


32GByte 1U OMI DDIMM
64GByte 2U OMI DDIMM
256GByte 4U OMI DDIMM
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
Same TA
-
1002
EDSFF
Connector
2019’s 25.6Gbit/s DDR4 OMI DDIMM
Locked ratio to the DDR Speed


21.33Gb/s x8 - DDR4
-
2667


25.6Gb/s x8 - DDR4/5
-
3200


32Gb/s x8 - DDR5
-
4000


38.4Gb/s - DDR5
-
4800


42.66Gb/s - DDR5
-
5333


51.2Gb/s - DDR5
-
6400
<2ns (without wire)
<2ns (without wire)
Serdes Phy Latency


Mesochronous clocking
E3.S
Other Potential Emerging
EDSFF Media Formats
Up to
512GByte
Dual OMI
Channel
OMI Phy
OMI Bandwidth vs SPFLOPs
OMI Helping to Address Memory Bound Applications
• Tailoring OPS : Bytes/s : Bytes Capacity to Application Needs
Die Size shrink = 7x
OMI Bandwidth reduction = 2.8x


SPFLOPS reduction = 15x
Theoretical Maximum of 80 OMI Channels


OMI Bandwidth = 5.1 TBytes/s


NVidia Ampere Max Reticule Size Die


~30 SPFLOPS
Maximum Reticule
Size Die @ 7nm


826mm2


~32.18mm x 

~25.66mm
28 OMI Channels = 1.8TByte/s


2 SPTFLOPs
117mm2


10.8 x 10.8
To Scale 10pts : 1mm
Overview
• Why Decouple Compute from Memory, Storage & IO?


• Top Down Systems Perspective and Introduction of OCP HPC Concepts


• Introduction to the Open Memory Interface, OMI


• Decoupling Compute with OCP OAM
-
HPC Module & OMI
OAM
-
HPC Con
f
iguration Examples - Modular, Flexible & Composable
1, 2 or 3 Port - Shared Memory OMI Chiplet Buffers
HBM


8x Channel
OMI Enabled
Logic Layer
HBM


8 Channel
OMI Enabled
Logic Layer
EDSFF
4C


Connector
Medium Reach OMI
Interconnect
OMI MR Extender Bu
ff
er


<500ps round trip delay
Nearstack


Connector
Fabric Interconnect


e.g. Ethernet / In
f
iniband
Passive Fabric Cable
E3.S
Module
1 or 2
Port Shared
Memory
Controller
Optional


In Bu
ff
er


Near Memory
Processor
XSR
-
NRZ
PHY
OMI
DLX
OMI
TLX
OMI 1 or 2 port Bu
ff
er Chiplet
OAM-HPC Module
Maximum Reticule
Size Processor


with 80 OMI


XSR Channels
EDSFF
4C


Connector
Fabric Interconnect


e.g. OpenCAPI /


CXL / GenZ / CCIX Protocol Speci
f
ic


Active Fabric Cable
2 Port OMI Bu
ff
er Chiplet
with integrated Shared
Memory


A Bu
ff
er for each Fabric
Standard
XSR
DLX
TLX
XSR
-
NRZ
PHY
OMI
DLX
OMI
TLX
XSR
-
NRZ PHY
OMI DLX
OMI TLX
Optional Near Memory
Processor Chiplet
XSR
-
NRZ PHY
OMI DLX
OMI TLX
3 Port


Shared
Memory
Controller
OMI 3 port Bu
ff
er Chiplet
OCP-NIC-3.0
Module
Fabric Interconnect


e.g. OpenCAPI /


CXL / GenZ /


Ethernet / In
f
iniband
OCP Accelerator Infrastructure, OAI Chassis’
8x Cold Plate Mounted OCP OAM
-
HPC Modules
Pluggable into OCP OAI Chassis
• Summary


• Rede
f
ine Computing Architecture


• With a Focus on Power and Latency


• Shared Memory Centric Architecture


• Leverage Open Memory Interface, OMI


• Dense OCP Modular Platform Approach
Interested? - How to Get Involved
From Silicon Startups to Open Hardware Enthusiasts alike
• Step 1 - Join the OpenCAPI consortium & OCP HPC Sub-Project


• Step 2 - Replace DDR with Standard OMI interfaces on next processor design


• Step 3 - Add low power OMI PHYs to spare beachfront on your Processors


• Step 4 - Build OAM
-
HPC Modules around your large Processor Devices


• Step 5 - Help community build Speci
f
ic 2 & 3 port OMI chiplet buffers


• Step 6 - Help community build OMI Buffer enabled E3.S & NIC 3.0 modules etc
Questions?
Contact me at a.cantle@nallasway.com


Join OpenCAPI Consortium at https://opencapi.org


Join OCP HPC Sub-Project Workgroup at https://www.opencompute.org/wiki/HPC

More Related Content

What's hot

DDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemoryDDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemorySubhajit Sahu
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
 
CXL chapter1 and chapter 2 presentation.pptx
CXL chapter1 and chapter 2 presentation.pptxCXL chapter1 and chapter 2 presentation.pptx
CXL chapter1 and chapter 2 presentation.pptxkirankumarpalakurthi
 
Arm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm SolutionsArm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm SolutionsMemory Fabric Forum
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingAMD
 
MemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemory Fabric Forum
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationSubhajit Sahu
 
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesOdinot Stanislas
 
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based MultiprocessingArm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based MultiprocessingArm
 
CXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent ConnectivityCXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent ConnectivityMemory Fabric Forum
 
Enfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory WorldsEnfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory WorldsMemory Fabric Forum
 
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingCXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingMemory Fabric Forum
 
Chiplets in Data Centers
Chiplets in Data CentersChiplets in Data Centers
Chiplets in Data CentersODSA Workgroup
 
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSDHigh-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSDinside-BigData.com
 
Reliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxReliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxSamsung Open Source Group
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Deepak Shankar
 
High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)HARINATH REDDY
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 

What's hot (20)

CXL Fabric Management Standards
CXL Fabric Management StandardsCXL Fabric Management Standards
CXL Fabric Management Standards
 
DDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemoryDDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM Memory
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
CXL chapter1 and chapter 2 presentation.pptx
CXL chapter1 and chapter 2 presentation.pptxCXL chapter1 and chapter 2 presentation.pptx
CXL chapter1 and chapter 2 presentation.pptx
 
Arm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm SolutionsArm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm Solutions
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
 
MemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL Environments
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : Presentation
 
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
 
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based MultiprocessingArm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
 
CXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent ConnectivityCXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent Connectivity
 
Enfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory WorldsEnfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory Worlds
 
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingCXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
 
Chiplets in Data Centers
Chiplets in Data CentersChiplets in Data Centers
Chiplets in Data Centers
 
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSDHigh-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
 
Reliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxReliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on Linux
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
 
High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 

Similar to Decoupling Compute from Memory, Storage and IO with OMI

OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI KeynoteAllan Cantle
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specificationsinside-BigData.com
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PROIDEA
 
Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overviewsolarisyougood
 
Presentation best practices for optimal configuration of oracle databases o...
Presentation   best practices for optimal configuration of oracle databases o...Presentation   best practices for optimal configuration of oracle databases o...
Presentation best practices for optimal configuration of oracle databases o...xKinAnx
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Netronome
 
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Ceph Community
 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesNetronome
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator Ganesan Narayanasamy
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterAaron Joue
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red_Hat_Storage
 
Challenges in Embedded Computing
Challenges in Embedded ComputingChallenges in Embedded Computing
Challenges in Embedded ComputingPradeep Kumar TS
 
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...inside-BigData.com
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programminginside-BigData.com
 

Similar to Decoupling Compute from Memory, Storage and IO with OMI (20)

OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI Keynote
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
@IBM Power roadmap 8
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8
 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
 
Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overview
 
Presentation best practices for optimal configuration of oracle databases o...
Presentation   best practices for optimal configuration of oracle databases o...Presentation   best practices for optimal configuration of oracle databases o...
Presentation best practices for optimal configuration of oracle databases o...
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific Architectures
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
 
Challenges in Embedded Computing
Challenges in Embedded ComputingChallenges in Embedded Computing
Challenges in Embedded Computing
 
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 

Recently uploaded

existing product research b2 Sunderland Culture
existing product research b2 Sunderland Cultureexisting product research b2 Sunderland Culture
existing product research b2 Sunderland CultureChloeMeadows1
 
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degreeyuu sss
 
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree z zzz
 
Hifi Babe North Delhi Call Girl Service Fun Tonight
Hifi Babe North Delhi Call Girl Service Fun TonightHifi Babe North Delhi Call Girl Service Fun Tonight
Hifi Babe North Delhi Call Girl Service Fun TonightKomal Khan
 
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一C SSS
 
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作f3774p8b
 
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...ttt fff
 
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...Amil Baba Dawood bangali
 
美国IUB学位证,印第安纳大学伯明顿分校毕业证书1:1制作
美国IUB学位证,印第安纳大学伯明顿分校毕业证书1:1制作美国IUB学位证,印第安纳大学伯明顿分校毕业证书1:1制作
美国IUB学位证,印第安纳大学伯明顿分校毕业证书1:1制作ss846v0c
 
Call Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall AvailableCall Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall AvailableCall Girls in Delhi
 
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...Amil baba
 
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls DubaiDubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubaikojalkojal131
 
(办理学位证)多伦多大学毕业证成绩单原版一比一
(办理学位证)多伦多大学毕业证成绩单原版一比一(办理学位证)多伦多大学毕业证成绩单原版一比一
(办理学位证)多伦多大学毕业证成绩单原版一比一C SSS
 
萨斯喀彻温大学毕业证学位证成绩单-购买流程
萨斯喀彻温大学毕业证学位证成绩单-购买流程萨斯喀彻温大学毕业证学位证成绩单-购买流程
萨斯喀彻温大学毕业证学位证成绩单-购买流程1k98h0e1
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...Amil Baba Dawood bangali
 
the cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptxthe cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptxLeaMaePahinagGarciaV
 
RBS学位证,鹿特丹商学院毕业证书1:1制作
RBS学位证,鹿特丹商学院毕业证书1:1制作RBS学位证,鹿特丹商学院毕业证书1:1制作
RBS学位证,鹿特丹商学院毕业证书1:1制作f3774p8b
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)861c7ca49a02
 
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degreeyuu sss
 

Recently uploaded (20)

existing product research b2 Sunderland Culture
existing product research b2 Sunderland Cultureexisting product research b2 Sunderland Culture
existing product research b2 Sunderland Culture
 
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
 
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Hifi Babe North Delhi Call Girl Service Fun Tonight
Hifi Babe North Delhi Call Girl Service Fun TonightHifi Babe North Delhi Call Girl Service Fun Tonight
Hifi Babe North Delhi Call Girl Service Fun Tonight
 
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
 
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
 
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
 
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
 
美国IUB学位证,印第安纳大学伯明顿分校毕业证书1:1制作
美国IUB学位证,印第安纳大学伯明顿分校毕业证书1:1制作美国IUB学位证,印第安纳大学伯明顿分校毕业证书1:1制作
美国IUB学位证,印第安纳大学伯明顿分校毕业证书1:1制作
 
Call Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall AvailableCall Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall Available
 
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
 
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls DubaiDubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
 
(办理学位证)多伦多大学毕业证成绩单原版一比一
(办理学位证)多伦多大学毕业证成绩单原版一比一(办理学位证)多伦多大学毕业证成绩单原版一比一
(办理学位证)多伦多大学毕业证成绩单原版一比一
 
萨斯喀彻温大学毕业证学位证成绩单-购买流程
萨斯喀彻温大学毕业证学位证成绩单-购买流程萨斯喀彻温大学毕业证学位证成绩单-购买流程
萨斯喀彻温大学毕业证学位证成绩单-购买流程
 
young call girls in Khanpur,🔝 9953056974 🔝 escort Service
young call girls in  Khanpur,🔝 9953056974 🔝 escort Serviceyoung call girls in  Khanpur,🔝 9953056974 🔝 escort Service
young call girls in Khanpur,🔝 9953056974 🔝 escort Service
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
 
the cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptxthe cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptx
 
RBS学位证,鹿特丹商学院毕业证书1:1制作
RBS学位证,鹿特丹商学院毕业证书1:1制作RBS学位证,鹿特丹商学院毕业证书1:1制作
RBS学位证,鹿特丹商学院毕业证书1:1制作
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
 
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
 

Decoupling Compute from Memory, Storage and IO with OMI

  • 1. Allan Cantle - 6/14/2021 Decoupling Compute from Memory, Storage & IO with OMI An Open Source Hardware Initiative Nomenclature : Read “Processor” as CPU and/or Accelerator
  • 2. Overview • Why Decouple Compute from Memory, Storage & IO? • Top Down Systems Perspective and Introduction of OCP HPC Concepts • Introduction to the Open Memory Interface, OMI • Decoupling Compute with OCP OAM - HPC Module & OMI
  • 3. Why Decouple Compute from Memory, Storage & IO? Rethinking Computing Architecture….. • Because the Data is at the heart of Computing Architecture Today • Compute is rapidly becoming a Commodity • Intel i386 = 276K Transistors = $0.01 retail! • Power Ef f iciency and Cost are Today’s Primary Drivers • Distribute the Compute ef f iciently : It’s not the Center of Attention anymore • Compute therefore needs 1 simple interface for easy re-use everywhere
  • 4. So, Let’s Rede f ine Computing Architecture Back to First Principles with a Primary Focus on Power & Latency • In Computing • Latency ≈ Time taken to Move Data • More Clock Cycles = More Latency = More Power • More Distance = More Latency = More Power • Hence Power can be seen as a proxy for Latency & Vice Versa • A Focus on Power & Latency will beget Performance • Heterogeneous Processors effectively do this for Speci f ic algorithm types • For HPC, we now need to Focus our energy at the System Architecture Level
  • 5. Cache Memory’s Bipolar relationship with Power It’s Implementation Needs Rethinking • Cache is Very Good for Power ef f iciency • Minimizes data movement on repetitively used data • Cache is very Bad for Power Ef f iciency • Unnecessary Cache thrashing for data that’s touched once or not at all • Many layers of Cache = multiple copies burning more power • Conclusion • Hardware Caching must be ef f iciently managed at the application level
  • 6. Today’s Processor interface Choices From a Latency / Power Perspective • Processor Designers must allocate % beachfront for each interface • Choices are increasing at the moment making the decision harder! CPU / Accel CPU / Accel DDR / HBM / OMI Memory Access ~50ns*† SMP / OpenCAPI / Memory Inception Memory Access 150ns to 300ns*† CXL / PCIe / GenZ / CCIX Interface only <500ns* (Switchless) Ethernet / In f iniband Network Interface only >500ns* *Latencies are round trip approximations only : † Includes DDR Memory Read access time Or Memory Pool
  • 7. Simplify to a Shared Memory Interface? Successfully Decouple Processors from Memory, Storage & IO • One Standardized, Low Latency, Low Power, Processor Interface • Graceful increase in latency and power beyond local memory • Processor Companies can focus on their core expertise and Support ALL Domain Speci f ic Use Cases CPU / Accel CPU / Accel ~50ns*† *Latencies are round trip approximations only : † Includes Memory Read access time ~50ns*† ~50ns*† ~50ns*† ~50ns*† ~50ns*† CPU / Accel 150ns - 300ns*† >500ns* <500ns* ~50ns*† ~50ns*† ~50ns*† Or Memory Pool
  • 8. Overview • Why Decouple Compute from Memory, Storage & IO? • Top Down Systems Perspective and Introduction of OCP HPC Concepts • Introduction to the Open Memory Interface, OMI • Decoupling Compute with OCP OAM - HPC Module & OMI
  • 9. S C M M M C IO IO S S S S S S S A Disaggregated Racks to Hyper-converged Chiplets Classic server being torn in opposite directions! Software Composable Expensive Physical composability Baseline Physical Composability Power Ignored Rack Interconnect >20pJ/bit Power Optimized Chiplet Interconnect <1pJ/bit Power Baseline Node Interconnect 5 - 10pJ/bit Node Volume >800 Cubic Inches SIP Volume <1 Cubic Inch Rack Volume >53K Cubic Inches Baseline Latency Poor Latency Optimal Latency
  • 10. S C M M M C IO IO S S S S S S S A An OCP OAM & EDSFF Inspired solution? Bringing the bene f its of Disaggregation and Chiplets together Software Composable Expensive Physical composability Baseline Physical Composability Power Ignored Rack Interconnect >20pJ/bit Power Optimized Chiplet Interconnect <1pJ/bit Power Baseline Node Interconnect 5 - 10pJ/bit Node Volume >800 Cubic Inches SIP Volume <1 Cubic Inch Rack Volume >53K Cubic Inches Baseline Latency Poor Latency Optimal Latency Software & Physical Composability Power Optimized Flexible Chiplet Interconnect 1 - 2pJ/bit Optimal Latency Module Volume <150 Cubic Inches OCP OAM - HPC Module Populated with E3.S, NIC - 3.0, & Cable IO
  • 11. Fully Composable Compute Node Module Leveraged from OCP’s OAM Module - nicknamed OAM - HPC • Modular, Flexible and Composable Module - Protocol Agnostic! • Memory, Storage & IO interchangeable depending on Application Need • Processor must use HBM or have Serially Attached Memory OCP OAM - HPC Module Top & Bottom View OAM - HPC Module Common Bottom View for all types of Processor Implementations 16x EDSFF TA - 1002 4C/4C+ Connectors + 8x Nearstack x8 Connectors Total of 320x Transceivers OAM - HPC Standard could Support Today’s Processors e.g. NVIDIA Ampere Google TPU IBM POWER10 Xilinx FPGAs Intel FPGAs Graphcore IPU Example OAM - HPC Module Bottom View Populated with 8x E3.S Modules, 2x OCP NIC 3.0 Modules, 4x TA1002 4C Cables & 8x Nearstack x8 Cables
  • 12. OMI in E3.S OMI Memory IO is f inally going Serial! • Bringing Memory into the composable world of Storage and IO with E3.S DDR DIMM OMI in DDIMM Format CXL.mem in E3.S Introduced in August 2019 Introduced in May 2021 Proposed in 2020 GenZ in E3.S Introduced in 2020 Dual OMI x8 DDR4/5 Channel CXL x16 DDR5 Channel GenZ x16 DDR4 Channel
  • 13. IBM POWER10 OCP - HPC Modules Example OCP - HPC Module Block Schematic 288x of 320x Transceiver Lanes in Total 32x PCIe Lanes 128x OMI Lanes 128 SMP / OpenCAPI Lanes EDSFF TA - 1002 4C / 4C+ Connector IBM POWER10 Single Chiplet Package 16 16 8 8 8 8 16 16 8 8 8 8 = 8 Lane OMI Channel = SMP / OpenCAPI Channel = PCIe-G5 Channel Nearstack PCIe x8 Connector 4 16 8 4 E3.S Up to 512GByte Dual OMI Channel DDR5 Module E3.S Up to 512GByte Dual OMI Channel DDR5 Module E3.S x4 NVMe SSD NIC 3.0 x16 Cabled / PCIe x8 IO Cabled SMP / OpenCAPI SMP/OpenCAPI SMP/OpenCAPI SMP/OpenCAPI SMP SMP/OpenCAPI SMP/OpenCAPI SMP/OpenCAPI SMP SMP SMP SMP SMP E3.S x4 NVMe SSD
  • 14. Dense Modularity = Power Saving Opportunity A Potential Flexible Chiplet Level Interconnect • Distance from Processor Die Bump to E3.S ASIC <5 Inches (128mm) - Worst Case Manhattan Distance • Opportunity to reduce PHY Channel to 5 - 10dB, 1 - 2pJ/bit - Similar to XSR • Opportunity to use the OAM - HPC & E3.S Modules as Processor & ASIC Package Substrates • Better Power Integrity and Signal Integrity 24mm 67mm 26mm x 26mm 676mm2 19mm 18mm
  • 15. Overview • Why Decouple Compute from Memory, Storage & IO? • Top Down Systems Perspective and Introduction of OCP HPC Concepts • Introduction to the Open Memory Interface, OMI • Decoupling Compute with OCP OAM - HPC Module & OMI
  • 16. Introduction to OMI - Open Memory Interface? OMI = Bandwidth of HBM at DDR Latency, Capacity & Cost • DDR4/5 • Low Bandwidth per Die Area/Beachfront • Not Physically Composable • HBM • In f lexible & Expensive • Capacity Limited • CXL.Mem, OpenCAPI, CCIX • Higher Latency, Far Memory • GenZ • Data Center Level Far Memory = DDR4 / DDR5 = OMI = HBM2E DRAM Capacity, TBytes Log Scale 0.01 0.1 1.0 10 0.01 0.1 1 10 Memory Bandwidth, TBytes/s Log Scale OMI HBM2E DDR4 0.001 DDR5 Comparison to OMI - In Production since 2019
  • 17. Memory Interface Comparison OMI, the ideal Processor Shared Memory Interface! Speci f ication LRDIMM DDR4 DDR5 HBM2E(8 - High) OMI Protocol Parallel Parallel Parallel Serial Signalling Single-Ended Single-Ended Single-Ended Di ff erential I/O Type Duplex Duplex Simplex Simplex LANES/Channel (Read/ Write) 64 32 512R/512W 8R/8W LANE Speed 3,200MT/s 6,400MT/s 3,200MT/S 32,000MT/s Channel Bandwidth (R+W) 25.6GBytes/s 25.6GBytes/s 400GBytes/s 64GBytes/s Latency 41.5ns ? 60.4ns 45.5ns Driver Area / Channel 7.8mm2 3.9mm2 11.4mm2 2.2mm2 Bandwidth/mm2 3.3GBytes/s/mm2 6.6GBytes/s/mm2 35GBytes/s/mm2 33.9GBytes/s/mm2 Max Capacity / Channel 64GB 256GB 16GB 256GB Connection Multi Drop Multi Drop Point-to-Point Point-to-Point Data Resilience Parity Parity Parity CRC Similar Bandwidth/mm2 provides an opportunity for an HBM Memory with an OMI Interface on its logic layer. Brings Flexibility and Capacity options to Processors with HBM Interfaces!
  • 18. OMI Today on IBM’s POWER10 Die POWER10 18B Transisters on Samsung 7nm - 602 mm2 ~24.26mm x ~24.82mm Die photo courtesy of Samsung Foundry Scale 1mm : 20pts OMI Memory PHY Area 2 Channels 1.441mm x 2.626mm 3.78mm2 Or 1.441mm x 1.313mm / Channel 1.89mm2 / Channel Or 30.27mm2 for 16x Channels Peak Bandwidth per Channel = 32Gbits/s * 8 * 2(Tx + Rx) = 64 GBytes/s Peak Bandwidth per Area = 64 GBytes/s / 1.89mm2 33.9 GBytes/s/mm2 Maximum DRAM Capacity per OMI DDIMM = 256GB 32Gb/s x8 OMI Channel OMI Bu ff er Chip 30dB @ <5pJ/bit 2.5W per 64GBytes/s Tx + Rx OMI Channel At each end DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS 16Gbit Monolithic Memory Jedec con f igurations 32GByte 1U OMI DDIMM 64GByte 2U OMI DDIMM 256GByte 4U OMI DDIMM DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS Same TA - 1002 EDSFF Connector 2019’s 25.6Gbit/s DDR4 OMI DDIMM Locked ratio to the DDR Speed 21.33Gb/s x8 - DDR4 - 2667 25.6Gb/s x8 - DDR4/5 - 3200 32Gb/s x8 - DDR5 - 4000 38.4Gb/s - DDR5 - 4800 42.66Gb/s - DDR5 - 5333 51.2Gb/s - DDR5 - 6400 <2ns (without wire) <2ns (without wire) Serdes Phy Latency Mesochronous clocking E3.S Other Potential Emerging EDSFF Media Formats Up to 512GByte Dual OMI Channel OMI Phy
  • 19. OMI Bandwidth vs SPFLOPs OMI Helping to Address Memory Bound Applications • Tailoring OPS : Bytes/s : Bytes Capacity to Application Needs Die Size shrink = 7x OMI Bandwidth reduction = 2.8x SPFLOPS reduction = 15x Theoretical Maximum of 80 OMI Channels OMI Bandwidth = 5.1 TBytes/s NVidia Ampere Max Reticule Size Die ~30 SPFLOPS Maximum Reticule Size Die @ 7nm 826mm2 ~32.18mm x ~25.66mm 28 OMI Channels = 1.8TByte/s 2 SPTFLOPs 117mm2 10.8 x 10.8 To Scale 10pts : 1mm
  • 20. Overview • Why Decouple Compute from Memory, Storage & IO? • Top Down Systems Perspective and Introduction of OCP HPC Concepts • Introduction to the Open Memory Interface, OMI • Decoupling Compute with OCP OAM - HPC Module & OMI
  • 21. OAM - HPC Con f iguration Examples - Modular, Flexible & Composable 1, 2 or 3 Port - Shared Memory OMI Chiplet Buffers HBM 8x Channel OMI Enabled Logic Layer HBM 8 Channel OMI Enabled Logic Layer EDSFF 4C Connector Medium Reach OMI Interconnect OMI MR Extender Bu ff er <500ps round trip delay Nearstack Connector Fabric Interconnect e.g. Ethernet / In f iniband Passive Fabric Cable E3.S Module 1 or 2 Port Shared Memory Controller Optional In Bu ff er Near Memory Processor XSR - NRZ PHY OMI DLX OMI TLX OMI 1 or 2 port Bu ff er Chiplet OAM-HPC Module Maximum Reticule Size Processor with 80 OMI XSR Channels EDSFF 4C Connector Fabric Interconnect e.g. OpenCAPI / CXL / GenZ / CCIX Protocol Speci f ic Active Fabric Cable 2 Port OMI Bu ff er Chiplet with integrated Shared Memory A Bu ff er for each Fabric Standard XSR DLX TLX XSR - NRZ PHY OMI DLX OMI TLX XSR - NRZ PHY OMI DLX OMI TLX Optional Near Memory Processor Chiplet XSR - NRZ PHY OMI DLX OMI TLX 3 Port Shared Memory Controller OMI 3 port Bu ff er Chiplet OCP-NIC-3.0 Module Fabric Interconnect e.g. OpenCAPI / CXL / GenZ / Ethernet / In f iniband
  • 23. 8x Cold Plate Mounted OCP OAM - HPC Modules Pluggable into OCP OAI Chassis • Summary • Rede f ine Computing Architecture • With a Focus on Power and Latency • Shared Memory Centric Architecture • Leverage Open Memory Interface, OMI • Dense OCP Modular Platform Approach
  • 24. Interested? - How to Get Involved From Silicon Startups to Open Hardware Enthusiasts alike • Step 1 - Join the OpenCAPI consortium & OCP HPC Sub-Project • Step 2 - Replace DDR with Standard OMI interfaces on next processor design • Step 3 - Add low power OMI PHYs to spare beachfront on your Processors • Step 4 - Build OAM - HPC Modules around your large Processor Devices • Step 5 - Help community build Speci f ic 2 & 3 port OMI chiplet buffers • Step 6 - Help community build OMI Buffer enabled E3.S & NIC 3.0 modules etc
  • 25. Questions? Contact me at a.cantle@nallasway.com Join OpenCAPI Consortium at https://opencapi.org Join OCP HPC Sub-Project Workgroup at https://www.opencompute.org/wiki/HPC