SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Harnessing light to power new possibilities
Advantages of Optical CXL
for Disaggregated Compute Architectures
Ron Swartzentruber
Director of Engineering
2
Agenda
 Memory centric shift in the data center
 AI Large Language Model growth
 Need for optical CXL technology
 Case study: OPT inference benefits using optical CXL
© Lightelligence, Inc.
3
Physical
Machine
0
Virtual
Machine
0
Virtual
Machine
1
Stranded
resource
Physical
Machine
1
Stranded
resource
Virtual
Machine
2
Virtual
Machine
3 FLEXIBLE MANAGEABLE ECONOMICAL OPEN
Physical
Machine
1
Physical
Machine
0
Physical
Machine
2
……
……
……
Virtual
Machine
0
Virtual
Machine
1
Virtual
Machine
2
Virtual
Machine
3
Disaggregation is the Future for Datacenter
Virtual
Machine
4
CPU cores DRAM Accelerators
© Lightelligence, Inc.
4
AI trends
 AI and Large Language Models will continue to grow and consume more compute
 Disaggregated memory architectures are required in order to continue to scale
 Optical interconnects are required to extend reach
Source: https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8 Source: https://hc34.hotchips.org/assets/program/tutorials/CXL/Hot%20Chips%202022%20CXL%20MemoryChallenges.pdf
© Lightelligence, Inc.
5
Optical Interconnect Latency
© Lightelligence, Inc.
100s of ns
100s of 𝜇s
6
CXL is the PredominantStandard for Disaggregation
Cache-
coherence
Latency
Memory
decouple
CXL Yes ~100ns Supported
RDMA (ethernet) No ~3μs Not supported
CXL 2.0 Switch
Standardized Fabric Manager
H1 H2 H3 H4 H#
……
CXL2.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0
CXL1.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0
D1 D2 D3 D4 …… D#
© Lightelligence, Inc.
7
OpticalCXL is Required forScaling
ATTENUATION
(DB)
0
-10
-20
-30
-40
-50
PROPAGATION DISTANCE (M)
1m 10m
0 -0.003
-4
-40
Copper Optics
Assuming AWG26 wire, PCIe 5.0 signal
32 cables with diameter
> 6mm (CAT8)
16 fibers with diameter
of 0.125mm
…
…
6mm
> 30 mm
Copper
<1mm
Optics
Supporting 4x PCIe 5.0 x16
© Lightelligence, Inc.
8
OpticalCXL in the Datacenter
Compute
Break Through the Rack!
Memory Banks
© Lightelligence, Inc.
9
Case study: LLM Inference
CXL Memory
Expander
CXL Memory
Expander
Server
2x CXL 1.1 CPUs
 2U Supermicro server
 2x AMD Genoa CXL 1.1 CPUs
 MemVerge Memory Tiering and
Pooling Software
 2x Micron 256GB Memory Expanders each with
CXL/PCIe Gen5x8 link
Memory
Expansion
Module
Photowave
Card
 Nvidia GPU running LLM inference
 All VMs access to CXL memory
 Secure application, encrypted data
Photowave Card
© Lightelligence, Inc.
10
LLM Model List
Model Weight Memory(float16)
KV-Cache per
sample(float16)
Activation per
sample(float16)
Context length
OPT-1.3B 2.4 GB 0.095 GB 0.002 GB 512
OPT-13B 23.921 GB 0.397 GB 0.005 GB 512
OPT-30B 55.803 GB 0.667 GB 0.007 GB 512
OPT-66B 122.375 GB 1.143 GB 0.009 GB 512
OPT-175B 325GB 2.285GB 0.012GB 512
KV-cache Size: data_type * dimension* num_layers* batch_size * Context_len * 2
e.g., for opt-1.3B, FP16 -> 2Bytes * 2048 * 24 * 1 * 512 * 2 = 100,663,296 Bytes
Activation Size: data_type * dimension * batch_size * Context_len
Entire OPT-66B model fits within one 128GB CXL memory expander
© Lightelligence, Inc.

CXL: 882MB/s, System Memory 857MB/s, Disk: 582MB/s, MemVerge: 493MB/s

CXL: 2365MB/s, System Memory: 2609MB/s, Disk: 1887MB/s, MemVerge: 2173MB/s
11
Results
~2.4x
© Lightelligence, Inc.
OPT-66B
model results
Disk
(NVMe)
CXL
Memory
System
Memory
MemVerge
60:40Policy
Decode
Throughput
(Tokens/s)
1.984 4.868 6.216 6.237
Decode
Latency(s)
338.7 138.2 108.1 107.7
12
PHOTOWAVETM OPTICALCXL MEMORY EXPANDER
© Lightelligence, Inc.
PHOTOWAVETM OPTICALCXL MEMORY EXPANDER
CXL
GPU
UTILIZATION
GPU MEM.
UTILIZATION
CPU
UTILIZATION
MEM.
UTILIZATION
CXL MEM.
UTILIZATION
DECODE
THROUGHPUT
GENOA
AMD
CPU
SAMSUNG
CXL 128GB
NVIDIA
GPU: 1xA10 24GB
OPT-66B MODEL PROGRESS 99%
TOKENS/S
PARAMETERS
INFERENCE ENGINE: FLEXGEN
KV CACHE: 109.688GB
RUN MODE: CXL
WEIGHTS: 122.375GB
95%
77% 51%
27%
77%
CXL DIS
K
✔️ ✔️
13
© Lightelligence, Inc.
NVMe
Summary of Results
CXL memory offloading is efficient and
beneficial
 LLM inference case study
 Allows use of lower cost memory
Similar performance compared to pure
system memory
1.9xTCO improvement with
inexpensiveGPUs at similar
throughput
2.4x performance advantage compared
to SSD/NVMe disk offloading
14
© Lightelligence, Inc.
PhotowaveTM Form Factors
 CXL 2.0/PCIe Gen5 x16
 Jitter reduction, SI cleanup
 Sideband signals over optics
 x8, x4 or x2 bifurcation
 End-to-end latency:
 Card: under 20ns + TOF
 AOC: 1ns + TOF
Low ProfilePCIeCard OCP3.0SFFCard ActiveOpticalCables
ProductSuite Features
15
© Lightelligence, Inc.
Endnotes
Hardware
configuration
Super Micro Server
 AMD EPYC 9124 16-Core
CPU
 Samsung DDR5 4800 MT/s
 MEM0 size: 256GB
 MEM1 size: 256GB
 Bandwidth: 307GB/s
Nvidia GPU
 Gen4x16, DMEM size: 24GB
 Bandwidth: 32GB/s
Samsung NVME
 Gen4x4, MEM size: 1.92TB
 Bandwidth: 8GB/s
Micron CXL Memory
 Gen5x8, MEM size: 256GB
 Bandwidth: 32GB/s
 LLM: OPT-66B
 Batch size = 24
 Context length = 512
 Output length = 8
 FlexGen
Algorithm&Software
16
© Lightelligence, Inc.

Weitere ähnliche Inhalte

Ähnlich wie Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute Architectures ​

Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIDataWorks Summit
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxMemory Fabric Forum
 
Q1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory WallQ1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory WallMemory Fabric Forum
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems
 
Micron CXL product and architecture update
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture updateMemory Fabric Forum
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
Full scan frenzy at amadeus
Full scan frenzy at amadeusFull scan frenzy at amadeus
Full scan frenzy at amadeusMongoDB
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistParis Open Source Summit
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedis Labs
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisNomanSiddiqui41
 
MemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big MemoryMemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big MemoryMemory Fabric Forum
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Redis Labs
 

Ähnlich wie Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute Architectures ​ (20)

Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
 
Q1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory WallQ1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory Wall
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for Data
 
Micron CXL product and architecture update
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture update
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Full scan frenzy at amadeus
Full scan frenzy at amadeusFull scan frenzy at amadeus
Full scan frenzy at amadeus
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
 
11540800.ppt
11540800.ppt11540800.ppt
11540800.ppt
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
 
MemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big MemoryMemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big Memory
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
 

Mehr von Memory Fabric Forum

H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxMemory Fabric Forum
 
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Memory Fabric Forum
 
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPQ1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPMemory Fabric Forum
 
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesQ1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesMemory Fabric Forum
 
Q1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingQ1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingMemory Fabric Forum
 
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPQ1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPMemory Fabric Forum
 
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyQ1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyMemory Fabric Forum
 
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsQ1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsMemory Fabric Forum
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupMemory Fabric Forum
 
Q1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerQ1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerMemory Fabric Forum
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemQ1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemMemory Fabric Forum
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIQ1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIMemory Fabric Forum
 
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesQ1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesMemory Fabric Forum
 
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateQ1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateMemory Fabric Forum
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Memory Fabric Forum
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIMemory Fabric Forum
 
Q1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionQ1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionMemory Fabric Forum
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemory Fabric Forum
 
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMicron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMemory Fabric Forum
 
MemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemory Fabric Forum
 

Mehr von Memory Fabric Forum (20)

H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
 
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPQ1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
 
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesQ1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
 
Q1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingQ1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare Training
 
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPQ1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
 
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyQ1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
 
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsQ1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product Lineup
 
Q1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerQ1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor Primer
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemQ1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIQ1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AI
 
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesQ1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
 
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateQ1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AI
 
Q1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionQ1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory Vision
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
 
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMicron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
 
MemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXL
 

Kürzlich hochgeladen

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 

Kürzlich hochgeladen (20)

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 

Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute Architectures ​

  • 1. Harnessing light to power new possibilities Advantages of Optical CXL for Disaggregated Compute Architectures Ron Swartzentruber Director of Engineering
  • 2. 2 Agenda  Memory centric shift in the data center  AI Large Language Model growth  Need for optical CXL technology  Case study: OPT inference benefits using optical CXL © Lightelligence, Inc.
  • 3. 3 Physical Machine 0 Virtual Machine 0 Virtual Machine 1 Stranded resource Physical Machine 1 Stranded resource Virtual Machine 2 Virtual Machine 3 FLEXIBLE MANAGEABLE ECONOMICAL OPEN Physical Machine 1 Physical Machine 0 Physical Machine 2 …… …… …… Virtual Machine 0 Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 Disaggregation is the Future for Datacenter Virtual Machine 4 CPU cores DRAM Accelerators © Lightelligence, Inc.
  • 4. 4 AI trends  AI and Large Language Models will continue to grow and consume more compute  Disaggregated memory architectures are required in order to continue to scale  Optical interconnects are required to extend reach Source: https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8 Source: https://hc34.hotchips.org/assets/program/tutorials/CXL/Hot%20Chips%202022%20CXL%20MemoryChallenges.pdf © Lightelligence, Inc.
  • 5. 5 Optical Interconnect Latency © Lightelligence, Inc. 100s of ns 100s of 𝜇s
  • 6. 6 CXL is the PredominantStandard for Disaggregation Cache- coherence Latency Memory decouple CXL Yes ~100ns Supported RDMA (ethernet) No ~3μs Not supported CXL 2.0 Switch Standardized Fabric Manager H1 H2 H3 H4 H# …… CXL2.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0 CXL1.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0 D1 D2 D3 D4 …… D# © Lightelligence, Inc.
  • 7. 7 OpticalCXL is Required forScaling ATTENUATION (DB) 0 -10 -20 -30 -40 -50 PROPAGATION DISTANCE (M) 1m 10m 0 -0.003 -4 -40 Copper Optics Assuming AWG26 wire, PCIe 5.0 signal 32 cables with diameter > 6mm (CAT8) 16 fibers with diameter of 0.125mm … … 6mm > 30 mm Copper <1mm Optics Supporting 4x PCIe 5.0 x16 © Lightelligence, Inc.
  • 8. 8 OpticalCXL in the Datacenter Compute Break Through the Rack! Memory Banks © Lightelligence, Inc.
  • 9. 9 Case study: LLM Inference CXL Memory Expander CXL Memory Expander Server 2x CXL 1.1 CPUs  2U Supermicro server  2x AMD Genoa CXL 1.1 CPUs  MemVerge Memory Tiering and Pooling Software  2x Micron 256GB Memory Expanders each with CXL/PCIe Gen5x8 link Memory Expansion Module Photowave Card  Nvidia GPU running LLM inference  All VMs access to CXL memory  Secure application, encrypted data Photowave Card © Lightelligence, Inc.
  • 10. 10 LLM Model List Model Weight Memory(float16) KV-Cache per sample(float16) Activation per sample(float16) Context length OPT-1.3B 2.4 GB 0.095 GB 0.002 GB 512 OPT-13B 23.921 GB 0.397 GB 0.005 GB 512 OPT-30B 55.803 GB 0.667 GB 0.007 GB 512 OPT-66B 122.375 GB 1.143 GB 0.009 GB 512 OPT-175B 325GB 2.285GB 0.012GB 512 KV-cache Size: data_type * dimension* num_layers* batch_size * Context_len * 2 e.g., for opt-1.3B, FP16 -> 2Bytes * 2048 * 24 * 1 * 512 * 2 = 100,663,296 Bytes Activation Size: data_type * dimension * batch_size * Context_len Entire OPT-66B model fits within one 128GB CXL memory expander © Lightelligence, Inc.
  • 11.  CXL: 882MB/s, System Memory 857MB/s, Disk: 582MB/s, MemVerge: 493MB/s  CXL: 2365MB/s, System Memory: 2609MB/s, Disk: 1887MB/s, MemVerge: 2173MB/s 11 Results ~2.4x © Lightelligence, Inc. OPT-66B model results Disk (NVMe) CXL Memory System Memory MemVerge 60:40Policy Decode Throughput (Tokens/s) 1.984 4.868 6.216 6.237 Decode Latency(s) 338.7 138.2 108.1 107.7
  • 12. 12 PHOTOWAVETM OPTICALCXL MEMORY EXPANDER © Lightelligence, Inc.
  • 13. PHOTOWAVETM OPTICALCXL MEMORY EXPANDER CXL GPU UTILIZATION GPU MEM. UTILIZATION CPU UTILIZATION MEM. UTILIZATION CXL MEM. UTILIZATION DECODE THROUGHPUT GENOA AMD CPU SAMSUNG CXL 128GB NVIDIA GPU: 1xA10 24GB OPT-66B MODEL PROGRESS 99% TOKENS/S PARAMETERS INFERENCE ENGINE: FLEXGEN KV CACHE: 109.688GB RUN MODE: CXL WEIGHTS: 122.375GB 95% 77% 51% 27% 77% CXL DIS K ✔️ ✔️ 13 © Lightelligence, Inc. NVMe
  • 14. Summary of Results CXL memory offloading is efficient and beneficial  LLM inference case study  Allows use of lower cost memory Similar performance compared to pure system memory 1.9xTCO improvement with inexpensiveGPUs at similar throughput 2.4x performance advantage compared to SSD/NVMe disk offloading 14 © Lightelligence, Inc.
  • 15. PhotowaveTM Form Factors  CXL 2.0/PCIe Gen5 x16  Jitter reduction, SI cleanup  Sideband signals over optics  x8, x4 or x2 bifurcation  End-to-end latency:  Card: under 20ns + TOF  AOC: 1ns + TOF Low ProfilePCIeCard OCP3.0SFFCard ActiveOpticalCables ProductSuite Features 15 © Lightelligence, Inc.
  • 16. Endnotes Hardware configuration Super Micro Server  AMD EPYC 9124 16-Core CPU  Samsung DDR5 4800 MT/s  MEM0 size: 256GB  MEM1 size: 256GB  Bandwidth: 307GB/s Nvidia GPU  Gen4x16, DMEM size: 24GB  Bandwidth: 32GB/s Samsung NVME  Gen4x4, MEM size: 1.92TB  Bandwidth: 8GB/s Micron CXL Memory  Gen5x8, MEM size: 256GB  Bandwidth: 32GB/s  LLM: OPT-66B  Batch size = 24  Context length = 512  Output length = 8  FlexGen Algorithm&Software 16 © Lightelligence, Inc.

Hinweis der Redaktion

  1. Key message: CXL is industry consensus for disaggregation
  2. MemVerge policy: System memory 60%, CXL Memory 40%
  3. What is CPU% ******************************************************************************** CPU% mem 29.525 cxl 31.431000000000004 disk 81.46 main.py:36: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot(). ax = plt.gca(facecolor='black') ******************************************************************************** MEM% mem 27.2 cxl 27.2 disk 11.722000000000001 ******************************************************************************** GPU% mem 99.49 cxl 97.05 disk 53.01 ******************************************************************************** CXLMEM% mem 0.0016306192454823602 cxl 77.11430249904593 disk 0.1663918208702139 ******************************************************************************** GPUMEM% mem 45.17 cxl 49.0 disk 34.71 ******************************************************************************** GPUMEM_USED_MB mem 9213.4375 cxl 9213.4375 disk 8979.4375 ******************************************************************************** PCI_TX_MBps mem 274.2578125 cxl 191.357421875 disk 81.064453125 ******************************************************************************** PCI_RX_MBps mem 2007.3828125 cxl 1422.421875 disk 1158.056640625 (tfpy38) hussainazhar@Hussains-MacBook-Air T4gpu % python main.py ******************************************************************************** CPU% mem 29.525 cxl 31.431000000000004 disk 81.46 main.py:36: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot(). ax = plt.gca(facecolor='black') ******************************************************************************** MEM% mem 27.2 cxl 27.2 disk 11.722000000000001 ******************************************************************************** GPU% mem 99.49 cxl 97.05 disk 53.01 ******************************************************************************** CXLMEM% mem 0.0016306192454823602 cxl 77.11430249904593 disk 0.1663918208702139 ******************************************************************************** GPUMEM% mem 45.17 cxl 49.0 disk 34.71 ******************************************************************************** GPUMEM_USED_MB mem 9213.4375 cxl 9213.4375 disk 8979.4375 ******************************************************************************** PCI_TX_MBps mem 274.2578125 cxl 191.357421875 disk 81.064453125 ******************************************************************************** PCI_RX_MBps mem 2007.3828125 cxl 1422.421875 disk 1158.056640625