SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Receive side scaling (RSS) with
eBPF in QEMU and virtio-net
Yan Vugenfirer - CEO, Daynix
Agenda
• What is RSS?


• History: RSS and virtio-net


• What is eBPF?


• Using eBPF for packet steering (RSS) in
virtio-net
What is RSS?
• Receive side scaling - distribution of packets’ processing
among CPUs


• A NIC uses a hashing function to compute a hash value
over a defined area


• Hash value is used to index an indirection table


• The values in the indirection table are used to assign the
received data to a CPU


• With MSI support, a NIC can also interrupt the associated
CPU
CPU1 CPU2 CPU3
CPU0
What is RSS?
NIC
RX	handling RX	handling RX	handling RX	handling
NIC	Driver
HW	
queue
HW	
queue
HW	
queue
HW	
queue
Packet	classification
What is RSS?
Packet	
header
Hash	
function	
(Toeplitz)
QueueCPU	
number
Redirection	table	(up	to	128	entries)
History: RSS and virtio-net
• Let’s use RSS with virtio-net!


• Utilisation CPUs for packet processing


• Cash locality for network applications


• Microsoft WHQL requirement for high speed
devices
History: RSS and virtio-net
• No multi-queue in virtio


• SW implementation in Windows guest driver (similar to RFS in Linux)
CPU1 CPU2 CPU3
CPU0
virtio-net
RX	
handling
RX	
handling
RX	
handling
NetKVM	
RX	virt-queue
Packet	classification
RX	handling
History: RSS and virtio-net
• virtio-net became multi queue device


• Due to Windows requirements - hybrid
model. Interrupt received on specific CPU
core, but could be rescheduled to another


• Works good for TCP


• Might not work for UDP


• Support legacy interrupts for old OSes
History: RSS and virtio-net
CPU1
CPU0
virtio-net
NetKVM	
RX	virt-queue
Packet	classification
RX	handling
RX	virt-queue
Packet	classification
RX	handling
History: RSS and virtio-net
• virtio spec changes


• Set steering mode


• Pass the device redirection tables


• Set hash value in virtio-net-hdr


• No inter-processor interrupts due to re-scheduling


• Vision: HW will do all the heavy work


• Implementations


• SW only POC in QEMU


• eBPF
virtio spec changes - capabilities
• VIRTIO_NET_F_RSS


• VIRTIO_NET_F_MQ must be set
virtio spec changes - device configuration
• virtio_net_config
struct virtio_net_con
fi
g
{

u8 mac[6]
;

le16 status
;

le16 max_virtqueue_pairs
;

le16 mtu
;

le32 speed
;

u8 duplex
;

u8 rss_max_key_size
;

le16 rss_max_indirection_table_length
;

le32 supported_hash_types
;

};
virtio spec changes - setting RSS parameters
• VIRTIO_NET_CTRL_MQ_RSS_CONFIG
struct virtio_net_rss_con
fi
g
{

le32 hash_types
;

le16 indirection_table_mask
;

le16 unclassi
fi
ed_queue
;

le16 indirection_table[indirection_table_length]
;

le16 max_tx_vq
;

u8 hash_key_length
;

u8 hash_key_data[hash_key_length]
;

};
virtio spec changes - virtio-net-hdr
struct virtio_net_hdr
{

u8
fl
ags
;

u8 gso_type
;

le16 hdr_len
;

le16 gso_size
;

le16 csum_start
;

le16 csum_offset
;

le16 num_buffers
;

le32 hash_value; (Only if
VIRTIO_NET_F_HASH_REPORT negotiated
)

le16 hash_report; (Only if
VIRTIO_NET_F_HASH_REPORT negotiated
)

le16 padding_reserved; (Only if
VIRTIO_NET_F_HASH_REPORT negotiated
)

};
What is eBPF?
• Enable running sandboxed code in Linux
kernel


• The code can be loaded at run time


• Used for security, tracing, networking,
observability
How can eBPF help us?
• Calculate the RSS hash and return the
queue index for incoming packets


• Populate the hash value in virtio_net_hdr
(work in progress)
The “magic”
• Loading eBPF program using IOCTL
TUNSETSTEERINGEBPF


• tun_struct has steering_prog field


• If eBPF program for steering is loaded,
tun_select_queue will call it with
bpf_prog_run_clear_cb
Hash population (work in progress)
• Population from eBPF program


• virtio_net_hdr with additional fields


• Work in progress in kernel


• Enlarge virtio_net_hdr in all kernel
modules


• Keep calculated hash in SKB and copy
it to virtio_net_hdr
eBPF program source in QEMU
• tun_rss_steering_prog


• tools/ebpf/rss.bpf.c


• Use clang to compile


• tools/ebpf/Makefile.ebpf
eBPF program skeleton
• During QEMU compilation include file is populated
with the compiled binary


• bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h


• Helpers to initialise maps (mechanism to share data
between eBPF program and kerneluserspace)


• Some changes to support libvirt - mmapping the
shared data structure to user space (3 maps in
current main branch without mmaping, 1 map in
pending patches)
Configuration map
• The configuration map is BPF array map that
contains everything required for RSS:


• Supported hash flows: IPv4, TCPv4, UDPv4,
IPv6, IPv6ex, TCPv6, UDPv6


• Indirections table size (max 128)


• Default queue


• Toeplitz hash key - 40 bytes


• Indirections table - 128 entries
Loading eBPF program
• Two mechanisms


• QEMU using function in skeleton file.
Calling bpf syscall


• eBPF helper program (with libvirt) -
QEMU gets file descriptors from libvirt
with already loaded ebpf program and
mapping of the ebpf map (patches under
review)
Loading eBPF program
• Possible load failures


• Kernel support. Current solution requires 5.8+


• Without helper


• QEMU process capabilities: CAP_BPF,
CAP_NET_ADMIN


• sysctl kernel.unprivileged_bpf_disabled=1


• libbpf not present


• In case of helper usage - mismatch between helper and
QEMU


• Stamp is a hash of skeleton include file
Fallback
• Built it QEMU RSS steering


• Can be triggered also by live migration


• Hash population is enabled in QEMU
command line, because there is still not
hash population from eBPF program
Live migration
• Known issue: migrating to old kernel


• eBPF load failure


• Fallback to in-QEMU RSS steering
QEMU command line
• Multi-queue should be enabled


• -smp with vCPU for each queue-pair


• -device virtio-net-pci,
rss=on,hash=on,ebpf_rss_fds=<fd0,fd1>
QEMU command line
• rss=on


• Try to load eBPF from skeleton or by using
provided file descriptors


• Fallback to “built-in” RSS steering in QEMU if
cannot load eBPF program


• hash=on


• Populate hash in virtio_net_hdr


• ebpf_rss_fds - optional, provide file descriptors for
eBPF program and map
libvirt integration
• QEMU should run with least possible privileges


• eBPF helper


• Stamping the helper during compilation time


• Redirection table mapping


• Additional command line options to provide file
descriptors to QEMU


• Patches under review
Current status
• Initial support was merged to QEMU


• libvirt integration patches in QEMU and
libvirt are under discussion on mailing lists


• Hash population by eBPF program -
pending additional work for next set of
patches
Pending patches
• QEMU libvirt integration: https://lists.nongnu.org/
archive/html/qemu-devel/2021-07/msg03535.html


• libvirt patches: https://listman.redhat.com/archives/
libvir-list/2021-July/msg00836.html


• RSS support in Linux virtio-net driver: https://
lists.linuxfoundation.org/pipermail/virtualization/
2021-August/055940.html


• In kernel hash calculation reporting to guest driver:
https://lkml.org/lkml/2021/1/12/1329
virtio-net and eBPF future
• Packet filtering with vhost


• Security?
yan@daynix.com
Links
• https://www.kernel.org/doc/Documentation/networking/scaling.txt


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/
introduction-to-receive-side-scaling


• https://ebpf.io


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
with-a-single-hardware-receive-queue


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
with-hardware-queuing


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
with-message-signaled-interrupts


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
hashing-functions

Weitere ähnliche Inhalte

Was ist angesagt?

/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity
Takuya ASADA
 

Was ist angesagt? (20)

Linux dma engine
Linux dma engineLinux dma engine
Linux dma engine
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity
 
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting router
 
Hands-on ethernet driver
Hands-on ethernet driverHands-on ethernet driver
Hands-on ethernet driver
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Linux DMA Engine
Linux DMA EngineLinux DMA Engine
Linux DMA Engine
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018
 
Embedded linux network device driver development
Embedded linux network device driver developmentEmbedded linux network device driver development
Embedded linux network device driver development
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Pcie drivers basics
Pcie drivers basicsPcie drivers basics
Pcie drivers basics
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction
 
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 

Ähnlich wie Receive side scaling (RSS) with eBPF in QEMU and virtio-net

CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 

Ähnlich wie Receive side scaling (RSS) with eBPF in QEMU and virtio-net (20)

Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
Linux Kernel Live Patching
Linux Kernel Live PatchingLinux Kernel Live Patching
Linux Kernel Live Patching
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
15 ia64
15 ia6415 ia64
15 ia64
 
ch2.pptx
ch2.pptxch2.pptx
ch2.pptx
 
Ebpf ovsconf-2016
Ebpf ovsconf-2016Ebpf ovsconf-2016
Ebpf ovsconf-2016
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
 
Load Balancing
Load BalancingLoad Balancing
Load Balancing
 
Building a Router
Building a RouterBuilding a Router
Building a Router
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology
 
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 

Mehr von Yan Vugenfirer

Advanced NDISTest options
Advanced NDISTest optionsAdvanced NDISTest options
Advanced NDISTest options
Yan Vugenfirer
 

Mehr von Yan Vugenfirer (14)

HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
 
Implementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migrationImplementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migration
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
 
Windows network teaming
Windows network teamingWindows network teaming
Windows network teaming
 
Rebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUpRebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUp
 
Rebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday partyRebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday party
 
Contributing to open source using Git
Contributing to open source using GitContributing to open source using Git
Contributing to open source using Git
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
 
Microsoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setupMicrosoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setup
 
UsbDk at a Glance 
UsbDk at a Glance UsbDk at a Glance 
UsbDk at a Glance 
 
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
 
Advanced NDISTest options
Advanced NDISTest optionsAdvanced NDISTest options
Advanced NDISTest options
 
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
 
Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012
 

Kürzlich hochgeladen

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Kürzlich hochgeladen (20)

tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Receive side scaling (RSS) with eBPF in QEMU and virtio-net

  • 1. Receive side scaling (RSS) with eBPF in QEMU and virtio-net Yan Vugenfirer - CEO, Daynix
  • 2. Agenda • What is RSS? • History: RSS and virtio-net • What is eBPF? • Using eBPF for packet steering (RSS) in virtio-net
  • 3. What is RSS? • Receive side scaling - distribution of packets’ processing among CPUs • A NIC uses a hashing function to compute a hash value over a defined area • Hash value is used to index an indirection table • The values in the indirection table are used to assign the received data to a CPU • With MSI support, a NIC can also interrupt the associated CPU
  • 4. CPU1 CPU2 CPU3 CPU0 What is RSS? NIC RX handling RX handling RX handling RX handling NIC Driver HW queue HW queue HW queue HW queue Packet classification
  • 6. History: RSS and virtio-net • Let’s use RSS with virtio-net! • Utilisation CPUs for packet processing • Cash locality for network applications • Microsoft WHQL requirement for high speed devices
  • 7. History: RSS and virtio-net • No multi-queue in virtio • SW implementation in Windows guest driver (similar to RFS in Linux) CPU1 CPU2 CPU3 CPU0 virtio-net RX handling RX handling RX handling NetKVM RX virt-queue Packet classification RX handling
  • 8. History: RSS and virtio-net • virtio-net became multi queue device • Due to Windows requirements - hybrid model. Interrupt received on specific CPU core, but could be rescheduled to another • Works good for TCP • Might not work for UDP • Support legacy interrupts for old OSes
  • 9. History: RSS and virtio-net CPU1 CPU0 virtio-net NetKVM RX virt-queue Packet classification RX handling RX virt-queue Packet classification RX handling
  • 10. History: RSS and virtio-net • virtio spec changes • Set steering mode • Pass the device redirection tables • Set hash value in virtio-net-hdr • No inter-processor interrupts due to re-scheduling • Vision: HW will do all the heavy work • Implementations • SW only POC in QEMU • eBPF
  • 11. virtio spec changes - capabilities • VIRTIO_NET_F_RSS • VIRTIO_NET_F_MQ must be set
  • 12. virtio spec changes - device configuration • virtio_net_config struct virtio_net_con fi g { u8 mac[6] ; le16 status ; le16 max_virtqueue_pairs ; le16 mtu ; le32 speed ; u8 duplex ; u8 rss_max_key_size ; le16 rss_max_indirection_table_length ; le32 supported_hash_types ; };
  • 13. virtio spec changes - setting RSS parameters • VIRTIO_NET_CTRL_MQ_RSS_CONFIG struct virtio_net_rss_con fi g { le32 hash_types ; le16 indirection_table_mask ; le16 unclassi fi ed_queue ; le16 indirection_table[indirection_table_length] ; le16 max_tx_vq ; u8 hash_key_length ; u8 hash_key_data[hash_key_length] ; };
  • 14. virtio spec changes - virtio-net-hdr struct virtio_net_hdr { u8 fl ags ; u8 gso_type ; le16 hdr_len ; le16 gso_size ; le16 csum_start ; le16 csum_offset ; le16 num_buffers ; le32 hash_value; (Only if VIRTIO_NET_F_HASH_REPORT negotiated ) le16 hash_report; (Only if VIRTIO_NET_F_HASH_REPORT negotiated ) le16 padding_reserved; (Only if VIRTIO_NET_F_HASH_REPORT negotiated ) };
  • 15. What is eBPF? • Enable running sandboxed code in Linux kernel • The code can be loaded at run time • Used for security, tracing, networking, observability
  • 16. How can eBPF help us? • Calculate the RSS hash and return the queue index for incoming packets • Populate the hash value in virtio_net_hdr (work in progress)
  • 17. The “magic” • Loading eBPF program using IOCTL TUNSETSTEERINGEBPF • tun_struct has steering_prog field • If eBPF program for steering is loaded, tun_select_queue will call it with bpf_prog_run_clear_cb
  • 18. Hash population (work in progress) • Population from eBPF program • virtio_net_hdr with additional fields • Work in progress in kernel • Enlarge virtio_net_hdr in all kernel modules • Keep calculated hash in SKB and copy it to virtio_net_hdr
  • 19. eBPF program source in QEMU • tun_rss_steering_prog • tools/ebpf/rss.bpf.c • Use clang to compile • tools/ebpf/Makefile.ebpf
  • 20. eBPF program skeleton • During QEMU compilation include file is populated with the compiled binary • bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h • Helpers to initialise maps (mechanism to share data between eBPF program and kerneluserspace) • Some changes to support libvirt - mmapping the shared data structure to user space (3 maps in current main branch without mmaping, 1 map in pending patches)
  • 21. Configuration map • The configuration map is BPF array map that contains everything required for RSS: • Supported hash flows: IPv4, TCPv4, UDPv4, IPv6, IPv6ex, TCPv6, UDPv6 • Indirections table size (max 128) • Default queue • Toeplitz hash key - 40 bytes • Indirections table - 128 entries
  • 22. Loading eBPF program • Two mechanisms • QEMU using function in skeleton file. Calling bpf syscall • eBPF helper program (with libvirt) - QEMU gets file descriptors from libvirt with already loaded ebpf program and mapping of the ebpf map (patches under review)
  • 23. Loading eBPF program • Possible load failures • Kernel support. Current solution requires 5.8+ • Without helper • QEMU process capabilities: CAP_BPF, CAP_NET_ADMIN • sysctl kernel.unprivileged_bpf_disabled=1 • libbpf not present • In case of helper usage - mismatch between helper and QEMU • Stamp is a hash of skeleton include file
  • 24. Fallback • Built it QEMU RSS steering • Can be triggered also by live migration • Hash population is enabled in QEMU command line, because there is still not hash population from eBPF program
  • 25. Live migration • Known issue: migrating to old kernel • eBPF load failure • Fallback to in-QEMU RSS steering
  • 26. QEMU command line • Multi-queue should be enabled • -smp with vCPU for each queue-pair • -device virtio-net-pci, rss=on,hash=on,ebpf_rss_fds=<fd0,fd1>
  • 27. QEMU command line • rss=on • Try to load eBPF from skeleton or by using provided file descriptors • Fallback to “built-in” RSS steering in QEMU if cannot load eBPF program • hash=on • Populate hash in virtio_net_hdr • ebpf_rss_fds - optional, provide file descriptors for eBPF program and map
  • 28. libvirt integration • QEMU should run with least possible privileges • eBPF helper • Stamping the helper during compilation time • Redirection table mapping • Additional command line options to provide file descriptors to QEMU • Patches under review
  • 29. Current status • Initial support was merged to QEMU • libvirt integration patches in QEMU and libvirt are under discussion on mailing lists • Hash population by eBPF program - pending additional work for next set of patches
  • 30. Pending patches • QEMU libvirt integration: https://lists.nongnu.org/ archive/html/qemu-devel/2021-07/msg03535.html • libvirt patches: https://listman.redhat.com/archives/ libvir-list/2021-July/msg00836.html • RSS support in Linux virtio-net driver: https:// lists.linuxfoundation.org/pipermail/virtualization/ 2021-August/055940.html • In kernel hash calculation reporting to guest driver: https://lkml.org/lkml/2021/1/12/1329
  • 31. virtio-net and eBPF future • Packet filtering with vhost • Security?
  • 33.
  • 34. Links • https://www.kernel.org/doc/Documentation/networking/scaling.txt • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/ introduction-to-receive-side-scaling • https://ebpf.io • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- with-a-single-hardware-receive-queue • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- with-hardware-queuing • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- with-message-signaled-interrupts • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- hashing-functions