Linux Kernel vs DPDK: HTTP Performance Showdown

Brought to you by
Linux Kernel vs DPDK:
HTTP Performance Showdown
Marc Richards
Performance Engineer at Amazon Web Services

Marc Richards
Performance Engineer at Talawah Solutions Amazon Web Services
■ Recently moved from Kingston to Toronto
■ DevOps Engineer turned Performance Engineer
■ Interested in exploring performance in The Cloud

What is kernel-bypass
■ Bypass the Linux networking stack. Data goes straight from the NIC/driver to
the userspace application
■ It is up to the application to implement (or not) the features that the kernel
normally provides. Ideal when performance is more important than certain
features e.g. ISPs, CDNs, HFT
■ It can also used to build HTTP servers, but the application would need to
implement a TCP/IP stack.

In Defense of the Kernel
■ Most kernel vs bypass comparisons are done without much optimization on
the kernel side
■ The kernel multi-purpose, so it isn't perfectly optimized for high-speed
networking by default.
■ I wanted to know what the performance gap would look like when a ﬁnely
tuned kernel goes head to head with kernel-bypass

It isn’t all about bypass
■ Much of the “kernel-bypass” performance is not from bypassing the kernel,
but from enforcing certain constraints.
■ These constraints can be replicated with the kernel as well
● (Semi) busy polling
● Perfect locality
● Simpliﬁed TCP/IP subsystem

Seastar and DPDK
■ DPDK is a kernel-bypass project created by Intel, run by The Linux Foundation
■ Seastar is an open-source C++ framework for building high-performance
server applications, sponsored by ScyllaDB
■ Seastar has support for building applications that use either the Linux kernel
or DPDK for networking, and implements its own TCP/IP stack

Benchmark Setup
■ Cloud: AWS
■ Hardware: 4 vCPU c5n.xlarge (server) / 16 vCPU c5n.4xlarge (client)
■ Software
● Amazon Linux 2022 (kernel 5.15)
● Seastar from GitHub w/ DPDK 19.05
● Simple JSON benchmark from Techempower
● Fake HTTP server called tcp_httpd

Blog Post with More Details
https://talawah.io/blog/linux-kernel-vs-dpdk-http-performance-showdown/

DPDK on AWS
■ A lot of trial and error at first, but the ENA/DPDK docs have gotten much
better
■ Seastar uses an older version of DPDK that needs a specific fix backported to
address an conflict with the ENA driver
■ AWS also has some ENA patches for older versions of DPDK
■ https://github.com/talawahtech/dpdk/tree/http-performance

DPDK on AWS
Running 5s test @ http://172.31.XX.XX:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 204.00us
90.00% 252.00us
99.00% 297.00us
99.99% 403.00us
5954189 requests in 5.00s, 0.86GB read
Requests/sec: 1,190,822.80

Initial ﬂamegraph of tcp_httpd w/ DPDK

DPDK Optimization
■ On newer EC2 instances the network driver supports a LLQ (Low Latency
Queue) mode for improved performance
■ You need to enable the write combining feature of the VFIO kernel module
otherwise, performance will suffer
■ The VFIO module doesn't support write combining by default, but the ENA
team has a patch to add it

DPDK Optimization
50.00% 152.00us
90.00% 195.00us
99.00% 233.00us
99.99% 352.00us
7575198 requests in 5.00s, 1.09GB read

Flamegraph of tcp_httpd w/ DPDK (and write-combining)

Kernel networking stack
50.00% 696.00us
90.00% 0.85ms
99.00% 0.96ms
99.99% 1.10ms
1789658 requests in 5.00s, 264.55MB read
Requests/sec: 357,927.16

OS Level Optimizations
■ Disable Speculative Execution Mitigations
■ Conﬁgure RSS and XPS for perfect locality
■ Interrupt Moderation and Busy Polling
■ Disable Raw/Packet Sockets
■ GRO and Congestion Control
■ A few kernel 5.15 speciﬁc optimizations

OS Level Optimizations
50.00% 347.00us
90.00% 455.00us
99.00% 564.00us
99.99% 758.00us

Perfect Locality: Application changes
■ SO_ATTACH_REUSEPORT_CBPF
■ CPU Pinning
■ syscount -d 10
■ epoll_wait(timeout = 0) -> epoll_pwait2(timeout = 100us)
SYSCALL COUNT
epoll_pwait 7525419
read 7272935
sendto 6926720
epoll_ctl 824992

50.00% 271.00us
90.00% 322.00us
99.00% 378.00us
99.99% 613.00us

Before After

Context Switching
■ sar -w 1
● libreactor
● tcp_httpd
01:13:50 AM proc/s cswch/s
01:13:57 AM 0.00 277.00
01:13:58 AM 0.00 229.00
01:13:59 AM 0.00 290.00
01:03:03 AM proc/s cswch/s
01:03:04 AM 0.00 17132.00
01:03:05 AM 0.00 17060.00
01:03:06 AM 0.00 17048.00

Context Switching
50.00% 257.00us
90.00% 296.00us
99.00% 337.00us
99.99% 557.00us

It is better to RECV and Remember to Flush
■ recv slightly syscall faster than read syscall
■ Batch_ﬂushes = false

It is better to RECV and Remember to Flush
50.00% 246.00us
90.00% 288.00us
99.00% 333.00us
99.99% 436.00us

DPDK Caveats
■ Niche technology
■ Bypassing the kernel’s time-tested networking stack and ecosystem
■ Poll-mode processing = higher CPU usage
■ It is important to make sure you balance your priorities

Conclusion
■ I see that 51% gap as an opportunity!
■ To what extent can the Linux kernel be further optimized for thread-per-core
applications without compromising its general-purpose nature
■ Syscall overhead is an area of interest. io_uring may be the answer

Brought to you by
Marc Richards
https://talawah.io/contact
@talawahtech
AWS Benchmarking is hiring!

Linux Kernel vs DPDK: HTTP Performance Showdown

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Linux Kernel vs DPDK: HTTP Performance Showdown

Ähnlich wie Linux Kernel vs DPDK: HTTP Performance Showdown (20)

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Linux Kernel vs DPDK: HTTP Performance Showdown