SlideShare a Scribd company logo
1 of 45
Download to read offline
Production
Snabb
Simple, fast software networking
with Snabb
20 January 2017 – linux.conf.au
Andy Wingo wingo@igalia.com
@andywingo
hey
hacker
User-space networking is for us!
Snabb is a great way to do it!
Make a thing with Snabb!
(hi)story You are an ISP
The distant past: the year 2000
To set up: you lease DSL exchanges,
bandwidth, core routers
Mission accomplished!
(hi)story The distant past: the year 2005
You still pay for DSL, bandwidth,
routers
Also you have some boxes doing
VoIP (more cash)
(hi)story The distant past: the year 2010
You still pay for DSL, bandwidth,
routers, VoIP
OMG TV!!!
Also we are running out of IPv4!!!
Also the subscriber fee is still the
same!!!!!!!
(hi)story Trend: ISPs have to do more (VoIP,
TV, VOD, cloud, carrier NAT)
“Doing more”: more expensive
boxes in the rack ($70k/port?)
Same story with many other users
Isn’t there a better way?
material
conditions
In the meantime, commodity
hardware caught up
Xeon dual-socket, >12 core/
socket
❧
Many 10Gbps PCIe network cards
(NICs)
❧
100-200 Gbps/server
10-15 million packets per second
(MPPS) per core+NIC pair
70 ns/packet
Let’s do it!
alternate
(hi)story
The teleology of open source: “one
day this will all run Linux”
Conventional wisdom: if I walk the
racks of a big ISP, it’s probably all
Linux
linux? The teleology of open source: “one
day this will all run Linux”
Conventional wisdom: if I walk the
racks of a big ISP, it’s probably all
Linux
Q: The hardware is ready for 10
MPPS on a core. Is Linux?
not
linux
The teleology of open source: “one
day this will all run Linux”
Conventional wisdom: if I walk the
racks of a big ISP, it’s probably all
Linux
Q: The hardware is ready for 10
MPPS on a core. Is Linux?
A: Nope
why
not
linux
Heavyweight networking stack
System/user barrier splits your
single network function into two
programs
Associated communication costs
user-
space
networking
Cut Linux-the-kernel out of the
picture; bring up card from user
space
tell Linux to forget about this PCI
device
❧
mmap device’s PCI registers into
address space
❧
poke registers as needed❧
set up a ring buffer for receive/
transmit
❧
profit!❧
(hi)story
time
The distant past: the year 2017
Multiple open source user-space
networking projects having success
Prominent ones: Snabb (2012), DPDK
(2012), VPP/fd.io (2016)
Deutsche Telekom’s TeraStream:
Vendors provide network functions
as software, not physical machines
How do software network functions
work?
aside Snabb aims to be rewritable software
The hard part: searching program-
space for elegant hacks
“Is that all? I could rewrite that in a
weekend.”
nutshell A snabb program consists of a graph
of apps
Apps are connected by directional
links
A snabb program processes packets
in units of breaths
local Intel82599 =
require("apps.intel.intel_app").Intel82599
local PcapFilter =
require("apps.packet_filter.pcap_filter").PcapFilter
local c = config.new()
config.app(c, "nic", Intel82599, {pciaddr="82:00.0"})
config.app(c, "filter", PcapFilter, {filter="tcp port 80"})
config.link(c, "nic.tx -> filter.input")
config.link(c, "filter.output -> nic.rx")
engine.configure(c)
while true do engine.breathe() end
breaths Each breath has two phases:
inhale a batch of packets into the
network
❧
process those packets❧
To inhale, run pull functions on
apps that have them
To process, run push functions on
apps that have them
function Intel82599:pull ()
for i = 1, engine.pull_npackets do
if not self.dev:can_receive() then
break
end
local pkt = self.dev:receive()
link.transmit(self.output.tx, pkt)
end
end
function PcapFilter:push ()
while not link.empty(self.input.rx) do
local p = link.receive(self.input.rx)
if self.accept_fn(p.data, p.length) then
link.transmit(self.output.tx, p)
else
packet.free(p)
end
end
end
packets struct packet {
uint16_t length;
unsigned char data[10*1024];
};
links struct link {
struct packet *packets[1024];
// the next element to be read
int read;
// the next element to be written
int write;
};
// (Some statistics counters elided)
voilà At this point, you can rewrite Snabb
(Please do!)
But you might want to use it as-is...
tao Snabby design principles
Simple > Complex❧
Small > Large❧
Commodity > Proprietary❧
simple Compose network functions from
simple parts
intel10g | reassemble | filter |
fragment | intel10g
Apps independently developed
Linked together at run-time
Communicating over simple
interfaces (packets and links)
small Early code budget: 10000 lines
Build in a minute
Constraints driving creativity
Secret weapon: Lua via LuaJIT
High performance with minimal fuss
small Minimize dependencies
1 minute make budget includes Snabb
and all deps (luajit, pflua, ljsyscall,
dynasm)
Deliverable is single binary
./snabb --help
./snabb top
./snabb lwaftr run ...
small Writing our own drivers, in Lua
User-space networking
The data plane is our domain, not
the kernel’s
❧
Not DPDK’s either!❧
Fits in 10000-line budget❧
commodity What’s special about a Snabb
network function?
Not the platform (assume recent
Xeon)
Not the NIC (just need a driver to
inhale some packets)
Not Snabb itself (it’s Apache 2.0)
commodity Open data sheets
Intel 82599 10Gb
Mellanox ConnectX-4 (10, 25, 40,
100Gb)
Also Linux tap interfaces, virtio host
and guest
commodity Prefer CPU over NIC where possible
Commoditize NICs – no offload
Double down on 64-bit x86 servers
status Going on 5 years old
27 patch authors last year, 1400 non-
merge commits
Deployed in a dozen sites or so
Biggest programs: NFV virtual
switch, lwAFTR IPv6 transition core
router, SWITCH.ch VPN
New in 2016: multi-process, guest
support, 100G, control plane
integration
production Igalia developed “lwAFTR”
(lightweight address family
translation router)
Central router component of
“lightweight 4-over-6” deployment
lw4o6: IPv4-as-a-service over pure
IPv6 network
Think of it like a big carrier-grade
NAT
20Gbps, 4MPPS per core
challenges (1) Make it fast
(2) Make it not lose any packets
(3) Make it integrate
(4) Make it scale up and out
fast LuaJIT does most of the work
App graph plays to LuaJIT’s
strengths: lots of little loops
Loop-invariant code motion boils
away Lua dynamism
❧
Trace compilation punches
through procedural and data
abstractions
❧
Scalar replacement eliminates all
intermediate allocations
❧
fast Speed tips could fill a talk
Prefer FFI data structures (Lua
arrays can be fine too)
Avoid data dependency chains
4MPPS: 250 ns/packet
One memory reference: 80ns
Example: hash table lookups
lossless Max average latency for 100 packets
at 4MPPS: 25 us
Max latency (512-packet receive ring
buffer): 128 us
Avoid allocation
Avoid syscalls
Avoid preemption – reserved CPU
cores, no hyperthreads
Avoid faults – NUMA / TLB /
hugepages
Lots of tuning
integrate Operators have monitoring and
control infrastructure – command
line necessary but not sufficient
Snabb now does enough YANG to
integrate with an external NETCONF
agents
Runtime configuration and state
query, update
Avoid packet loss via multi-process
protocol
scale 2017 is the year of 100G in
production Snabb; multiple
coordinated data-plane processes
Also horizontal scaling via BGP/
ECMP: terabit lw4o6 deployments
Work in progress!
more Pflua: tcpdump / BPF compiler (now
with native codegen!)
NFV: fast virtual switch
Perf tuning: “x-ray diffraction” of
internal CPU structure via PMU
registers and timelines
DynASM: generating machine code
at run-time optimized for particular
data structures
Automated benchmarking via Nix,
Hydra, and RMarkdown!
[Your cool hack here!]
thanks! Make a thing with Snabb!
git clone https://github.com/SnabbCo/snabb
cd snabb
make
wingo@igalia.com
@andywingo
oh no here comes the hidden track!
Storytime! Modern x86: who’s winning?
Clock speed same since years ago
Main memory just as far away
HPC
people
are
winning
“We need to do work on data... but
there’s just so much of it and it’s
really far away.”
Three primary improvements:
CPU can work on more data per
cycle, once data in registers
❧
CPU can load more data per
cycle, once it’s in cache
❧
CPU can make more parallel
fetches to L3 and RAM at once
❧
Networking
folks
can
win
too
Instead of chasing zero-copy, tying
yourself to ever-more-proprietary
features of your NIC, just take the hit
once: DDIO into L3.
Copy if you need to – copies with L3
not expensive.
Software will eat the world!
Networking
folks
can
win
too
Once in L3, you have:
wide loads and stores via AVX2
and soon AVX-512 (64 bytes!)
❧
pretty good instruction-level
parallelism: up to 16 concurrent
L2 misses per core on haswell
❧
wide SIMD: checksum in
software!
❧
software, not firmware❧

More Related Content

What's hot

Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
inside-BigData.com
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
inside-BigData.com
 

What's hot (20)

Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
 
Accelerating apache spark with rdma
Accelerating apache spark with rdmaAccelerating apache spark with rdma
Accelerating apache spark with rdma
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Yeti DNS Project
Yeti DNS ProjectYeti DNS Project
Yeti DNS Project
 
RuG Guest Lecture
RuG Guest LectureRuG Guest Lecture
RuG Guest Lecture
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
 
GEN-Z: An Overview and Use Cases
GEN-Z: An Overview and Use CasesGEN-Z: An Overview and Use Cases
GEN-Z: An Overview and Use Cases
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systems
 
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache Flink
 
Ucx an open source framework for hpc network ap is and beyond
Ucx  an open source framework for hpc network ap is and beyondUcx  an open source framework for hpc network ap is and beyond
Ucx an open source framework for hpc network ap is and beyond
 
deep learning in production cff 2017
deep learning in production cff 2017deep learning in production cff 2017
deep learning in production cff 2017
 
Spark optimization
Spark optimizationSpark optimization
Spark optimization
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
cisco 7200
cisco 7200 cisco 7200
cisco 7200
 
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 

Similar to Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2017)

Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Igalia
 

Similar to Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2017) (20)

Snabb, a toolkit for building user-space network functions (ES.NOG 20)
Snabb, a toolkit for building user-space network functions (ES.NOG 20)Snabb, a toolkit for building user-space network functions (ES.NOG 20)
Snabb, a toolkit for building user-space network functions (ES.NOG 20)
 
Practical virtual network functions with Snabb (8th SDN Workshop)
Practical virtual network functions with Snabb (8th SDN Workshop)Practical virtual network functions with Snabb (8th SDN Workshop)
Practical virtual network functions with Snabb (8th SDN Workshop)
 
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
Modern VoIP in modern infrastructures
Modern VoIP in modern infrastructuresModern VoIP in modern infrastructures
Modern VoIP in modern infrastructures
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
Io t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doinIo t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doin
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
 
FD.io - The Universal Dataplane
FD.io - The Universal DataplaneFD.io - The Universal Dataplane
FD.io - The Universal Dataplane
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
5G Cellular D2D RDMA Clusters
5G Cellular D2D RDMA Clusters5G Cellular D2D RDMA Clusters
5G Cellular D2D RDMA Clusters
 
DevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructure
DevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructureDevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructure
DevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructure
 

More from Igalia

Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
Igalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
Igalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Igalia
 

More from Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2017)

  • 1. Production Snabb Simple, fast software networking with Snabb 20 January 2017 – linux.conf.au Andy Wingo wingo@igalia.com @andywingo
  • 2. hey hacker User-space networking is for us! Snabb is a great way to do it! Make a thing with Snabb!
  • 3. (hi)story You are an ISP The distant past: the year 2000 To set up: you lease DSL exchanges, bandwidth, core routers Mission accomplished!
  • 4. (hi)story The distant past: the year 2005 You still pay for DSL, bandwidth, routers Also you have some boxes doing VoIP (more cash)
  • 5. (hi)story The distant past: the year 2010 You still pay for DSL, bandwidth, routers, VoIP OMG TV!!! Also we are running out of IPv4!!! Also the subscriber fee is still the same!!!!!!!
  • 6. (hi)story Trend: ISPs have to do more (VoIP, TV, VOD, cloud, carrier NAT) “Doing more”: more expensive boxes in the rack ($70k/port?) Same story with many other users Isn’t there a better way?
  • 7. material conditions In the meantime, commodity hardware caught up Xeon dual-socket, >12 core/ socket ❧ Many 10Gbps PCIe network cards (NICs) ❧ 100-200 Gbps/server 10-15 million packets per second (MPPS) per core+NIC pair 70 ns/packet Let’s do it!
  • 8. alternate (hi)story The teleology of open source: “one day this will all run Linux” Conventional wisdom: if I walk the racks of a big ISP, it’s probably all Linux
  • 9. linux? The teleology of open source: “one day this will all run Linux” Conventional wisdom: if I walk the racks of a big ISP, it’s probably all Linux Q: The hardware is ready for 10 MPPS on a core. Is Linux?
  • 10. not linux The teleology of open source: “one day this will all run Linux” Conventional wisdom: if I walk the racks of a big ISP, it’s probably all Linux Q: The hardware is ready for 10 MPPS on a core. Is Linux? A: Nope
  • 11. why not linux Heavyweight networking stack System/user barrier splits your single network function into two programs Associated communication costs
  • 12. user- space networking Cut Linux-the-kernel out of the picture; bring up card from user space tell Linux to forget about this PCI device ❧ mmap device’s PCI registers into address space ❧ poke registers as needed❧ set up a ring buffer for receive/ transmit ❧ profit!❧
  • 13. (hi)story time The distant past: the year 2017 Multiple open source user-space networking projects having success Prominent ones: Snabb (2012), DPDK (2012), VPP/fd.io (2016) Deutsche Telekom’s TeraStream: Vendors provide network functions as software, not physical machines How do software network functions work?
  • 14. aside Snabb aims to be rewritable software The hard part: searching program- space for elegant hacks “Is that all? I could rewrite that in a weekend.”
  • 15. nutshell A snabb program consists of a graph of apps Apps are connected by directional links A snabb program processes packets in units of breaths
  • 16. local Intel82599 = require("apps.intel.intel_app").Intel82599 local PcapFilter = require("apps.packet_filter.pcap_filter").PcapFilter local c = config.new() config.app(c, "nic", Intel82599, {pciaddr="82:00.0"}) config.app(c, "filter", PcapFilter, {filter="tcp port 80"}) config.link(c, "nic.tx -> filter.input") config.link(c, "filter.output -> nic.rx") engine.configure(c) while true do engine.breathe() end
  • 17. breaths Each breath has two phases: inhale a batch of packets into the network ❧ process those packets❧ To inhale, run pull functions on apps that have them To process, run push functions on apps that have them
  • 18. function Intel82599:pull () for i = 1, engine.pull_npackets do if not self.dev:can_receive() then break end local pkt = self.dev:receive() link.transmit(self.output.tx, pkt) end end
  • 19. function PcapFilter:push () while not link.empty(self.input.rx) do local p = link.receive(self.input.rx) if self.accept_fn(p.data, p.length) then link.transmit(self.output.tx, p) else packet.free(p) end end end
  • 20. packets struct packet { uint16_t length; unsigned char data[10*1024]; };
  • 21. links struct link { struct packet *packets[1024]; // the next element to be read int read; // the next element to be written int write; }; // (Some statistics counters elided)
  • 22. voilà At this point, you can rewrite Snabb (Please do!) But you might want to use it as-is...
  • 23. tao Snabby design principles Simple > Complex❧ Small > Large❧ Commodity > Proprietary❧
  • 24. simple Compose network functions from simple parts intel10g | reassemble | filter | fragment | intel10g Apps independently developed Linked together at run-time Communicating over simple interfaces (packets and links)
  • 25. small Early code budget: 10000 lines Build in a minute Constraints driving creativity Secret weapon: Lua via LuaJIT High performance with minimal fuss
  • 26. small Minimize dependencies 1 minute make budget includes Snabb and all deps (luajit, pflua, ljsyscall, dynasm) Deliverable is single binary ./snabb --help ./snabb top ./snabb lwaftr run ...
  • 27. small Writing our own drivers, in Lua User-space networking The data plane is our domain, not the kernel’s ❧ Not DPDK’s either!❧ Fits in 10000-line budget❧
  • 28. commodity What’s special about a Snabb network function? Not the platform (assume recent Xeon) Not the NIC (just need a driver to inhale some packets) Not Snabb itself (it’s Apache 2.0)
  • 29. commodity Open data sheets Intel 82599 10Gb Mellanox ConnectX-4 (10, 25, 40, 100Gb) Also Linux tap interfaces, virtio host and guest
  • 30. commodity Prefer CPU over NIC where possible Commoditize NICs – no offload Double down on 64-bit x86 servers
  • 31. status Going on 5 years old 27 patch authors last year, 1400 non- merge commits Deployed in a dozen sites or so Biggest programs: NFV virtual switch, lwAFTR IPv6 transition core router, SWITCH.ch VPN New in 2016: multi-process, guest support, 100G, control plane integration
  • 32. production Igalia developed “lwAFTR” (lightweight address family translation router) Central router component of “lightweight 4-over-6” deployment lw4o6: IPv4-as-a-service over pure IPv6 network Think of it like a big carrier-grade NAT 20Gbps, 4MPPS per core
  • 33. challenges (1) Make it fast (2) Make it not lose any packets (3) Make it integrate (4) Make it scale up and out
  • 34. fast LuaJIT does most of the work App graph plays to LuaJIT’s strengths: lots of little loops Loop-invariant code motion boils away Lua dynamism ❧ Trace compilation punches through procedural and data abstractions ❧ Scalar replacement eliminates all intermediate allocations ❧
  • 35. fast Speed tips could fill a talk Prefer FFI data structures (Lua arrays can be fine too) Avoid data dependency chains 4MPPS: 250 ns/packet One memory reference: 80ns Example: hash table lookups
  • 36. lossless Max average latency for 100 packets at 4MPPS: 25 us Max latency (512-packet receive ring buffer): 128 us Avoid allocation Avoid syscalls Avoid preemption – reserved CPU cores, no hyperthreads Avoid faults – NUMA / TLB / hugepages Lots of tuning
  • 37. integrate Operators have monitoring and control infrastructure – command line necessary but not sufficient Snabb now does enough YANG to integrate with an external NETCONF agents Runtime configuration and state query, update Avoid packet loss via multi-process protocol
  • 38. scale 2017 is the year of 100G in production Snabb; multiple coordinated data-plane processes Also horizontal scaling via BGP/ ECMP: terabit lw4o6 deployments Work in progress!
  • 39. more Pflua: tcpdump / BPF compiler (now with native codegen!) NFV: fast virtual switch Perf tuning: “x-ray diffraction” of internal CPU structure via PMU registers and timelines DynASM: generating machine code at run-time optimized for particular data structures Automated benchmarking via Nix, Hydra, and RMarkdown! [Your cool hack here!]
  • 40. thanks! Make a thing with Snabb! git clone https://github.com/SnabbCo/snabb cd snabb make wingo@igalia.com @andywingo
  • 41. oh no here comes the hidden track!
  • 42. Storytime! Modern x86: who’s winning? Clock speed same since years ago Main memory just as far away
  • 43. HPC people are winning “We need to do work on data... but there’s just so much of it and it’s really far away.” Three primary improvements: CPU can work on more data per cycle, once data in registers ❧ CPU can load more data per cycle, once it’s in cache ❧ CPU can make more parallel fetches to L3 and RAM at once ❧
  • 44. Networking folks can win too Instead of chasing zero-copy, tying yourself to ever-more-proprietary features of your NIC, just take the hit once: DDIO into L3. Copy if you need to – copies with L3 not expensive. Software will eat the world!
  • 45. Networking folks can win too Once in L3, you have: wide loads and stores via AVX2 and soon AVX-512 (64 bytes!) ❧ pretty good instruction-level parallelism: up to 16 concurrent L2 misses per core on haswell ❧ wide SIMD: checksum in software! ❧ software, not firmware❧