SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
New sendfile(2)
Gleb Smirnoff
glebius@FreeBSD.org
FreeBSD Storage Summit
Netflix
20 February 2015
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 1 / 23
History of sendfile(2) Before sendfile(2)
Miserable life w/o sendfile(2)
while ((cnt = read(filefd, buf, (u_int)blksize))
write(netfd, buf, cnt) == cnt)
byte_count += cnt;
send_data() в src/libexec/ftpd/ftpd.c,
FreeBSD 1.0, 1993
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 2 / 23
History of sendfile(2) sendfile(2) introduced
sendfile(2) introduced
int
sendfile(int fd, int s, off_t offset, size_t nbytes, .. );
1997: HP-UX 11.00
1998: FreeBSD 3.0 and Linux 2.2
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 3 / 23
History of sendfile(2) sendfile(2) in FreeBSD
sendfile(2) in FreeBSD
First implementation - mapping userland cycle to the
kernel:
read(filefd) → VOP_READ(vnode)
write(netfd) → sosend(socket)
blksize → PAGE_SIZE
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
History of sendfile(2) sendfile(2) in FreeBSD
sendfile(2) in FreeBSD
First implementation - mapping userland cycle to the
kernel:
read(filefd) → VOP_READ(vnode)
write(netfd) → sosend(socket)
blksize → PAGE_SIZE
Further optimisations:
2004: SF_NODISKIO flag
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
History of sendfile(2) sendfile(2) in FreeBSD
sendfile(2) in FreeBSD
First implementation - mapping userland cycle to the
kernel:
read(filefd) → VOP_READ(vnode)
write(netfd) → sosend(socket)
blksize → PAGE_SIZE
Further optimisations:
2004: SF_NODISKIO flag
2006: inner cycle, working on sbspace() bytes
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
History of sendfile(2) sendfile(2) in FreeBSD
sendfile(2) in FreeBSD
First implementation - mapping userland cycle to the
kernel:
read(filefd) → VOP_READ(vnode)
write(netfd) → sosend(socket)
blksize → PAGE_SIZE
Further optimisations:
2004: SF_NODISKIO flag
2006: inner cycle, working on sbspace() bytes
2013: sending a shared memory descriptor data
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
What’s not right with sendfile(2) blocking on I/O
Problem #1: blocking on I/O
Algorithm of a modern HTTP-server:
1 Take yet another descriptor from kevent(2)
2 Do write(2)/read(2)/sendfile(2) on it
3 Go to 1
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 5 / 23
What’s not right with sendfile(2) blocking on I/O
Problem #1: blocking on I/O
Algorithm of a modern HTTP-server:
1 Take yet another descriptor from kevent(2)
2 Do write(2)/read(2)/sendfile(2) on it
3 Go to 1
Bottleneck: any syscall time.
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 5 / 23
What’s not right with sendfile(2) blocking on I/O
Attempts to solve problem #1
Separate I/O contexts: processes, threads
Apache
nginx 2
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 6 / 23
What’s not right with sendfile(2) blocking on I/O
Attempts to solve problem #1
Separate I/O contexts: processes, threads
Apache
nginx 2
SF_NODISKIO + aio_read(2)
nginx
Varnish
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 6 / 23
What’s not right with sendfile(2) blocking on I/O
More attempts . . .
aio_mlock(2) instead of aio_read(2)
aio_sendfile(2) ???
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 7 / 23
What’s not right with sendfile(2) control over
Problem #2: control over VM
VOP_READ() leaves pages in VM cache
VOP_READ() [for UFS] does readahead
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 8 / 23
What’s not right with sendfile(2) control over
Problem #2: control over VM
VOP_READ() leaves pages in VM cache
VOP_READ() [for UFS] does readahead
Not easy to prevent it doing that!
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 8 / 23
New sendfile(2) implementation above pager
waht if VOP_GETPAGES()?
VOP_READ() → VOP_GETPAGES()
Pros:
sendfile() already works on pages
implementations for vnode and shmem converge
control over VM is now easier task
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
New sendfile(2) implementation above pager
waht if VOP_GETPAGES()?
VOP_READ() → VOP_GETPAGES()
Pros:
sendfile() already works on pages
implementations for vnode and shmem converge
control over VM is now easier task
Cons
Losing readahead heuristics
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
New sendfile(2) implementation above pager
waht if VOP_GETPAGES()?
VOP_READ() → VOP_GETPAGES()
Pros:
sendfile() already works on pages
implementations for vnode and shmem converge
control over VM is now easier task
Cons
Losing readahead heuristics
But no one used them!
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
New sendfile(2) VOP_GETPAGES_ASYNC()
VOP_GETPAGES_ASYNC()
int
VOP_GETPAGES(struct vnode *vp, vm_page_t *ma,
int count, int reqpage);
1 Initialize buf(9)
2 buf->b_iodone = bdone;
3 bstrategy(buf);
4 bwait(buf); /* sleeps until I/O completes */
5 return;
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 10 / 23
New sendfile(2) VOP_GETPAGES_ASYNC()
VOP_GETPAGES_ASYNC()
int
VOP_GETPAGES_ASYNC(struct vnode *vp,
vm_page_t *ma, int count, int reqpage,
vop_getpages_iodone_t *iodone, void *arg);
1 Initialize buf(9)
2 buf->b_iodone = vnode_pager_async_iodone;
3 bstrategy(buf);
4 return;
vnode_pager_async_iodone calls iodone() .
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 10 / 23
New sendfile(2) non-blocking sendfile(2)
naive non-blocking sendfile(2)
In kern_sendfile():
1 nios++;
2 VOP_GETPAGES_ASYNC(sendfile_iodone);
In sendfile_iodone():
1 nios--;
2 if (nios) return;
3 sosend();
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 11 / 23
New sendfile(2) non-blocking sendfile(2)
the problem of naive implementation
sendfile(filefd, sockfd, ..);
write(sockfd, ..);
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 12 / 23
New sendfile(2) “not ready” data in socket buffers
socket buffer
mbuf mbuf mbuf mbuf mbuf mbuf
struct sockbuf
struct mbuf *sb_mb
struct mbuf *sb_mbtail
u_int sb_cc
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 13 / 23
New sendfile(2) “not ready” data in socket buffers
socket buffer with “not ready” data
mbuf mbuf mbuf mbuf mbuf mbuf
page page
struct sockbuf
struct mbuf *sb_mb
struct mbuf *sb_fnrdy
struct mbuf *sb_mbtail
u_int sb_acc
u_int sb_ccc
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 14 / 23
New sendfile(2) final implementation
non-blocking sendfile(2)
In kern_sendfile():
1 nios++;
2 VOP_GETPAGES_ASYNC(sendfile_iodone);
3 sosend(NOT_READY);
In sendfile_iodone():
1 nios--;
2 if (nios) return;
3 soready();
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 15 / 23
New sendfile(2) comparison with old sendfile(2)
traffic
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 16 / 23
New sendfile(2) comparison with old sendfile(2)
CPU idle
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 17 / 23
New sendfile(2) comparison with old sendfile(2)
profiling sendfile(2) in head
aio_daemon 13.64%
sys_sendfile 7.40%
t4_intr 5.66%
xpt_done 1.04%
pagedaemon 4.16%
scheduler 5.28%
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 18 / 23
New sendfile(2) comparison with old sendfile(2)
profiling new sendfile(2)
sys_sendfile 16.9%
t4_intr 8.17%
xpt_done 9.91%
pagedaemon 6.54%
scheduler 3.58%
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 19 / 23
New sendfile(2) comparison with old sendfile(2)
profiling new sendfile(2)
sys_sendfile 16.9% (vm_page_grab 9.24% !!)
t4_intr 8.17% (tcp_output() 2.07% !!)
xpt_done 9.91% (m_freem() 3.11% !!)
pagedaemon 6.54%
scheduler 3.58%
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 19 / 23
New sendfile(2) comparison with old sendfile(2)
what did change?
New code always sends full socket buffer
Which is good for TCP (as protocol)
Which hurts VM, mbuf allocator,
and unexpectedly TCP stack
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 20 / 23
New sendfile(2) comparison with old sendfile(2)
what did change?
New code always sends full socket buffer
Which is good for TCP (as protocol)
Which hurts VM, mbuf allocator,
and unexpectedly TCP stack
Will fix that!
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 20 / 23
New sendfile(2) comparison with old sendfile(2)
old sendfile(2) @ Netflix
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 21 / 23
New sendfile(2) comparison with old sendfile(2)
new sendfile(2) @ Netflix
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 21 / 23
New sendfile(2) plans and problems
TODO list
Problems:
VM & I/O overcommit
ZFS
SCTP
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 22 / 23
New sendfile(2) plans and problems
TODO list
Problems:
VM & I/O overcommit
ZFS
SCTP
Future plans:
sendfile(2) doing TLS
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 22 / 23
New sendfile(2)
Questions?
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 23 / 23

Weitere ähnliche Inhalte

Was ist angesagt?

Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFJeff Squyres
 
[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVMDouglas Chen
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealTzung-Bi Shih
 
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動するStargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動するKohei Tokunaga
 
Startup Containers in Lightning Speed with Lazy Image Distribution
Startup Containers in Lightning Speed with Lazy Image DistributionStartup Containers in Lightning Speed with Lazy Image Distribution
Startup Containers in Lightning Speed with Lazy Image DistributionKohei Tokunaga
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudAndrea Righi
 
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz SnapshotterThe overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz SnapshotterKohei Tokunaga
 
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動するStargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動するKohei Tokunaga
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingAnne Nicolas
 
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...Kohei Tokunaga
 
不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)Douglas Chen
 
Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Tracing Software Build Processes to Uncover License Compliance InconsistenciesTracing Software Build Processes to Uncover License Compliance Inconsistencies
Tracing Software Build Processes to Uncover License Compliance InconsistenciesShane McIntosh
 
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingFaster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingKohei Tokunaga
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaXKernel TLV
 

Was ist angesagt? (20)

Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOF
 
[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
 
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動するStargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
 
Move from C to Go
Move from C to GoMove from C to Go
Move from C to Go
 
Startup Containers in Lightning Speed with Lazy Image Distribution
Startup Containers in Lightning Speed with Lazy Image DistributionStartup Containers in Lightning Speed with Lazy Image Distribution
Startup Containers in Lightning Speed with Lazy Image Distribution
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Streams for the Web
Streams for the WebStreams for the Web
Streams for the Web
 
Go 1.8 Release Party
Go 1.8 Release PartyGo 1.8 Release Party
Go 1.8 Release Party
 
CMake best practices
CMake best practicesCMake best practices
CMake best practices
 
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz SnapshotterThe overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
 
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動するStargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
 
ocelot
ocelotocelot
ocelot
 
Ddev workshop t3dd18
Ddev workshop t3dd18Ddev workshop t3dd18
Ddev workshop t3dd18
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
 
不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)
 
Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Tracing Software Build Processes to Uncover License Compliance InconsistenciesTracing Software Build Processes to Uncover License Compliance Inconsistencies
Tracing Software Build Processes to Uncover License Compliance Inconsistencies
 
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingFaster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
 

Ähnlich wie New sendfile

WebKit Security Updates (GUADEC 2016)
WebKit Security Updates (GUADEC 2016)WebKit Security Updates (GUADEC 2016)
WebKit Security Updates (GUADEC 2016)Igalia
 
How to keep Eclipse on the bleeding edge in the Linux world
How to keep Eclipse on the bleeding edge in the Linux worldHow to keep Eclipse on the bleeding edge in the Linux world
How to keep Eclipse on the bleeding edge in the Linux worldArun Kumar Thondapu
 
The State of the Metasploit Framework.pdf
The State of the Metasploit Framework.pdfThe State of the Metasploit Framework.pdf
The State of the Metasploit Framework.pdfegypt
 
A tale of two RTC fuzzing approaches
A tale of two RTC fuzzing approachesA tale of two RTC fuzzing approaches
A tale of two RTC fuzzing approachesSandro Gauci
 
DockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐるDockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐるKohei Tokunaga
 
Rust & Python : Python WA October meetup
Rust & Python : Python WA October meetupRust & Python : Python WA October meetup
Rust & Python : Python WA October meetupJohn Vandenberg
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetesWilliam Stewart
 
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014Fantix King 王川
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone
 
Continuous Integration for Fun and Profit
Continuous Integration for Fun and ProfitContinuous Integration for Fun and Profit
Continuous Integration for Fun and Profitinovex GmbH
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationAnne Nicolas
 
Git workflows automat-it
Git workflows automat-itGit workflows automat-it
Git workflows automat-itAutomat-IT
 
HTTP and 5G (fixed1)
HTTP and 5G (fixed1)HTTP and 5G (fixed1)
HTTP and 5G (fixed1)dynamis
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch TuesdayIvanti
 
2024 Français Patch Tuesday - Février
2024 Français Patch Tuesday - Février2024 Français Patch Tuesday - Février
2024 Français Patch Tuesday - FévrierIvanti
 
Patch Tuesday de Febrero
Patch Tuesday de FebreroPatch Tuesday de Febrero
Patch Tuesday de FebreroIvanti
 

Ähnlich wie New sendfile (20)

WebKit Security Updates (GUADEC 2016)
WebKit Security Updates (GUADEC 2016)WebKit Security Updates (GUADEC 2016)
WebKit Security Updates (GUADEC 2016)
 
How to keep Eclipse on the bleeding edge in the Linux world
How to keep Eclipse on the bleeding edge in the Linux worldHow to keep Eclipse on the bleeding edge in the Linux world
How to keep Eclipse on the bleeding edge in the Linux world
 
The State of the Metasploit Framework.pdf
The State of the Metasploit Framework.pdfThe State of the Metasploit Framework.pdf
The State of the Metasploit Framework.pdf
 
A tale of two RTC fuzzing approaches
A tale of two RTC fuzzing approachesA tale of two RTC fuzzing approaches
A tale of two RTC fuzzing approaches
 
DockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐるDockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐる
 
Pledge in OpenBSD
Pledge in OpenBSDPledge in OpenBSD
Pledge in OpenBSD
 
Rust & Python : Python WA October meetup
Rust & Python : Python WA October meetupRust & Python : Python WA October meetup
Rust & Python : Python WA October meetup
 
Metasploitable
MetasploitableMetasploitable
Metasploitable
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
 
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
Continuous Integration for Fun and Profit
Continuous Integration for Fun and ProfitContinuous Integration for Fun and Profit
Continuous Integration for Fun and Profit
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integration
 
Git workflows automat-it
Git workflows automat-itGit workflows automat-it
Git workflows automat-it
 
HTTP and 5G (fixed1)
HTTP and 5G (fixed1)HTTP and 5G (fixed1)
HTTP and 5G (fixed1)
 
General Pr0ken File System
General Pr0ken File SystemGeneral Pr0ken File System
General Pr0ken File System
 
GTPing, How To
GTPing, How ToGTPing, How To
GTPing, How To
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch Tuesday
 
2024 Français Patch Tuesday - Février
2024 Français Patch Tuesday - Février2024 Français Patch Tuesday - Février
2024 Français Patch Tuesday - Février
 
Patch Tuesday de Febrero
Patch Tuesday de FebreroPatch Tuesday de Febrero
Patch Tuesday de Febrero
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

New sendfile

  • 1. New sendfile(2) Gleb Smirnoff glebius@FreeBSD.org FreeBSD Storage Summit Netflix 20 February 2015 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 1 / 23
  • 2. History of sendfile(2) Before sendfile(2) Miserable life w/o sendfile(2) while ((cnt = read(filefd, buf, (u_int)blksize)) write(netfd, buf, cnt) == cnt) byte_count += cnt; send_data() в src/libexec/ftpd/ftpd.c, FreeBSD 1.0, 1993 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 2 / 23
  • 3. History of sendfile(2) sendfile(2) introduced sendfile(2) introduced int sendfile(int fd, int s, off_t offset, size_t nbytes, .. ); 1997: HP-UX 11.00 1998: FreeBSD 3.0 and Linux 2.2 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 3 / 23
  • 4. History of sendfile(2) sendfile(2) in FreeBSD sendfile(2) in FreeBSD First implementation - mapping userland cycle to the kernel: read(filefd) → VOP_READ(vnode) write(netfd) → sosend(socket) blksize → PAGE_SIZE Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
  • 5. History of sendfile(2) sendfile(2) in FreeBSD sendfile(2) in FreeBSD First implementation - mapping userland cycle to the kernel: read(filefd) → VOP_READ(vnode) write(netfd) → sosend(socket) blksize → PAGE_SIZE Further optimisations: 2004: SF_NODISKIO flag Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
  • 6. History of sendfile(2) sendfile(2) in FreeBSD sendfile(2) in FreeBSD First implementation - mapping userland cycle to the kernel: read(filefd) → VOP_READ(vnode) write(netfd) → sosend(socket) blksize → PAGE_SIZE Further optimisations: 2004: SF_NODISKIO flag 2006: inner cycle, working on sbspace() bytes Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
  • 7. History of sendfile(2) sendfile(2) in FreeBSD sendfile(2) in FreeBSD First implementation - mapping userland cycle to the kernel: read(filefd) → VOP_READ(vnode) write(netfd) → sosend(socket) blksize → PAGE_SIZE Further optimisations: 2004: SF_NODISKIO flag 2006: inner cycle, working on sbspace() bytes 2013: sending a shared memory descriptor data Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
  • 8. What’s not right with sendfile(2) blocking on I/O Problem #1: blocking on I/O Algorithm of a modern HTTP-server: 1 Take yet another descriptor from kevent(2) 2 Do write(2)/read(2)/sendfile(2) on it 3 Go to 1 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 5 / 23
  • 9. What’s not right with sendfile(2) blocking on I/O Problem #1: blocking on I/O Algorithm of a modern HTTP-server: 1 Take yet another descriptor from kevent(2) 2 Do write(2)/read(2)/sendfile(2) on it 3 Go to 1 Bottleneck: any syscall time. Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 5 / 23
  • 10. What’s not right with sendfile(2) blocking on I/O Attempts to solve problem #1 Separate I/O contexts: processes, threads Apache nginx 2 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 6 / 23
  • 11. What’s not right with sendfile(2) blocking on I/O Attempts to solve problem #1 Separate I/O contexts: processes, threads Apache nginx 2 SF_NODISKIO + aio_read(2) nginx Varnish Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 6 / 23
  • 12. What’s not right with sendfile(2) blocking on I/O More attempts . . . aio_mlock(2) instead of aio_read(2) aio_sendfile(2) ??? Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 7 / 23
  • 13. What’s not right with sendfile(2) control over Problem #2: control over VM VOP_READ() leaves pages in VM cache VOP_READ() [for UFS] does readahead Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 8 / 23
  • 14. What’s not right with sendfile(2) control over Problem #2: control over VM VOP_READ() leaves pages in VM cache VOP_READ() [for UFS] does readahead Not easy to prevent it doing that! Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 8 / 23
  • 15. New sendfile(2) implementation above pager waht if VOP_GETPAGES()? VOP_READ() → VOP_GETPAGES() Pros: sendfile() already works on pages implementations for vnode and shmem converge control over VM is now easier task Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
  • 16. New sendfile(2) implementation above pager waht if VOP_GETPAGES()? VOP_READ() → VOP_GETPAGES() Pros: sendfile() already works on pages implementations for vnode and shmem converge control over VM is now easier task Cons Losing readahead heuristics Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
  • 17. New sendfile(2) implementation above pager waht if VOP_GETPAGES()? VOP_READ() → VOP_GETPAGES() Pros: sendfile() already works on pages implementations for vnode and shmem converge control over VM is now easier task Cons Losing readahead heuristics But no one used them! Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
  • 18. New sendfile(2) VOP_GETPAGES_ASYNC() VOP_GETPAGES_ASYNC() int VOP_GETPAGES(struct vnode *vp, vm_page_t *ma, int count, int reqpage); 1 Initialize buf(9) 2 buf->b_iodone = bdone; 3 bstrategy(buf); 4 bwait(buf); /* sleeps until I/O completes */ 5 return; Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 10 / 23
  • 19. New sendfile(2) VOP_GETPAGES_ASYNC() VOP_GETPAGES_ASYNC() int VOP_GETPAGES_ASYNC(struct vnode *vp, vm_page_t *ma, int count, int reqpage, vop_getpages_iodone_t *iodone, void *arg); 1 Initialize buf(9) 2 buf->b_iodone = vnode_pager_async_iodone; 3 bstrategy(buf); 4 return; vnode_pager_async_iodone calls iodone() . Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 10 / 23
  • 20. New sendfile(2) non-blocking sendfile(2) naive non-blocking sendfile(2) In kern_sendfile(): 1 nios++; 2 VOP_GETPAGES_ASYNC(sendfile_iodone); In sendfile_iodone(): 1 nios--; 2 if (nios) return; 3 sosend(); Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 11 / 23
  • 21. New sendfile(2) non-blocking sendfile(2) the problem of naive implementation sendfile(filefd, sockfd, ..); write(sockfd, ..); Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 12 / 23
  • 22. New sendfile(2) “not ready” data in socket buffers socket buffer mbuf mbuf mbuf mbuf mbuf mbuf struct sockbuf struct mbuf *sb_mb struct mbuf *sb_mbtail u_int sb_cc Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 13 / 23
  • 23. New sendfile(2) “not ready” data in socket buffers socket buffer with “not ready” data mbuf mbuf mbuf mbuf mbuf mbuf page page struct sockbuf struct mbuf *sb_mb struct mbuf *sb_fnrdy struct mbuf *sb_mbtail u_int sb_acc u_int sb_ccc Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 14 / 23
  • 24. New sendfile(2) final implementation non-blocking sendfile(2) In kern_sendfile(): 1 nios++; 2 VOP_GETPAGES_ASYNC(sendfile_iodone); 3 sosend(NOT_READY); In sendfile_iodone(): 1 nios--; 2 if (nios) return; 3 soready(); Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 15 / 23
  • 25. New sendfile(2) comparison with old sendfile(2) traffic Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 16 / 23
  • 26. New sendfile(2) comparison with old sendfile(2) CPU idle Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 17 / 23
  • 27. New sendfile(2) comparison with old sendfile(2) profiling sendfile(2) in head aio_daemon 13.64% sys_sendfile 7.40% t4_intr 5.66% xpt_done 1.04% pagedaemon 4.16% scheduler 5.28% Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 18 / 23
  • 28. New sendfile(2) comparison with old sendfile(2) profiling new sendfile(2) sys_sendfile 16.9% t4_intr 8.17% xpt_done 9.91% pagedaemon 6.54% scheduler 3.58% Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 19 / 23
  • 29. New sendfile(2) comparison with old sendfile(2) profiling new sendfile(2) sys_sendfile 16.9% (vm_page_grab 9.24% !!) t4_intr 8.17% (tcp_output() 2.07% !!) xpt_done 9.91% (m_freem() 3.11% !!) pagedaemon 6.54% scheduler 3.58% Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 19 / 23
  • 30. New sendfile(2) comparison with old sendfile(2) what did change? New code always sends full socket buffer Which is good for TCP (as protocol) Which hurts VM, mbuf allocator, and unexpectedly TCP stack Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 20 / 23
  • 31. New sendfile(2) comparison with old sendfile(2) what did change? New code always sends full socket buffer Which is good for TCP (as protocol) Which hurts VM, mbuf allocator, and unexpectedly TCP stack Will fix that! Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 20 / 23
  • 32. New sendfile(2) comparison with old sendfile(2) old sendfile(2) @ Netflix Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 21 / 23
  • 33. New sendfile(2) comparison with old sendfile(2) new sendfile(2) @ Netflix Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 21 / 23
  • 34. New sendfile(2) plans and problems TODO list Problems: VM & I/O overcommit ZFS SCTP Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 22 / 23
  • 35. New sendfile(2) plans and problems TODO list Problems: VM & I/O overcommit ZFS SCTP Future plans: sendfile(2) doing TLS Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 22 / 23
  • 36. New sendfile(2) Questions? Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 23 / 23