SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Userfaultfd:
Current Features, Limitations and
Future Development
This project has received funding from
the European Union's Horizon 2020
research and innovation programme
under grant agreement No 688386
Mike Rapoport
<rppt@linux.vnet.ibm.com>
Overview
● Userfaultfd is a mechanism for user-space paging and memory tracking
○ Anonymous
○ Hugetlbfs
○ Shared memory
● File descriptor with ioctl()’s for control
● [e]poll() and read() for notifications
● Page fault events (since 4.3)
● Events for virtual memory layout changes (since 4.11)
Use cases
● Userfaultfd-Missing
○ VMs and containers post-copy migration
○ Shared memory robustness
○ Host enforcement for VM memory ballooning
● Userfaultfd-Write-Protect (WIP)
○ Efficient replacement of mprotect + SIGBUS
○ Memory snapshotting
○ Garbage collectors
○ Distributed shared memory
Initialization
● Create userfault file descriptor
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
● Perform API handshake
○ Query supported features
○ Enable/disable notifications for non page fault events
ioctl(uffd, UFFDIO_API, &uffdio_api);
● Register memory ranges
userfaultfd_register(uffd, UFFDIO_REGISTER, &uffdio_register);
Simple userfaultfd monitor
pollfd[0].fd = uffd;
while (poll(pollfd, 1, -1) > 0) {
read(uffd, &uffd_msg, sizeof(uffd_msg));
switch (uffd_msg.event) {
case UFFD_EVENT_PAGEFAULT:
page_fault_event(&msg);
break;
case UFFD_EVENT_FORK:
fork_event(&msg);
break;
...
default:
unknown_event();
break;
}
User-space page fault handler
page_fault_event(struct uffdio_msg *msg)
{
struct uffdio_copy uffdio_copy = { 0 };
unsigned long address, len;
void *page_data;
address = msg->arg.pagefault.address;
get_data(address, &page_data, &len);
uffdio_copy.src = (unsigned long) page_data;
uffdio_copy.dst = address;
uffdio_copy.len = len;
ioctl(uffd, UFFDIO_COPY, &uffdio_copy);
}
Page fault processing
● Read not-present memory
● #PF handler calls handle_userfault()
● handle_userfault():
○ Suspend the faulting thread
○ Notify userfaultfd monitor
● monitor thread:
○ Resolve the pagefault
■ UFFDIO_{COPY,ZEROPAGE}
○ Wake up the faulting thread
Non-cooperative userfaultfd
● Monitor runs in different address space
● Notifications about address space changes
○ mremap
○ madvise(DONTNEED, FREE, REMOVE)
○ munmap
● Notification about fork()
madvise(DONTNEED, FREE, REMOVE)
● Application requests to free pages
● Expects to read zeros
● Userfaultfd should not intervene
app
uffd range
madvise(DONTNEED, FREE, REMOVE)
● madvise(REMOVE)
○ userfaultfd_remove()
○ Notify monitor
● Monitor:
○ userfaultfd_unregister()
mremap()
● Application moves virtual range
● Expects data at new addresses
● Userfaultfd monitor should “remap” as well
app
uffd range
uffd range
mremap()
● mremap()
○ userfaultfd_remap()
○ Notify monitor
● Monitor:
○ Update internal memory maps
fork()
● Address space is duplicated
● Child expects to see data
● Userfaultfd monitor has to fill it
app
uffd range
app’
?
fork()
● fork()
○ dup_userfaultfd()
○ Notify monitor
● Monitor:
○ Update internal state
Limitations
● No UFFDIO_ZEROPAGE for shmem/hugetlbfs
● Missing notification for fallocate(PUNCH_HOLE | KEEP_SIZE)
● Synchronous events are required for multi-threaded monitor apps
● Userfaultfd and COW
○ Same page may be shared by several processes. Now we create a copy for each.
○ COW-ed page may be “lazily” restored, need API for that
● Userfaultfd nesting/chaining
Current development
● Add UFFDIO_ZEROPAGE for shmem/hugetlbfs
○ http://www.spinics.net/lists/linux-mm/msg129484.html
● Add synchronous events
○ http://www.spinics.net/lists/linux-mm/msg127093.html
● Add notification for fallocate(PUNCH_HOLE | KEEP_SIZE)
○ Stuck at the moment
Future: userfaultfd and COW
● Enable userfaultfd for file-backed VMAs
○ Special COW-only mode?
○ UFFDIO_EVENT_PAGEFAULT_COW?
● Add API for COW-ing pages
○ UFFDIO_COPY + map to several address spaces
○ Process page fault in kernel and copy from page cache
Future: userfaultfd nesting/chaining
● Notifications about userfaultfd_{register,unregister}
● Ability to “pass” page fault to the next userfaultfd context
● Userfaultfd context prioritization
References
https://www.kernel.org/doc/Documentation/vm/userfaultfd.txt
http://man7.org/linux/man-pages/man2/userfaultfd.2.html
https://schd.ws/hosted_files/lcccna2016/c4/userfaultfd.pdf
http://wiki.qemu.org/Features/PostCopyLiveMigration
https://criu.org/Lazy_migration
https://medium.com/@MartinCracauer/generational-garbage-collection-write-barri
ers-write-protection-and-userfaultfd-2-8b0e796b8f7f

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdf
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Linux Kernel Module - For NLKB
Linux Kernel Module - For NLKBLinux Kernel Module - For NLKB
Linux Kernel Module - For NLKB
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1
 
InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
 
Linux : PSCI
Linux : PSCILinux : PSCI
Linux : PSCI
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 

Ähnlich wie Userfaultfd: Current Features, Limitations and Future Development

HKG18-110 - net_mdev: Fast path user space I/O
HKG18-110 - net_mdev: Fast path user space I/OHKG18-110 - net_mdev: Fast path user space I/O
HKG18-110 - net_mdev: Fast path user space I/O
Linaro
 

Ähnlich wie Userfaultfd: Current Features, Limitations and Future Development (20)

Userfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy MigrationUserfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy Migration
 
Userfaultfd and post copy migration
Userfaultfd and post copy migrationUserfaultfd and post copy migration
Userfaultfd and post copy migration
 
Rlite software-architecture (1)
Rlite software-architecture (1)Rlite software-architecture (1)
Rlite software-architecture (1)
 
Cloud firewall logging
Cloud firewall loggingCloud firewall logging
Cloud firewall logging
 
Automating Spatial Data Sharing
Automating Spatial Data SharingAutomating Spatial Data Sharing
Automating Spatial Data Sharing
 
Lisbon Mulesoft Meetup - Logging Aggregation & Visualization
Lisbon Mulesoft Meetup - Logging Aggregation & VisualizationLisbon Mulesoft Meetup - Logging Aggregation & Visualization
Lisbon Mulesoft Meetup - Logging Aggregation & Visualization
 
Bsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device driversBsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device drivers
 
Sprint 15
Sprint 15Sprint 15
Sprint 15
 
MTCNA Intro to routerOS
MTCNA Intro to routerOSMTCNA Intro to routerOS
MTCNA Intro to routerOS
 
SDN Programming with Go
SDN Programming with GoSDN Programming with Go
SDN Programming with Go
 
Sprint 81
Sprint 81Sprint 81
Sprint 81
 
HKG18-110 - net_mdev: Fast path user space I/O
HKG18-110 - net_mdev: Fast path user space I/OHKG18-110 - net_mdev: Fast path user space I/O
HKG18-110 - net_mdev: Fast path user space I/O
 
2015-09-16 georchestra @ foss4g2015 Seoul
2015-09-16 georchestra @ foss4g2015 Seoul2015-09-16 georchestra @ foss4g2015 Seoul
2015-09-16 georchestra @ foss4g2015 Seoul
 
geOrchestra, a free, modular and secure SDI
geOrchestra, a free, modular and secure SDIgeOrchestra, a free, modular and secure SDI
geOrchestra, a free, modular and secure SDI
 
Adopting OGC Standards in São Paulo Flood Alert System - FOSS4G 2014 - PDX
Adopting OGC Standards in São Paulo Flood Alert System  - FOSS4G 2014 - PDXAdopting OGC Standards in São Paulo Flood Alert System  - FOSS4G 2014 - PDX
Adopting OGC Standards in São Paulo Flood Alert System - FOSS4G 2014 - PDX
 
MTCNA : Intro to RouterOS - Part 1
MTCNA : Intro to RouterOS - Part 1MTCNA : Intro to RouterOS - Part 1
MTCNA : Intro to RouterOS - Part 1
 
Linux Kernel Debugging
Linux Kernel DebuggingLinux Kernel Debugging
Linux Kernel Debugging
 
The pathway to Chromium on Wayland (Web Engines Hackfest 2018)
The pathway to Chromium on Wayland (Web Engines Hackfest 2018)The pathway to Chromium on Wayland (Web Engines Hackfest 2018)
The pathway to Chromium on Wayland (Web Engines Hackfest 2018)
 
nebulaconf
nebulaconfnebulaconf
nebulaconf
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 

Mehr von Kernel TLV

Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
Kernel TLV
 

Mehr von Kernel TLV (20)

DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
SGX Trusted Execution Environment
SGX Trusted Execution EnvironmentSGX Trusted Execution Environment
SGX Trusted Execution Environment
 
Fun with FUSE
Fun with FUSEFun with FUSE
Fun with FUSE
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and Containers
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545
 
Present Absence of Linux Filesystem Security
Present Absence of Linux Filesystem SecurityPresent Absence of Linux Filesystem Security
Present Absence of Linux Filesystem Security
 
OpenWrt From Top to Bottom
OpenWrt From Top to BottomOpenWrt From Top to Bottom
OpenWrt From Top to Bottom
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance Tools
 
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
KernelTLV Speaker Guidelines
KernelTLV Speaker GuidelinesKernelTLV Speaker Guidelines
KernelTLV Speaker Guidelines
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival Guide
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
WiFi and the Beast
WiFi and the BeastWiFi and the Beast
WiFi and the Beast
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and Drivers
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network Stack
 

Kürzlich hochgeladen

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Kürzlich hochgeladen (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 

Userfaultfd: Current Features, Limitations and Future Development