SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Jiannan Ouyang, John Lange
Department of Computer Science
University of Pittsburgh
VEE ’13
03/17/2013
Preemptable Ticket Spinlocks
Improving Consolidated Performance in the Cloud
Motivation
2
—  VM interference in overcommitted environments
—  OS synchronization overhead
—  Lock holder preemption (LHP)
—  Lock Waiter Preemption
—  significance analysis of lock waiter preemption
—  PreemptableTicket Spinlock
—  implementation inside Linux
—  Evaluation
—  significant speedup over Linux
Contributions
3
Spinlocks
4
—  Basics
—  lock() & unlock()
—  Busy waiting lock
—  generic spinlock: random order, unfair (starvation)
—  ticket spinlock: FIFO order, fair
—  Designed for fast mutual exclusion
—  busy waiting vs. sleep/wakeup
—  spinlocks for short & fast critical sections (~1us)
—  OS assumptions
—  use spinlocks for short critical section only
—  never preempt a thread holding or waiting a kernel spinlock
Preemption in VMs
5
—  Lock Holder Preemption (LHP)
—  virtualization breaks the OS assumption
—  vCPU holding a lock is unscheduled byVMM
—  preemption prolongs critical section (~1m v.s. ~1us)
—  Proposed Solutions
—  Co-scheduling and variants
—  Hardware-assisted scheme (Pause Loop Exiting)
—  Paravirtual spinlocks
Preemption in Ticket Lock
6
0 1 2
head = 0 tail = 2
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Preemption in Ticket Lock
7
0 1 2
head = 0 tail = 2
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Preemption in Ticket Lock
8
0 1 2 3
head = 0 tail = 3
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Preemption in Ticket Lock
9
0 1 2 3 4
head = 0 tail = 4
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Preemption in Ticket Lock
10
1 2 3 4
tail = 4head = 1
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Preemption in Ticket Lock
11
1 2 3 4
tail = 4head = 1
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Lock Holder Preemption!
Preemption in Ticket Lock
12
1 2 3 4
tail = 4head = 1
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Preemption in Ticket Lock
13
1 2 3 4
tail = 4head = 1
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Preemption in Ticket Lock
14
2 3 4
tail = 4head = 2
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Preemption in Ticket Lock
15
3 4
head = 3
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
tail = 4
Preemption in Ticket Lock
16
3 4 5
head = 3 tail = 5
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
Preemption in Ticket Lock
17
3 4 5 6
head = 3 tail = 6
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
LockWaiter Preemption
wait on available resource
Lock Waiter Preemption
18
—  Lock waiter is preempted
—  Later waiters wait on an available lock
—  Possible to adapt to it, if we
—  detect preempted waiter
—  acquire lock out of order
How significant is it??
Waiter Preemption Dominates
19
LHP + LWP LWP ​ 𝐋 𝐖𝐏/ 𝐋𝐇𝐏
+𝐋𝐖𝐏 
hackbench x1 1089 452 41.5%
hackbench x2 44342 39221 88.5%
ebizzy x1 294 166 56.5%
ebizzy x2 1017 980 96.4%
Table 2: LockWaiter Preemption Problem in the Linux Kernel
Lock waiter preemption dominates in
overcommitted environments
Challenges & Approach
20
—  How to identify a preempted waiter?
—  timeout threshold
—  How to violate order constraints?
—  allow timed out waiters get the lock randomly
—  ensure mutual exclusion between them
—  How NOT to break the whole ordering mechanism?
—  timeout threshold proportional to queue position
Queue Position Index
21
N = ticket – queue_head
—  ticket: copy of queue tail value upon enqueue
—  N: number of earlier waiters
n n+1 n+2
head = n tail = n+2
ticket = n+2
N = 2
Proportional Timeout Threshold
22
T = N x t
—  t is a constant parameter
—  large enough to avoid false detection
—  small enough to save waiting time
—  Performance is NOT t value sensitive
—  most locks take ~1us & most spinning time wasted on locks
that wait ~1ms
—  larger t does not harm & smaller t does not gain much
n n+1 n+2
head = n tail = n+2
0 t 2tTimeout
Threshold
Preemptable Ticket Spinlock
23
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
0 1 2 3 4 5
head = 0 tail = 5
0
Timeout
Threshold
t 2t 3t 4t 5t
Preemptable Ticket Spinlock
24
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
1 2 3 4 5
head = 1 tail = 5
0
Timeout
Threshold
t 2t 3t 4t
Preemptable Ticket Spinlock
25
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
1 2 3 4 5
head = 1 tail = 5
0
Timeout
Threshold
t 2t 3t 4t
Preemptable Ticket Spinlock
26
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
1 3 4 5
head = 2 tail = 5
Timeout
Threshold
t 2t 3t
N = ticket – head
Preemptable Ticket Spinlock
27
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
1 3 4 5
head = 2 tail = 5
Timeout
Threshold
t 2t 3t
Preemptable Ticket Spinlock
28
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
1 3 5
head = 3 tail = 5
Timeout
Threshold
0 2t
Preemptable Ticket Spinlock
29
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
1 3 5
head = 3 tail = 5
Timeout
Threshold
0 2t
Preemptable Ticket Spinlock
30
0
1
a scheduled waiter with ticket 0
a preempted waiter with ticket 1
1 3 5
head = 3 tail = 5
Timeout
Threshold
0 2t
Summary
31
—  PreemptableTicket Lock adapts to preemption
—  preserve order in absence of preemption
—  violate order upon preemption
—  PreemptableTicket Lock preserves fairness
—  order violations are restricted
—  priority is always given to timed out waiters
—  timed out waiters bounded by vCPU numbers of aVM
Implementation
32
—  Drop-in replacement
—  lock(), unlock(), is_locked(), trylock(), etc.
—  Correct
—  race condition free: atomic updates
—  Fast
—  performance is sensitive to lock efficiency
—  ~60 lines of C/inline-assembly in Linux 3.5.0
Paravirtual Spinlocks
33
—  Lock holder preemption is unaddressed
—  semantic gap between guest and host
—  paravirtualization: guest/host cooperation
—  signal long waiting lock / put a vCPU to sleep
—  notify to wake up a vCPU / wake up a vCPU
—  paravirtual preemptable ticket spinlock
—  sleep when waiting too long after timed out
—  wake up all sleeping waiters upon lock releasing
Evaluation
34
—  Host
—  8 core 2.6GHz Intel Core i7 CPU, 8 GB RAM, 1Gbit NIC,
Fedora 17 (Linux 3.5.0)
—  Guest
—  8 core, 1G RAM, Fedora 17 (Linux 3.5.0)
—  Benchmarks
—  hackbench, ebizzy, dell dvd store
—  Lock implementations
—  baseline: ticket lock, paravirtual ticket lock (pv-lock)
—  preemptable ticket lock
—  paravirtual (pv) preemptable ticket lock
Hackbench
35
—  Average Speedup
—  preemptable-lock vs. ticket lock: 4.82X
—  pv-preemptable-lock v.s. ticket lock: 7.08X
—  pv-preemptable-lock v.s. pv-lock: 1.03X
Ebizzy
36
Less variance over ticket lock and pv-lock
—  in-VM preemption adaptivity
—  lessVM interference
variance
80.36 vs. 10.94
variance
131.62 vs. 16.09
Dell DVD Store (apache/mysql)
37
—  Average Speedup
—  preemptable-lock vs. ticket lock: 11.68X
—  pv-preemptable-lock v.s. ticket lock: 19.52X
—  pv-preemptable-lock v.s. pv-lock: 1.11X
Evaluation Summary
38
—  PreemptableTicket Spinlocks speedup
—  5.32X over ticket lock
—  Paravirtual PreemptableTicket Spinlocks speedup
—  7.91X over ticket lock
—  1.08X over paravirtual ticket lock
Average speedup across cases for all benchmarks
—  LockWaiter Preemption
—  most significant preemption problem in queue based lock under
overcommitted environment
—  PreemptableTicket Spinlock
—  Implementation with ~60 lines of code in Linux
—  Better performance in overcommitted environment
—  5.32X average speedup up over ticket lock w/oVMM support
—  1.08X average speedup over pv-lock with less variance
Conclusion
39
Thank You
40
— Jiannan Ouyang
—  ouyang@cs.pitt.edu
—  http://www.cs.pitt.edu/~ouyang/
1 2 3 4 5
0 t 2t 3t 4t
PreemptableTicket Spinlock

Weitere ähnliche Inhalte

Was ist angesagt?

Creating "Secure" PHP applications, Part 2, Server Hardening
Creating "Secure" PHP applications, Part 2, Server HardeningCreating "Secure" PHP applications, Part 2, Server Hardening
Creating "Secure" PHP applications, Part 2, Server Hardening
archwisp
 

Was ist angesagt? (13)

Exploiting buffer overflows
Exploiting buffer overflowsExploiting buffer overflows
Exploiting buffer overflows
 
主機自保指南
主機自保指南主機自保指南
主機自保指南
 
Pres
PresPres
Pres
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
Proxy arp
Proxy arpProxy arp
Proxy arp
 
Prosess accouting
Prosess accoutingProsess accouting
Prosess accouting
 
DTMF Decoder Shield for Arduino
DTMF Decoder Shield for ArduinoDTMF Decoder Shield for Arduino
DTMF Decoder Shield for Arduino
 
The propeller
The propellerThe propeller
The propeller
 
Austin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreAustin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectre
 
Roll your own toy unix clone os
Roll your own toy unix clone osRoll your own toy unix clone os
Roll your own toy unix clone os
 
Ethereum A to Z
Ethereum A to ZEthereum A to Z
Ethereum A to Z
 
Playing CTFs for Fun & Profit
Playing CTFs for Fun & ProfitPlaying CTFs for Fun & Profit
Playing CTFs for Fun & Profit
 
Creating "Secure" PHP applications, Part 2, Server Hardening
Creating "Secure" PHP applications, Part 2, Server HardeningCreating "Secure" PHP applications, Part 2, Server Hardening
Creating "Secure" PHP applications, Part 2, Server Hardening
 

Andere mochten auch

Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
Jiannan Ouyang, PhD
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Jiannan Ouyang, PhD
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
Eric Van Hensbergen
 
reference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysisreference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysis
Buland Singh
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM Virtualization
Linaro
 

Andere mochten auch (20)

Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
 
Basic Concept of Pixel and MPEG data structure (english)
Basic Concept of Pixel and MPEG data structure (english)Basic Concept of Pixel and MPEG data structure (english)
Basic Concept of Pixel and MPEG data structure (english)
 
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
 
Cache profiling on ARM Linux
Cache profiling on ARM LinuxCache profiling on ARM Linux
Cache profiling on ARM Linux
 
Docker by demo
Docker by demoDocker by demo
Docker by demo
 
Denser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microserversDenser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microservers
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
SDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + QuantumSDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + Quantum
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP Scheduling
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
 
DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2
 
reference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysisreference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysis
 
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emption
 
Memory Barriers in the Linux Kernel
Memory Barriers in the Linux KernelMemory Barriers in the Linux Kernel
Memory Barriers in the Linux Kernel
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespaces
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM Virtualization
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Kürzlich hochgeladen (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 

Preemptable ticket spinlocks: improving consolidated performance in the cloud

  • 1. Jiannan Ouyang, John Lange Department of Computer Science University of Pittsburgh VEE ’13 03/17/2013 Preemptable Ticket Spinlocks Improving Consolidated Performance in the Cloud
  • 2. Motivation 2 —  VM interference in overcommitted environments —  OS synchronization overhead —  Lock holder preemption (LHP)
  • 3. —  Lock Waiter Preemption —  significance analysis of lock waiter preemption —  PreemptableTicket Spinlock —  implementation inside Linux —  Evaluation —  significant speedup over Linux Contributions 3
  • 4. Spinlocks 4 —  Basics —  lock() & unlock() —  Busy waiting lock —  generic spinlock: random order, unfair (starvation) —  ticket spinlock: FIFO order, fair —  Designed for fast mutual exclusion —  busy waiting vs. sleep/wakeup —  spinlocks for short & fast critical sections (~1us) —  OS assumptions —  use spinlocks for short critical section only —  never preempt a thread holding or waiting a kernel spinlock
  • 5. Preemption in VMs 5 —  Lock Holder Preemption (LHP) —  virtualization breaks the OS assumption —  vCPU holding a lock is unscheduled byVMM —  preemption prolongs critical section (~1m v.s. ~1us) —  Proposed Solutions —  Co-scheduling and variants —  Hardware-assisted scheme (Pause Loop Exiting) —  Paravirtual spinlocks
  • 6. Preemption in Ticket Lock 6 0 1 2 head = 0 tail = 2 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  • 7. Preemption in Ticket Lock 7 0 1 2 head = 0 tail = 2 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  • 8. Preemption in Ticket Lock 8 0 1 2 3 head = 0 tail = 3 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  • 9. Preemption in Ticket Lock 9 0 1 2 3 4 head = 0 tail = 4 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  • 10. Preemption in Ticket Lock 10 1 2 3 4 tail = 4head = 1 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  • 11. Preemption in Ticket Lock 11 1 2 3 4 tail = 4head = 1 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 Lock Holder Preemption!
  • 12. Preemption in Ticket Lock 12 1 2 3 4 tail = 4head = 1 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  • 13. Preemption in Ticket Lock 13 1 2 3 4 tail = 4head = 1 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  • 14. Preemption in Ticket Lock 14 2 3 4 tail = 4head = 2 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  • 15. Preemption in Ticket Lock 15 3 4 head = 3 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 tail = 4
  • 16. Preemption in Ticket Lock 16 3 4 5 head = 3 tail = 5 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  • 17. Preemption in Ticket Lock 17 3 4 5 6 head = 3 tail = 6 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 LockWaiter Preemption wait on available resource
  • 18. Lock Waiter Preemption 18 —  Lock waiter is preempted —  Later waiters wait on an available lock —  Possible to adapt to it, if we —  detect preempted waiter —  acquire lock out of order How significant is it??
  • 19. Waiter Preemption Dominates 19 LHP + LWP LWP ​ 𝐋 𝐖𝐏/ 𝐋𝐇𝐏 +𝐋𝐖𝐏  hackbench x1 1089 452 41.5% hackbench x2 44342 39221 88.5% ebizzy x1 294 166 56.5% ebizzy x2 1017 980 96.4% Table 2: LockWaiter Preemption Problem in the Linux Kernel Lock waiter preemption dominates in overcommitted environments
  • 20. Challenges & Approach 20 —  How to identify a preempted waiter? —  timeout threshold —  How to violate order constraints? —  allow timed out waiters get the lock randomly —  ensure mutual exclusion between them —  How NOT to break the whole ordering mechanism? —  timeout threshold proportional to queue position
  • 21. Queue Position Index 21 N = ticket – queue_head —  ticket: copy of queue tail value upon enqueue —  N: number of earlier waiters n n+1 n+2 head = n tail = n+2 ticket = n+2 N = 2
  • 22. Proportional Timeout Threshold 22 T = N x t —  t is a constant parameter —  large enough to avoid false detection —  small enough to save waiting time —  Performance is NOT t value sensitive —  most locks take ~1us & most spinning time wasted on locks that wait ~1ms —  larger t does not harm & smaller t does not gain much n n+1 n+2 head = n tail = n+2 0 t 2tTimeout Threshold
  • 23. Preemptable Ticket Spinlock 23 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 0 1 2 3 4 5 head = 0 tail = 5 0 Timeout Threshold t 2t 3t 4t 5t
  • 24. Preemptable Ticket Spinlock 24 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 2 3 4 5 head = 1 tail = 5 0 Timeout Threshold t 2t 3t 4t
  • 25. Preemptable Ticket Spinlock 25 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 2 3 4 5 head = 1 tail = 5 0 Timeout Threshold t 2t 3t 4t
  • 26. Preemptable Ticket Spinlock 26 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 4 5 head = 2 tail = 5 Timeout Threshold t 2t 3t N = ticket – head
  • 27. Preemptable Ticket Spinlock 27 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 4 5 head = 2 tail = 5 Timeout Threshold t 2t 3t
  • 28. Preemptable Ticket Spinlock 28 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 5 head = 3 tail = 5 Timeout Threshold 0 2t
  • 29. Preemptable Ticket Spinlock 29 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 5 head = 3 tail = 5 Timeout Threshold 0 2t
  • 30. Preemptable Ticket Spinlock 30 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 5 head = 3 tail = 5 Timeout Threshold 0 2t
  • 31. Summary 31 —  PreemptableTicket Lock adapts to preemption —  preserve order in absence of preemption —  violate order upon preemption —  PreemptableTicket Lock preserves fairness —  order violations are restricted —  priority is always given to timed out waiters —  timed out waiters bounded by vCPU numbers of aVM
  • 32. Implementation 32 —  Drop-in replacement —  lock(), unlock(), is_locked(), trylock(), etc. —  Correct —  race condition free: atomic updates —  Fast —  performance is sensitive to lock efficiency —  ~60 lines of C/inline-assembly in Linux 3.5.0
  • 33. Paravirtual Spinlocks 33 —  Lock holder preemption is unaddressed —  semantic gap between guest and host —  paravirtualization: guest/host cooperation —  signal long waiting lock / put a vCPU to sleep —  notify to wake up a vCPU / wake up a vCPU —  paravirtual preemptable ticket spinlock —  sleep when waiting too long after timed out —  wake up all sleeping waiters upon lock releasing
  • 34. Evaluation 34 —  Host —  8 core 2.6GHz Intel Core i7 CPU, 8 GB RAM, 1Gbit NIC, Fedora 17 (Linux 3.5.0) —  Guest —  8 core, 1G RAM, Fedora 17 (Linux 3.5.0) —  Benchmarks —  hackbench, ebizzy, dell dvd store —  Lock implementations —  baseline: ticket lock, paravirtual ticket lock (pv-lock) —  preemptable ticket lock —  paravirtual (pv) preemptable ticket lock
  • 35. Hackbench 35 —  Average Speedup —  preemptable-lock vs. ticket lock: 4.82X —  pv-preemptable-lock v.s. ticket lock: 7.08X —  pv-preemptable-lock v.s. pv-lock: 1.03X
  • 36. Ebizzy 36 Less variance over ticket lock and pv-lock —  in-VM preemption adaptivity —  lessVM interference variance 80.36 vs. 10.94 variance 131.62 vs. 16.09
  • 37. Dell DVD Store (apache/mysql) 37 —  Average Speedup —  preemptable-lock vs. ticket lock: 11.68X —  pv-preemptable-lock v.s. ticket lock: 19.52X —  pv-preemptable-lock v.s. pv-lock: 1.11X
  • 38. Evaluation Summary 38 —  PreemptableTicket Spinlocks speedup —  5.32X over ticket lock —  Paravirtual PreemptableTicket Spinlocks speedup —  7.91X over ticket lock —  1.08X over paravirtual ticket lock Average speedup across cases for all benchmarks
  • 39. —  LockWaiter Preemption —  most significant preemption problem in queue based lock under overcommitted environment —  PreemptableTicket Spinlock —  Implementation with ~60 lines of code in Linux —  Better performance in overcommitted environment —  5.32X average speedup up over ticket lock w/oVMM support —  1.08X average speedup over pv-lock with less variance Conclusion 39
  • 40. Thank You 40 — Jiannan Ouyang —  ouyang@cs.pitt.edu —  http://www.cs.pitt.edu/~ouyang/ 1 2 3 4 5 0 t 2t 3t 4t PreemptableTicket Spinlock