SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
When the OS
gets in the way
(and what you can do about it)
Mark Price
@epickrram
LMAX Exchange
When the OS
gets in the way
(and what you can do about it)
Linux
Mark Price
@epickrram
LMAX Exchange
● Linux is an excellent general-purpose OS
● Many target platforms
● Scheduling is actually fairly complicated
● Low-latency is a special use-case
● We need to provide some hints
It’s not the OS’s fault
Why should I care?
Useful in some scenarios
● Low latency applications
● Response times < 1ms
● Compute-intensive workloads
● Long-running jobs
A real-world scenario: LMAX
System
Latency = T1 - T0
Before tuning:
250us / 10+ms
After tuning:
80us / <1ms
(mean / max)
Jitter
● “slight irregular movement, variation, or
unsteadiness, especially in an electrical
signal or electronic device”
● Variation in response time latency
● Long-tail in response time
Dealing with it
● First take care of the low-hanging fruit
○ e.g. Garbage collection (gc-free / Zing)
○ e.g. Slow I/O
● Once response times are < 10ms the fun
begins
● Make sure your code is running!
Measure first
● Need to validate changes are good
● End-to-end tests
● Using realistic load
● Change one thing and observe
● A refresher...
Modern hardware layout
Multi-tasking
● num(tasks) > num(HyperThreads)
● OS must share out hardware resources
● Clever? Dumb? Fast? Slow?
● Fair...
Linux CFS
● Completely Fair Scheduler
● Maintains a task ‘queue’ per HT
● Runs the task with the lowest runtime
● Updates task runtime after execution
● Higher priority implies longer execution time
● Tasks are load-balanced across HTs
An example application ...
Threads
… running on a language runtime
… running on an operating system
Optimise for locality - PCI/memory
Target deployment
How do I start?
● BIOS settings for maximum performance
● That’s a whole other talk...
Start with the metal
● lstopo is a useful tool for looking at hardware
● Provided by the hwloc package
● Displays:
○ HyperThreads
○ Physical cores
○ NUMA nodes
○ PCI locality
Discover what’s available
lstopo
lstopo
HyperThread
Core
Caches
NUMA-local
RAM
● Use isolcpus to reserve cpu resource
● kernel boot parameter
● isolcpus=0-5,10-13
● Use taskset to pin your application to cpus:
● taskset -c 10-13 java …
● Set affinity of hot threads:
● sched_setaffinity(...)
Reserve & use specific resource
Deploy the application
sched_setaffinity() !{isolcpus} taskset
You have no load-balancer
Pile-up
A solution: cpusets
● Create hierarchical sets of reserved
resource
● CPU, memory
● Userland tools: cset (SUSE)
Isolate OS processes
● cset set --set=/system --cpu=6-9
○ create a cpuset with cpus 6-9
○ create it at the path /system
● cset proc --move --from-set=/ --to-set=/system
○ move all processes from / to /system
○ -k => move unbound kernel threads
○ --threads => move child threads
○ --force => erm... force
Run the application
● cset set --cpu=0-5,10-13 --set=/app
● cset proc --exec /app taskset -cp 10-13 java …
○ start a process in the /app cpuset
○ run the program on cpus 10-13
● sched_setaffinity() to pin the hot threads to cpus 1,3,5
Isolated threads
/app
sched_setaffinity()
/system
/app
taskset
No more jitter?
● Sampling tracer
● Static/dynamic trace points
● Very low overhead
● A good starting point for digging deeper
● perf list to view available trace points
● network, file-system, scheduler, etc
perf_events
What’s happening CPU?
● perf record -e "sched:sched_switch" -C 3
○ Sample task switches on CPU 3
● perf report (best for multiple events)
● perf script (best for single events)
Rogue process
java
36049 [003] 3011858.780856: sched:sched_switch: java:
36049 [110] R ==> kworker/3:1:13991 [120]
kworker/3:1
13991 [003] 3011858.780861: sched:sched_switch:
kworker/3:1:13991 [120] S ==> java:36049 [110]
ftrace
● Function tracer
● Static/dynamic trace points
● Higher overhead
● But captures everything
● Can provide function graphs
● trace-cmd is the usable front-end
So what is that kernel thread doing?
● trace-cmd record -P <pid> -p function_graph
○ Trace functions called by process <pid>
● trace-cmd report
○ Display captured trace data
Some things can’t be deferred
kworker/3:1-13991 [003] 3013287.180771: funcgraph_entry: | process_one_work() {
kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: | cache_reap() {
kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: 0.137 us | mutex_trylock();
kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: 0.289 us | drain_array();
kworker/3:1-13991 [003] 3013287.180773: funcgraph_entry: 0.040 us | _cond_resched();
………
………
kworker/3:1-13991 [003] 3013287.180859: funcgraph_exit: +86.735 us | }
+86.735 us
Things to look out for
● cache_reap() - SLAB allocator
● vmstat_update() - kernel stats
● other workqueue events
○ perf record -e “workqueue:*” -C 3
● Interrupts - set affinity in /proc/irq
● Timer ticks - tickless mode
● CPU governor - set to performance
○ /sys/devices/system/cpu/cpuN/cpufreq/scaling_governor
Some numbers
● Inter-thread latency is a good proxy
● 2 busy-spinning threads passing a message
● Time taken between producer & consumer
● Record times over several seconds
● Compare tuned/untuned
Results
== Latency (ns) ==
mean
min
50.00%
90.00%
99.00%
99.90%
99.99%
max
untuned
466
200
464
608
768
992
2432
69632
tuned
216
128
208
288
336
544
1664
69632
tuned vs untuned
tuned vs untuned (log scale)
Results (loaded system)
== Latency (ns) ==
mean
min
50.00%
90.00%
99.00%
99.90%
99.99%
max
untuned
545
144
464
544
736
2944
294913
884739
tuned
332
216
336
352
448
544
704
36864
tuned vs untuned (loaded system)
Summary
● Select threads that need access to CPU
● Isolate CPUs from the OS
● Pin important threads to isolated CPUs
● Don’t forget interrupts
● There will be more things…
● Always test assumptions!
● Run validation tests to ensure tunings are as
expected
Thank you
● lmax.com/blog/staff-blogs/
● epickrram.blogspot.com
● github.com/epickrram/perf-workshop
● @epickrram

Weitere ähnliche Inhalte

Was ist angesagt?

LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)Linaro
 
Linux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA'sLinux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA'sMydbops
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer PériodiqueAnne Nicolas
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debuggingHao-Ran Liu
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freqLinaro
 
Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitSpying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitAndrea Righi
 
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...Anne Nicolas
 
Linux Troubleshooting
Linux TroubleshootingLinux Troubleshooting
Linux TroubleshootingKeith Wright
 
Smarter Scheduling
Smarter SchedulingSmarter Scheduling
Smarter SchedulingDavid Evans
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Anne Nicolas
 
Ganglia Monitoring Tool
Ganglia Monitoring ToolGanglia Monitoring Tool
Ganglia Monitoring Toolsudhirpg
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Tier1 App
 
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Anne Nicolas
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
 

Was ist angesagt? (20)

LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
 
Kgdb kdb modesetting
Kgdb kdb modesettingKgdb kdb modesetting
Kgdb kdb modesetting
 
Linux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA'sLinux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA's
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer Périodique
 
Le guide de dépannage de la jvm
Le guide de dépannage de la jvmLe guide de dépannage de la jvm
Le guide de dépannage de la jvm
 
RCU
RCURCU
RCU
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
 
Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitSpying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profit
 
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
 
Linux Troubleshooting
Linux TroubleshootingLinux Troubleshooting
Linux Troubleshooting
 
Process scheduling linux
Process scheduling linuxProcess scheduling linux
Process scheduling linux
 
Smarter Scheduling
Smarter SchedulingSmarter Scheduling
Smarter Scheduling
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
 
Ganglia Monitoring Tool
Ganglia Monitoring ToolGanglia Monitoring Tool
Ganglia Monitoring Tool
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
 
Metrics with Ganglia
Metrics with GangliaMetrics with Ganglia
Metrics with Ganglia
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
test1_361
test1_361test1_361
test1_361
 

Andere mochten auch

Andere mochten auch (19)

Performance: Observe and Tune
Performance: Observe and TunePerformance: Observe and Tune
Performance: Observe and Tune
 
Tuned
TunedTuned
Tuned
 
FPGA Applications in Finance
FPGA Applications in FinanceFPGA Applications in Finance
FPGA Applications in Finance
 
TMPA-2015: FPGA-Based Low Latency Sponsored Access
TMPA-2015: FPGA-Based Low Latency Sponsored AccessTMPA-2015: FPGA-Based Low Latency Sponsored Access
TMPA-2015: FPGA-Based Low Latency Sponsored Access
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
 
Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012
 
Extent3 exactpro testing_of_hft_gui
Extent3 exactpro testing_of_hft_guiExtent3 exactpro testing_of_hft_gui
Extent3 exactpro testing_of_hft_gui
 
Work4 23
Work4 23Work4 23
Work4 23
 
Premiare i dipendenti: i nostri 5 suggerimenti
Premiare i dipendenti: i nostri 5 suggerimentiPremiare i dipendenti: i nostri 5 suggerimenti
Premiare i dipendenti: i nostri 5 suggerimenti
 
Come (e perché) le HR devono sviluppare la loro resilienza...
Come (e perché) le HR devono sviluppare la loro resilienza...Come (e perché) le HR devono sviluppare la loro resilienza...
Come (e perché) le HR devono sviluppare la loro resilienza...
 
Esclarecer o-habitus
Esclarecer o-habitusEsclarecer o-habitus
Esclarecer o-habitus
 
Levchenko Andrey
Levchenko AndreyLevchenko Andrey
Levchenko Andrey
 
great
greatgreat
great
 
Madhusmita pati
Madhusmita patiMadhusmita pati
Madhusmita pati
 
Presupuesto publico
Presupuesto publicoPresupuesto publico
Presupuesto publico
 
C3 new
C3 newC3 new
C3 new
 
Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...
Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...
Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...
 
Silabus mtk xii
Silabus mtk xiiSilabus mtk xii
Silabus mtk xii
 
Ayesha tanwir ppt
Ayesha tanwir pptAyesha tanwir ppt
Ayesha tanwir ppt
 

Ähnlich wie When the OS gets in the way

BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction Linaro
 
OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchDaniel Ben-Zvi
 
HKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyHKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyLinaro
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsButtaRajasekhar2
 
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxProfMonikaJain
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsScyllaDB
 
PERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schemaPERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schemaFromDual GmbH
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...PROIDEA
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)Gustavo Rene Antunez
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Lec 12-15 mips instruction set processor
Lec 12-15 mips instruction set processorLec 12-15 mips instruction set processor
Lec 12-15 mips instruction set processorMayank Roy
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementationRajan Kumar
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesAnne Nicolas
 
Operating Systems: Revision
Operating Systems: RevisionOperating Systems: Revision
Operating Systems: RevisionDamian T. Gordon
 

Ähnlich wie When the OS gets in the way (20)

BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction
 
Optimizing Linux Servers
Optimizing Linux ServersOptimizing Linux Servers
Optimizing Linux Servers
 
OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switch
 
HKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyHKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case study
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose Processors
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
 
PERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schemaPERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schema
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Lec 12-15 mips instruction set processor
Lec 12-15 mips instruction set processorLec 12-15 mips instruction set processor
Lec 12-15 mips instruction set processor
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementation
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
 
Operating Systems: Revision
Operating Systems: RevisionOperating Systems: Revision
Operating Systems: Revision
 

Kürzlich hochgeladen

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 

Kürzlich hochgeladen (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

When the OS gets in the way

  • 1. When the OS gets in the way (and what you can do about it) Mark Price @epickrram LMAX Exchange
  • 2. When the OS gets in the way (and what you can do about it) Linux Mark Price @epickrram LMAX Exchange
  • 3. ● Linux is an excellent general-purpose OS ● Many target platforms ● Scheduling is actually fairly complicated ● Low-latency is a special use-case ● We need to provide some hints It’s not the OS’s fault
  • 4. Why should I care?
  • 5. Useful in some scenarios ● Low latency applications ● Response times < 1ms ● Compute-intensive workloads ● Long-running jobs
  • 6. A real-world scenario: LMAX System Latency = T1 - T0 Before tuning: 250us / 10+ms After tuning: 80us / <1ms (mean / max)
  • 7. Jitter ● “slight irregular movement, variation, or unsteadiness, especially in an electrical signal or electronic device” ● Variation in response time latency ● Long-tail in response time
  • 8. Dealing with it ● First take care of the low-hanging fruit ○ e.g. Garbage collection (gc-free / Zing) ○ e.g. Slow I/O ● Once response times are < 10ms the fun begins ● Make sure your code is running!
  • 9. Measure first ● Need to validate changes are good ● End-to-end tests ● Using realistic load ● Change one thing and observe ● A refresher...
  • 11. Multi-tasking ● num(tasks) > num(HyperThreads) ● OS must share out hardware resources ● Clever? Dumb? Fast? Slow? ● Fair...
  • 12. Linux CFS ● Completely Fair Scheduler ● Maintains a task ‘queue’ per HT ● Runs the task with the lowest runtime ● Updates task runtime after execution ● Higher priority implies longer execution time ● Tasks are load-balanced across HTs
  • 13. An example application ... Threads
  • 14. … running on a language runtime
  • 15. … running on an operating system
  • 16. Optimise for locality - PCI/memory
  • 18. How do I start?
  • 19. ● BIOS settings for maximum performance ● That’s a whole other talk... Start with the metal
  • 20. ● lstopo is a useful tool for looking at hardware ● Provided by the hwloc package ● Displays: ○ HyperThreads ○ Physical cores ○ NUMA nodes ○ PCI locality Discover what’s available
  • 23. ● Use isolcpus to reserve cpu resource ● kernel boot parameter ● isolcpus=0-5,10-13 ● Use taskset to pin your application to cpus: ● taskset -c 10-13 java … ● Set affinity of hot threads: ● sched_setaffinity(...) Reserve & use specific resource
  • 25. You have no load-balancer Pile-up
  • 26. A solution: cpusets ● Create hierarchical sets of reserved resource ● CPU, memory ● Userland tools: cset (SUSE)
  • 27. Isolate OS processes ● cset set --set=/system --cpu=6-9 ○ create a cpuset with cpus 6-9 ○ create it at the path /system ● cset proc --move --from-set=/ --to-set=/system ○ move all processes from / to /system ○ -k => move unbound kernel threads ○ --threads => move child threads ○ --force => erm... force
  • 28. Run the application ● cset set --cpu=0-5,10-13 --set=/app ● cset proc --exec /app taskset -cp 10-13 java … ○ start a process in the /app cpuset ○ run the program on cpus 10-13 ● sched_setaffinity() to pin the hot threads to cpus 1,3,5
  • 31. ● Sampling tracer ● Static/dynamic trace points ● Very low overhead ● A good starting point for digging deeper ● perf list to view available trace points ● network, file-system, scheduler, etc perf_events
  • 32. What’s happening CPU? ● perf record -e "sched:sched_switch" -C 3 ○ Sample task switches on CPU 3 ● perf report (best for multiple events) ● perf script (best for single events)
  • 33. Rogue process java 36049 [003] 3011858.780856: sched:sched_switch: java: 36049 [110] R ==> kworker/3:1:13991 [120] kworker/3:1 13991 [003] 3011858.780861: sched:sched_switch: kworker/3:1:13991 [120] S ==> java:36049 [110]
  • 34. ftrace ● Function tracer ● Static/dynamic trace points ● Higher overhead ● But captures everything ● Can provide function graphs ● trace-cmd is the usable front-end
  • 35. So what is that kernel thread doing? ● trace-cmd record -P <pid> -p function_graph ○ Trace functions called by process <pid> ● trace-cmd report ○ Display captured trace data
  • 36. Some things can’t be deferred kworker/3:1-13991 [003] 3013287.180771: funcgraph_entry: | process_one_work() { kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: | cache_reap() { kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: 0.137 us | mutex_trylock(); kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: 0.289 us | drain_array(); kworker/3:1-13991 [003] 3013287.180773: funcgraph_entry: 0.040 us | _cond_resched(); ……… ……… kworker/3:1-13991 [003] 3013287.180859: funcgraph_exit: +86.735 us | } +86.735 us
  • 37. Things to look out for ● cache_reap() - SLAB allocator ● vmstat_update() - kernel stats ● other workqueue events ○ perf record -e “workqueue:*” -C 3 ● Interrupts - set affinity in /proc/irq ● Timer ticks - tickless mode ● CPU governor - set to performance ○ /sys/devices/system/cpu/cpuN/cpufreq/scaling_governor
  • 38. Some numbers ● Inter-thread latency is a good proxy ● 2 busy-spinning threads passing a message ● Time taken between producer & consumer ● Record times over several seconds ● Compare tuned/untuned
  • 39. Results == Latency (ns) == mean min 50.00% 90.00% 99.00% 99.90% 99.99% max untuned 466 200 464 608 768 992 2432 69632 tuned 216 128 208 288 336 544 1664 69632
  • 41. tuned vs untuned (log scale)
  • 42. Results (loaded system) == Latency (ns) == mean min 50.00% 90.00% 99.00% 99.90% 99.99% max untuned 545 144 464 544 736 2944 294913 884739 tuned 332 216 336 352 448 544 704 36864
  • 43. tuned vs untuned (loaded system)
  • 44. Summary ● Select threads that need access to CPU ● Isolate CPUs from the OS ● Pin important threads to isolated CPUs ● Don’t forget interrupts ● There will be more things… ● Always test assumptions! ● Run validation tests to ensure tunings are as expected
  • 45. Thank you ● lmax.com/blog/staff-blogs/ ● epickrram.blogspot.com ● github.com/epickrram/perf-workshop ● @epickrram