As we move from monolithic applications to microservices, the ability to colocate workloads offers a tremendous opportunity to realize greater development velocity, robustness, and resource utilization. But workload colocation can also introduce performance variability and affect service levels. Google describes the problem as the “tail at scale”—the amplification of negative results observed at the tail of the latency curve when many systems are involved.
With its latest tooling capabilities, Intel has an experiments framework to calculate the trade-offs between low latency and higher density. Niklas Nielsen discusses the challenges and complexities of workload colocation, why solving these challenges matters to your business no matter the size, and how Intel intends to help smarter resource allocations with its latest tooling capabilities and Kubernetes.
8. 0 1 2 3 4 5 6
First
Second
First and Second
First ‘O’
Done typing
OSCON in autocomplete list
OSCON 2016 is
autocomplete list
Pushed enter
OSCON 2017 is found
Rest of search
OSCON context
OSCON logo
2 seconds
>5 seconds
72. IntelⓇ helps by using priority to reduce the sources of variability
through IntelⓇ RDT
73. Swan is a tool to understand the effects of interference and how to
avoid it
74. Swan is under Apache 2.0 License and available for download today
https://github.com/intelsdi-x/swan
Read more about how to use Intel Ⓡ RDT
https://github.com/01org/intel-cmt-cat/
75. Thanks to all involved in this project
Maciej Iwanowski, Pawel Palucki, Szymon Konefal, Maciej Patelczyk, Michal Stachowski,
Arek Chylinski and the rest of the Swan team
Andrew Herdich and the Intel RDT teams
Tony Luck, Fenghua Yu and Intel Linux Kernel teams
How is everyone feeling?
Been seeing some good talk by now?
Just getting started?
Not so gentle introduction to kubernetes performance
The most important thing for me is that you understand and that I don’t loose you mid way
So as we all have different levels of experience, feel free to shout out if something doesn’t make sense
First off, since I am an Intel employee and this is a sponsor talk slot, I have to remind you of our legal notice.
Mentions of our brand and legal protection, in general and for the contents of this talk
That aside, I want to conduct a small experiment
I’m going to show you two google searches and see if you can tell the difference
Be aware, each one is only a few seconds. So I need you to pay close attention
An artificial 100ms delay per connection raised the response time from 2 to 5 seconds.
I’ve tried to break down the response time here.
Few seconds at each graph to slowly explain what the axes mean before diving into interpretation.
It might seem surprising, but 2.4 seconds is the sweet spot for users
Another way to interpret this is that online retail, customers starts to turn away after this amount of time
The user patience is steadily decreasing
Expect instantaneous response for even the most complicated queries
Consider graphic here
Consider graphic here
Maybe get some numbers
To give you an example of the interconnectiveness, Netflix built a tool called visceral which samples network requests
Give options
Need to tie back to initial experiment
Every request is like flipping a coin
Too information dense
Include highlight
Don’t explain the equation. Hard to talk to.
Insert reference
Few seconds at each graph to slowly explain what the axes mean before diving into interpretation.
High lights?
At google scale this matters.
The reason this is called the tail at scale
Not only a problem for the largest companies in the world.
Similar how to these fellas are probably dragging their owner in each direction, each user and system are competing for access to resources in modern data centers.
Global
Network oversubscription
Queueing in leaf and spine switches
Local
Issue slots, L1 and L2, power budgets per core during SMT
L3, Memory bandwidth and power budget for per socket
I/O bandwidth
Network links
Kernel caches
Talk about what makes an application perform as desired and when it isn’t performing like we expect
Few seconds at each graph to slowly explain what the axes mean before diving into interpretation.
Sensitivity profiles have been used in academia to show how sensitive a workload is to co-location.
Used to demonstrate performance isolation in research from Stanford and Google[2]
Greener profiles indicate more resilience to interference
Network in data centers have become so fast, memory access over network can outperform disk access
Have ‘cache clusters’ either of spare capacity or, more likely, dedicated to speed up the requests
Normal pattern used by the largest sites
Twitter, Facebook, Wikipedia
We chose memcached as a high priority workload as it is notoriously hard to place anything next to.
Kubernetes co-location
Now, why is that?
Compute the fractions
The process scheduler does, is to find out which process is furthest away from it’s fair share and schedules it next.
What we call interference
Explain caches in a modern server CPU
These are done on a Xeon D 1541 platform with a single socket
Linux is the operating system
High lights
Core isolation alone is not enough
CAT reduce the interference and keeps the SLA up to 80%
Explain axis
Some applications are extremely sensitive to these kinds of workloads
Online web search is one
Why does CDP matter?
Maybe more realistic example
Show how contention looks like
Maybe more realistic example
Show how contention looks like
TODO Split into 4 slides
TODO Split into 4 slides
TODO Split into 4 slides
How do you know how much to give to each partition?