I'm curious. For the past few months, people@openvz.org have discovered (and fixed) an ongoing stream of obscure but serious and quite long-standing bugs.
How are you discovering these bugs?
Andrew added later:
hm, OK, I was visualizing some mysterious Russian bugfinding machine or something.
Don't stop ;)
3. 3
Andrew Morton
I'm curious. For the past few months, people@openvz.org have
discovered (and fixed) an ongoing stream of obscure but serious and
quite long-standing bugs.
How are you discovering these bugs?
Andrew added later:
hm, OK, I was visualizing some mysterious Russian bugfinding
machine or something.
Don't stop ;)
David Miller
This issue has existed since the very creation of the netlink code :-)
4. 4
Linux Containers (LXC)
Many isolated environments on top of a single kernel
●
Namespaces
●
Resource accounting
●
Better resource accounting
●
Checkpointing and live migration
●
Extra features: cpu limits, NFS inside CTs, etc
OpenVZ Containers
5. 5
What makes a good test lab?
●
Fully automated system with deployment service
●
A web interface for test scheduling
●
Standard test sets (“combo #3, make it large”)
●
A web interface for test results (comparisons, graphs,
logs)
●
Integration with a bug tracking system
●
Net or serial console to collect kernel oopses
●
KVM, power switch, other goodies
6. 6
How do we find bugs in the mainstream kernel
Containers help us find more bugs
●
Independent life cycles
●
Precise resource accounting
Containers allow us to
●
Test initialization/finalization of kernel subsystems
●
Test error paths
●
Catch more leaks than the regular testing does
●
Catch more race conditions by means of stress testing
7. 7
Start/stop test
●
Massive parallel start/stop and suspend/resume
●
Random resource parameters
Helps to catch:
●
Race conditions
●
Test error paths
●
Memory leaks
8. 8
What makes a good performance test?
●
Effective load:
●
Atomic (UnixBench)
●
Complex (LAMP, SPEC-JBB, vConsolidate)
●
Sane test environment (no random cron jobs etc.)
●
Automation (minimize human interaction)
●
Reproducible results, minimize variability
●
Understand test results, even good ones
9. 12
Density testing
●
High density is important feature of OpenVZ (vs VMs)
●
Test measures response time on a number of CTs
●
increasing the number of CTs until time is bad
●
It's not a stress test
●
Produce a big resource overcommit
10. 13
Other useful tests
●
Week load test replays real httpd logs in real containers
●
Feature tests: isolation, CPU scheduler, checkpointing,
network virtualization, second level quota, etc.
●
Third-party tests: LTP, Сonnectathon, vSpecJBB,
vConsolidate, UNIX bench, sysbench, DVD-store, Netperf
12. 15
(1) How a Russian bug finding machine works
●
QA found a leak of 78 bytes of kernel memory
●
Developer was unable to reproduce a bug
●
He found that this is a leak of a 'struct user' object
●
He audited kernel code which references this object
●
Found one suspicious place
●
Wrote a demo code to trigger the bug, and a fix
●
...
●
PROFIT!
13. 16
(2) How resource controls prevented a DoS attack
uid / resource held maxheld barrier limit failcnt
numothersocks 9 360 360 360 1
uid / resource held maxheld barrier limit failcnt
kmemsize 1237973 14372344 14372700 14790164 80
numothersocks 9 360 360 360 1
A simple kernel attack using socketpair()
a.k.a. CVE 2010-4249
14. 18
(3) How a guy measured netns performance
●
It was a nice sunny day...
●
5 different configurations to test
●
Unpredictable, random results
●
CPU throttling caused by overheating;
adding a case fan helped!
15. 20
Conclusion
● Containers are good for kernel testing
● Resource limits (cgroups) are also helpful
● [most] performance tests are hoax
My name is Andrey Vagin. I have been working on OpenVZ for the last 5 years. I started working as a QA engineer, developing and running Linux kernel tests. Then I moved to the Linux kernel team as a developer. This talk tries to summarize the experience of me and my colleagues at Parallels.
I want to tell you how we test OpenVZ Linux kernel. I start by explaining what OpenVZ really is. Next, I share some thoughts about an ideal test lab. Then we'll see which testing techniques are good for kernel testing, and in particular why OpenVZ is helping us to find more bugs. Also, I'd like to say a few words about performance testing. Finally, a few anecdotal cases of bugs found will be presented.
We regularly find and fix bugs in different subsystems of the Linux kernel. Often these bugs are obscure, long-standing and hard to catch. Sometimes maintainers wonder, how we find those bugs. Right now I want to reveal all of our deep secrets.
But before I start, I want to say a few words about Linux Containers and OpenVZ Containers. A container is an isolated environment. Each container has its own user, network, filesystem and other namespaces that virtualize various kernel subsystems. Plus, there are cgroups for additional resource accounting. All containers are running on top of one single kernel – this is what makes them different from virtual machines. Containers do have some restrictions (like, on a Linux machine we can only have Linux containers), but the technology is more effective, because it doesn't do things such as emulation of hardware devices, or running multiple kernels. Compared to LXC, OpenVZ Containers have better resource accounting and some extra features such as cpu limits, checkpointing and live migration, NFS and FUSE inside containers and so on.
Based on our experience, these are the requirements for a good test lab. First, a test system is fully automatic. It should include the Deployment Service, the results portal, many different configurations of servers and additional hardware such as kvm, power switches and so on. All this components should be tightly integrated together and work smoothly. They may be controlled via web interface. The test system should have easy way to execute tests and find or compare restuls.
A lot of people are testing the Linux kernel, but for us containers play a special role in the process. A container initializes many kernel subsystems on start and destroys them on stop. On a usual system such operations are only done on boot and shutdown. It is hard to perform these operations many times, plus usually after all deinit operations the system is shutting down. Containers give us a way to perform multiple concurrent init/deinit sequences. It helps to find bugs such as not freeing of some resource. Plus, we have per-container resource accounting, which helps in detecting memory leaks. Also it enables to test various seldom error paths when we set different limits on resources.
Now I want to tell about one of significant tests, it's called Start-stop test. It starts/stops and suspends/resumes many containers simultaneously and sets random resource limits, just for some more fun. Can you imagine this test may find many bugs? Probably you are not sure, but it does, and finds bugs not only in OpenVZ kernel, but in the mainstream kernel, too. Actually it's also a stress test, since it generates a heavy load. In additional it executes many initialization and finalization of kernel subsystems. Also, this test forces the kernel to execute error paths due to randomization of resource limits. On each iteration it does some sanity checks. For example, it checks that all resource usage counters are zero after a container is stopped. It catches leaks, race conditions, errors on subsystem finalization and even leaks on error paths caused by race conditions.
Performance Testing is the most difficult part of testing. The results of these tests are published and users look at the numbers when choosing a product. So, test results should be comprehensible and reproducible. A main problem in creating of a performance test is to think up a useful workload. All performance tests may be divided into atomic tests and complex tests. Atomic tests make simple basic operations such as context switching, creating a file or forking a process. The to see a full picture, so they are more interested in complex tests. A complex test simulates some real workload. What should be a good performance test? Ideally the test should be fully automatic to avoid human factors and ensure consistency. A person may forget to do something or may do it in another way next time. If you can't automate the test, you should at least describe the process in great details. You should avoid side effects such as cron jobs, other extra daemons doing some work from time to time, data base index rebuild, CPU scaling and other such stuff. You can't be too much careful here. We have a special script which validates a test environment. The script is regularly updated when we find a new thing. The test should run several iterations and calculate statistical errors, to make sure results are reproducible. Often the system requires some time for stabilization and for this purpose you can execute a few warm-up iterations, ignoring their results. Then performing a comparison test, all products should be configured in the same or similar way. For example, when comparing network performance of virtualized systems, we should try to use the same networking setup (say, bridged networking). Finally, all the test results, both good and bad, should be analyzed and explained. Analysts are usually done only for bad results, and good ones are taken for granted. The thing is, in some cases good results mean there's something wrong with the test itself. If you can't explain your test results, they are totally useless, except maybe for marketing purposes.
Now let me show some results of our performance measurements. We compared XEN, ESXi, KVM and OpenVZ. I choose a LAMP test, because most of out customers are hosting providers. From the following results you can understand how well such type of workloads run in virtualized environment and how many web servers can you run on a single piece of hardware.
On this slide you can see the number of virtual machines affects performance, measured in the number of serviced requests per second. Here we can see that in case of 20 VMs all the products have very similar performance. In case of 40 VMs performance difference becomes more obvious. In case of 60 VMs we can see that all products except for OpenVZ have worse performance than with 40 VMs. This is because the system is too small to handle that amounts of VMs. With OpenVZ, containers are more lightweight so you can have greater number of containers than you could have VMs. In other words, OpenVZ density is higher.
Indeed, OpenVZ high container density is an important feature, so we regularly compare it to other products and try to improve. For that, we have a special density test. This test simulates a typical web hosting workload. Each container has an web server, mail server (with Spam Assasin and an Anti-virus) and Parallels Plesk Panel. This test tries to simulate a workload by sending requests to each service with a defined frequency. On each iteration of the test we add some more containers and measure service response time, making sure it is below a certain limit. Test is stopped when response time is bad. Test result is the number of containers for which the response time is still good. As for every other test, if we see a regression, we try to understand why it happened, and from time to time we find interesting things. For example, last time we found out that the directory entry cache shrinker was too aggressive doing its work, slowing down the whole system.
One more good test is a week load test. It is one of few tests which creates a non-synthetic workload, it replays of real users apache logs. We have many our own tests for testing OpenVZ specific features and use foreign test suites for other functionality.
Now I want to tell a real life story of how one of my colleagues, has fixed a bug in the Linux kernel, causing a comment from Andrew Morton about russian bugfinding machine. In the course of OpenVZ kernel testing, our QA (Quality Assurance) team found a leak of 78 bytes of kernel memory. Who cares about 78 bytes, especially on a server with 16 gigabytes of RAM? We do. We checked the beancounters debug information which showed that one struct user object has leaked. He then tried to reproduce that but with no luck. Bugs that can not be reproduced are hard. The only option left was to audit the kernel source code. That involved finding all the places where struct user object is referenced, and checking the code correctness. It took him 4 hours to do the audit, and he found one place where the reference to an object might be lost. The bug was present not ony OpenVZ kernel, but in the mainstream kernel too. In this case, after the problem was found, fixing it was pretty simple. So he wrote a fix and a demo code to trigger the bug, tested the fix and sent it to Linux kernel mailing list. Why is this particular incident so important? It's OpenVZ resource limiting code which helped to detect the leak in the first place -- as the bug is very hard to trigger and the leak is small enough that it might not be discovered at all. This bug is in fact a security issue. An ordinary user could exploit the bug and eat all the kernel memory, thus bringing the whole system down. Worse scenarios could be possible as well. Incidentally, OpenVZ is protected from this security issue -- because the kmemsize beancounter (which helped to found it) limits kernel memory usage per Container.
. About a year ago a DoS exploit which leads to system unresponsiveness was published. It looks like most kernels are indeed vulnerable. The good news is OpenVZ is not vulnerable. Why? Because of user beancounters. The nature of exploit is to create an unlimited number of sockets, thus rendering the whole system unusable so you need to power-cycle it to bring it back to life. Now, if you run this exploit in an OpenVZ container, you will hit the numothersock beancounter limit pretty soon and the script will exit. I went further and set numothersock limit to 'unlimited', and re-run the exploit. The situation is much worse in that case, the system slows down considerably, but I was still able to login to the physical server using ssh and kill the offending task from the host system using SIGTERM. Now, another beancounter, kmemsize, is working to save the system. Of course, if you set all beancounters to unlimited, exploit will work. So don't do that, unless your CT is completely trusted. Those limits are there for a reason, you know.
One of OpenVZ team members, Kirill Kolishkin, decided to suspend a container, but forgot to specify one parameter. Vzctl returned an error, that this parameter wasn't specified. When Kir executes vzctl with correct parameters, it returned the error “No such container”. After small investigation, he found that the config file disappeared. Kir didn't guess what the problem in a minute, but then he's understood how it may be reproduced and where the problem in the code. Now look at this code: This code allocates one variable on the stack, then validates a parameter and initialized the variable. While we do not see anything strange, but let's see what will occur, if the parameter is invalid. Oh, not. The code in the error path uses the uninitialized variable, it removes a file with name from this variable. By some chance, this variable contains the path to the container's config. Bad luck. GCC doesn't report any warning in this case.
One hot summer day, my colleague made performance measurements of network namespaces. He got some results, which look like a set of random data. It's not first measurements and the procedure was well tested. Where is a problem? The day was hot, a brain worked not well and probably not brain only. It required more then one hour, that he noticed a note about CPU throttling due to overheating. The host had not a body fan, after it is set up, the results is stabilized. What is conclusion of this story? Make sure, that the results is reproducible and remember about sideeffects.