With so many moving parts, how can you be sure that a tweak to your database is really beneficial or just a measurement fluke? And how can you be sure that your benchmark is measuring the right thing?
In this webinar you will learn a set of best practices to improve the quality of your benchmarks, including how to...
Evaluate changes to database systems, whether you are a code contributor or a user playing with knobs
Determine what to measure for the most accurate results
Make benchmarks that are strong, solid, and reliable
We will also cover how to judge whether benchmarks with incredible claims are to be trusted.
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
The Do’s and Don’ts of Benchmarking Databases
1. The Do’s and Don’ts of
Benchmarking Databases
Glauber Costa - Principal Architect
WEBINAR
2. 2
+ Next-generation NoSQL database
+ Drop-in replacement for Cassandra
+ 10X the performance & low tail latency
+ Open source and enterprise editions
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
About ScyllaDB
3. Glauber Costa
3
Glauber Costa is a Principal Architect at ScyllaDB.
He shares his time between the engineering department
working on upcoming Scylla features and helping
customers succeed.
Before ScyllaDB, Glauber worked with Virtualization in the
Linux Kernel for 10 years with contributions ranging from
the Xen Hypervisor to all sorts of guest functionality and
containers
11. Why benchmark?
11
Tesla Roadster VS Yugo
1.9s Yes0-60
4 Seating
You can fit the whole
drunk squad no matter
how many they are
Price$200,000
Whatever you have
in your pocket
12. DO be aware of client-side bottlenecks
12
+ “I have applied a certain pressure to the Roadster’s gas pedal. It does 30mph”
+ “I have applied a certain pressure to the Yugo’s gas pedal. It does 32mph”
“Conclusion”: The Yugo is Faster than the Roadster (but not much!)
13. DO use standard tools
13
+ Writing your own benchmark is cool, but what about the bugs?
+ cassandra-stress
+ YCSB
+ ndbench (Netflix)
14. DO understand what you want to measure
14
+ Are you benchmarking a disk-bound or CPU-bound load?
+ Sometimes the workload bottlenecks both, but that is rare
+ Throughput benchmarks
+ A resource needs to be at 100% or close to 100% utilization
+ Latency benchmarks
+ Throughput is constant and lower than 100%, otherwise doesn’t mean much
+ Sizing/cost benchmark
+ Throughput (and maybe latency requirements) are constant, how many nodes or how much $?
15. Corollary: understand your system
15
throughput vs latency curve on Intel Optane
maximum useful
throughput
latency response
18. DO look at what you want to measure
18
+ Familiarize yourself with the database theory of operation
+ Example: Scylla polling, compactions, caching, etc.
+ After you have results, you should be able to explain them
19. DO look at what you want to measure
19
+ Familiarize yourself with the database theory of operation
+ Example: Scylla polling, compactions, caching, etc.
Throughput benchmark!
20. DO look at steady state
20
+ Common Big Data Database workloads have 10s or 100s of TB
+ At least have more data than memory
+ Workloads tend to run for hours, so your benchmark should as well
21. DO look at steady state
21
+ What’s my throughput here?
22. DON’T run unrealistic workloads
22
+ “My write latencies are 2500ms for 500-byte writes!”
+ Great for testing (does the system survive?)
+ Most people would have scaled the cluster by then
+ “I get great performance by always reading from the same key”
+ Sure, but who does that?
23. Some examples:
23
+ Ingestion
+ Ingest as fast as possible for some hours, no timeouts allowed
+ RTB
+ Bulk writes or constantly low, lots of reads - latency requirements
+ Time Series
+ Heavy, constant writes to ever-growing partitions. Reads latest rows
+ Metadata store
+ Some writes, random reads with good cacheability
+ Analytics
+ Periodic writes, full table scans
24. DON’T share your nodes with the loaders
+ Pushing and pulling data can be expensive!
+ It steals resources from the database
+ Don’t do it with any database, but Scylla is particularly affected due to pinning and polling.
24
25. But if you DO share your nodes:
25
+ Statically partition resources
+ Taskset, memory reserves
+ In case of Scylla, use --cpuset
+ Example:
+ taskset -c 0,5-12 cassandra-stress write duration=15m …
+ scylla --cpuset 1-4 …
26. DO be pessimistic
26
+ Unless you can guarantee that your workload always caches well:
+ Benchmark cold scenarios as well
+ Disabling the cache is a good way to enforce that (miss rate: 100%)
+ But sometimes just restarting helps
+ What are the minimum amount of resources you will have in the field? (be realistic)
+ What is the maximum load you expect to see?
27. In Summary
27
1. Define the problem
2. Find the bottleneck
3. Explain the results
4. Optionally, raise the bar
5. goto #2
29. DO be careful with aggregation
29
+ Summaries are useful, but they hide a lot of information
+ Both those runs have the same load and about the same throughput/latencies
30. DO be careful with aggregation
30
+ Summaries are useful, but they hide a lot of information
+ Both those runs have the same load and about the same throughput/latencies
31. DO be careful with aggregation
31
+ Client1: 100 requests: 98 of them took 1ms. 2 took 3ms
+ Client2: 100 requests: 99 of them took 30ms, 1 took 31ms
+ Common mistake: 99 % is avg(3ms, 30ms) -> 16.5ms
+ Real 99 % is 30ms
32. DON’T assume people will just believe you
32
+ When reporting, be very descriptive with your setup
+ BAD: “Our cluster has a p99 which is lower than 1ms”
+ GOOD: “We set up 3 nodes with 24 Intel i7-7500U CPU @ 2.70GHz each and
512GB RAM and Samsung SSD 850 PRO 256GB SSDs, with
<client_description> as loaders, and here’s the graph of our p99 over time”
33. DO be as fair as possible in comparisons
33
+ Most other databases require tuning, as they lack Autonomous Operations
+ Unless in a specific “out of the box” benchmark: tune it! (and say how)
+ HORRIBLE: “we installed Cassandra, ran it, and Scylla is 2000x faster”
+ BAD: “we tuned Cassandra, ran it, and Scylla is 10x faster”
+ GOOD: “we tuned Cassandra, and here is how (a link or appendix is fine). After
that, we ran Scylla and it is has 10x more throughput for the same hardware”