Retaining Goodput with Query Rate Limiting

Retaining Goodput
with Query Rate
Limiting
Piotr Dulikowski, Senior Software Engineer

Piotr Dulikowski
■ Holds BA and MSc in Computer Science from the
University of Warsaw
■ Involved in development of several features in ScyllaDB
including CDC and per-partition rate limiting
■ Maintainer of the ScyllaDB Rust Driver

■ A ScyllaDB cluster consists of multiple nodes
■ Each node is divided into shards (CPU core + part of RAM)
■ Shards within a node handle separate data (shared-nothing architecture)
■ Data is split into partitions
■ Consists of rows with the same partition key
■ Each partition has a subset of nodes called replicas, responsible for storing the partition
■ Requests can be handled from any node/shard, but the coordinator has to contact
replicas
Data Distribution in ScyllaDB

Each partition has limited computing resources assigned to it, and it’s easy to
exhaust them if the workload becomes too unbalanced.
Partitions whose replicas intersect with hot partition’s replicas will be affected,
too.
The “Hot Partition” Problem

■ Keep in mind how your expected workload looks like
■ Hot partitions may appear due to badly chosen schema
■ ScyllaDB won’t ﬁx those issues for you - schema is your responsibility
Choose Appropriate Schema

It makes sense to optimize your schema for the common case. What about the
“uncommon case”?
You can always encounter:
■ Malicious/misbehaving users
■ Parts of your system going awry due to bugs
The system does not have to satisfy these requests, but they should not affect
the whole system too much.
It’s Not Always About Bad Schema

■ Requests will start piling up on overloaded shards
■ When latency exceeds the request timeout, most of the work will be
wasted
■ We can reject some requests early
■ Accept only as much as we can comfortably handle
■ Rejecting some requests early leaves more resources for handling the
remaining ones
How to Retain Goodput?

A maximum read/write rate can be
set for a table.
ScyllaDB will reject some operations
in an effort to keep the rate of
successful requests under the limit.
Per-Partition Rate Limiting
ALTER TABLE ks.tbl
WITH per_partition_rate_limit = {
'max_writes_per_second': 100,
'max_reads_per_second': 200
};

■ A shard tracks “hit count” for tuples of
(token, table name, operation type)
■ Every second all counters are halved
■ Assuming steady rate of X ops/s,
counter will eventually oscillate between
X and 2X
https://github.com/scylladb/scylladb/blob/m
aster/db/rate_limiter.hh
Measurements on Replica Side
token,
table, operation type
counter
2c042489794ad03b,
‘table1’, ‘write’
100
6fc6353cbcd7808,
‘table1’, ‘read’
2
3ea0c947c5fcd34e,
‘table2’, ‘read’
1

Coordinator increments the counter relevant
to the operation and chooses to reject with
some probability.
■ If the operation is accepted, the
operation proceeds as usual and
replicas increment their counters
■ If the operation is rejected,
communication with replicas is skipped
Case: Coordinator is a Replica

Coordinator lets replicas decide whether to
accept or reject.
■ Coordinator chooses a random value
and sends it to replicas
■ Replicas compute probability of
rejection based on their counter and
choose to reject based on the random
value
■ Replicas should have counter values
that are close to each other
Case: Coordinator is not a Replica

On write, all live replicas participate. All
replica counters are updated and all replicas
have a good estimate of the request rate for
the partition.
On read, not all replicas participate. The exact
number of replicas depends on replication
factor and consistency level. This can lead to
read operations to be under-counted - but it’s
still ﬁne for rate limiting.
Reads vs. Writes Accuracy
Writes Reads

■ People have mixed feelings about exceptions
■ They are a part of the language, and they are used in the standard library
■ …but they have some undesirable properties, e.g. hard-to-predict performance
■ We are using exceptions in ScyllaDB
■ Leads to more idiomatic code, and our framework supports them well
■ They aren’t a big problem, as long as you aren’t throwing them in large volumes
■ Throwing exceptions can be slow
■ It involves acquiring a global mutex which is not scalable
■ We worked around it, but had to disable caching which made throwing scalable, but slow
■ https://github.com/scylladb/seastar/blob/master/src/core/exception_hacks.cc
What’s Wrong with C++ Exceptions?

Seastar gives us ﬂow control
constructs that do not use throwing
underneath.
Exceptions can be stored in
std::exception_ptr and passed
around without throwing.
Problem is, the exception inside the
std::exception_ptr must be rethrown
in order to access it.
Exceptions in Seastar
future<> do_thing() {
return really_do_thing().finally([] {
std::cout << "Did the thingn"
});
}
future<> really_do_thing() {
if (fail_flag) {
return make_exception_future<>(
std::runtime_error("oh no!"));
} else {
return make_ready_future<>();
}
}

Use boost::result to return the result
(contains success or exception).
Use a custom container that allows
inspecting the exception.
Results in portable code, but very
tedious to convert existing code.
Approach 1: Avoid Them
future<result<>> do_thing() {
return really_do_thing().then(
[] (result<> res) -> result<> {
if (res) {
// handle success
} else {
// handle failure
}
}
);
}

Introduce an “exception_ptr
inspector” function and replace
existing try..catch blocks in a
straightforward way.
Make sure that for other things we
use the existing tools.
Non-portable code, but much less
work!
Approach 2: Implement Missing Parts
std::exception_ptr ep = get_exception();
if (auto* ex
= try_catch<std::logic_error>(ep)) {
// ...
} else if (auto* ex
= try_catch<std::runtime_error>(ep)) {
// ...
} else {
// ...
}
Based on the C++ proposal:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1066r1.html

Goodput Restored after Enabling Rate Limit
Highlighted: all shards loaded uniformly

Highlighted: one shard overloaded, no per-partition rate limiting

Highlighted: one shard overloaded, with per-partition rate limiting

More Stable Goodput under Timeouts
Yellow - before improvements, Green - after improvements
Before
Before
After
After

piodul
Thank You
Stay in Touch
Piotr Dulikowski
piodul@scylladb.com

Retaining Goodput with Query Rate Limiting

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Retaining Goodput with Query Rate Limiting

Ähnlich wie Retaining Goodput with Query Rate Limiting (20)

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Retaining Goodput with Query Rate Limiting