Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
1. Kafka Multi-Tenancy - 160
Billion Daily Messages on One
Shared Cluster at LINE
Yuto Kawamura - LINE Corporation
2. Speaker introduction
Yuto Kawamura
Senior Software Engineer at LINE
Leading a team for providing a
company-wide Kafka platform
Apache Kafka Contributor
Speaker at Kafka Summit SF
2017 1
1
https://kafka-summit.org/sessions/single-data-hub-
services-feed-100-billion-messages-per-day/
3. LINE
Messaging service
164 million active users in
countries with top market share
like Japan, Taiwan, Thailand and
Indonesia.2
And many other services:
- News
- Bitbox/Bitmax -
Cryptocurrency trading
- LINE Pay - Digital payment
2
As of June 2018.
4. Kafka platform at LINE
Two main usages:
— "Data Hub" for distributing data to other services
— e.g: Users relationship update event from
messaging service
— As a task queue for buffering and processing business
logic asynchronously
5. Kafka platform at LINE
Single cluster is shared by many independent services
for:
- Concept of Data Hub
- Efficiency of management/operation
Messaging, AD, News, Blockchain and etc... all of their
data stored and distributed on single Kafka cluster.
6. From department-wide to company-wide platform
It was just for messaging service. Now everyone uses it.
7. Broker installation
CPU: Intel(R) Xeon(R) 2.20GHz x 20 cores (HT) * 2
Memory: 256GiB
- more memory, more caching (page cache)
- newly written data can survive only 20 minutes ...
Network: 10Gbps
Disk: HDD x 12 RAID 1+0
- saves maintenance costs
Kafka version: 0.10.2.1 ~ 0.11.1.2
8. Requirements doing multitenancy
Cluster can protect itself against abusing workloads
- Accidental workload doesn't propagates to other
users.
We can track on which client is sending requests
- Find source of strange requests.
Certain level of isolation among client workloads
- Slow response for one client doesn't appears to
another client.
9. Protect cluster against abusing workload - Request
Quota
It is more important to manage number of requests over
incoming/outgoing byte rate.
Kafka is amazingly durable for large data if they are well-batched.
=> Producers which configures linger.ms=0 with large number of
servers probably leads large amount of requests
Starting from 0.11.0.0, by KIP-124 we can configure request rate
quota 3
3
https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+Request+rate+quotas
10. Protect cluster against abusing
workload - Request Quota
Basic idea is to apply default
quota for preventing single
abusing client destabilize the
cluster as a least protection.
*Not for controlling resource
quantity for each client.
11. Track on requests from clients - Metrics
— kafka.server:type=Request,user=([-.w]+),client-
id=([-.w]+):request-time
— Percentage of time spent in broker network and I/O
threads to process requests from each client
group.
— Useful to see how much of broker resource is being
consumed by each client.
12. Track on requests from clients -
Slowlog
Log requests which took longer
than certain threshold to
process.
- Kafka has "request logging"
but it leads too many of lines
- Inspired by HBase's
Thresholds can be changed
dynamically through JMX console
for each request type.
21. Network thread runs event loop
— Multiplex and processes assigned client sockets sequentially.
— It never blocks awaiting IO completion.
=> So it makes sense to set num.network.threads <= CPU_CORES
22. When Network threads gets busy...
It means either one of:
1. Really busy doing lots of work. Many requests/
responses to read/write
2. Blocked by some operations (which should not
happen in event loop in general)
23. Response handling of normal
requests
When response is in queue, all
data to be transferred are in
memory.
24. Exceptional handling for Fetch
response
When response is in queue, topic
data segments are not in
userspace memory.
=> Copy to client socket directly
inside the kernel using
sendfile(2) system call.
25. What if target data doesn't exists in page cache?
Target data in page cache:
=> Just a memory copy. Very fast: ~ 100us
Target data is NOT in page cache:
=> Needs to load data from disk into page cache first.
Can be slow: ~ 50ms (or even slower)
26. Suspecting blocking in sendfile(2)
Inspected duration of sendfile system calls issued by broker process using
SystemTap (dynamic tracing tool to probe events in kernel. see my previous talk 4
)
$ stap -e ‘(script counting sendfile(2) duration histogram)’
# value (us)
value |---------------------------------------- count
0 | 0
1 | 71
2 |@@@ 6171
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472
32 |@@@ 3418
2048 | 0
...
8192 | 3
4
https://www.confluent.io/kafka-summit-sf17/One-Day-One-Data-Hub-100-Billion-Messages-
Kafka-at-LINE
27. Problem hypothesis
Fetch request reading old data causes blocking sendfile(2) in event loop and applying latency for
responses needs to be processed in the same network thread.
28. Problem hypothesis
Super harmful because of:
It can be triggered either by:
- Consumers attempting to fetch old data
- Replica fetch by follower brokers for restoring replica
of old logs
=> Both are very common scenario
Breaks performance isolation among independent
clients.
29. Solution candidates
A: Separate network threads among clients
=> Possible, but a lot of changes required
=> Not essential because network threads should be
completely computation intensive
B: Balance connections among network threads
=> Possible, but again a lot of changes
=> Still for first moment other connections will get
affected
30. Solution candidates
C: Make sure that data are ready on memory before the
response passed to the network thread
=> Event loop never blocks
31. Choice: Warmup page cache
before the network thread
Move blocking part to request
handler threads (= single queue
and pool of threads)
=> Free thread can take arbitrary
task (request) while some
threads are blocked.
32. Choice: Warmup page cache
before the network thread
When Network thread calls
sendfile(2) for transferring log
data, it's always in page cache.
33. Warming up page cache with minimal overhead
Easiest way: Do synchronous read(2) on target data
=> Large overhead by copying memory from kernel to
userland
Why is Kafka using sendfile(2) for transferring topic data?
=> To avoid expensive large memory copy
How can we achieve it keeping this property?
34. Trick #1 Zero copy synchronous
page load
Call sendfile(2) for target data
with dest /dev/null.
The /dev/null driver does not
actually copy data to anywhere.
35. Why it has almost no overhead?
Linux kernel internally uses splice to implement sendfile(2).
splice implementation of /dev/null returns w/o iterating target data.
# ./drivers/char/mem.c
static const struct file_operations null_fops = {
...
.splice_write = splice_write_null,
};
static int pipe_to_null(...)
{
return sd->len;
}
static ssize_t splice_write_null(...)
{
return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_null);
}
37. Trick #2 Skip the "hot" last log
segment
Another concern: additional
syscalls * Fetch req count?
- Warming up is necessary only
for older data.
- Exclude the last log segment
from the warmup target.
38. Trick #2 Skip the "hot" last log segment
# Log.scala#read
@@ -585,6 +586,17 @@ class Log(@volatile var dir: File,
if(fetchInfo == null) {
entry = segments.higherEntry(entry.getKey)
} else {
+ // For last entries we assume that it is hot enough to still have all data in page cache.
+ // Most of fetch requests are fetching from the tail of the log, so this optimization
+ // should save call of sendfile significantly.
+ if (!isLastEntry && fetchInfo.records.isInstanceOf[FileRecords]) {
+ try {
+ info("Prepare Read for " + fetchInfo.records.asInstanceOf[FileRecords].file().getPath)
+ fetchInfo.records.asInstanceOf[FileRecords].prepareForRead()
+ } catch {
+ case e: Throwable => warn("failed to prepare cache for read", e)
+ }
+ }
return fetchInfo
}
39. It works
No response time degradation in irrelevant requests while there are coincidence of Fetch request
triggers disk read.
40. Patch upstream?
Concern: The patch heavily assumes underlying kernel
implementation.
Still:
- Effect is tremendous.
- Fixes very common performance degradation scenario.
Discuss at KAFKA-7504
41. Conclusion
— Talked requirements for multi tenancy clusters and
solutions
— Quota, Metrics, Slowlog ... and hacky patch.
— After fixing some issues our hosting policy is working well
and efficient, keeping:
— concept of single "Data Hub" and
— operational cost not proportional to the number of
users/usages.
— Kafka is well designed and implemented to contain many,
independent and different workloads.