SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
Kafka Multi-Tenancy - 160
Billion Daily Messages on One
Shared Cluster at LINE
Yuto Kawamura - LINE Corporation
Speaker introduction
Yuto Kawamura
Senior Software Engineer at LINE
Leading a team for providing a
company-wide Kafka platform
Apache Kafka Contributor
Speaker at Kafka Summit SF
2017 1
1
https://kafka-summit.org/sessions/single-data-hub-
services-feed-100-billion-messages-per-day/
LINE
Messaging service
164 million active users in
countries with top market share
like Japan, Taiwan, Thailand and
Indonesia.2
And many other services:
- News
- Bitbox/Bitmax -
Cryptocurrency trading
- LINE Pay - Digital payment
2
As of June 2018.
Kafka platform at LINE
Two main usages:
— "Data Hub" for distributing data to other services
— e.g: Users relationship update event from
messaging service
— As a task queue for buffering and processing business
logic asynchronously
Kafka platform at LINE
Single cluster is shared by many independent services
for:
- Concept of Data Hub
- Efficiency of management/operation
Messaging, AD, News, Blockchain and etc... all of their
data stored and distributed on single Kafka cluster.
From department-wide to company-wide platform
It was just for messaging service. Now everyone uses it.
Broker installation
CPU: Intel(R) Xeon(R) 2.20GHz x 20 cores (HT) * 2
Memory: 256GiB
- more memory, more caching (page cache)
- newly written data can survive only 20 minutes ...
Network: 10Gbps
Disk: HDD x 12 RAID 1+0
- saves maintenance costs
Kafka version: 0.10.2.1 ~ 0.11.1.2
Requirements doing multitenancy
Cluster can protect itself against abusing workloads
- Accidental workload doesn't propagates to other
users.
We can track on which client is sending requests
- Find source of strange requests.
Certain level of isolation among client workloads
- Slow response for one client doesn't appears to
another client.
Protect cluster against abusing workload - Request
Quota
It is more important to manage number of requests over
incoming/outgoing byte rate.
Kafka is amazingly durable for large data if they are well-batched.
=> Producers which configures linger.ms=0 with large number of
servers probably leads large amount of requests
Starting from 0.11.0.0, by KIP-124 we can configure request rate
quota 3
3
https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+Request+rate+quotas
Protect cluster against abusing
workload - Request Quota
Basic idea is to apply default
quota for preventing single
abusing client destabilize the
cluster as a least protection.
*Not for controlling resource
quantity for each client.
Track on requests from clients - Metrics
— kafka.server:type=Request,user=([-.w]+),client-
id=([-.w]+):request-time
— Percentage of time spent in broker network and I/O
threads to process requests from each client
group.
— Useful to see how much of broker resource is being
consumed by each client.
Track on requests from clients -
Slowlog
Log requests which took longer
than certain threshold to
process.
- Kafka has "request logging"
but it leads too many of lines
- Inspired by HBase's
Thresholds can be changed
dynamically through JMX console
for each request type.
Isolation among client workloads
Let's give a look through the actual troubleshooting.
Detection
Detection: 50x ~ 100x slower
response time in 99th %ile
Produce response time.
Normal: ~20ms
Observed: 50ms ~ 200ms
Finding #1: Disk read
coincidence
Coincidence disk read of certain
amount.
Finding #2: Network threads got
busier
Network threads' utilization was
very high.
Metrics:
kafka.network:type=SocketServer,n
ame=NetworkProcessorAvgIdlePercen
t
Request handling in Kafka broker
Two thread layers:
Network Threads: Reads/Writes request/response from/to client sockets.
Request Handler Threads: Processes requests and produces response object.
Request handling - Read Request
Request handling - Process
Request handling - Write Response
Network thread runs event loop
— Multiplex and processes assigned client sockets sequentially.
— It never blocks awaiting IO completion.
=> So it makes sense to set num.network.threads <= CPU_CORES
When Network threads gets busy...
It means either one of:
1. Really busy doing lots of work. Many requests/
responses to read/write
2. Blocked by some operations (which should not
happen in event loop in general)
Response handling of normal
requests
When response is in queue, all
data to be transferred are in
memory.
Exceptional handling for Fetch
response
When response is in queue, topic
data segments are not in
userspace memory.
=> Copy to client socket directly
inside the kernel using
sendfile(2) system call.
What if target data doesn't exists in page cache?
Target data in page cache:
=> Just a memory copy. Very fast: ~ 100us
Target data is NOT in page cache:
=> Needs to load data from disk into page cache first.
Can be slow: ~ 50ms (or even slower)
Suspecting blocking in sendfile(2)
Inspected duration of sendfile system calls issued by broker process using
SystemTap (dynamic tracing tool to probe events in kernel. see my previous talk 4
)
$ stap -e ‘(script counting sendfile(2) duration histogram)’
# value (us)
value |---------------------------------------- count
0 | 0
1 | 71
2 |@@@ 6171
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472
32 |@@@ 3418
2048 | 0
...
8192 | 3
4
https://www.confluent.io/kafka-summit-sf17/One-Day-One-Data-Hub-100-Billion-Messages-
Kafka-at-LINE
Problem hypothesis
Fetch request reading old data causes blocking sendfile(2) in event loop and applying latency for
responses needs to be processed in the same network thread.
Problem hypothesis
Super harmful because of:
It can be triggered either by:
- Consumers attempting to fetch old data
- Replica fetch by follower brokers for restoring replica
of old logs
=> Both are very common scenario
Breaks performance isolation among independent
clients.
Solution candidates
A: Separate network threads among clients
=> Possible, but a lot of changes required
=> Not essential because network threads should be
completely computation intensive
B: Balance connections among network threads
=> Possible, but again a lot of changes
=> Still for first moment other connections will get
affected
Solution candidates
C: Make sure that data are ready on memory before the
response passed to the network thread
=> Event loop never blocks
Choice: Warmup page cache
before the network thread
Move blocking part to request
handler threads (= single queue
and pool of threads)
=> Free thread can take arbitrary
task (request) while some
threads are blocked.
Choice: Warmup page cache
before the network thread
When Network thread calls
sendfile(2) for transferring log
data, it's always in page cache.
Warming up page cache with minimal overhead
Easiest way: Do synchronous read(2) on target data
=> Large overhead by copying memory from kernel to
userland
Why is Kafka using sendfile(2) for transferring topic data?
=> To avoid expensive large memory copy
How can we achieve it keeping this property?
Trick #1 Zero copy synchronous
page load
Call sendfile(2) for target data
with dest /dev/null.
The /dev/null driver does not
actually copy data to anywhere.
Why it has almost no overhead?
Linux kernel internally uses splice to implement sendfile(2).
splice implementation of /dev/null returns w/o iterating target data.
# ./drivers/char/mem.c
static const struct file_operations null_fops = {
...
.splice_write = splice_write_null,
};
static int pipe_to_null(...)
{
return sd->len;
}
static ssize_t splice_write_null(...)
{
return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_null);
}
Implementing page load
// FileRecords.java
private static final java.nio.file.Path DEVNULL_PATH =
new File("/dev/null").toPath();
public void prepareForRead() throws IOException {
long size = Math.min(channel.size(), end) - start;
try (FileChannel devnullChannel = FileChannel.open(
DEVNULL_PATH, StandardOpenOption.WRITE)) {
// Calls sendfile(2) internally
channel.transferTo(start, size, devnullChannel);
}
}
Trick #2 Skip the "hot" last log
segment
Another concern: additional
syscalls * Fetch req count?
- Warming up is necessary only
for older data.
- Exclude the last log segment
from the warmup target.
Trick #2 Skip the "hot" last log segment
# Log.scala#read
@@ -585,6 +586,17 @@ class Log(@volatile var dir: File,
if(fetchInfo == null) {
entry = segments.higherEntry(entry.getKey)
} else {
+ // For last entries we assume that it is hot enough to still have all data in page cache.
+ // Most of fetch requests are fetching from the tail of the log, so this optimization
+ // should save call of sendfile significantly.
+ if (!isLastEntry && fetchInfo.records.isInstanceOf[FileRecords]) {
+ try {
+ info("Prepare Read for " + fetchInfo.records.asInstanceOf[FileRecords].file().getPath)
+ fetchInfo.records.asInstanceOf[FileRecords].prepareForRead()
+ } catch {
+ case e: Throwable => warn("failed to prepare cache for read", e)
+ }
+ }
return fetchInfo
}
It works
No response time degradation in irrelevant requests while there are coincidence of Fetch request
triggers disk read.
Patch upstream?
Concern: The patch heavily assumes underlying kernel
implementation.
Still:
- Effect is tremendous.
- Fixes very common performance degradation scenario.
Discuss at KAFKA-7504
Conclusion
— Talked requirements for multi tenancy clusters and
solutions
— Quota, Metrics, Slowlog ... and hacky patch.
— After fixing some issues our hosting policy is working well
and efficient, keeping:
— concept of single "Data Hub" and
— operational cost not proportional to the number of
users/usages.
— Kafka is well designed and implemented to contain many,
independent and different workloads.
End of presentation.
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 

Was ist angesagt? (20)

Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka
 
카프카, 산전수전 노하우
카프카, 산전수전 노하우카프카, 산전수전 노하우
카프카, 산전수전 노하우
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice Architectures
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
 
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Apache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesApache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and Architectures
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Airflow를 이용한 데이터 Workflow 관리
Airflow를 이용한  데이터 Workflow 관리Airflow를 이용한  데이터 Workflow 관리
Airflow를 이용한 데이터 Workflow 관리
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada
 

Ähnlich wie Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE

Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 

Ähnlich wie Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE (20)

Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE

  • 1. Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE Yuto Kawamura - LINE Corporation
  • 2. Speaker introduction Yuto Kawamura Senior Software Engineer at LINE Leading a team for providing a company-wide Kafka platform Apache Kafka Contributor Speaker at Kafka Summit SF 2017 1 1 https://kafka-summit.org/sessions/single-data-hub- services-feed-100-billion-messages-per-day/
  • 3. LINE Messaging service 164 million active users in countries with top market share like Japan, Taiwan, Thailand and Indonesia.2 And many other services: - News - Bitbox/Bitmax - Cryptocurrency trading - LINE Pay - Digital payment 2 As of June 2018.
  • 4. Kafka platform at LINE Two main usages: — "Data Hub" for distributing data to other services — e.g: Users relationship update event from messaging service — As a task queue for buffering and processing business logic asynchronously
  • 5. Kafka platform at LINE Single cluster is shared by many independent services for: - Concept of Data Hub - Efficiency of management/operation Messaging, AD, News, Blockchain and etc... all of their data stored and distributed on single Kafka cluster.
  • 6. From department-wide to company-wide platform It was just for messaging service. Now everyone uses it.
  • 7. Broker installation CPU: Intel(R) Xeon(R) 2.20GHz x 20 cores (HT) * 2 Memory: 256GiB - more memory, more caching (page cache) - newly written data can survive only 20 minutes ... Network: 10Gbps Disk: HDD x 12 RAID 1+0 - saves maintenance costs Kafka version: 0.10.2.1 ~ 0.11.1.2
  • 8. Requirements doing multitenancy Cluster can protect itself against abusing workloads - Accidental workload doesn't propagates to other users. We can track on which client is sending requests - Find source of strange requests. Certain level of isolation among client workloads - Slow response for one client doesn't appears to another client.
  • 9. Protect cluster against abusing workload - Request Quota It is more important to manage number of requests over incoming/outgoing byte rate. Kafka is amazingly durable for large data if they are well-batched. => Producers which configures linger.ms=0 with large number of servers probably leads large amount of requests Starting from 0.11.0.0, by KIP-124 we can configure request rate quota 3 3 https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+Request+rate+quotas
  • 10. Protect cluster against abusing workload - Request Quota Basic idea is to apply default quota for preventing single abusing client destabilize the cluster as a least protection. *Not for controlling resource quantity for each client.
  • 11. Track on requests from clients - Metrics — kafka.server:type=Request,user=([-.w]+),client- id=([-.w]+):request-time — Percentage of time spent in broker network and I/O threads to process requests from each client group. — Useful to see how much of broker resource is being consumed by each client.
  • 12. Track on requests from clients - Slowlog Log requests which took longer than certain threshold to process. - Kafka has "request logging" but it leads too many of lines - Inspired by HBase's Thresholds can be changed dynamically through JMX console for each request type.
  • 13. Isolation among client workloads Let's give a look through the actual troubleshooting.
  • 14. Detection Detection: 50x ~ 100x slower response time in 99th %ile Produce response time. Normal: ~20ms Observed: 50ms ~ 200ms
  • 15. Finding #1: Disk read coincidence Coincidence disk read of certain amount.
  • 16. Finding #2: Network threads got busier Network threads' utilization was very high. Metrics: kafka.network:type=SocketServer,n ame=NetworkProcessorAvgIdlePercen t
  • 17. Request handling in Kafka broker Two thread layers: Network Threads: Reads/Writes request/response from/to client sockets. Request Handler Threads: Processes requests and produces response object.
  • 18. Request handling - Read Request
  • 20. Request handling - Write Response
  • 21. Network thread runs event loop — Multiplex and processes assigned client sockets sequentially. — It never blocks awaiting IO completion. => So it makes sense to set num.network.threads <= CPU_CORES
  • 22. When Network threads gets busy... It means either one of: 1. Really busy doing lots of work. Many requests/ responses to read/write 2. Blocked by some operations (which should not happen in event loop in general)
  • 23. Response handling of normal requests When response is in queue, all data to be transferred are in memory.
  • 24. Exceptional handling for Fetch response When response is in queue, topic data segments are not in userspace memory. => Copy to client socket directly inside the kernel using sendfile(2) system call.
  • 25. What if target data doesn't exists in page cache? Target data in page cache: => Just a memory copy. Very fast: ~ 100us Target data is NOT in page cache: => Needs to load data from disk into page cache first. Can be slow: ~ 50ms (or even slower)
  • 26. Suspecting blocking in sendfile(2) Inspected duration of sendfile system calls issued by broker process using SystemTap (dynamic tracing tool to probe events in kernel. see my previous talk 4 ) $ stap -e ‘(script counting sendfile(2) duration histogram)’ # value (us) value |---------------------------------------- count 0 | 0 1 | 71 2 |@@@ 6171 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472 32 |@@@ 3418 2048 | 0 ... 8192 | 3 4 https://www.confluent.io/kafka-summit-sf17/One-Day-One-Data-Hub-100-Billion-Messages- Kafka-at-LINE
  • 27. Problem hypothesis Fetch request reading old data causes blocking sendfile(2) in event loop and applying latency for responses needs to be processed in the same network thread.
  • 28. Problem hypothesis Super harmful because of: It can be triggered either by: - Consumers attempting to fetch old data - Replica fetch by follower brokers for restoring replica of old logs => Both are very common scenario Breaks performance isolation among independent clients.
  • 29. Solution candidates A: Separate network threads among clients => Possible, but a lot of changes required => Not essential because network threads should be completely computation intensive B: Balance connections among network threads => Possible, but again a lot of changes => Still for first moment other connections will get affected
  • 30. Solution candidates C: Make sure that data are ready on memory before the response passed to the network thread => Event loop never blocks
  • 31. Choice: Warmup page cache before the network thread Move blocking part to request handler threads (= single queue and pool of threads) => Free thread can take arbitrary task (request) while some threads are blocked.
  • 32. Choice: Warmup page cache before the network thread When Network thread calls sendfile(2) for transferring log data, it's always in page cache.
  • 33. Warming up page cache with minimal overhead Easiest way: Do synchronous read(2) on target data => Large overhead by copying memory from kernel to userland Why is Kafka using sendfile(2) for transferring topic data? => To avoid expensive large memory copy How can we achieve it keeping this property?
  • 34. Trick #1 Zero copy synchronous page load Call sendfile(2) for target data with dest /dev/null. The /dev/null driver does not actually copy data to anywhere.
  • 35. Why it has almost no overhead? Linux kernel internally uses splice to implement sendfile(2). splice implementation of /dev/null returns w/o iterating target data. # ./drivers/char/mem.c static const struct file_operations null_fops = { ... .splice_write = splice_write_null, }; static int pipe_to_null(...) { return sd->len; } static ssize_t splice_write_null(...) { return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_null); }
  • 36. Implementing page load // FileRecords.java private static final java.nio.file.Path DEVNULL_PATH = new File("/dev/null").toPath(); public void prepareForRead() throws IOException { long size = Math.min(channel.size(), end) - start; try (FileChannel devnullChannel = FileChannel.open( DEVNULL_PATH, StandardOpenOption.WRITE)) { // Calls sendfile(2) internally channel.transferTo(start, size, devnullChannel); } }
  • 37. Trick #2 Skip the "hot" last log segment Another concern: additional syscalls * Fetch req count? - Warming up is necessary only for older data. - Exclude the last log segment from the warmup target.
  • 38. Trick #2 Skip the "hot" last log segment # Log.scala#read @@ -585,6 +586,17 @@ class Log(@volatile var dir: File, if(fetchInfo == null) { entry = segments.higherEntry(entry.getKey) } else { + // For last entries we assume that it is hot enough to still have all data in page cache. + // Most of fetch requests are fetching from the tail of the log, so this optimization + // should save call of sendfile significantly. + if (!isLastEntry && fetchInfo.records.isInstanceOf[FileRecords]) { + try { + info("Prepare Read for " + fetchInfo.records.asInstanceOf[FileRecords].file().getPath) + fetchInfo.records.asInstanceOf[FileRecords].prepareForRead() + } catch { + case e: Throwable => warn("failed to prepare cache for read", e) + } + } return fetchInfo }
  • 39. It works No response time degradation in irrelevant requests while there are coincidence of Fetch request triggers disk read.
  • 40. Patch upstream? Concern: The patch heavily assumes underlying kernel implementation. Still: - Effect is tremendous. - Fixes very common performance degradation scenario. Discuss at KAFKA-7504
  • 41. Conclusion — Talked requirements for multi tenancy clusters and solutions — Quota, Metrics, Slowlog ... and hacky patch. — After fixing some issues our hosting policy is working well and efficient, keeping: — concept of single "Data Hub" and — operational cost not proportional to the number of users/usages. — Kafka is well designed and implemented to contain many, independent and different workloads.