SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
1
KSQL Performance Tuning
For Fun and Profit
Nick Dearden
2
KSQL Performance Tuning Planning
For Fun and Profit
Nick Dearden
3
4
5
6
77
Anatomy of a KSQL Query
Tuning goals
Performance Factors
What to Monitor
Rules of Thumb
8
● Every KSQL continuous query results in a
Kafka Streams Application
● An Application has a Topology…
● ..which may have sub-topologies…
● ..which are executed on StreamThreads
Apps,
CPUs,
Topologies
and Threads,
Oh My!
9
WriteProcessRead
10
Topologies, Tasks, & Partitions
• Topologies are divided into sub-topologies at read-write boundaries
- Read-process-write loop
• Within a sub-topology, tasks created for the max input partition count
- If multiple input topics, they are being co-processed, e.g. joins
- Internal topics, such as *-rekey ones, are counted too
• Each task is assigned to at most one StreamThread
- A StreamThread results in at least 3 JVM threads being created
- A StreamThread has its own Consumer and Producer instance
1111
Topologies, Tasks, &
Partitions
Divide a topology into read-
process-write sub-topologies
Thanks to Andy Bryant for the diagram!
12
Can I just explain it ?
● ksql> show queries;
● ksql> explain CSAS_R2_0;
13
ksql> create stream r2 as select stars, user_id, channel
from ratings;
Query ID | Kafka Topic | Query String
----------------------------------------------------------------
----------------------------------------
CSAS_R2_0 | R2 | CREATE STREAM R2 WITH
(KAFKA_TOPIC='R2', PARTITIONS=1, REPLICAS=1) AS SELECT
RATINGS.STARS "STARS",
RATINGS.USER_ID "USER_ID",
RATINGS.CHANNEL "CHANNEL"
FROM RATINGS;
----------------------------------------------------------------
----------------------------------------
For detailed information on a Query run: EXPLAIN <Query ID>;
ksql> show queries;
14
ksql> explain CSAS_R2_0;
Execution plan
--------------
> [ SINK ] | Schema: [ROWKEY STRING KEY, STARS INTEGER, USER_ID INTEGER, CHANNEL STRING]
> [ PROJECT ] | Schema: [ROWKEY STRING KEY, STARS INTEGER, USER_ID INTEGER,
CHANNEL STRING]
> [ SOURCE ] | Schema: [RATINGS.ROWKEY STRING KEY, RATINGS.ROWTIME
BIGINT, RATINGS.ROWKEY STRING, RATINGS.RATING_ID
BIGINT, RATINGS.USER_ID INTEGER, RATINGS.STARS
INTEGER, RATINGS.ROUTE_ID INTEGER,
RATINGS.RATING_TIME BIGINT, RATINGS.CHANNEL
STRING, RATINGS.MESSAGE STRING]
15
Multiple Queries ->
Multiple Topologies
1616
Performance Goals
● Latency ?
● Throughput ?
● Elasticity ?
17
18
Breaking Rules ?
● Consumer /
producer configs
● Message format
(Avro uses less CPU
than JSON)
● Compression
19
Network
1 Gb/s ~= 100-110 MB/s
Message Size
100 bytes : 1,000,000 / sec
1kb messages : 100,000 / sec
10kb messages : 10,000 / sec
20
21
Stream/Table Duality
22
State Stores (RocksDB)
Tables - consider key-space cardinality and message-size
Joins - join type, join windows
Aggregates - window sizes, group cardinality
23
Fault-Tolerance, powered by Kafka
Server A:
“I do stateful stream
processing, like tables,
joins, aggregations.”
“streaming
restore” of
A’s local state to BChangelog Topic
“streaming
backup” of
A’s local state
KSQL / Kafka
Streams App
Kafka
A key challenge of distributed stream processing is fault-tolerant state.
State is automatically migrated
in case of server failure
Server B:
“I restore the state and
continue processing
where
server A stopped.”
2424
Some Measurements
● KSQL Servers – i3.xlarge
○ 4 vCPUs
○ 30.5 GB memory
○ “up to 10Gbit network” (experimentally measured at ~ 1.2Gb/s
full-duplex baseline)
○ 200GB EBS SSD
● JVM Settings
○ Heap size 16GB (~50% of RAM, to leave space for state-stores)
25
Test Highlights
• Simple project query
(“speed-of-light”)
• CREATE STREAM foo AS
SELECT * FROM bar;
#
Queries msg/s MB/s
msg
size
CPU
%
MB Mem
Max
2 193k 59.14 320 99.19 18,949
10 189k 57.67 320 99.74 20,101
20 175k 53.43 320 99.68 23,377
50 168k 51.37 320 96.61 28,291
• 4 cores can’t saturate a 1Gb network link in
this test (but larger messages get close)
26
Test Highlights
• Simple project query
(“speed-of-light”)
• CREATE STREAM foo AS
SELECT * FROM bar;
#
Queries
#
Servers msg/s
msg/s/
host MB/s
CPU
%
2 1 193k 193k 59 99
2 3 585k 195k 179 96
2 10 1,855k 185k 567 96
Message throughput scales with server count
(same query, same data, msg-size=300bytes)
27
CREATE STREAM vip_actions AS
SELECT userid, page, action, zipcode
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
28
Test Highlights
• Stream-Table join
Stream-table join runs at ~50% throughput of
project query
#
Queries msg/s MB/s
msg
size
CPU
%
MB Mem
Max
2 88k 26 314 99.8 18,022
10 80k 24 314 99.8 19,931
29
Further Results
• A non-windowed aggregate on the same data ran at ~47k msgs/sec
• A windowed aggregate ran at ~24k msgs/sec (varies with window params)
• Re-partitioning can cut these results further
30
Miscellaneous Factors
• UDFs / UDAFs
• Scaling horizontally vs vertically
• Planning for elasticity
31
Take-Aways (1)
• Establish c
• Project and filter queries are cheap and fast
• Joins are slower, aggregates more so
• If select throughput (c) is 100%, then
• Joins run at about 50% of c
• Aggregates run at about 25%
• Windowed aggregates run ~10-15%
32
Take-Aways (2)
• (de)serialization is the most expensive part of any query
• Use Avro message format
• Start with 4 CPU cores for “serious” message volumes
• Use SSD for any state stores (speed > size)
33

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Cassandra under the hood
Cassandra under the hoodCassandra under the hood
Cassandra under the hood
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 

Ähnlich wie KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka Summit SF 2019

Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Vigyan Jain
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 

Ähnlich wie KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka Summit SF 2019 (20)

5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log InsightVMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
 
Presentation oracle net services
Presentation    oracle net servicesPresentation    oracle net services
Presentation oracle net services
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
 
Corralling Big Data at TACC
Corralling Big Data at TACCCorralling Big Data at TACC
Corralling Big Data at TACC
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka Summit SF 2019

  • 1. 1 KSQL Performance Tuning For Fun and Profit Nick Dearden
  • 2. 2 KSQL Performance Tuning Planning For Fun and Profit Nick Dearden
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 77 Anatomy of a KSQL Query Tuning goals Performance Factors What to Monitor Rules of Thumb
  • 8. 8 ● Every KSQL continuous query results in a Kafka Streams Application ● An Application has a Topology… ● ..which may have sub-topologies… ● ..which are executed on StreamThreads Apps, CPUs, Topologies and Threads, Oh My!
  • 10. 10 Topologies, Tasks, & Partitions • Topologies are divided into sub-topologies at read-write boundaries - Read-process-write loop • Within a sub-topology, tasks created for the max input partition count - If multiple input topics, they are being co-processed, e.g. joins - Internal topics, such as *-rekey ones, are counted too • Each task is assigned to at most one StreamThread - A StreamThread results in at least 3 JVM threads being created - A StreamThread has its own Consumer and Producer instance
  • 11. 1111 Topologies, Tasks, & Partitions Divide a topology into read- process-write sub-topologies Thanks to Andy Bryant for the diagram!
  • 12. 12 Can I just explain it ? ● ksql> show queries; ● ksql> explain CSAS_R2_0;
  • 13. 13 ksql> create stream r2 as select stars, user_id, channel from ratings; Query ID | Kafka Topic | Query String ---------------------------------------------------------------- ---------------------------------------- CSAS_R2_0 | R2 | CREATE STREAM R2 WITH (KAFKA_TOPIC='R2', PARTITIONS=1, REPLICAS=1) AS SELECT RATINGS.STARS "STARS", RATINGS.USER_ID "USER_ID", RATINGS.CHANNEL "CHANNEL" FROM RATINGS; ---------------------------------------------------------------- ---------------------------------------- For detailed information on a Query run: EXPLAIN <Query ID>; ksql> show queries;
  • 14. 14 ksql> explain CSAS_R2_0; Execution plan -------------- > [ SINK ] | Schema: [ROWKEY STRING KEY, STARS INTEGER, USER_ID INTEGER, CHANNEL STRING] > [ PROJECT ] | Schema: [ROWKEY STRING KEY, STARS INTEGER, USER_ID INTEGER, CHANNEL STRING] > [ SOURCE ] | Schema: [RATINGS.ROWKEY STRING KEY, RATINGS.ROWTIME BIGINT, RATINGS.ROWKEY STRING, RATINGS.RATING_ID BIGINT, RATINGS.USER_ID INTEGER, RATINGS.STARS INTEGER, RATINGS.ROUTE_ID INTEGER, RATINGS.RATING_TIME BIGINT, RATINGS.CHANNEL STRING, RATINGS.MESSAGE STRING]
  • 16. 1616 Performance Goals ● Latency ? ● Throughput ? ● Elasticity ?
  • 17. 17
  • 18. 18 Breaking Rules ? ● Consumer / producer configs ● Message format (Avro uses less CPU than JSON) ● Compression
  • 19. 19 Network 1 Gb/s ~= 100-110 MB/s Message Size 100 bytes : 1,000,000 / sec 1kb messages : 100,000 / sec 10kb messages : 10,000 / sec
  • 20. 20
  • 22. 22 State Stores (RocksDB) Tables - consider key-space cardinality and message-size Joins - join type, join windows Aggregates - window sizes, group cardinality
  • 23. 23 Fault-Tolerance, powered by Kafka Server A: “I do stateful stream processing, like tables, joins, aggregations.” “streaming restore” of A’s local state to BChangelog Topic “streaming backup” of A’s local state KSQL / Kafka Streams App Kafka A key challenge of distributed stream processing is fault-tolerant state. State is automatically migrated in case of server failure Server B: “I restore the state and continue processing where server A stopped.”
  • 24. 2424 Some Measurements ● KSQL Servers – i3.xlarge ○ 4 vCPUs ○ 30.5 GB memory ○ “up to 10Gbit network” (experimentally measured at ~ 1.2Gb/s full-duplex baseline) ○ 200GB EBS SSD ● JVM Settings ○ Heap size 16GB (~50% of RAM, to leave space for state-stores)
  • 25. 25 Test Highlights • Simple project query (“speed-of-light”) • CREATE STREAM foo AS SELECT * FROM bar; # Queries msg/s MB/s msg size CPU % MB Mem Max 2 193k 59.14 320 99.19 18,949 10 189k 57.67 320 99.74 20,101 20 175k 53.43 320 99.68 23,377 50 168k 51.37 320 96.61 28,291 • 4 cores can’t saturate a 1Gb network link in this test (but larger messages get close)
  • 26. 26 Test Highlights • Simple project query (“speed-of-light”) • CREATE STREAM foo AS SELECT * FROM bar; # Queries # Servers msg/s msg/s/ host MB/s CPU % 2 1 193k 193k 59 99 2 3 585k 195k 179 96 2 10 1,855k 185k 567 96 Message throughput scales with server count (same query, same data, msg-size=300bytes)
  • 27. 27 CREATE STREAM vip_actions AS SELECT userid, page, action, zipcode FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 28. 28 Test Highlights • Stream-Table join Stream-table join runs at ~50% throughput of project query # Queries msg/s MB/s msg size CPU % MB Mem Max 2 88k 26 314 99.8 18,022 10 80k 24 314 99.8 19,931
  • 29. 29 Further Results • A non-windowed aggregate on the same data ran at ~47k msgs/sec • A windowed aggregate ran at ~24k msgs/sec (varies with window params) • Re-partitioning can cut these results further
  • 30. 30 Miscellaneous Factors • UDFs / UDAFs • Scaling horizontally vs vertically • Planning for elasticity
  • 31. 31 Take-Aways (1) • Establish c • Project and filter queries are cheap and fast • Joins are slower, aggregates more so • If select throughput (c) is 100%, then • Joins run at about 50% of c • Aggregates run at about 25% • Windowed aggregates run ~10-15%
  • 32. 32 Take-Aways (2) • (de)serialization is the most expensive part of any query • Use Avro message format • Start with 4 CPU cores for “serious” message volumes • Use SSD for any state stores (speed > size)
  • 33. 33