KSQL is a streaming SQL engine for Apache Kafka that allows users to perform stream processing by writing SQL-like queries. It enables non-engineers to process streaming data by using a familiar SQL syntax. KSQL queries can perform operations like filtering, aggregations, joins, and windowing on streaming data stored in Kafka topics. It leverages Kafka Streams to provide distributed, scalable, and fault-tolerant stream processing. KSQL can be run in client-server mode, embedded in applications, or deployed as standalone streaming ETL jobs.
8. 8
CREATE TABLE clicks AS
SELECT user_id, COUNT(url)
FROM clickstream
WINDOW TUMBLING (size 30 seconds)
GROUP BY user_id
HAVING COUNT(url) > 20
WHERE bytes > 1024;
Windowed Aggregations
15. 15
CREATE STREAM vip_actions AS
SELECT c.user_id, fullname, url, status
FROM clickstream c
LEFT JOIN users u ON c.user_id = u.user_id
WHERE u.level = 'Platinum';
Joins for Enrichment
19. 19
How to run KSQL
JVM
KSQL Server
KSQL CLI
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
#1 Client-server
20. 20
How to run KSQL
#1 Client-server
• Start any number of server nodes
bin/ksql-server-start
• Start one or more CLIs and point them to a server
bin/ksql https://myksqlserver:8090
• All servers share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
21. 21
How to run KSQL
JVM
KSQL Server
JVM
KSQL Server
JVM
KSQL Server
#2 as a standalone Application
Kafka Cluster
22. 22
How to run KSQL
#2 as a standalone Application
• Start any number of server nodes
Pass a file of KSQL statement to execute
bin/ksql-node query-file=foo/bar.sql
• Ideal for streaming ETL application deployment
Version-control your queries and transformations as code
• All running engines share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
23. 23
How to run KSQL
#3 EMBEDDED IN AN APPLICATION
JVM App Instance
KSQL Engine
Application Code
JVM App Instance
KSQL Engine
Application Code
JVM App Instance
KSQL Engine
Application Code
Kafka Cluster
24. 24
How to run KSQL
#3 EMBEDDED IN AN APPLICATION
• Embed directly in your Java application
• Generate and execute KSQL queries through the Java API
Version-control your queries and transformations as code
• All running application instances share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart