Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Use Apache Gradle to Build
and Automate KSQL and
Apache Kafka Streams
twitter: @stewartbryson
medium: @stewartbryson
linkedin: stewartbryson
Owner & CEO
Red Pill Analytics
@redpilla
What We Do
Data Warehouse Analytics
ANALYTICS
Data-engineering & ETL
@redpilla
Project Overview
KSQL
KSQL User Defined Functions
(demo-only)
Kafka Streams (demo-only)
Agenda
@redpilla
@redpilla
Project Inspiration
@redpilla
Data Integration: Streaming In
@redpilla
Data Integration: Streaming Out
@redpilla
Packaging of Payloads for OWMC
@redpilla
KSQL Primer
@redpilla
KSQL Registration Queries
CREATE STREAM clickstream
( _time bigint,
time varchar,
ip varchar,
request varchar,
s...
@redpilla
KSQL Persistent Queries
CREATE TABLE events_per_min AS
SELECT userid,
count(*) AS events
FROM clickstream window...
@redpilla
A pipeline is a group of SQL
statements that work together to
define an end-to-end process.
@redpilla
KSQL Dependencies
@redpilla
KSQL Pipelines
clickstream
clickstream_codes
Streams and TablesSQL Scripts
enriched_error_codes
enriched_error_c...
@redpilla
KSQL Pipelines
clickstream
clickstream_codes
Streams and TablesSQL Scripts
enriched_error_codes
enriched_error_c...
@redpilla
KSQL Pipelines
clickstream
clickstream_codes
Streams and TablesSQL Scripts
enriched_error_codes
enriched_error_c...
@redpilla
KSQL Pipelines
clickstream
clickstream_codes
Streams and TablesSQL Scripts
enriched_error_codes
enriched_error_c...
@redpilla
Multiple statements per pipeline
Statements are in script files
Script files are in Git
Developers need to itera...
@redpilla
What is Apache Gradle?
Gradle is an open-source build automation tool for the Age
of Continuous Delivery (CD). B...
@redpilla
What is Apache Gradle?
@redpilla
gradle-confluent for KSQL
pipelines
shadow for KSQL user defined
functions
application for Kafka Streams
Gradle ...
@redpilla
Plugin
@redpilla
Plugin
github.com/RedPillAnalytics/gradle-confluent
@redpilla
Unfortunately…
I only have 40 minutes
@redpilla
Unfortunately…
I only have 33 minutes
@redpilla
Demo
@redpilla
Demo
...using a Jupyter Notebook
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryson, Red Pill Analytics) Kafka Summit NYC 2019
Nächste SlideShare
Wird geladen in …5
×

Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryson, Red Pill Analytics) Kafka Summit NYC 2019

280 Aufrufe

Veröffentlicht am

KSQL is an easy-to-use and easy-to-understand streaming SQL engine for Apache Kafka built on top of Kafka Streams. The ability to write streaming applications using only SQL makes Apache Kafka available to a whole range of new developers and potential use cases, either as a stand-alone solution, or as a single component to a broader Kafka Streams implementation. Inspired by a customer project now in production, experience the lifecycle of a streaming application developed using KSQL and Kafka Streams. With Apache Gradle as our build framework, we’ll explore the open-source Gradle plugin we built during this project to improve developer efficiency and automate the deployment of KSQL pipelines, user-defined functions, and Kafka Streams microservices.

We’ll demonstrate the deployment process live, and discuss design decisions around incorporating SQL-based processes into an overall streaming application.

Key Takeaways
1. KSQL is a natural choice for expressing data-driven applications, but it may not naturally fit into established DevOps processes and automations.
2. We built an open-source Gradle plugin to handle all aspects of deploying a Kafka-based streaming application: KSQL pipelines, KSQL user-defined functions, and Kafka Streams microservices.
3. KSQL pipelines can be deployed using either a server start script, or the KSQL REST API, and our Gradle plugin fully supports both options.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryson, Red Pill Analytics) Kafka Summit NYC 2019

  1. 1. Use Apache Gradle to Build and Automate KSQL and Apache Kafka Streams
  2. 2. twitter: @stewartbryson medium: @stewartbryson linkedin: stewartbryson Owner & CEO Red Pill Analytics
  3. 3. @redpilla What We Do Data Warehouse Analytics ANALYTICS Data-engineering & ETL
  4. 4. @redpilla Project Overview KSQL KSQL User Defined Functions (demo-only) Kafka Streams (demo-only) Agenda
  5. 5. @redpilla
  6. 6. @redpilla Project Inspiration
  7. 7. @redpilla Data Integration: Streaming In
  8. 8. @redpilla Data Integration: Streaming Out
  9. 9. @redpilla Packaging of Payloads for OWMC
  10. 10. @redpilla KSQL Primer
  11. 11. @redpilla KSQL Registration Queries CREATE STREAM clickstream ( _time bigint, time varchar, ip varchar, request varchar, status int, userid int, bytes bigint, agent varchar ) with ( kafka_topic = 'clickstream', value_format = 'json' );
  12. 12. @redpilla KSQL Persistent Queries CREATE TABLE events_per_min AS SELECT userid, count(*) AS events FROM clickstream window TUMBLING (size 60 second) GROUP BY userid;
  13. 13. @redpilla A pipeline is a group of SQL statements that work together to define an end-to-end process.
  14. 14. @redpilla KSQL Dependencies
  15. 15. @redpilla KSQL Pipelines clickstream clickstream_codes Streams and TablesSQL Scripts enriched_error_codes enriched_error_codes_count customer_clickstream user_clickstream web_users click_user_sessions
  16. 16. @redpilla KSQL Pipelines clickstream clickstream_codes Streams and TablesSQL Scripts enriched_error_codes enriched_error_codes_count customer_clickstream user_clickstream web_users click_user_sessions Dependencies within a pipeline
  17. 17. @redpilla KSQL Pipelines clickstream clickstream_codes Streams and TablesSQL Scripts enriched_error_codes enriched_error_codes_count customer_clickstream user_clickstream web_users click_user_sessions Dependencies across pipelines
  18. 18. @redpilla KSQL Pipelines clickstream clickstream_codes Streams and TablesSQL Scripts enriched_error_codes enriched_error_codes_count customer_clickstream user_clickstream web_users click_user_sessions Manage all KSQL dependencies in one place
  19. 19. @redpilla Multiple statements per pipeline Statements are in script files Script files are in Git Developers need to iterate
  20. 20. @redpilla What is Apache Gradle? Gradle is an open-source build automation tool for the Age of Continuous Delivery (CD). Building software is no longer just about compiling, linking and packaging.
  21. 21. @redpilla What is Apache Gradle?
  22. 22. @redpilla gradle-confluent for KSQL pipelines shadow for KSQL user defined functions application for Kafka Streams Gradle Plugins
  23. 23. @redpilla Plugin
  24. 24. @redpilla Plugin github.com/RedPillAnalytics/gradle-confluent
  25. 25. @redpilla Unfortunately… I only have 40 minutes
  26. 26. @redpilla Unfortunately… I only have 33 minutes
  27. 27. @redpilla Demo
  28. 28. @redpilla Demo ...using a Jupyter Notebook

×