Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) Kafka Summit SF 2019

Cross the Streams
thanks to Kafka and Flink
Christophe Philemotte

Hello,
I am Christophe
CTO, digazu
toch
@_toch
ibakesoftware.com

Crossing
the Streams
Motivations

10+ years ago: DIY
2000 2010
Stream Processing Papers
Stratosphere
Kafka
Lambda architecture
Kappa architecture

10+ years ago: DIY
Resource Management?
Collection? Distribution?
Consistency?
Planning DAG?
Distributed Processing?
Only Once Semantic?
Stateful Continuous Streaming?

Ecosystem Integration?
Deployment?
Resource Scaling?
Stateful Operation?
Today

Kafka
Connect
JDBC Source
Our goal

Agenda
● Stack deployment
● Flink Job
● Outro
Agenda

● Sandbox:
toch/sf-kafka-summit-2019
● Lessons From Production
● Integration How-to of Kafka and Flink Ecosystems
● SQL Streaming How-to
AgendaTakeaways

Crossing
the Streams
Stack deployment

⚠Warning
➔ Stateful & Stateless Pods
➔ Service Dependencies
➔ Storage Type
➔ JVM Heap & Container Memory
AgendaKafka & Co.

⚠Warning
➔ Stateful & Stateless Pods → k8s operator
➔ Service Dependencies
➔ Storage Type
AgendaKafka & Co.

⚠Warning
➔ Service Dependencies → init container
➔ Storage Type
AgendaKafka & Co.

⚠Warning
➔ Storage Type → persistent volume storage class
AgendaKafka & Co.

⚠Warning
➔ Storage Type → persistent volume storage class
➔ JVM Heap & Container Memory → mem. limits & requests
AgendaKafka & Co.

⚠Warning
➔ JVM Heap & RocksDB Memory & Container Memory
➔ State Backend
➔ HA Setup
➔ Rootless Container with random UID
Flink

⚠Warning
➔ JVM Heap & RocksDB Memory & Container Memory → explicit allocation
➔ State Backend
➔ HA Setup
Flink

⚠Warning
➔ State Backend → e.g. HDFS
➔ HA Setup
Flink

⚠Warning
➔ HA Setup → e.g. HDFS
Flink

⚠Warning
➔ HA Setup → e.g. HDFS
➔ Rootless Container with random UID → Build your own Docker Image
Flink

⚠Warning
➔ Stateful Pod & replication
➔ Storage Type
PostgreSQL

⚠Warning
➔ Stateful Pod & replication → k8s operator
➔ Storage Type
PostgreSQL

⚠Warning
➔ Stateful Pod & replication → k8s operator
➔ Storage Type → fast without cache pv storage class
PostgreSQL

ghosts
id: INT name: TEXT
2 Slimer
movies
id: INT name: TEXT year: INT
1 Ghostbusters 1984
2 Ghostbusters II 1989
ghosts_in_movies
ghost_id: INT movie_id: INT id: INT
2 1 2
2 2 3
Seed the data into PostgreSQL (seed.sql)

Feed the StreamsFeed the Streams

⚠Warning:
➔ Rebalancing at each Conﬁg Update (ﬁxed in 2.3)
➔ Connector JARs Location
Kafka Connect Setup

⚠Warning:
➔ Rebalancing at each Conﬁg Update (ﬁxed in 2.3) → upgrade
➔ Connector JARs Location
Kafka Connect Setup

⚠Warning:
➔ Rebalancing at each Conﬁg Update (ﬁxed in 2.3) → upgrade
➔ Connector JARs Location → Deps JARs close to plugin JAR
Kafka Connect Setup

Main Lessons
● Memory Allocation
● Rootless container
● Stateful vs Stateless
● Throughput vs Latency
Main lessons

Crossing
the Streams
Flink Job

ghosts
id: INT name: TEXT
2 Slimer
movies
id: INT name: TEXT year: INT
1 Ghostbusters 1984
2 Ghostbusters II 1989
ghosts_in_movies
ghost_id: INT movie_id: INT id: INT
2 1 2
2 2 3
Query Result Example

ghost_appearances
name: TEXT movie: TEXT
Slimer Ghostbusters
Slimer Ghostbusters II
Query Result Example

Main LessonsUnless You Have No Choice

1. TableEnvironment
2. TableSource
3. sqlQuery
4. TableSink
5. Execute
Job

● Transparent State Management
● Transparent Fault-Tolerance
● Undocumented Rich Test Helpers
● Required Topic
Main LessonsMain lessons

● JVM & RocksDB Memory in Docker Container
● K8s Stateful Pod and Scaling
● Service Dependencies (init container)
Main LessonsOperations Pitfalls

● Deployment as Code
● Throughput vs Latency
Main LessonsOperations Pitfalls

● Rebalancing
● JARs Location
● Development Environment
docker run
--net=host lensesio/fast-data-dev
Main LessonsKafka Connect Pitfalls

Main LessonsFlink Job Pitfalls
● Required Kafka Topic

What have we achieved?
Kafka
Connect
JDBC Source
Rss Mgt
SQL Query & Fault Tolerance
State Mgt

toch
@_toch
ibakesoftware.com
Thanks!
toch/sf-kafka-summit-2019

● Global Window for INNER JOIN
● ANSI SQL
● Partitioning Freedom
Main LessonsWhy not KSQL?

● Temporal Table
● CEP
Main LessonsWhy not KSQL?

Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) Kafka Summit SF 2019

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) Kafka Summit SF 2019

Ähnlich wie Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) Kafka Summit SF 2019 (20)

Mehr von confluent

Mehr von confluent (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) Kafka Summit SF 2019