SlideShare ist ein Scribd-Unternehmen logo
1 von 86
Downloaden Sie, um offline zu lesen
Streaming Design Patterns Using
Alpakka Kafka Connector
Sean Glover, Lightbend
@seg1o
Who am I?
I’m Sean Glover
• Principal Engineer at Lightbend
• Member of the Lightbend Pipelines team
• Organizer of Scala Toronto (scalator)
• Author and contributor to various projects in the Kafka
ecosystem including Kafka, Alpakka Kafka (reactive-kafka),
Strimzi, Kafka Lag Exporter, DC/OS Commons SDK
2
/ seg1o
3
“ “The Alpakka project is an initiative to
implement a library of integration
modules to build stream-aware, reactive,
pipelines for Java and Scala.
4
Cloud Services Data Stores
JMS
Messaging
5
kafka connector
“ “This Alpakka Kafka connector lets you
connect Apache Kafka to Akka Streams.
Top Alpakka Modules
6
Alpakka Module Downloads in August 2018
Kafka 61177
Cassandra 15946
AWS S3 15075
MQTT 11403
File 10636
Simple Codecs 8285
CSV 7428
AWS SQS 5385
AMQP 4036
7
“
“
Akka Streams is a library toolkit to provide low latency
complex event processing streaming semantics using the
Reactive Streams specification implemented internally with
an Akka actor system.
streams
8
Source Flow Sink
User Messages (flow downstream)
Internal Back-pressure Messages (flow upstream)
Outlet
Inlet
streams
Reactive Streams Specification
9
“ “Reactive Streams is an initiative to
provide a standard for asynchronous
stream processing with non-blocking
back pressure.
http://www.reactive-streams.org/
Reactive Streams Libraries
10
streams
Spec now part of JDK 9
java.util.concurrent.Flow
migrating to
GraphStageAkka Actor Concepts
11
Mailbox
1. Constrained Actor Mailbox
2. One message at a time
“Single Threaded Illusion”
3. May contain state
Actor
State
// Message Handler “Receive Block”
def receive = {
case message: MessageType =>
}
Back Pressure Demo
12
Source Flow Sink
Source Kafka Topic
Destination Kafka Topic
I need some
messages.
Demand request is
sent upstream
I need to load
some messages
for downstream
...
Key: EN, Value: {“message”: “Hi Akka!” }
Key: FR, Value: {“message”: “Salut Akka!” }
Key: ES, Value: {“message”: “Hola Akka!” }
...
Demand satisfied
downstream
...
Key: EN, Value: {“message”: “Bye Akka!” }
Key: FR, Value: {“message”: “Au revoir Akka!” }
Key: ES, Value: {“message”: “Adiós Akka!” }
...
openclipart
Dynamic Push Pull
13
Source Flow
Bounded Mailbox
Flow sends demand request
(pull) of 5 messages max
x
I can handle 5 more
messages
Source sends (push) a batch
of 5 messages downstream
I can’t send more
messages
downstream
because I no more
demand to fulfill.
Flow’s mailbox is full!
Slow Consumer
Fast Producer
openclipart
Why Back Pressure?
• Prevent cascading failure
• Alternative to using a big buffer (i.e. Kafka)
• Back Pressure flow control can use several strategies
• Slow down until there’s demand (classic back pressure, “throttling”)
• Discard elements
• Buffer in memory to some max, then discard elements
• Shutdown
14
Why Back Pressure? A case study.
15
https://medium.com/@programmerohit/back-press
ure-implementation-aws-sqs-polling-from-a-shard
ed-akka-cluster-running-on-kubernetes-56ee8c67
efb
Akka Streams Factorial Example
import ...
object Main extends App {
implicit val system = ActorSystem("QuickStart")
implicit val materializer = ActorMaterializer()
val source: Source[Int, NotUsed] = Source(1 to 100)
val factorials = source.scan(BigInt(1))((acc, next) ⇒ acc * next)
val result: Future[IOResult] =
factorials
.map(num => ByteString(s"$numn"))
.runWith(FileIO.toPath(Paths.get("factorials.txt")))
}
16
https://doc.akka.io/docs/akka/2.5/stream/stream-quickstart.html
Apache Kafka
17
Kafka Documentation
“ “Apache Kafka is a distributed
streaming system. It’s best suited to
support fast, high volume, and fault
tolerant, data streaming platforms.
18
streams
streams!=
Akka Streams and Kafka Streams solve different problems
When to use Alpakka Kafka?
When to use Alpakka Kafka?
1. To build back pressure aware integrations
2. Complex Event Processing
3. A need to model the most complex of graphs
19
Anatomy of an Alpakka Kafka app
Alpakka Kafka Setup
val consumerClientConfig = system.settings. config.getConfig( "akka.kafka.consumer")
val consumerSettings =
ConsumerSettings(consumerClientConfig, new StringDeserializer, new ByteArrayDeserializer)
.withBootstrapServers( "localhost:9092")
.withGroupId( "group1")
.withProperty(ConsumerConfig. AUTO_OFFSET_RESET_CONFIG, "earliest")
val producerClientConfig = system.settings. config.getConfig( "akka.kafka.producer")
val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer)
.withBootstrapServers( "localhost:9092")
21
Alpakka Kafka config & Kafka Client
config can go here
Set ad-hoc Kafka client config
Anatomy of an Alpakka Kafka App
val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
22
A small Consume -> Transform -> Produce Akka Streams app using Alpakka Kafka
val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
23
Anatomy of an Alpakka Kafka App
The Committable Source propagates Kafka offset
information downstream with consumed messages
val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
24
Anatomy of an Alpakka Kafka App
ProducerMessage used to map consumed offset to
transformed results.
One to One (1:1)
ProducerMessage. single
One to Many (1:M)
ProducerMessage. multi(
immutable.Seq(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
new ProducerRecord(topic2, msg.record.key, msg.record.value)),
passthrough = msg.committableOffset
)
One to None (1:0)
ProducerMessage. passThrough(msg.committableOffset)
val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
25
Anatomy of an Alpakka Kafka App
Produce messages to destination topic
flexiFlow accepts new ProducerMessage
type and will replace deprecated flow in the
future.
val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat( Committer.sink(committerSettings))(Keep.both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
26
Anatomy of an Alpakka Kafka App
Batches consumed offset commits
Passthrough allows us to track what messages
have been successfully processed for At Least
Once message delivery guarantees.
val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
27
Anatomy of an Alpakka Kafka App
Gacefully shutdown stream
1. Stop consuming (polling) new messages
from Source
2. Wait for all messages to be successfully
committed (when applicable)
3. Wait for all produced messages to ACK
val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
28
Anatomy of an Alpakka Kafka App
Graceful shutdown when SIGTERM sent to
app (i.e. by docker daemon)
Force shutdown after grace interval
Consumer Group Rebalancing
Why use Consumer Groups?
1. Easy, robust, and
performant scaling of
consumers to reduce
consumer lag
30
Back
Pressure
Consumer Group
Latency and Offset Lag
31
Cluster
Topic
Producer 1
Producer 2
Producer n
...
Throughput: 10 MB/s
Consumer 1
Consumer 2
Consumer 3
Consumer Throughput ~3 MB/s
each
~9 MB/s
Total offset lag and latency is
growing.
openclipart
Consumer Group
Latency and Offset Lag
32
Cluster
Topic
Producer 1
Producer 2
Producer n
...
Data Throughput: 10 MB/s
Consumer 1
Consumer 2
Consumer 3
Consumer 4
Add new consumer and rebalance
Consumers now can support a
throughput of ~12 MB/s
Offset lag and latency decreases until
consumers are caught up
Committable Sink
val committerSettings = CommitterSettings(system)
val control: DrainingControl[Done] =
Consumer
.committableSource(consumerSettings, Subscriptions.topics(topic))
.mapAsync(1) { msg =>
business(msg.record.key, msg.record.value)
.map(_ => msg.committableOffset)
}
.toMat(Committer.sink(committerSettings))(Keep.both)
.mapMaterializedValue(DrainingControl.apply)
.run()
33
Anatomy of a Consumer Group
34
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer Group Offsets topic
Ex)
P0: 100489
P1: 128048
P2: 184082
P3: 596837
P4: 110847
P5: 99472
P6: 148270
P7: 3582785
P8: 182483
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group Topic Subscription
Important Consumer Group Client Config
Topic Subscription:
Subscription.topics(“Topic1”, “Topic2”, “Topic3”)
Kafka Consumer Properties:
group.id: [“my-group”]
session.timeout.ms: [30000 ms]
partition.assignment.strategy: [RangeAssignor]
heartbeat.interval.ms: [3000 ms]
Consumer Group
Leader
Consumer Group Rebalance (1/7)
35
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Consumer Group Rebalance (2/7)
36
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Client D requests to join the consumer groupNew Client D with same group.id sends a request to join
the group to Coordinator
Consumer Group Rebalance (3/7)
37
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Consumer group coordinator requests group leader to
calculate new Client:partition assignments.
Consumer Group Rebalance (4/7)
38
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Consumer group leader sends new Client:Partition
assignment to group coordinator.
Consumer Group Rebalance (5/7)
39
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Assign Partitions: 0,1
Assign Partitions: 2,3
Assign
Partitions:6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Consumer group coordinator informs all clients of their
new Client:Partition assignments.
Assign Partitions: 4,5
Consumer Group Rebalance (6/7)
40
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Clients that had partitions revoked are given the chance
to commit their latest processed offsets.
Partitions
to
C
om
m
it:2
Partitions to Commit: 3,5
Partitions to Commit: 6,7,8
Consumer Group Rebalance (7/7)
41
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Rebalance complete. Clients begin consuming partitions
from their last committed offsets.
Partitions: 0,1
Partitions: 2,3
Partitions: 4,5
Partitions: 6,7,8
Consumer Group Rebalancing (asynchronous)
42
class RebalanceListener extends Actor with ActorLogging {
def receive: Receive = {
case TopicPartitionsAssigned(sub, assigned) =>
case TopicPartitionsRevoked(sub, revoked) =>
processRevokedPartitions(revoked)
}
}
val subscription = Subscriptions.topics("topic1", "topic2")
.withRebalanceListener(system.actorOf(Props[RebalanceListener]))
val control = Consumer.committableSource(consumerSettings, subscription)
...
Declare an Akka Actor to handle
assigned and revoked partition messages
asynchronously.
Useful to perform asynchronous actions
during rebalance, but not for blocking
operations you want to happen during
rebalance.
Consumer Group Rebalancing (asynchronous)
43
class RebalanceListener extends Actor with ActorLogging {
def receive: Receive = {
case TopicPartitionsAssigned(sub, assigned) =>
case TopicPartitionsRevoked(sub, revoked) =>
processRevokedPartitions(revoked)
}
}
val subscription = Subscriptions.topics("topic1", "topic2")
.withRebalanceListener(system.actorOf(Props[RebalanceListener]))
val control = Consumer.committableSource(consumerSettings, subscription)
...
Add Actor Reference to Topic
subscription to use.
Consumer Group Rebalancing (synchronous)
Synchronous partition assignment handler for next release by Enno Runne
https://github.com/akka/alpakka-kafka/pull/761
● Synchronous operations difficult to model in async library
● Will allow users block Consumer Actor thread (Consumer.poll thread)
● Provides limited consumer operations
○ Seek to offset
○ Synchronous commit
44
Transactional “Exactly-Once”
Kafka Transactions
46
“ “
Transactions enable atomic writes to
multiple Kafka topics and partitions.
All of the messages included in the
transaction will be successfully written
or none of them will be.
Message Delivery Semantics
• At most once
• At least once
• “Exactly once”
47
openclipart
Exactly Once Delivery vs Exactly Once Processing
48
“ “Exactly-once message delivery is
impossible between two parties where
failures of communication are
possible.
Two Generals/Byzantine Generals problem
Why use Transactions?
1. Zero tolerance for duplicate messages
2. Less boilerplate (deduping, client offset
management)
49
Anatomy of Kafka Transactions
50
Client
Cluster
Consumer
Offset Log
Topic Sub
Consumer
Group
Coordinator
Transaction
Log
Transaction
Coordinator
Topic Dest
Transformation
CM UM UM CM UM UM
Control Messages
Important Client Config
Topic Subscription:
Subscription.topics(“Topic1”, “Topic2”, “Topic3”)
Destination topic partitions get included in the transaction based on messages that are
produced.
Kafka Consumer Properties:
group.id: “my-group”
isolation.level: “read_committed”
plus other relevant consumer group configuration
Kafka Producer Properties:
transactional.id: “my-transaction”
enable.idempotence: “true” (implicit)
max.in.flight.requests.per.connection: “1” (implicit)
“Consume, Transform, Produce”
Kafka Features That Enable Transactions
1. Idempotent producer
2. Multiple partition atomic writes
3. Consumer read isolation level
51
Idempotent Producer (1/5)
52
Client
Cluster Broker
KafkaProducer.send(k,v)
sequence num = 0
producer id = 123
Leader Partition
Log
Idempotent Producer (2/5)
53
Client
Cluster Broker
Leader Partition
Log
Append (k,v) to partition
sequence num = 0
producer id = 123
(k,v)
seq = 0
pid = 123
Idempotent Producer (3/5)
54
Client
Cluster Broker
Leader Partition
Log
(k,v)
seq = 0
pid = 123
KafkaProducer.send(k,v)
sequence num = 0
producer id = 123
Broker acknowledgement fails
x
Idempotent Producer (4/5)
55
Client
Cluster Broker
Leader Partition
Log
(k,v)
seq = 0
pid = 123
(Client Retry)
KafkaProducer.send(k,v)
sequence num = 0
producer id = 123
Idempotent Producer (5/5)
56
Client
Cluster Broker
Leader Partition
Log
(k,v)
seq = 0
pid = 123
KafkaProducer.send(k,v)
sequence num = 0
producer id = 123
Broker acknowledgement succeeds
ack(duplicate)
Multiple Partition Atomic Writes
57
Client
Consumer
Offset Log
Transactions
Log
User Defined
Partition 1
User Defined
Partition 2
User Defined
Partition 3
Cluster Transaction and Consumer Group Coordinators
CM
UM
UM
CM
UM
UM
CM
UM
UM
CM
CM
CM
CM
CM
CM
Ex) Second phase of two phase commit
KafkaProducer.commitTransaction()
Last Offset Processed for
Consumer Subscription
Transaction Committed
(internal) Transaction Committed control
messages (user topics)
Multiple Partitions Committed Atomically, “All or nothing”
Consumer Read Isolation Level
58
Client
User Defined
Partition 1
User Defined
Partition 2
User Defined
Partition 3Cluster
CM
UM
UM
CM
UM
UM
CM
UM
UM
Kafka Consumer Properties:
isolation.level: “read_committed”
Transactional Pipeline Latency
59
Client Client Client
Transaction Batches every 100ms
End-to-end Latency
~300ms
Alpakka Kafka Transactions
60
Transactional
Source
Transform
Transactional
Sink
Source Kafka Partition(s)
Destination Kafka Partitions
...
Key: EN, Value: {“message”: “Hi Akka!” }
Key: FR, Value: {“message”: “Salut Akka!” }
Key: ES, Value: {“message”: “Hola Akka!” }
...
...
Key: EN, Value: {“message”: “Bye Akka!” }
Key: FR, Value: {“message”: “Au revoir Akka!” }
Key: ES, Value: {“message”: “Adiós Akka!” }
...
akka.kafka.producer.eos-commit-interval = 100ms
Cluster
Cluster
Messages waiting for ack
before commit
openclipart
Transactional GraphStage (1/7)
61
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Resume Demand
Waiting for ACK
Commit Loop
Waiting
Transaction Status
Begin Transaction
Mailbox
Transactional GraphStage (2/7)
62
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Resume Demand
Waiting for ACK
Commit Loop
Commit Interval Elapses
Transaction Status
Transaction is Open
Mailbox
Messages flowing
Transactional GraphStage (3/7)
63
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Resume Demand
Waiting for ACK
Transaction Status
Transaction is Open
Commit Loop
Commit Interval Elapses
Messages flowing
Mailbox
Commit loop “tick”
message
100ms
Transactional GraphStage (4/7)
64
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Suspend Demand
Waiting for ACK
Transaction Status
Transaction is Open
Commit Loop
Commit Interval Elapses
x
Mailbox
Messages stopped
Transactional GraphStage (5/7)
65
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Suspend Demand
Waiting for ACK
Transaction Status
Send Consumed Offsets
Commit Loop
Commit Interval Elapses
x
Mailbox
Messages stopped
Transactional GraphStage (6/7)
66
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Suspend Demand
Waiting for ACK
Transaction Status
Commit Transaction
Commit Loop
Commit Interval Elapses
x
Mailbox
Messages stopped
Transactional GraphStage (7/7)
67
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Resume Demand
Waiting for ACK
Commit Loop
Waiting
Transaction Status
Begin New Transaction
Mailbox
Messages flowing
again
Alpakka Kafka Transactions
68
val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer)
.withBootstrapServers("localhost:9092")
.withEosCommitInterval(100.millis)
val control =
Transactional
.source(consumerSettings, Subscriptions.topics("source-topic"))
.via(transform)
.map { msg =>
ProducerMessage.Single(new ProducerRecord[String, Array[Byte]]("sink-topic", msg.record.value),
msg.partitionOffset)
}
.to(Transactional.sink(producerSettings, "transactional-id"))
.run()
Optionally provide a Transaction
commit interval (default is 100ms)
Alpakka Kafka Transactions
69
val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer)
.withBootstrapServers("localhost:9092")
.withEosCommitInterval(100.millis)
val control =
Transactional
.source(consumerSettings, Subscriptions.topics("source-topic"))
.via(transform)
.map { msg =>
ProducerMessage.Single(new ProducerRecord[String, Array[Byte]]("sink-topic", msg.record.value),
msg.partitionOffset)
}
.to(Transactional.sink(producerSettings, "transactional-id"))
.run()
Use Transactional.source to
propagate necessary info to
Transactional.sink (CG ID,
Offsets)
Alpakka Kafka Transactions
70
val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer)
.withBootstrapServers("localhost:9092")
.withEosCommitInterval(100.millis)
val control =
Transactional
.source(consumerSettings, Subscriptions.topics("source-topic"))
.via(transform)
.map { msg =>
ProducerMessage.Single(new ProducerRecord[String, Array[Byte]]("sink-topic", msg.record.value),
msg.partitionOffset)
}
.to(Transactional.sink(producerSettings, "transactional-id"))
.run()
Call Transactional.sink or .flow
to produce and commit messages.
Complex Event Processing
What is Complex Event Processing (CEP)?
72
“ “
Complex event processing, or CEP, is
event processing that combines data
from multiple sources to infer events
or patterns that suggest more
complicated circumstances.
Foundations of Complex Event Processing, Cornell
Calling into an Akka Actor System
73
Source
Ask
? SinkCluster Cluster
“Ask pattern” models non-blocking request and
response of Akka messages.
openclipart
Actor System
& JVM
Actor System
& JVM
Actor System
& JVM
Cluster
Router
Akka Cluster/Actor System
Actor
Actor System Integration
class ProblemSolverRouter extends Actor {
def receive = {
case problem: Problem =>
val solution = businessLogic(problem)
sender() ! solution // reply to the ask
}
}
...
val control = Consumer
.committableSource(consumerSettings, Subscriptions.topics("topic1", "topic2"))
...
.mapAsync(parallelism = 5)(problem => (problemSolverRouter ? problem).mapTo[Solution])
...
.run()
74
Transform your stream by processing messages in an
Actor System. All you need is an ActorRef.
Actor System Integration
class ProblemSolverRouter extends Actor {
def receive = {
case problem: Problem =>
val solution = businessLogic(problem)
sender() ! solution // reply to the ask
}
}
...
val control = Consumer
.committableSource(consumerSettings, Subscriptions.topics("topic1", "topic2"))
...
.mapAsync(parallelism = 5)(problem => (problemSolverRouter ? problem).mapTo[Solution])
...
.run()
75
Use Ask pattern (? function) to call provided
ActorRef to get an async response
Actor System Integration
class ProblemSolverRouter extends Actor {
def receive = {
case problem: Problem =>
val solution = businessLogic(problem)
sender() ! solution // reply to the ask
}
}
...
val control = Consumer
.committableSource(consumerSettings, Subscriptions.topics("topic1", "topic2"))
...
.mapAsync(parallelism = 5)(problem => (problemSolverRouter ? problem).mapTo[Solution])
...
.run()
76
Parallelism used to limit how many messages in
flight so we don’t overwhelm mailbox of destination
Actor and maintain stream back-pressure.
Persistent Stateful Stages
Options for implementing Stateful Streams
1. Provided Akka Streams stages: fold, scan,
etc.
2. Custom GraphStage
3. Call into an Akka Actor System
78
Persistent Stateful Stages using Event Sourcing
79
1. Recover state after failure
2. Create an event log
3. Share state
Sound familiar? KTable’s!
Persistent GraphStage using Event Sourcing
80
Source
Stateful
Stage
SinkCluster Cluster
Event Log
Response (Event)
Triggers State Change
akka.persistence.Journal
State
Akka Persistence Plugins
Request
Handler
Event
Handler
Request (Command/Query)
Writes Reads
(Replays)
openclipart
81
krasserm / akka-stream-eventsourcing
“ “This project brings to Akka Streams what
Akka Persistence brings to Akka Actors:
persistence via event sourcing.
Experimental
Public Domain Vectors
New in Alpakka Kafka 1.0
Alpakka Kafka 1.0 Release Notes
Released Feb 28, 2019. Highlights:
● Upgraded the Kafka client to version 2.0.0 #544 by @fr3akX
○ Support new API’s from KIP-299: Fix Consumer indefinite blocking behaviour in #614 by
@zaharidichev
● New Committer.sink for standardised committing #622 by @rtimush
● Commit with metadata #563 and #579 by @johnclara
● Factored out akka.kafka.testkit for internal and external use: see Testing
● Support for merging commit batches #584 by @rtimush
● Reduced risk of message loss for partitioned sources #589
● Expose Kafka errors to stream #617
● Java APIs for all settings classes #616
● Much more comprehensive tests
83
Conclusion
85
kafka connector
openclipart
Thank You!
Sean Glover
@seg1o
in/seanaglover
sean.glover@lightbend.com
Free eBook!
https://bit.ly/2J9xmZm

Weitere ähnliche Inhalte

Was ist angesagt?

Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemAvleen Vig
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Capabilities for Resources and Effects
Capabilities for Resources and EffectsCapabilities for Resources and Effects
Capabilities for Resources and EffectsMartin Odersky
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka confluent
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...HostedbyConfluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
Saga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaSaga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaRoan Brasil Monteiro
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your streamEnno Runne
 
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...confluent
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafkaconfluent
 
Spring @Transactional Explained
Spring @Transactional ExplainedSpring @Transactional Explained
Spring @Transactional ExplainedVictor Rentea
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionDatabricks
 
Real Life Clean Architecture
Real Life Clean ArchitectureReal Life Clean Architecture
Real Life Clean ArchitectureMattia Battiston
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframeJaemun Jung
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQGeorge Teo
 

Was ist angesagt? (20)

Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Capabilities for Resources and Effects
Capabilities for Resources and EffectsCapabilities for Resources and Effects
Capabilities for Resources and Effects
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
Saga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaSaga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafka
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your stream
 
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Spring @Transactional Explained
Spring @Transactional ExplainedSpring @Transactional Explained
Spring @Transactional Explained
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
Real Life Clean Architecture
Real Life Clean ArchitectureReal Life Clean Architecture
Real Life Clean Architecture
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
 

Ähnlich wie Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbend) Kafka Summit NYC 2019

Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and KafkaIraj Hedayati
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsDean Wampler
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming InfoDoug Chang
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps_Fest
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Microsoft Tech Community
 
Designing a Scalable Data Platform
Designing a Scalable Data PlatformDesigning a Scalable Data Platform
Designing a Scalable Data PlatformAlex Silva
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringJoe Kutner
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streamsconfluent
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz
 
Reactive Streams - László van den Hoek
Reactive Streams - László van den HoekReactive Streams - László van den Hoek
Reactive Streams - László van den HoekRubiX BV
 
Service messaging using Kafka
Service messaging using KafkaService messaging using Kafka
Service messaging using KafkaRobert Vadai
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsKonrad Malawski
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Lightbend
 
Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2Knoldus Inc.
 

Ähnlich wie Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbend) Kafka Summit NYC 2019 (20)

Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and Kafka
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
Designing a Scalable Data Platform
Designing a Scalable Data PlatformDesigning a Scalable Data Platform
Designing a Scalable Data Platform
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
 
Training
TrainingTraining
Training
 
Reactive Streams - László van den Hoek
Reactive Streams - László van den HoekReactive Streams - László van den Hoek
Reactive Streams - László van den Hoek
 
Service messaging using Kafka
Service messaging using KafkaService messaging using Kafka
Service messaging using Kafka
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka Streams
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2
 

Mehr von confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbend) Kafka Summit NYC 2019

  • 1. Streaming Design Patterns Using Alpakka Kafka Connector Sean Glover, Lightbend @seg1o
  • 2. Who am I? I’m Sean Glover • Principal Engineer at Lightbend • Member of the Lightbend Pipelines team • Organizer of Scala Toronto (scalator) • Author and contributor to various projects in the Kafka ecosystem including Kafka, Alpakka Kafka (reactive-kafka), Strimzi, Kafka Lag Exporter, DC/OS Commons SDK 2 / seg1o
  • 3. 3 “ “The Alpakka project is an initiative to implement a library of integration modules to build stream-aware, reactive, pipelines for Java and Scala.
  • 4. 4 Cloud Services Data Stores JMS Messaging
  • 5. 5 kafka connector “ “This Alpakka Kafka connector lets you connect Apache Kafka to Akka Streams.
  • 6. Top Alpakka Modules 6 Alpakka Module Downloads in August 2018 Kafka 61177 Cassandra 15946 AWS S3 15075 MQTT 11403 File 10636 Simple Codecs 8285 CSV 7428 AWS SQS 5385 AMQP 4036
  • 7. 7 “ “ Akka Streams is a library toolkit to provide low latency complex event processing streaming semantics using the Reactive Streams specification implemented internally with an Akka actor system. streams
  • 8. 8 Source Flow Sink User Messages (flow downstream) Internal Back-pressure Messages (flow upstream) Outlet Inlet streams
  • 9. Reactive Streams Specification 9 “ “Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure. http://www.reactive-streams.org/
  • 10. Reactive Streams Libraries 10 streams Spec now part of JDK 9 java.util.concurrent.Flow migrating to
  • 11. GraphStageAkka Actor Concepts 11 Mailbox 1. Constrained Actor Mailbox 2. One message at a time “Single Threaded Illusion” 3. May contain state Actor State // Message Handler “Receive Block” def receive = { case message: MessageType => }
  • 12. Back Pressure Demo 12 Source Flow Sink Source Kafka Topic Destination Kafka Topic I need some messages. Demand request is sent upstream I need to load some messages for downstream ... Key: EN, Value: {“message”: “Hi Akka!” } Key: FR, Value: {“message”: “Salut Akka!” } Key: ES, Value: {“message”: “Hola Akka!” } ... Demand satisfied downstream ... Key: EN, Value: {“message”: “Bye Akka!” } Key: FR, Value: {“message”: “Au revoir Akka!” } Key: ES, Value: {“message”: “Adiós Akka!” } ... openclipart
  • 13. Dynamic Push Pull 13 Source Flow Bounded Mailbox Flow sends demand request (pull) of 5 messages max x I can handle 5 more messages Source sends (push) a batch of 5 messages downstream I can’t send more messages downstream because I no more demand to fulfill. Flow’s mailbox is full! Slow Consumer Fast Producer openclipart
  • 14. Why Back Pressure? • Prevent cascading failure • Alternative to using a big buffer (i.e. Kafka) • Back Pressure flow control can use several strategies • Slow down until there’s demand (classic back pressure, “throttling”) • Discard elements • Buffer in memory to some max, then discard elements • Shutdown 14
  • 15. Why Back Pressure? A case study. 15 https://medium.com/@programmerohit/back-press ure-implementation-aws-sqs-polling-from-a-shard ed-akka-cluster-running-on-kubernetes-56ee8c67 efb
  • 16. Akka Streams Factorial Example import ... object Main extends App { implicit val system = ActorSystem("QuickStart") implicit val materializer = ActorMaterializer() val source: Source[Int, NotUsed] = Source(1 to 100) val factorials = source.scan(BigInt(1))((acc, next) ⇒ acc * next) val result: Future[IOResult] = factorials .map(num => ByteString(s"$numn")) .runWith(FileIO.toPath(Paths.get("factorials.txt"))) } 16 https://doc.akka.io/docs/akka/2.5/stream/stream-quickstart.html
  • 17. Apache Kafka 17 Kafka Documentation “ “Apache Kafka is a distributed streaming system. It’s best suited to support fast, high volume, and fault tolerant, data streaming platforms.
  • 18. 18 streams streams!= Akka Streams and Kafka Streams solve different problems When to use Alpakka Kafka?
  • 19. When to use Alpakka Kafka? 1. To build back pressure aware integrations 2. Complex Event Processing 3. A need to model the most complex of graphs 19
  • 20. Anatomy of an Alpakka Kafka app
  • 21. Alpakka Kafka Setup val consumerClientConfig = system.settings. config.getConfig( "akka.kafka.consumer") val consumerSettings = ConsumerSettings(consumerClientConfig, new StringDeserializer, new ByteArrayDeserializer) .withBootstrapServers( "localhost:9092") .withGroupId( "group1") .withProperty(ConsumerConfig. AUTO_OFFSET_RESET_CONFIG, "earliest") val producerClientConfig = system.settings. config.getConfig( "akka.kafka.producer") val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer) .withBootstrapServers( "localhost:9092") 21 Alpakka Kafka config & Kafka Client config can go here Set ad-hoc Kafka client config
  • 22. Anatomy of an Alpakka Kafka App val control = Consumer .committableSource(consumerSettings, Subscriptions. topics(topic1)) .map { msg => ProducerMessage. single( new ProducerRecord(topic1, msg.record.key, msg.record.value), passThrough = msg.committableOffset) } .via(Producer. flexiFlow(producerSettings)) .map(_.passThrough) .toMat(Committer. sink(committerSettings))(Keep. both) .mapMaterializedValue(DrainingControl. apply) .run() // Add shutdown hook to respond to SIGTERM and gracefully shutdown stream sys.ShutdownHookThread { Await.result(control.shutdown(), 10.seconds) } 22 A small Consume -> Transform -> Produce Akka Streams app using Alpakka Kafka
  • 23. val control = Consumer .committableSource(consumerSettings, Subscriptions. topics(topic1)) .map { msg => ProducerMessage. single( new ProducerRecord(topic1, msg.record.key, msg.record.value), passThrough = msg.committableOffset) } .via(Producer. flexiFlow(producerSettings)) .map(_.passThrough) .toMat(Committer. sink(committerSettings))(Keep. both) .mapMaterializedValue(DrainingControl. apply) .run() // Add shutdown hook to respond to SIGTERM and gracefully shutdown stream sys.ShutdownHookThread { Await.result(control.shutdown(), 10.seconds) } 23 Anatomy of an Alpakka Kafka App The Committable Source propagates Kafka offset information downstream with consumed messages
  • 24. val control = Consumer .committableSource(consumerSettings, Subscriptions. topics(topic1)) .map { msg => ProducerMessage. single( new ProducerRecord(topic1, msg.record.key, msg.record.value), passThrough = msg.committableOffset) } .via(Producer. flexiFlow(producerSettings)) .map(_.passThrough) .toMat(Committer. sink(committerSettings))(Keep. both) .mapMaterializedValue(DrainingControl. apply) .run() // Add shutdown hook to respond to SIGTERM and gracefully shutdown stream sys.ShutdownHookThread { Await.result(control.shutdown(), 10.seconds) } 24 Anatomy of an Alpakka Kafka App ProducerMessage used to map consumed offset to transformed results. One to One (1:1) ProducerMessage. single One to Many (1:M) ProducerMessage. multi( immutable.Seq( new ProducerRecord(topic1, msg.record.key, msg.record.value), new ProducerRecord(topic2, msg.record.key, msg.record.value)), passthrough = msg.committableOffset ) One to None (1:0) ProducerMessage. passThrough(msg.committableOffset)
  • 25. val control = Consumer .committableSource(consumerSettings, Subscriptions. topics(topic1)) .map { msg => ProducerMessage. single( new ProducerRecord(topic1, msg.record.key, msg.record.value), passThrough = msg.committableOffset) } .via(Producer. flexiFlow(producerSettings)) .map(_.passThrough) .toMat(Committer. sink(committerSettings))(Keep. both) .mapMaterializedValue(DrainingControl. apply) .run() // Add shutdown hook to respond to SIGTERM and gracefully shutdown stream sys.ShutdownHookThread { Await.result(control.shutdown(), 10.seconds) } 25 Anatomy of an Alpakka Kafka App Produce messages to destination topic flexiFlow accepts new ProducerMessage type and will replace deprecated flow in the future.
  • 26. val control = Consumer .committableSource(consumerSettings, Subscriptions. topics(topic1)) .map { msg => ProducerMessage. single( new ProducerRecord(topic1, msg.record.key, msg.record.value), passThrough = msg.committableOffset) } .via(Producer. flexiFlow(producerSettings)) .map(_.passThrough) .toMat( Committer.sink(committerSettings))(Keep.both) .mapMaterializedValue(DrainingControl. apply) .run() // Add shutdown hook to respond to SIGTERM and gracefully shutdown stream sys.ShutdownHookThread { Await.result(control.shutdown(), 10.seconds) } 26 Anatomy of an Alpakka Kafka App Batches consumed offset commits Passthrough allows us to track what messages have been successfully processed for At Least Once message delivery guarantees.
  • 27. val control = Consumer .committableSource(consumerSettings, Subscriptions. topics(topic1)) .map { msg => ProducerMessage. single( new ProducerRecord(topic1, msg.record.key, msg.record.value), passThrough = msg.committableOffset) } .via(Producer. flexiFlow(producerSettings)) .map(_.passThrough) .toMat(Committer. sink(committerSettings))(Keep. both) .mapMaterializedValue(DrainingControl. apply) .run() // Add shutdown hook to respond to SIGTERM and gracefully shutdown stream sys.ShutdownHookThread { Await.result(control.shutdown(), 10.seconds) } 27 Anatomy of an Alpakka Kafka App Gacefully shutdown stream 1. Stop consuming (polling) new messages from Source 2. Wait for all messages to be successfully committed (when applicable) 3. Wait for all produced messages to ACK
  • 28. val control = Consumer .committableSource(consumerSettings, Subscriptions. topics(topic1)) .map { msg => ProducerMessage. single( new ProducerRecord(topic1, msg.record.key, msg.record.value), passThrough = msg.committableOffset) } .via(Producer. flexiFlow(producerSettings)) .map(_.passThrough) .toMat(Committer. sink(committerSettings))(Keep. both) .mapMaterializedValue(DrainingControl. apply) .run() // Add shutdown hook to respond to SIGTERM and gracefully shutdown stream sys.ShutdownHookThread { Await.result(control.shutdown(), 10.seconds) } 28 Anatomy of an Alpakka Kafka App Graceful shutdown when SIGTERM sent to app (i.e. by docker daemon) Force shutdown after grace interval
  • 30. Why use Consumer Groups? 1. Easy, robust, and performant scaling of consumers to reduce consumer lag 30
  • 31. Back Pressure Consumer Group Latency and Offset Lag 31 Cluster Topic Producer 1 Producer 2 Producer n ... Throughput: 10 MB/s Consumer 1 Consumer 2 Consumer 3 Consumer Throughput ~3 MB/s each ~9 MB/s Total offset lag and latency is growing. openclipart
  • 32. Consumer Group Latency and Offset Lag 32 Cluster Topic Producer 1 Producer 2 Producer n ... Data Throughput: 10 MB/s Consumer 1 Consumer 2 Consumer 3 Consumer 4 Add new consumer and rebalance Consumers now can support a throughput of ~12 MB/s Offset lag and latency decreases until consumers are caught up
  • 33. Committable Sink val committerSettings = CommitterSettings(system) val control: DrainingControl[Done] = Consumer .committableSource(consumerSettings, Subscriptions.topics(topic)) .mapAsync(1) { msg => business(msg.record.key, msg.record.value) .map(_ => msg.committableOffset) } .toMat(Committer.sink(committerSettings))(Keep.both) .mapMaterializedValue(DrainingControl.apply) .run() 33
  • 34. Anatomy of a Consumer Group 34 Client A Client B Client C Cluster Consumer Group Partitions: 0,1,2 Partitions: 3,4,5 Partitions: 6,7,8 Consumer Group Offsets topic Ex) P0: 100489 P1: 128048 P2: 184082 P3: 596837 P4: 110847 P5: 99472 P6: 148270 P7: 3582785 P8: 182483 Consumer Offset Log T3 T1 T2 Consumer Group Coordinator Consumer Group Topic Subscription Important Consumer Group Client Config Topic Subscription: Subscription.topics(“Topic1”, “Topic2”, “Topic3”) Kafka Consumer Properties: group.id: [“my-group”] session.timeout.ms: [30000 ms] partition.assignment.strategy: [RangeAssignor] heartbeat.interval.ms: [3000 ms] Consumer Group Leader
  • 35. Consumer Group Rebalance (1/7) 35 Client A Client B Client C Cluster Consumer Group Partitions: 0,1,2 Partitions: 3,4,5 Partitions: 6,7,8 Consumer Offset Log T3 T1 T2 Consumer Group Coordinator Consumer Group Leader
  • 36. Consumer Group Rebalance (2/7) 36 Client D Client A Client B Client C Cluster Consumer Group Partitions: 0,1,2 Partitions: 3,4,5 Partitions: 6,7,8 Consumer Offset Log T3 T1 T2 Consumer Group Coordinator Consumer Group Leader Client D requests to join the consumer groupNew Client D with same group.id sends a request to join the group to Coordinator
  • 37. Consumer Group Rebalance (3/7) 37 Client D Client A Client B Client C Cluster Consumer Group Partitions: 0,1,2 Partitions: 3,4,5 Partitions: 6,7,8 Consumer Offset Log T3 T1 T2 Consumer Group Coordinator Consumer Group Leader Consumer group coordinator requests group leader to calculate new Client:partition assignments.
  • 38. Consumer Group Rebalance (4/7) 38 Client D Client A Client B Client C Cluster Consumer Group Partitions: 0,1,2 Partitions: 3,4,5 Partitions: 6,7,8 Consumer Offset Log T3 T1 T2 Consumer Group Coordinator Consumer Group Leader Consumer group leader sends new Client:Partition assignment to group coordinator.
  • 39. Consumer Group Rebalance (5/7) 39 Client D Client A Client B Client C Cluster Consumer Group Assign Partitions: 0,1 Assign Partitions: 2,3 Assign Partitions:6,7,8 Consumer Offset Log T3 T1 T2 Consumer Group Coordinator Consumer Group Leader Consumer group coordinator informs all clients of their new Client:Partition assignments. Assign Partitions: 4,5
  • 40. Consumer Group Rebalance (6/7) 40 Client D Client A Client B Client C Cluster Consumer Group Consumer Offset Log T3 T1 T2 Consumer Group Coordinator Consumer Group Leader Clients that had partitions revoked are given the chance to commit their latest processed offsets. Partitions to C om m it:2 Partitions to Commit: 3,5 Partitions to Commit: 6,7,8
  • 41. Consumer Group Rebalance (7/7) 41 Client D Client A Client B Client C Cluster Consumer Group Consumer Offset Log T3 T1 T2 Consumer Group Coordinator Consumer Group Leader Rebalance complete. Clients begin consuming partitions from their last committed offsets. Partitions: 0,1 Partitions: 2,3 Partitions: 4,5 Partitions: 6,7,8
  • 42. Consumer Group Rebalancing (asynchronous) 42 class RebalanceListener extends Actor with ActorLogging { def receive: Receive = { case TopicPartitionsAssigned(sub, assigned) => case TopicPartitionsRevoked(sub, revoked) => processRevokedPartitions(revoked) } } val subscription = Subscriptions.topics("topic1", "topic2") .withRebalanceListener(system.actorOf(Props[RebalanceListener])) val control = Consumer.committableSource(consumerSettings, subscription) ... Declare an Akka Actor to handle assigned and revoked partition messages asynchronously. Useful to perform asynchronous actions during rebalance, but not for blocking operations you want to happen during rebalance.
  • 43. Consumer Group Rebalancing (asynchronous) 43 class RebalanceListener extends Actor with ActorLogging { def receive: Receive = { case TopicPartitionsAssigned(sub, assigned) => case TopicPartitionsRevoked(sub, revoked) => processRevokedPartitions(revoked) } } val subscription = Subscriptions.topics("topic1", "topic2") .withRebalanceListener(system.actorOf(Props[RebalanceListener])) val control = Consumer.committableSource(consumerSettings, subscription) ... Add Actor Reference to Topic subscription to use.
  • 44. Consumer Group Rebalancing (synchronous) Synchronous partition assignment handler for next release by Enno Runne https://github.com/akka/alpakka-kafka/pull/761 ● Synchronous operations difficult to model in async library ● Will allow users block Consumer Actor thread (Consumer.poll thread) ● Provides limited consumer operations ○ Seek to offset ○ Synchronous commit 44
  • 46. Kafka Transactions 46 “ “ Transactions enable atomic writes to multiple Kafka topics and partitions. All of the messages included in the transaction will be successfully written or none of them will be.
  • 47. Message Delivery Semantics • At most once • At least once • “Exactly once” 47 openclipart
  • 48. Exactly Once Delivery vs Exactly Once Processing 48 “ “Exactly-once message delivery is impossible between two parties where failures of communication are possible. Two Generals/Byzantine Generals problem
  • 49. Why use Transactions? 1. Zero tolerance for duplicate messages 2. Less boilerplate (deduping, client offset management) 49
  • 50. Anatomy of Kafka Transactions 50 Client Cluster Consumer Offset Log Topic Sub Consumer Group Coordinator Transaction Log Transaction Coordinator Topic Dest Transformation CM UM UM CM UM UM Control Messages Important Client Config Topic Subscription: Subscription.topics(“Topic1”, “Topic2”, “Topic3”) Destination topic partitions get included in the transaction based on messages that are produced. Kafka Consumer Properties: group.id: “my-group” isolation.level: “read_committed” plus other relevant consumer group configuration Kafka Producer Properties: transactional.id: “my-transaction” enable.idempotence: “true” (implicit) max.in.flight.requests.per.connection: “1” (implicit) “Consume, Transform, Produce”
  • 51. Kafka Features That Enable Transactions 1. Idempotent producer 2. Multiple partition atomic writes 3. Consumer read isolation level 51
  • 52. Idempotent Producer (1/5) 52 Client Cluster Broker KafkaProducer.send(k,v) sequence num = 0 producer id = 123 Leader Partition Log
  • 53. Idempotent Producer (2/5) 53 Client Cluster Broker Leader Partition Log Append (k,v) to partition sequence num = 0 producer id = 123 (k,v) seq = 0 pid = 123
  • 54. Idempotent Producer (3/5) 54 Client Cluster Broker Leader Partition Log (k,v) seq = 0 pid = 123 KafkaProducer.send(k,v) sequence num = 0 producer id = 123 Broker acknowledgement fails x
  • 55. Idempotent Producer (4/5) 55 Client Cluster Broker Leader Partition Log (k,v) seq = 0 pid = 123 (Client Retry) KafkaProducer.send(k,v) sequence num = 0 producer id = 123
  • 56. Idempotent Producer (5/5) 56 Client Cluster Broker Leader Partition Log (k,v) seq = 0 pid = 123 KafkaProducer.send(k,v) sequence num = 0 producer id = 123 Broker acknowledgement succeeds ack(duplicate)
  • 57. Multiple Partition Atomic Writes 57 Client Consumer Offset Log Transactions Log User Defined Partition 1 User Defined Partition 2 User Defined Partition 3 Cluster Transaction and Consumer Group Coordinators CM UM UM CM UM UM CM UM UM CM CM CM CM CM CM Ex) Second phase of two phase commit KafkaProducer.commitTransaction() Last Offset Processed for Consumer Subscription Transaction Committed (internal) Transaction Committed control messages (user topics) Multiple Partitions Committed Atomically, “All or nothing”
  • 58. Consumer Read Isolation Level 58 Client User Defined Partition 1 User Defined Partition 2 User Defined Partition 3Cluster CM UM UM CM UM UM CM UM UM Kafka Consumer Properties: isolation.level: “read_committed”
  • 59. Transactional Pipeline Latency 59 Client Client Client Transaction Batches every 100ms End-to-end Latency ~300ms
  • 60. Alpakka Kafka Transactions 60 Transactional Source Transform Transactional Sink Source Kafka Partition(s) Destination Kafka Partitions ... Key: EN, Value: {“message”: “Hi Akka!” } Key: FR, Value: {“message”: “Salut Akka!” } Key: ES, Value: {“message”: “Hola Akka!” } ... ... Key: EN, Value: {“message”: “Bye Akka!” } Key: FR, Value: {“message”: “Au revoir Akka!” } Key: ES, Value: {“message”: “Adiós Akka!” } ... akka.kafka.producer.eos-commit-interval = 100ms Cluster Cluster Messages waiting for ack before commit openclipart
  • 61. Transactional GraphStage (1/7) 61 Transactional GraphStage Transaction Flow Back Pressure Status Resume Demand Waiting for ACK Commit Loop Waiting Transaction Status Begin Transaction Mailbox
  • 62. Transactional GraphStage (2/7) 62 Transactional GraphStage Transaction Flow Back Pressure Status Resume Demand Waiting for ACK Commit Loop Commit Interval Elapses Transaction Status Transaction is Open Mailbox Messages flowing
  • 63. Transactional GraphStage (3/7) 63 Transactional GraphStage Transaction Flow Back Pressure Status Resume Demand Waiting for ACK Transaction Status Transaction is Open Commit Loop Commit Interval Elapses Messages flowing Mailbox Commit loop “tick” message 100ms
  • 64. Transactional GraphStage (4/7) 64 Transactional GraphStage Transaction Flow Back Pressure Status Suspend Demand Waiting for ACK Transaction Status Transaction is Open Commit Loop Commit Interval Elapses x Mailbox Messages stopped
  • 65. Transactional GraphStage (5/7) 65 Transactional GraphStage Transaction Flow Back Pressure Status Suspend Demand Waiting for ACK Transaction Status Send Consumed Offsets Commit Loop Commit Interval Elapses x Mailbox Messages stopped
  • 66. Transactional GraphStage (6/7) 66 Transactional GraphStage Transaction Flow Back Pressure Status Suspend Demand Waiting for ACK Transaction Status Commit Transaction Commit Loop Commit Interval Elapses x Mailbox Messages stopped
  • 67. Transactional GraphStage (7/7) 67 Transactional GraphStage Transaction Flow Back Pressure Status Resume Demand Waiting for ACK Commit Loop Waiting Transaction Status Begin New Transaction Mailbox Messages flowing again
  • 68. Alpakka Kafka Transactions 68 val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer) .withBootstrapServers("localhost:9092") .withEosCommitInterval(100.millis) val control = Transactional .source(consumerSettings, Subscriptions.topics("source-topic")) .via(transform) .map { msg => ProducerMessage.Single(new ProducerRecord[String, Array[Byte]]("sink-topic", msg.record.value), msg.partitionOffset) } .to(Transactional.sink(producerSettings, "transactional-id")) .run() Optionally provide a Transaction commit interval (default is 100ms)
  • 69. Alpakka Kafka Transactions 69 val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer) .withBootstrapServers("localhost:9092") .withEosCommitInterval(100.millis) val control = Transactional .source(consumerSettings, Subscriptions.topics("source-topic")) .via(transform) .map { msg => ProducerMessage.Single(new ProducerRecord[String, Array[Byte]]("sink-topic", msg.record.value), msg.partitionOffset) } .to(Transactional.sink(producerSettings, "transactional-id")) .run() Use Transactional.source to propagate necessary info to Transactional.sink (CG ID, Offsets)
  • 70. Alpakka Kafka Transactions 70 val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer) .withBootstrapServers("localhost:9092") .withEosCommitInterval(100.millis) val control = Transactional .source(consumerSettings, Subscriptions.topics("source-topic")) .via(transform) .map { msg => ProducerMessage.Single(new ProducerRecord[String, Array[Byte]]("sink-topic", msg.record.value), msg.partitionOffset) } .to(Transactional.sink(producerSettings, "transactional-id")) .run() Call Transactional.sink or .flow to produce and commit messages.
  • 72. What is Complex Event Processing (CEP)? 72 “ “ Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. Foundations of Complex Event Processing, Cornell
  • 73. Calling into an Akka Actor System 73 Source Ask ? SinkCluster Cluster “Ask pattern” models non-blocking request and response of Akka messages. openclipart Actor System & JVM Actor System & JVM Actor System & JVM Cluster Router Akka Cluster/Actor System Actor
  • 74. Actor System Integration class ProblemSolverRouter extends Actor { def receive = { case problem: Problem => val solution = businessLogic(problem) sender() ! solution // reply to the ask } } ... val control = Consumer .committableSource(consumerSettings, Subscriptions.topics("topic1", "topic2")) ... .mapAsync(parallelism = 5)(problem => (problemSolverRouter ? problem).mapTo[Solution]) ... .run() 74 Transform your stream by processing messages in an Actor System. All you need is an ActorRef.
  • 75. Actor System Integration class ProblemSolverRouter extends Actor { def receive = { case problem: Problem => val solution = businessLogic(problem) sender() ! solution // reply to the ask } } ... val control = Consumer .committableSource(consumerSettings, Subscriptions.topics("topic1", "topic2")) ... .mapAsync(parallelism = 5)(problem => (problemSolverRouter ? problem).mapTo[Solution]) ... .run() 75 Use Ask pattern (? function) to call provided ActorRef to get an async response
  • 76. Actor System Integration class ProblemSolverRouter extends Actor { def receive = { case problem: Problem => val solution = businessLogic(problem) sender() ! solution // reply to the ask } } ... val control = Consumer .committableSource(consumerSettings, Subscriptions.topics("topic1", "topic2")) ... .mapAsync(parallelism = 5)(problem => (problemSolverRouter ? problem).mapTo[Solution]) ... .run() 76 Parallelism used to limit how many messages in flight so we don’t overwhelm mailbox of destination Actor and maintain stream back-pressure.
  • 78. Options for implementing Stateful Streams 1. Provided Akka Streams stages: fold, scan, etc. 2. Custom GraphStage 3. Call into an Akka Actor System 78
  • 79. Persistent Stateful Stages using Event Sourcing 79 1. Recover state after failure 2. Create an event log 3. Share state Sound familiar? KTable’s!
  • 80. Persistent GraphStage using Event Sourcing 80 Source Stateful Stage SinkCluster Cluster Event Log Response (Event) Triggers State Change akka.persistence.Journal State Akka Persistence Plugins Request Handler Event Handler Request (Command/Query) Writes Reads (Replays) openclipart
  • 81. 81 krasserm / akka-stream-eventsourcing “ “This project brings to Akka Streams what Akka Persistence brings to Akka Actors: persistence via event sourcing. Experimental Public Domain Vectors
  • 82. New in Alpakka Kafka 1.0
  • 83. Alpakka Kafka 1.0 Release Notes Released Feb 28, 2019. Highlights: ● Upgraded the Kafka client to version 2.0.0 #544 by @fr3akX ○ Support new API’s from KIP-299: Fix Consumer indefinite blocking behaviour in #614 by @zaharidichev ● New Committer.sink for standardised committing #622 by @rtimush ● Commit with metadata #563 and #579 by @johnclara ● Factored out akka.kafka.testkit for internal and external use: see Testing ● Support for merging commit batches #584 by @rtimush ● Reduced risk of message loss for partitioned sources #589 ● Expose Kafka errors to stream #617 ● Java APIs for all settings classes #616 ● Much more comprehensive tests 83