If you have already worked on various Kafka Streams applications before, then you have probably found yourself in the situation of rewriting the same piece of code again and again.
Whether it's to manage processing failures or bad records, to use interactive queries, to organize your code, to deploy or to monitor your Kafka Streams app, build some in-house libraries to standardize common patterns across your projects seems to be unavoidable.
And, if you're new to Kafka Streams you might be interested to know what are those patterns to use for your next streaming project.
In this talk, I propose to introduce you to Azkarra, an open-source lightweight Java framework that was designed to provide most of that stuffs off-the-shelf by leveraging the best-of-breed ideas and proven practices from the Apache Kafka community.
3. 3
Like me, you probably started
with the famous Word Count !
KStream<String, String> source = builder.stream("streams-plaintext-input");
source.flatMapValues(splitAndToLowercase())
.groupBy((key, value) -> value)
.count(Materialized.as("counts-store"))
.toStream()
.to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
Topology topology = builder.build();
9. 9
(Well, unless you are testing your app
in production…cough, cough...)
OK, Nobody does that!
10. ▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Some requirements before
moving into production
Our TODO list
10
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
11. . Business Value vs Effort
Topology
(Business Logic)
Business Value
High
Kafka Streams
Management
IQ
Error Handling
logic
Monitoring /
Health-check
Security
Config
Externalization
Low
Effort
Low/Medium
High
Streams
Lifecycle
Kafka Streams Application
11
RocksDB Offsets and Lags Packaging
12. .
A lightweight Java framework to make a Kafka Streams application
production-ready in just a few lines of code.
■ Distributed under the Apache License 2.0.
■ Was developed based on experience on a wide range of projects
■ Uses best-practices developed by Kafka users and the open-source community.
Overview:
■ REST API: Health Check, Monitoring, Interactive Queries, etc
■ Embedded WebUI: Topology DAG Visualization
■ Built-in features for handling exceptions and tuning RocksDB
■ Support for Server-Sent Events
Azkarra Framework
in a nutshell
12
#azkarrastreams
13. .
Available on Maven Central
Azkarra Stream
How to use It ?
13
<dependency>
<groupId>io.streamthoughts
</groupId>
<artifactId>azkarra-streams
</artifactId>
<version>0.9.2</version>
</dependency>
Azkarra Framework:
<dependency>
<groupId>io.streamthoughts
</groupId>
<artifactId>azkarra-commons
</artifactId>
<version>0.9.2</version>
</dependency>
Provides reusable classes for Kafka Streams :
mvn archetype:generate
-DarchetypeGroupId
=io.streamthoughts
-DarchetypeArtifactId
=azkarra-quickstart-java
-DarchetypeVersion
=0.9.2
-DgroupId=azkarra.streams
-DartifactId=azkarra-getting-started
-Dversion=1.0
-Dpackage=azkarra
-DinteractiveMode
=false
Quick start:
15. .
.
. Concepts
TopologyProvider
Topology
Provider
Topology
Container for building
and configuring a
Topology
15
class WordCountTopology
implements TopologyProvider, Configurable {
private Conf conf;
@Override
public Topology topology() {
var source = conf.getString("topic.source.name");
var sink = conf.getString("topic.sink.name");
var store = conf.getString("store.name");
var builder = new StreamsBuilder();
builder
.<String, String>stream(source)
.flatMapValues(splitAndToLowercase())
.groupBy((key, value) -> value)
.count(Materialized.as(store))
.toStream()
.to(sink, Produced.with(Serdes.String(), Serdes.Long()));
return builder.build();
}
@Override
public void configure(final Conf conf) { this.conf = conf; }
@Override
public String version() { return "1.0"; }
}
16. .
.
. Concepts
Execution Environment
StreamsExecution
Environment
Manages the life
cycle of
KafkaStreams
instances. Topology
Provider
Topology
16
// (1) Define the KafkaStreams configuration
var streamsConfig = Conf.of(
BOOTSTRAP_SERVERS_CONFIG, "localhost:9092",
DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass(),
DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass()
);
// (2) Define the Topology configuration
var topologyConfig = Conf.of(
"topic.source.name", "topic-text-lines",
"topic.sink.name", "topic-text-word-count",
"store.name", "Count"
);
// (3) Create and configure a local execution environment
var env = LocalStreamsExecutionEnvironment
.create(Conf.of("streams", streamsConfig))
// (4) Register our topology to run
.registerTopology(
WordCountTopology::new,
Executed.as("WordCount").withConfig(topologyConfig)
);
// (5) Start the environment
env.start();
// (6) Add Shutdown Hook
Runtime.getRuntime()
.addShutdownHook(new Thread(env::stop));
17. .
17
Let’s start KafkaStreams
Boom! Transient Errors
word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] Received error code INCOMPLETE_SOURCE_TOPIC_METADATA
16:05:12.585 [word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1] ERROR
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer
clientId=word-count-1-0-ae1a9bf9-101d-4796-ad36-2e1130e83573-StreamThread-1-consumer, groupId=word-count-1-0] User provided listener
org.apache.kafka.streams.processor.internals.StreamsRebalanceListener failed on invocation of onPartitionsAssigned for partitions []
org.apache.kafka.streams.errors.MissingSourceTopicException: One or more source topics were missing during rebalance
at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:57)
~[kafka-streams-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293) [kafka-clients-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430) [kafka-clients-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:451) [kafka-clients-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:367) [kafka-clients-2.7.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508) [kafka-clients-2.7.0.jar:?]
18. .
.
.
18
StreamLifecycleInterceptor
Concepts
Interface StreamsLifecycleInterceptor {
/**
* Intercepts the streams instance before being started.
*/
default void onStart(StreamsLifecycleContext context,
StreamsLifecycleChain chain) {
chain.execute();
}
/**
* Intercepts the streams instance before being stopped.
*/
default void onStop(StreamsLifecycleContext context,
StreamsLifecycleChain chain) {
chain.execute();
}
/**
* Used for logging information.
*/
default String name() {
return getClass().getSimpleName();
}
}
A pluggable interface that allows intercepting a
KafkaStreams instance before being started or
stopped.
Built-in Implementations:
■ AutoCreateTopicsInterceptor
■ WaitForSourceTopicsInterceptor
■ KafkaBrokerReadyInterceptor
...and a few more (discussed later) 😉
Most Interceptors are configurable.
19. .
.
.
19
AutoCreateTopicsInterceptor
Concepts import static io.s.a.r.i.AutoCreateTopicsInterceptorConfig.*;
// (1) Define the KafkaStreams configuration
var streamsConfig = ...
// (2) Define the Topology configuration
var topologyConfig = ...
// (3) Define the Environment configuration
var envConfig = Conf.of(
"streams", streamsConfig,
AUTO_CREATE_TOPICS_NUM_PARTITIONS_CONFIG, 2,
AUTO_CREATE_TOPICS_REPLICATION_FACTOR_CONFIG, 1,
// WARN - ONLY DURING DEVELOPMENT
AUTO_DELETE_TOPICS_ENABLE_CONFIG, true
);
// (4) Create and configure the local execution environment
LocalStreamsExecutionEnvironment
.create(envConfig)
// (5) Add the StreamLifecycleInterceptor
.addStreamsLifecycleInterceptor(
AutoCreateTopicsInterceptor::new
)
// ...code omitted for clarity
Automatically infers the source and sink topics to
be created from the Topology.describe().
■ Internally, uses the AdminClient API.
■ Can be used during development for deleting
all topics when the instance is stopped.
for
20. ▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Externalizing configuration
(we have 20’ left)😀
What's left to do ?
20
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
21. .
.
.
21
Conf & AzkarraConf
External Configuration
// file:application.conf
azkarra {
// The configuration settings passed to the Kafka Streams
// instance should be prefixed with `.streams`
streams {
bootstrap.servers = "localhost:9092"
default.key.serde = "org.apache.kafka..Serdes$StringSerde"
default.value.serde = "org.apache.kafka..Serdes$StringSerde"
}
topic.source.name = "topic-text-lines"
topic.sink.name = "topic-text-word-count"
store.name = "Count"
auto.create.topics.num.partitions = 2
auto.create.topics.replication.factor = 1
auto.delete.topics.enable = true
}
// file:Main.class
var config = AzkarraConf.create().getSubConf("azkarra");
Azkarra provides the Configurable interface which
can be implemented by most of the Azkarra
components.
■ AzkarraConf: Uses the Lightbend Config library.
○ Allows loading configuration settings from
HOCON files.
void configure(final Conf configuration);
22. .
.
. Concepts
AzkarraContext
AzkarraContext
StreamsExecution
Environment
Container for
Dependency Injection.
Used to automatically
configures
streams execution
environments.
Topology
Provider
Topology
22
public static void main(final String[] args) {
// (1) Load the configuration (application.conf)
var config = AzkarraConf.create().getSubConf("azkarra");
// (2) Create the Azkarra Context
var context = DefaultAzkarraContext.create(config);
// (3) Register StreamLifecycleInterceptor as component
context.registerComponent(
ConsoleStreamsLifecycleInterceptor.class
);
// (4) Register the Topology to the default environment
context.addTopology(
WordCountTopology.class,
Executed.as("word-count")
);
// (5) Start the context
context
.setRegisterShutdownHook(true)
.start();
}
23. .
.
. Concepts
AzkarraApplication
AzkarraContext
AzkarraApplication
StreamsExecution
Environment
Used to bootstrap and
configure an Azkarra
application.
Provides Embedded
HTTP-Server
Provides
Component
Scanning
Topology
Provider
Topology
23
public class WordCount {
public static void main(final String[] args) {
// (1) Load the configuration (application.conf)
var config = AzkarraConf.create();
// (2) Create the Azkarra Context
var context = DefaultAzkarraContext.create();
// (3) Register the Topology to the default environment
context.addTopology(
WordCountTopology.class,
Executed.as("word-count")
);
// (4) Create Azkarra application
new AzkarraApplication()
.setContext(context)
.setConfiguration(config)
// (5) Enable and configure embedded HTTP server
.setHttpServerEnable(true)
.setHttpServerConf(ServerConfig.newBuilder()
.setListener("localhost")
.setPort(8080)
.build()
)
// (6) Start Azkarra
.run(args);
}
}
24. .
.
. Concepts
AzkarraApplication
AzkarraContext
AzkarraApplication
StreamsExecution
Environment
Topology
Provider
Topology
24
@AzkarraStreamsApplication
public class WordCount {
public static void main(String[] args) {
AzkarraApplication.run(WordCount.class, args);
}
@Component
public static class WordCountTopology implements
TopologyProvider, Configurable {
private Conf conf;
@Override
public Topology topology() {
var builder = new StreamsBuilder();
// ...code omitted for clarity
return builder.build();
}
@Override
public void configure(Conf conf) {
this.conf = conf;
}
@Override
public String version() { return "1.0"; }
}
}
Used to bootstrap and
configure an Azkarra
application.
Provides Embedded
HTTP-Server
Provides
Component
Scanning
25. ▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Handling Deserialization Exceptions
(we have 15’ left)🤔
What's left to do ?
25
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
26. .
default.deserialization.exception.handler
■ CONTINUE: continue with processing
■ FAIL: fail the processing and stop
Two available implementations :
■ LogAndContinueExceptionHandler
■ LogAndFailExceptionHandler
26
Solution #1
Built-in mechanisms
Not really suitable for production.
Cannot monitor efficiently
corrupted messages
27. .
.
.
27
Solution #2
Dead Letter Queue Topic
Solution #3
Sentinel Value
DeserializationExceptionHandler
Send corrupted messages to a
special topic.
Deserializer<T>
Catch any exception thrown during deserialization
and return a default value (e.g: null, “N/A”, etc).
Handler
?
Source Topic
Topology
(skip)
Dead Letter Topic
! !
! !
Source Topic SafeDeserializer
Delegate
Deserializer
(null)(null)
! !
28. .
.
.
28
Solution #2
Using Azkarra
Solution #3
DeadLetterTopicExceptionHandler
■ By default, sends corrupted records to
<Topic>-rejected
■ Doesn’t change the schema/format of the
corrupted message.
■ Use Kafka Headers to trace exception cause and
origin, e.g. :
○ __errors.exception.stacktrace
__errors.exception.message
○ __errors.exception.class.name
○ __errors.timestamp
○ __errors.application.id
○ __errors.record.[topic|partition|offset]
■ Can be configured to send records to a distinct
Kafka Cluster than the one used for KafkaStreams.
SafeSerdes
SafeSerdes.Long(-1L);
SafeSerdes.UUID(null);
SafeSerdes.serdeFrom(
new JsonSerializer (),
new JsonDeserializer (),
NullNode.getInstance ()
);
29. ▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Monitoring
(we have 10’ left)🙃
Our TODO list
29
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
30. .
The Kafka Streams API provides few methods for monitoring the state of the running instance.
■ KafkaStreams#state(), KafkaStreams#setStateListener()
⎼ CREATED, REBALANCING, RUNNING, PENDING_SHUTDOWN, NOT_RUNNING, ERROR
⎼ can be used for checking the Liveness and Readiness for the instance.
■ KafkaStreams#localThreadsMetadata
⎼ returns information about local Threads/Tasks and partition assignments.
■ KafkaStreams#metrics()
Best Practices:
■ Build some REST APIs to expose the states of Kafka Streams
■ Export Metrics using JMX, Prometheus, etc
30
How to monitor
Kafka Streams ?
31. .
31
Kafka Consumer Lag and Offsets
Maybe the most fundamental indicator to monitor
Consumer
KafkaStreams#allLocalStorePartitionLags()
KafkaStreams#setGlobalStateRestoreListener
■ NOTE: Internal KafkaStreams Threads do not
start consuming messages until stores are
recovered.
public interface ConsumerInterceptor <K, V> extends Configurable ,
AutoCloseable {
ConsumerRecords <K, V> onConsume (ConsumerRecords <K, V> record);
void onCommit (Map<TopicPartition , OffsetAndMetadata > offsets);
void close();
}
KafkaStreams
Configured using :
main.consumer.interceptor.classes
How far behind the Kafka Streams consumers
are from the producers ?
Is the Kafka Streams application ready to process
records and can serve interactive queries ?
32. .
Azkarra supports a REST API for managing,
monitoring and querying Kafka Streams instances.
■ Provides support for Interactive Queries
■ Built-in authentication and authorization
mechanisms (Basic Auth, SSL 2-Way).
■ Allows registration of new JAX-RS resources
using plugin interface: AzkarraRestExtension
32
Azkarra
REST API ● Get information about the local streams instance
GET /api/v1/streams
● Get the status for the streams instance
GET /api/v1/streams/(string: id)/status
● Get the configuration for the streams instance
GET /api/v1/streams/(string: id)/config
● Get current metrics for the streams instance
GET /api/v1/streams/(string: applicationId)/metrics
● Get all metrics in Prometheus format
GET /prometheus
Micrometer Prometheus
33. .
.
.
Azkarra can be configured for periodically reporting
the internal states of a KafkaStreams instance.
■ Use StreamLifecycleInterceptor:
⎼ MonitoringStreamsInterceptor
■ Accepts a pluggable reporter class
⎼ Default : KafkaMonitoringReporter
⎼ Publishes events that adhere to the
CloudEvents specification.
33
Putting it all together
Exporting Kafka Streams
States Anywhere
{
"id":
"appid:word-count;appsrv:localhost:8080;ts:1620691200000",
"source": "azkarra/ks/localhost:8080",
"specversion": "1.0",
"type": "io.streamthoughts.azkarra.streams.stateupdateevent",
"time": "2021-05-11T00:00:00.000+0000",
"datacontenttype": "application/json",
"ioazkarramonitorintervalms": 10000,
"ioazkarrastreamsappid": "word-count",
"ioazkarraversion": "0.9.2",
"ioazkarrastreamsappserver": "localhost:8080",
"data": {
"state": "RUNNING",
"threads": [
{
"name": "word-count-...-93e9a84057ad-StreamThread-1",
"state": "RUNNING",
"active_tasks": [],
"standby_tasks": [],
"clients": {}
}
],
"offsets": {
"group": "",
"consumers": []
},
"stores": {
"partitionRestoreInfos": [],
"partitionLagInfos": []
},
"state_changed_time": 1620691200000
}
}
Cloud Events
34. ▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
Packaging
(we have still 5’ left) 😬
Our TODO list
34
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
35. .
Azkarra-based applications can be packaged as any other Kafka Streams apps.
Azkarra Worker → An empty Azkarra application
■ Topologies and components can be loaded from an external uber-jar
⎼ Similar to Kafka Connect plugins and connectors
■ Can be used as the base image for Docker
⎼ Use Jib to build optimized Docker images for Java
35
Packaging Kafka Streams
with Azkarra
$ docker run --net host streamthoughts/azkarra-streams-worker:latest
-v ./application.conf=/etc/azkarra/azkarra.conf
-v ./local-topologies=/usr/share/azkarra-components/
streamthoughts/azkarra-streams-worker
Jib + Docker + Azkarra = ❤
36. .
Using Kubernetes, topologies can be downloaded and mount using an init-container.
36
Deploying Kafka Streams
with Azkarra (in Kubernetes)
Deployment, StatefulSet, or...
Container
(image: azkarra-worker)
InitContainer
my-topology-with-dependencies-1.0.jar
HTTP GET /
Repository Manager
e.g., Nexus / Artifactory
Shared volume
/var/lib/components/
azkarra.component.paths
37. ▢ Test the app is working as
expected
▢ Externalize configuration
▢ Handle transient errors
▢ Handle deserialization exceptions
In less than 30 min
using Azkarra🚀
DONE
37
▢ Expose the state of the Kafka
Streams application
▢ Be able to monitor offsets and lags
of consumers and state stores
▢ Interactive Queries (optional)
▢ Package the Kafka Streams
application for production
39. .
Kafka Streams is a very good choice to quickly create streaming applications.
But, building applications for production can be a lot of work.
Azkarra aims to be a fast path for production by providing all the cool features you need:
■ Built-in mechanisms for handling exceptions
■ Built-in REST API for executing Interactive Queries.
■ Consumers Offsets Lag
■ Topology Visualization
■ Dashboard UI
Take Aways
Conclusion
39
40. .
■ Add support for querying stale stores.
■ Add support for deploying and managing Kafka Streams
topologies directly into Kubernetes
❏ i.e., KubStreamsExecutionEnvironment
■ Enhance the WebUI to add some visualizations for the key
metrics to monitor.
Take Aways
Roadmap
40
41. .
Official Website: https://www.azkarrastreams.io/
GitHub: https://github.com/streamthoughts/azkarra-streams (for contributing and adding⭐)
Slack: https://communityinviter.com/apps/azkarra-streams/azkarra-streams-community
Demo: https://github.com/streamthoughts/demo-kafka-streams-scottify
Take Aways
Links
41
Join us on Slack!