Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ

KAFKA SUMMIT 2021
Cloud-Native Kafka

09.02.2021
Cloud-Native Kafka – KAFKA SUMMIT 2021
2
Sascha Holtbrügge
Big Data Architect
SVA System Vertrieb Alexander GmbH
sascha.holtbruegge@sva.de
Sascha Bleckmann
Big Data Engineer
SVA System Vertrieb Alexander GmbH
sascha.bleckmann@sva.de

CLOUD-NATIVE KAFKA
Big Data Analytics & IoT @ SVA
09.02.2021
3
• more than 170 consultants
• widespread skill: engineers, physicists,
mathematicians, computer scientists,
statisticians, psychologists, …
• all required roles for an e2e big data solution
• Data Scientist – algorithms and statistics
• Data Engineer – development
• Architect – IT Infrastructure, Platform
• Managed Services – Operations
• DevOps – agility and methodics
• strong eco-system (Confluent, Elastic, Splunk …)

What‘s the meaning of …
„CLOUD NATIVE“?

09.02.2021
5
Cloud native technologies empower
organizations to build and run scalable
applications in modern, dynamic environments
such as public, private, and hybrid clouds.
CNCF Cloud Native Definition v1.0

CLOUD-NATIVE KAFKA
 What are the key points of cloud native technologies?
 Best Practices: Containers, Service-Meshes, Microservices, „immutable infrastructure“ and „declarative APIs“
 Loosely coupled systems
 Decoupling infrastructure and platform
 Hardware and operating systems become transparent to the application and are considered a disposable
resource
 What would be a feasible approach to achieve this goal?
Cloud Native
09.02.2021
6

CLOUD-NATIVE KAFKA
Building a platform on top of an infrastructure
09.02.2021
7
How may a single container platform be operated on dedicated hardware resources?

09.02.2021
8
Kubernetes!

CLOUD-NATIVE KAFKA
 „Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and
management of containerized applications.”
 Open Source project within the Cloud Native Computing Foundation
 Originates back to the principles of Google’s “Borg”, the system which empowers the whole Google platform
 Kubernetes enables complex orchestration and deployment scenarios
 Declarative approach using „Manifests“ describing the desired state of resources
 Kubernetes introduces the „Pod“ as smallest unit, implying the execution of one or an union of more
containers to be executed collectively on a „Worker Node“
 The „Kubelet“ process runs the pods on the according nodes
 Storage, network, … are considered a disposable resource, managed by Kubernetes for pod usage
Kubernetes
09.02.2021
9

CLOUD-NATIVE KAFKA
Provisioning the Kubernetes platform
09.02.2021
10

CLOUD-NATIVE KAFKA
We introduced a shared platform by decoupling the infrastructure, using Kubernetes:
 … processes to be run may be described declaratively! 
 Configuration of the application‘s environment can be prepared by the development team
 Containers are already provided with their respective run-time environment
 … the system is resilient to failure and embraces load balancing! 
 Resources are shared fairly between all services running on the platform
 Node failure may be compensated by other nodes in the cluster
 … there are new problems arising over the horizon. 
Provisioning the Kubernetes platform
09.02.2021
11

CLOUD-NATIVE KAFKA
 What happens, if …
 … the sink is not reachable?
 … the source transmits too much data in a short time
period, such that the sink can‘t handle it?
 … the data has an unexpected format?
 … the number of participating sources and sinks
increases further?
 … the overall load of the system increases?
Data Integration between source and sink
09.02.2021
12
Source Sink
Data

CLOUD-NATIVE KAFKA
Increasing system complexity
09.02.2021
13
Source Sink
Data
Source
Source Sink
Sink

CLOUD-NATIVE KAFKA
 Apache Kafka is a scalable, reliable and distributed streaming platform, optimized for high data rates
and throughput:
 Out-of-the-box integrations, for example legacy services, databases and external services
 Guarantees regarding delivery and order of messages
 High throughput, even considering very high message load
 Loosely coupled sources and sinks
 Transmission in realtime
 Horizontal scalability
 Open Source project supported by commercial features
Why Apache Kafka?
09.02.2021
14

09.02.2021
15
Kafka is not a Message Queue …
… but a Streaming Platform!

CLOUD-NATIVE KAFKA
Apache Kafka
 Producers and consumers are completely decoupled
 Producer write data as soon as they are present
 Consumer read and process data with the velocity they
are able to handle, from the source they want
 Consumers are organized in consumer groups
 Load balancing between all members of the consumer
group
 Messages may be replayed, even if already processed
 If needed, messages may even be stored infinitely – for
example, utilizing the Tiered Storage feature of Confluent
Platform 6.0!
 With the right architectural decisions, the system scales
without any limit
Kafka vs. Message Queue
09.02.2021
16
Message Queue
 Processing in the fashion of a „command queue“
 Normally, there‘s no replaying of messages intended
 Possibilities of routing and processing/filtering of
messages in the MQ system itself
 e.g. you need to manage business logic in the MQ system
 Need to file a „subscription“ before messages are
scheduled
 Messages are pushed to the according parties by the MQ

CLOUD-NATIVE KAFKA
Operating a Kafka cluster, especially on Kubernetes, requires some distinct challenges to be addressed:
 Kafka‘s brokers are stateful applications
 Attribution of the Broker ID must not change
 Each broker has its own data persistence layer – and that assignment must not change as well
 ZooKeeper is needed to operate a Kafka cluster
 Stateful application as well
 Used for coordination of brokers, as well as meta information regarding topics, ACLs, …
 There are further components to be configured with the cluster
 Kafka Connect, ksqlDB, …
Kafka on Kubernetes
09.02.2021
17

09.02.2021
18
How to manage a system of such complexity
efficiently on the Kubernetes platform?
Operator Pattern!

CLOUD-NATIVE KAFKA
Operator Pattern
09.02.2021
19
Operator
Custom Resources
Deployments, Config-Maps,
PVCs, …

CLOUD-NATIVE KAFKA
Operators allow an automated deployment of applications on Kubernetes:
 Declaring own APIs in Kubernetes by definition of „Custom Resource Definitions“ (CRDs)
 Kubernetes provides the technical base using the API server for resource lifecycle management
 An operator is a dedicated process in the cluster, using the Kubernetes API to watch and manage the
state of the CRD instance objects
 Changes in CRD instance objects trigger a reconciliation
 Operator manages dependent Kubernetes objects, such as Deployments, ConfigMaps, …
Operator Pattern
09.02.2021
20

CLOUD-NATIVE KAFKA
Strimzi Kafka Operator
 Open Source project
 https://github.com/strimzi/strimzi-kafka-operator
 Kafka‘s components as Custom Resource Definitions
 Kafka, KafkaConnect, KafkaConnector, …
 Topics can be created and managed as CRDs
Kafka Operators for Kubernetes
09.02.2021
21
Confluent Operator
 Commercial product with enterprise platform
 Confluent Platform may be used along Kafka
 Current operator is based on a stacked Helm chart, but:
 Confluent Operator 2.0 hit „Early Access“ phase –
completely based on Custom Resource Definitions!

CLOUD-NATIVE KAFKA
„Strimzi“ open source operator
Kafka Operators for Kubernetes
09.02.2021
22
Confluent Operator (Early Access Preview)
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
storage:
type: persistent-claim
size: 16Gi
deleteClaim: false
apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
name: my-cluster
spec:
replicas: 3
image:
application: confluentinc/cp-server-
operator:6.0.0.0
init: confluentinc/cp-init-container-
operator:6.0.0.0
dataVolumeCapacity: 16Gi
metricReporter:
enabled: true
https://github.com/confluentinc/operator-earlyaccess
https://github.com/strimzi/strimzi-kafka-operator

CLOUD-NATIVE KAFKA
 Customer wants to operate a multi-tenant environment
 Kubernetes platform and Kafka cluster shall be shared between all tenants
 Namespaces are dedicated to a single tenant‘s environment
 Deployment and operation of external developed applications
 Applications are delivered as OCI-compatible containers along with Helm charts
 For every tenant the subset of deployed applications is different
 But still, every application communicates utilizing Kafka as central data hub
 Test environments should be started and stopped on the fly
 Topics should be dynamically created and deleted with the according environments
 Kafka connectors also need to be created and stopped automatically
How does a Kafka operator support that?
09.02.2021
23

CLOUD-NATIVE KAFKA
 Continuous Delivery shall accelerate the delivery process of software, while improving quality
 Minimizing the duration of a single development cycle
 Using deployment pipelines and well-defined processes to deliver the software product
 Cloud Native technologies support this intend
 Declarative definition of resources and decoupling systems significantly decreases maintenance and
administration overhead
 Systems have lower coherence with each other, thus less overall complexity
 ArgoCD is a Kubernetes operator enabling GitOps by Kubernetes manifests
 Deployment of desired Kubernetes resources is triggered by further Kubernetes resources
 Continuous synchronisation with Git repository
GitOps and Continuous Delivery
09.02.2021
24

09.02.2021
25
Demo
ArgoCD deploying Strimzi Operator

CLOUD-NATIVE KAFKA
 Cloud Native technologies benefit from decoupled infrastructure
 Kubernetes is a perfectly well-suited platform
 Kafka simplifies and consolidates data streams
 Data streams are chained to a single central data hub
 Kafka Connect enables in- and output of data in conjunction with sources and sinks
 Orchestrating complex applications on Kubernetes should be addressed by the operator pattern
 Kafka can be operated very well on Kubernetes in that way
 Strimzi Operator and Confluent Operator are able to manage the whole life-cycle of the Kafka instance
Summary
09.02.2021
26

Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ

Ähnlich wie Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ (20)

Mehr von HostedbyConfluent

Mehr von HostedbyConfluent (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ