SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Envoy & Kafka
Adam Kotwasinski
Principal Software Development Engineer
Workday
Agenda
01 What is Kafka?
02 Proxying Kafka
03 Envoy as Kafka proxy
04 Envoy Kafka broker filter
05 Envoy Kafka mesh filter
What is Kafka?
• Streaming solution for sending and receiving records
• Records are stored in topics which are divided into partitions
‒ partition is a unit of assignment
‒ a single consumer can have multiple partitions assigned
• High throughput
‒ producer records stored as-is (no record format translation)
‒ zero-copy implementation
• Re-reading
‒ a record can be consumed multiple times (unlike typical messaging solutions)
• Durability
‒ partitions are replicated to other brokers in a cluster (replication factor)
‒ topics have a time / size-based retention configuration
What is Kafka?
• Some examples - https://kafka.apache.org/uses
‒ messaging (topics == queues),
‒ website activity tracking
(e.g. topic per activity type, high volume due to multiple client actions),
‒ metrics,
‒ log aggregation
(abstracts out log files and puts all the logs in a single place),
‒ external commit log.
Capabilities
• Raw clients: consumer, producer, admin
‒ (official) Java Apache Kafka client, librdkafka for C & C++
• Wrappers / frameworks
‒ spring-kafka, alpakka, smallrye
• Kafka-streams API
‒ stream-friendly DSL: map, filter, join, group-by
• Kafka Connect
‒ framework service for defining source and sink connectors
‒ allows pulling data from / pushing data into other services
 for example: Redis, SQL, Hadoop
Rich ecosystem
• Kafka cluster is composed of 1+ Kafka brokers that store the partitions.
• A topic is composed of 1+ partitions.
Kafka cluster
• A partition is effectively an append-only record list.
• Producers append only at the end of partition.
• Consumers can consume from any offset.
Partition
• Key, value and headers.
• https://kafka.apache.org/documentation/#record
Record
• Producers append the records to the end of partition.
• Configurable batching capabilities (batch.size and linger.ms).
• Target partition is chosen depending on producer’s configuration
(org.apache.kafka.clients.producer.internals.DefaultPartitioner):
‒ if partition provided explicitly – use the partition provided,
‒ if key present – use hash(key) % partition count,
‒ if no partition nor key present – use the same partition for a single batch;
‒ latter two cases require the producer to know how many partitions are in a topic
(this will be important for kafka-mesh-filter).
• Broker acknowledgements (acks):
‒ leader replica (acks = 1),
‒ all replicas (acks = all),
‒ no confirmation (acks = 0).
• Transaction / idempotence capabilities.
Kafka Producer
• Consumer specifies which topics / partitions it wants to poll
the records from.
• Partition assignment can be either explicit (assign API)
or cluster-managed (subscribe API).
‒ Subscription API requires consumer group id.
• Records are received from current consumer position.
‒ Position can be changed with seek API (similar to any file-reader API).
Kafka Consumer
• Kafka mechanism that allows for automatic distribution of partitions
across consumer group members.
• Auto-balancing if group members join or die (heartbeat).
• Strategy configurable with partition.assignment.strategy property.
Consumer groups
• Consumers can store their position either in external system,
or in Kafka (internal topic __consumer_offsets).
• Effectively a triple of group name, partition and offset.
• Java client:
‒ commitSync, commitAsync, configuration property enable.auto.commit
• Delivery semantics:
‒ at most once – offset committed before it is processed,
‒ at least once – offset committed after it is processed,
‒ exactly once – transaction API (if the processing == writing to the same
Kafka cluster); storing offset in external system together with processed data.
Consumer offsets
• https://kafka.apache.org/31/protocol.html#protocol_api_keys
• Smart clients (producers, consumers) negotiate the protocol version
‒ API-versions response contains a map of understood request types
• Automatic discovery of cluster members
‒ metadata response contains cluster topology information
 what topics are present
 how many partitions these topics have
 which brokers are leaders and replicas for partitions
 brokers’ host and port info
Protocol
Proxying Kafka
• The host & port of Kafka broker, that the client will send requests to,
come from broker’s advertised.listeners property.
• As we want our traffic to go through the proxy,
Kafka broker needs to advertise the socket it is listening on.
• This requires configuration on both ends:
‒ proxy needs to point to Kafka broker,
‒ Kafka broker needs to advertise proxy’s address instead of itself.
• This is not Envoy-specific.
Kafka advertised.listeners
• Naïve proxying makes
the broker-to-broker
traffic go through
the proxy.
Naïve proxying
• Brokers can be configured
to listen to multiple listeners,
and we can specify which
ones to use for inter-broker
traffic.
• inter.broker.listener.name
• This way, only external traffic
is routed through the proxy.
Inter-broker traffic
Envoy as Kafka proxy
TCP proxy filter
Envoy being used as proxy for Kafka,
without any custom code – only TCP proxy filter.
Envoy as TCP proxy for Kafka
Use the protocol deserializer to collect
connection metrics (number of requests, processing time).
Kafka broker filter
Allow a consumer to use a single-entry point (Envoy)
to consume data from multiple upstream Kafka clusters.
Kafka mesh filter (consumer)
Change std::vector<unsigned char>
into request/response objects.
Kafka protocol support
Receive and process requests from producers,
and send received records to multiple upstream Kafka clusters
Kafka mesh filter (producer)
• If we want to proxy a Kafka cluster with Envoy,
we need to provide as many listeners as there are brokers.
• Each of listeners would then use the TCP proxy filter
to point to an upstream Kafka broker
(which is present in Envoy cluster configuration object).
• The filter chain can then be enhanced with other filters.
• In general, a 1-1 mapping between a broker
and Envoy listener needs to be kept.
Proxying Kafka with Envoy
Example:
single Envoy instance proxying two Kafka brokers
ENVOY CONFIG
BROKER 1 CONFIG
BROKER 2 CONFIG
Envoy Kafka broker filter
Kafka protocol support in Envoy
Envoy being used as proxy for Kafka,
without any custom code – only tcp_proxy.
Envoy as TCP proxy for Kafka
Use the protocol deserializer to collect
connection metrics (number of requests, processing time).
Kafka broker filter
Allow a consumer to use a single-entry point (Envoy)
to consume data from multiple upstream Kafka clusters.
Kafka mesh filter (consumer)
Change std::vector<unsigned char>
into request/response objects.
Kafka protocol support
Receive and process requests from producers,
and send received records to multiple upstream Kafka clusters
Kafka mesh filter (producer)
• Kafka message protocol is described in language-agnostic
specification files.
• These files are used to generate Java server/client code.
• The same files were used to generate corresponding C++ code
for Envoy – https://github.com/envoyproxy/envoy/pull/4950 –
Python templates that generate headers to be included
in the broker/mesh filter code.
• https://github.com/apache/kafka/tree/3.1.0/clients/src/main/resource
s/common/message
Kafka message spec files
• Kafka messages have an increasing correlation id
(sequence number).
• https://kafka.apache.org/31/protocol.html#protocol_messages
• This allows us to match a response with its request,
as we can keep track when a request with particular id was
received.
‒ absl::flat_hash_map<int32_t, MonotonicTime> request_arrivals_ (filter.h)
• Requests (version 1+) also contain a client identifier.
Request header
Kafka broker filter
Envoy being used as proxy for Kafka,
without any custom code – only tcp_proxy.
Envoy as TCP proxy for Kafka
Use the protocol deserializer to collect
connection metrics (number of requests, processing time).
Kafka broker filter
Allow a consumer to use a single-entry point (Envoy)
to consume data from multiple upstream Kafka clusters.
Kafka mesh filter (consumer)
Change std::vector<unsigned char>
into request/response objects.
Kafka protocol support
Receive and process requests from producers,
and send received records to multiple upstream Kafka clusters
Kafka mesh filter (producer)
• https://www.envoyproxy.io/docs/envoy/v1.22.0/configuration/listeners/ne
twork_filters/kafka_broker_filter
• Intermediary filter intended to be used in a filter chain before
TCP-filter that sends the traffic to the upstream broker.
• As of now, data is sent without any changes.
• Captures:
‒ request metrics,
‒ request processing time.
• Entry point for future features (e.g. filtering by client identifier).
• https://adam-kotwasinski.medium.com/deploying-envoy-and-kafka-
8aa7513ec0a0
Kafka broker filter features
VIDEO
Broker filter demo
Envoy Kafka mesh filter
Kafka mesh filter (producer)
Envoy being used as proxy for Kafka,
without any custom code – only tcp_proxy.
Envoy as TCP proxy for Kafka
Use the protocol deserializer to collect
connection metrics (number of requests, processing time).
Kafka broker filter
Allow a consumer to use a single-entry point (Envoy)
to consume data from multiple upstream Kafka clusters.
Kafka mesh filter (consumer)
Change std::vector<unsigned char>
into request/response objects.
Kafka protocol support
Receive and process requests from producers,
and send received records to multiple upstream Kafka clusters
Kafka mesh filter (producer) CURRENT STATE
• Use Envoy as a facade for multiple Kafka clusters.
• Clients are not aware of Kafka clusters;
Envoy would perform necessary traffic routing.
• Received Kafka requests would be routed to correct clusters
depending on filter configuration.
Motivation
• Terminal filter in Envoy filter chain.
• https://www.envoyproxy.io/docs/envoy/v1.22.0/configuration/listener
s/network_filters/kafka_mesh_filter
• From the client perspective, an Envoy instance acts as a Kafka
broker in one-broker cluster.
• Upstream connections are performed by embedded librdkafka
producer instances.
• https://adam-kotwasinski.medium.com/kafka-mesh-filter-in-envoy-
a70b3aefcdef
Kafka mesh filter
1. Filter instance pretends to be a broker in a single-broker cluster.
2. All partitions requested are hosted by the “Envoy-broker”.
3. When Produce requests are received, the filter extracts
the records.
4. Extracted records are forwarded to embedded librdkafka
producers pointing at upstream clusters.
‒ Upstream is chosen depending on the forwarding rules.
5. Filter waits for all delivery responses (failures too) before
the response can be sent back downstream.
Typical flow
• API-Versions response
‒ the filter supports only a limited subset of Kafka requests
 API-versions – to negotiate the request versions with clients
 Metadata – to make clients send all traffic to Envoy
 Produce – to receive the records and send them upstream to real Kafka clusters
• Metadata response
‒ Broker’s host & port – required configuration properties for a filter instance
 Same purpose as broker’s advertised.listeners property
‒ Partition numbers for a topic
 Required configuration properties in upstream cluster definition
 This data is also used by default partitioner if key is not present
 Future improvement: fetch the configuration from upstream cluster
API-Versions & Metadata
• As we can parse the Kafka messages, we can extract the necessary
information and pass it to forwarding logic.
• Current implementation uses only topic names to decide which
upstream cluster should be used.
‒ First match in the configured prefix list,
‒ No match – exception (closes the connection),
‒ KafkaProducer& getProducerForTopic(const std::string& topic)
(upstream_kafka_facade.h).
• Single request can contain multiple records that would map
to multiple upstream clusters.
‒ Downstream response is sent after all upstreams have finished (or failed).
Forwarding policy
• We create an instance of Kafka producer (RdKafka::Producer)
per internal worker thread (--concurrency) (source)
• Custom configuration for each upstream (e.g. acks, buffer size).
Embedded producer
VIDEO
Mesh filter demo
Kafka mesh filter (consumer)
Envoy being used as proxy for Kafka,
without any custom code – only tcp_proxy.
Envoy as TCP proxy for Kafka
Use the protocol deserializer to collect
connection metrics (number of requests, processing time).
Kafka broker filter
Allow a consumer to use a single-entry point (Envoy)
to consume data from multiple upstream Kafka clusters.
Kafka mesh filter (consumer)
Change std::vector<unsigned char>
into request/response objects.
Kafka protocol support
Receive and process requests from producers,
and send received records to multiple upstream Kafka clusters
Kafka mesh filter (producer)
FUTURE
Kafka consumer types
• Single consumer instance would
handle multiple FetchRequests.
• Messages would be distributed
across multiple connections from
downstream.
• Similar to Kafka REST proxy and
Kafka consumer groups (but
without partition assignment).
Shared consumer
• New consumer for every
downstream connection.
• Multiple connections could
receive the same message.
• Consumer group support might
be possible (would need to
investigate JoinGroup & similar
requests).
Dedicated consumer
Q&A
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent
 
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Kai Wähner
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...HostedbyConfluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...HostedbyConfluent
 
DoK Talks #91- Leveraging Druid Operator to manage Apache Druid on Kubernetes
DoK Talks #91- Leveraging Druid Operator to manage Apache Druid on KubernetesDoK Talks #91- Leveraging Druid Operator to manage Apache Druid on Kubernetes
DoK Talks #91- Leveraging Druid Operator to manage Apache Druid on KubernetesDoKC
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillDelivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillHostedbyConfluent
 

Was ist angesagt? (20)

Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
DoK Talks #91- Leveraging Druid Operator to manage Apache Druid on Kubernetes
DoK Talks #91- Leveraging Druid Operator to manage Apache Druid on KubernetesDoK Talks #91- Leveraging Druid Operator to manage Apache Druid on Kubernetes
DoK Talks #91- Leveraging Druid Operator to manage Apache Druid on Kubernetes
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillDelivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 

Ähnlich wie Envoy and Kafka

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuideInexture Solutions
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafkadatamantra
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaLevon Avakyan
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaMax Alexejev
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptxKoiuyt1
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
 
Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfGuozhang Wang
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
 

Ähnlich wie Envoy and Kafka (20)

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptx
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
 

Kürzlich hochgeladen

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 

Kürzlich hochgeladen (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 

Envoy and Kafka

  • 2. Adam Kotwasinski Principal Software Development Engineer Workday
  • 3. Agenda 01 What is Kafka? 02 Proxying Kafka 03 Envoy as Kafka proxy 04 Envoy Kafka broker filter 05 Envoy Kafka mesh filter
  • 5. • Streaming solution for sending and receiving records • Records are stored in topics which are divided into partitions ‒ partition is a unit of assignment ‒ a single consumer can have multiple partitions assigned • High throughput ‒ producer records stored as-is (no record format translation) ‒ zero-copy implementation • Re-reading ‒ a record can be consumed multiple times (unlike typical messaging solutions) • Durability ‒ partitions are replicated to other brokers in a cluster (replication factor) ‒ topics have a time / size-based retention configuration What is Kafka?
  • 6. • Some examples - https://kafka.apache.org/uses ‒ messaging (topics == queues), ‒ website activity tracking (e.g. topic per activity type, high volume due to multiple client actions), ‒ metrics, ‒ log aggregation (abstracts out log files and puts all the logs in a single place), ‒ external commit log. Capabilities
  • 7. • Raw clients: consumer, producer, admin ‒ (official) Java Apache Kafka client, librdkafka for C & C++ • Wrappers / frameworks ‒ spring-kafka, alpakka, smallrye • Kafka-streams API ‒ stream-friendly DSL: map, filter, join, group-by • Kafka Connect ‒ framework service for defining source and sink connectors ‒ allows pulling data from / pushing data into other services  for example: Redis, SQL, Hadoop Rich ecosystem
  • 8. • Kafka cluster is composed of 1+ Kafka brokers that store the partitions. • A topic is composed of 1+ partitions. Kafka cluster
  • 9. • A partition is effectively an append-only record list. • Producers append only at the end of partition. • Consumers can consume from any offset. Partition
  • 10. • Key, value and headers. • https://kafka.apache.org/documentation/#record Record
  • 11. • Producers append the records to the end of partition. • Configurable batching capabilities (batch.size and linger.ms). • Target partition is chosen depending on producer’s configuration (org.apache.kafka.clients.producer.internals.DefaultPartitioner): ‒ if partition provided explicitly – use the partition provided, ‒ if key present – use hash(key) % partition count, ‒ if no partition nor key present – use the same partition for a single batch; ‒ latter two cases require the producer to know how many partitions are in a topic (this will be important for kafka-mesh-filter). • Broker acknowledgements (acks): ‒ leader replica (acks = 1), ‒ all replicas (acks = all), ‒ no confirmation (acks = 0). • Transaction / idempotence capabilities. Kafka Producer
  • 12. • Consumer specifies which topics / partitions it wants to poll the records from. • Partition assignment can be either explicit (assign API) or cluster-managed (subscribe API). ‒ Subscription API requires consumer group id. • Records are received from current consumer position. ‒ Position can be changed with seek API (similar to any file-reader API). Kafka Consumer
  • 13. • Kafka mechanism that allows for automatic distribution of partitions across consumer group members. • Auto-balancing if group members join or die (heartbeat). • Strategy configurable with partition.assignment.strategy property. Consumer groups
  • 14. • Consumers can store their position either in external system, or in Kafka (internal topic __consumer_offsets). • Effectively a triple of group name, partition and offset. • Java client: ‒ commitSync, commitAsync, configuration property enable.auto.commit • Delivery semantics: ‒ at most once – offset committed before it is processed, ‒ at least once – offset committed after it is processed, ‒ exactly once – transaction API (if the processing == writing to the same Kafka cluster); storing offset in external system together with processed data. Consumer offsets
  • 15. • https://kafka.apache.org/31/protocol.html#protocol_api_keys • Smart clients (producers, consumers) negotiate the protocol version ‒ API-versions response contains a map of understood request types • Automatic discovery of cluster members ‒ metadata response contains cluster topology information  what topics are present  how many partitions these topics have  which brokers are leaders and replicas for partitions  brokers’ host and port info Protocol
  • 16.
  • 18. • The host & port of Kafka broker, that the client will send requests to, come from broker’s advertised.listeners property. • As we want our traffic to go through the proxy, Kafka broker needs to advertise the socket it is listening on. • This requires configuration on both ends: ‒ proxy needs to point to Kafka broker, ‒ Kafka broker needs to advertise proxy’s address instead of itself. • This is not Envoy-specific. Kafka advertised.listeners
  • 19. • Naïve proxying makes the broker-to-broker traffic go through the proxy. Naïve proxying
  • 20. • Brokers can be configured to listen to multiple listeners, and we can specify which ones to use for inter-broker traffic. • inter.broker.listener.name • This way, only external traffic is routed through the proxy. Inter-broker traffic
  • 21. Envoy as Kafka proxy
  • 22. TCP proxy filter Envoy being used as proxy for Kafka, without any custom code – only TCP proxy filter. Envoy as TCP proxy for Kafka Use the protocol deserializer to collect connection metrics (number of requests, processing time). Kafka broker filter Allow a consumer to use a single-entry point (Envoy) to consume data from multiple upstream Kafka clusters. Kafka mesh filter (consumer) Change std::vector<unsigned char> into request/response objects. Kafka protocol support Receive and process requests from producers, and send received records to multiple upstream Kafka clusters Kafka mesh filter (producer)
  • 23. • If we want to proxy a Kafka cluster with Envoy, we need to provide as many listeners as there are brokers. • Each of listeners would then use the TCP proxy filter to point to an upstream Kafka broker (which is present in Envoy cluster configuration object). • The filter chain can then be enhanced with other filters. • In general, a 1-1 mapping between a broker and Envoy listener needs to be kept. Proxying Kafka with Envoy
  • 24. Example: single Envoy instance proxying two Kafka brokers ENVOY CONFIG BROKER 1 CONFIG BROKER 2 CONFIG
  • 26. Kafka protocol support in Envoy Envoy being used as proxy for Kafka, without any custom code – only tcp_proxy. Envoy as TCP proxy for Kafka Use the protocol deserializer to collect connection metrics (number of requests, processing time). Kafka broker filter Allow a consumer to use a single-entry point (Envoy) to consume data from multiple upstream Kafka clusters. Kafka mesh filter (consumer) Change std::vector<unsigned char> into request/response objects. Kafka protocol support Receive and process requests from producers, and send received records to multiple upstream Kafka clusters Kafka mesh filter (producer)
  • 27. • Kafka message protocol is described in language-agnostic specification files. • These files are used to generate Java server/client code. • The same files were used to generate corresponding C++ code for Envoy – https://github.com/envoyproxy/envoy/pull/4950 – Python templates that generate headers to be included in the broker/mesh filter code. • https://github.com/apache/kafka/tree/3.1.0/clients/src/main/resource s/common/message Kafka message spec files
  • 28. • Kafka messages have an increasing correlation id (sequence number). • https://kafka.apache.org/31/protocol.html#protocol_messages • This allows us to match a response with its request, as we can keep track when a request with particular id was received. ‒ absl::flat_hash_map<int32_t, MonotonicTime> request_arrivals_ (filter.h) • Requests (version 1+) also contain a client identifier. Request header
  • 29. Kafka broker filter Envoy being used as proxy for Kafka, without any custom code – only tcp_proxy. Envoy as TCP proxy for Kafka Use the protocol deserializer to collect connection metrics (number of requests, processing time). Kafka broker filter Allow a consumer to use a single-entry point (Envoy) to consume data from multiple upstream Kafka clusters. Kafka mesh filter (consumer) Change std::vector<unsigned char> into request/response objects. Kafka protocol support Receive and process requests from producers, and send received records to multiple upstream Kafka clusters Kafka mesh filter (producer)
  • 30. • https://www.envoyproxy.io/docs/envoy/v1.22.0/configuration/listeners/ne twork_filters/kafka_broker_filter • Intermediary filter intended to be used in a filter chain before TCP-filter that sends the traffic to the upstream broker. • As of now, data is sent without any changes. • Captures: ‒ request metrics, ‒ request processing time. • Entry point for future features (e.g. filtering by client identifier). • https://adam-kotwasinski.medium.com/deploying-envoy-and-kafka- 8aa7513ec0a0 Kafka broker filter features
  • 33. Kafka mesh filter (producer) Envoy being used as proxy for Kafka, without any custom code – only tcp_proxy. Envoy as TCP proxy for Kafka Use the protocol deserializer to collect connection metrics (number of requests, processing time). Kafka broker filter Allow a consumer to use a single-entry point (Envoy) to consume data from multiple upstream Kafka clusters. Kafka mesh filter (consumer) Change std::vector<unsigned char> into request/response objects. Kafka protocol support Receive and process requests from producers, and send received records to multiple upstream Kafka clusters Kafka mesh filter (producer) CURRENT STATE
  • 34. • Use Envoy as a facade for multiple Kafka clusters. • Clients are not aware of Kafka clusters; Envoy would perform necessary traffic routing. • Received Kafka requests would be routed to correct clusters depending on filter configuration. Motivation
  • 35. • Terminal filter in Envoy filter chain. • https://www.envoyproxy.io/docs/envoy/v1.22.0/configuration/listener s/network_filters/kafka_mesh_filter • From the client perspective, an Envoy instance acts as a Kafka broker in one-broker cluster. • Upstream connections are performed by embedded librdkafka producer instances. • https://adam-kotwasinski.medium.com/kafka-mesh-filter-in-envoy- a70b3aefcdef Kafka mesh filter
  • 36.
  • 37. 1. Filter instance pretends to be a broker in a single-broker cluster. 2. All partitions requested are hosted by the “Envoy-broker”. 3. When Produce requests are received, the filter extracts the records. 4. Extracted records are forwarded to embedded librdkafka producers pointing at upstream clusters. ‒ Upstream is chosen depending on the forwarding rules. 5. Filter waits for all delivery responses (failures too) before the response can be sent back downstream. Typical flow
  • 38. • API-Versions response ‒ the filter supports only a limited subset of Kafka requests  API-versions – to negotiate the request versions with clients  Metadata – to make clients send all traffic to Envoy  Produce – to receive the records and send them upstream to real Kafka clusters • Metadata response ‒ Broker’s host & port – required configuration properties for a filter instance  Same purpose as broker’s advertised.listeners property ‒ Partition numbers for a topic  Required configuration properties in upstream cluster definition  This data is also used by default partitioner if key is not present  Future improvement: fetch the configuration from upstream cluster API-Versions & Metadata
  • 39. • As we can parse the Kafka messages, we can extract the necessary information and pass it to forwarding logic. • Current implementation uses only topic names to decide which upstream cluster should be used. ‒ First match in the configured prefix list, ‒ No match – exception (closes the connection), ‒ KafkaProducer& getProducerForTopic(const std::string& topic) (upstream_kafka_facade.h). • Single request can contain multiple records that would map to multiple upstream clusters. ‒ Downstream response is sent after all upstreams have finished (or failed). Forwarding policy
  • 40. • We create an instance of Kafka producer (RdKafka::Producer) per internal worker thread (--concurrency) (source) • Custom configuration for each upstream (e.g. acks, buffer size). Embedded producer
  • 42. Kafka mesh filter (consumer) Envoy being used as proxy for Kafka, without any custom code – only tcp_proxy. Envoy as TCP proxy for Kafka Use the protocol deserializer to collect connection metrics (number of requests, processing time). Kafka broker filter Allow a consumer to use a single-entry point (Envoy) to consume data from multiple upstream Kafka clusters. Kafka mesh filter (consumer) Change std::vector<unsigned char> into request/response objects. Kafka protocol support Receive and process requests from producers, and send received records to multiple upstream Kafka clusters Kafka mesh filter (producer) FUTURE
  • 43. Kafka consumer types • Single consumer instance would handle multiple FetchRequests. • Messages would be distributed across multiple connections from downstream. • Similar to Kafka REST proxy and Kafka consumer groups (but without partition assignment). Shared consumer • New consumer for every downstream connection. • Multiple connections could receive the same message. • Consumer group support might be possible (would need to investigate JoinGroup & similar requests). Dedicated consumer
  • 44. Q&A