Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Kafka basics
1.
2. What’s Kafka
• It’s an open-source message broker written in Scala
Java…
• Which aims to provide a unified, high-throughput,
low-latency platform for handling real-time data
feeds.
• Whose design is heavily influenced by transaction
logs.
3. Kafka it’s also…
• A distributed, partitioned, replicated commit log
service.
• A streaming process platform.
• Both queue and publish/subscribe paradigms
5. Kafka concepts
• Maintains feeds of messages in categories called
topics.
• Processes that publish messages to Kafka are
called producers.
• Processes that subscribe to topics and process the
feed of published messages are called consumers.
• Run as a cluster comprised of one or more servers
each of which is called a broker.
6. Data Retention
• Kafka retains all published messages for a
configurable period of time.
• Retaining lots of data is not a problem.
7. Producers and Consumers
Producers send messages over the network to the
Kafka cluster which in turn serves them up to
consumers like this:
8. The Topic
A topic is a category or feed name to which messages are published.
For each topic, the Kafka cluster maintains a partitioned log that looks
like this:
9. The Partition
• Each partition is an ordered, immutable sequence
of messages that is continually appended to.
• The messages in the partitions are each assigned a
sequential number called the offset.
• The offset uniquely identifies each message within
the partition.
11. More on partitions
• Partitions in the log allow it to scale beyond a size
that would fit on a single server.
• A topic may have many partitions.
• Partitions also act as the unit of parallelism.
12. Partitions… again…
• Partitions are distributed over the servers in the
Kafka cluster.
• Each partition is replicated across servers for fault
tolerance.
13. Guess what… Yep,
partitions…
• Each partition has one server which acts as the
“leader".
• Each partition has zero or more servers which act
as “followers".
• If the leader fails, one of the followers will become
the leader.
14. …
• The leader handles all requests for the partition
while the followers replicate the leader.
• Each server/node/broker acts as a leader for some
of its partitions and a follower for others.
16. Producers
• Producers publish data to the topics of their choice.
• The producer is responsible for choosing which
message to assign to which partition within the
topic.
18. Consumers
• Kafka offers a single consumer abstraction called
the consumer group.
• Consumers label themselves with a consumer
group name.
• Each message published to a topic is delivered to
one consumer within each consumer group.
24. Guarantees
• Messages sent by a producer to a particular topic
partition will be appended in the order they are
sent.
• A consumer instance sees messages in the order
they are stored in the log.
• For a topic with replication factor N, Kafka will
tolerate up to N-1 server failures without losing
any messages committed to the log.
25. Zookeeper
• Kafka uses Zookeeper to store metadata about
the Kafka cluster, as well as consumer client
details.
26. AVRO
• AVRO is the preferred serialization format for
Kafka messages.
• It’s independent of platform and/or language.
• Allows schemas to be evolved.
• Schemas are defined in a JSON like format.
28. Schema Registry
• It’s a REST service.
• Allows a AVRO schema to be registered to one or
more topics.
• Stores multiple versions of a schema.
• Validates schemas compatibility.
33. Kafka Streams
• Is a client library for building applications and microservices, where the input
and output data are stored in Kafka clusters.
34. Kafka Streams
• A stream is the most important abstraction provided by Kafka Streams. It
represents an unbounded, continuously updating data set.
• A stream processing application is any program that makes use of the Kafka
Streams library.
• A stream processor is a node in the processor topology.
• There are two special processors in the topology:
• Source Processor: A source processor is a special type of stream
processor that does not have any upstream processors.
• Sink Processor: A sink processor is a special type of stream processor that
does not have down-stream processors.