SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Apache Kafka
A high throughput distributed messaging system
What is Kakfa?
 Kafka is a distributed publish-subscribe messaging system rethought as a distributed
commit log.
 It’s designed to be
 Fast
 Scalable
 Durable
 When used in the right way and for the right use case, Kafka has unique attributes
that make it a highly attractive option for data integration.
Publish subscribe messaging system
 Kafka maintains feeds of messages in categories called topics
 Producers are processes that publish messages to one or more topics
 Consumers are processes that subscribe to topics and process the feed of
published messages
Subscriber
Subscriber
Subscriber
Publisher
Message
Message
Message
Message Topic
Kafka cluster
 Since Kafka is distributed in nature, Kafka is run as a cluster.
 A cluster is typically comprised multiple servers; each of which is called a broker.
 Communication between the clients and the servers takes place over TCP protocol
Kafka cluster
Consumer
Consumer
Consumer
Producer
Producer
Producer
Broker 1
Topic 1 Topic 2
Broker 2
Topic 1 Topic 2
Zookeeper
Deep dive into high level abstractions
Topic
 To balance load, a topic is divided into
multiple partitions and replicated
across brokers.
 Partitions are ordered, immutable
sequences of messages that’s
continually appended i.e. a commit log.
 The messages in the partitions are each
assigned a sequential id number called
the offset that uniquely identifies each
message within the partition.
 Partitions allow a topic’s log to scale beyond a size that will fit on a single server (i.e. a
broker) and act as the unit of parallelism
 The partitions of a topic are distributed over the brokers in the Kafka cluster where each
broker handles data and requests for a share of the partitions.
 For fault tolerance, each partition is replicated across a configurable number of brokers.
Distribution and partitions
Distribution and fault tolerance
 Each partition has one server which acts as the "leader" and zero or more servers
which act as "followers".
 The leader handles all read and write requests for the partition while the followers
passively replicate the leader.
 If the leader fails, one of the followers will automatically become the new leader.
 Each server acts as a leader for some of its partitions and a follower for others so load
is well balanced within the cluster.
Retention
 The Kafka cluster retains all published messages—whether or not they have been
consumed—for a configurable period of time; after which it will be discarded to
free up space.
 Metadata retained on a per-consumer basis is the position of the consumer in the
log, called the offset; which is controlled by consumer.
 Normally a consumer will advance its offset linearly as it reads messages, but it can
consume messages in any order it likes.
 Kafka consumers can come and go without much impact on the cluster or on other
consumers.
Producers
 Producers publish data to the topics by assigning messages to a partition within the
topic either in a round-robin fashion or according to some semantic partition function
(say based on some key in the message).
Consumers
 Kafka offers a single consumer abstraction called consumer group that generalises
both queue and topic.
 Consumers label themselves with a consumer group name.
 Each message published to a topic is delivered to one consumer instance within each
subscribing consumer group.
 If all the consumer instances have the same consumer group, then this works just like
a traditional queue balancing load over the consumers.
 If all the consumer instances have different consumer groups, then this works like
publish-subscribe and all messages are broadcast to all consumers.
Consumer groups
 Topics have a small number of consumer groups, one for each logical subscriber.
 Each group is composed of many consumer instances for scalability and fault tolerance.
Ordering guarantees
 Kafka assigns partitions in a topic to consumers in a consumer group so, each partition is
consumed by exactly one consumer in the group.
 Limitation: there cannot be more consumer instances in a consumer group than partitions.
 Provides a total order over messages within a partition, not between different partitions in
a topic.
Comaprison
Kafka JMS message broker; Rabbit MQ
A fire hose of events arriving at rate of
approximately 100k+/sec
Messages arriving at a rate of 20k+/sec
‘At least once‘ processed as data is read
with an offset within a partition.
Exactly once processed by consumers
Producer-centric. Doesn't have message
acknowledgements as consumers track
messages consumed.
Broker-centric. Uses the broker itself to
maintain state of what's consumed (via
message acknowledgements)
Supports both online and batch
consumers that may be online or offline. It
also supports producer message batching
- it's designed for holding and distributing
large volumes of messages at a very low
latency.
Consumers are mostly online, and any
messages "in wait" (persistent or not) are
held opaquely.
Comaprison
Kafka JMS message broker; Rabbit MQ
Provides a rudimentary routing. It uses
topic for exchanges.
Provides rich routing capabilities with
Advanced Message Queuing Protocol’s
(AMQP) exchange, binding and queuing
model.
Makes distributed cluster explicit, by
forcing the producer to know it is
partitioning a topic's messages across
several nodes.
Makes the distributed cluster transparent,
as if it were a virtual broker
Preserves ordered delivery within a
partition
Almost always unordered delivery. AMQP
model says "one producer channel, one
exchange, one queue, one consumer
channel" is required for in-order delivery
Throttling is un-necessary
 The whole job of Kafka is to provide a "shock absorber" between the flood of
events and those who want to consume them in their own way.
Performance benchmark
 500,000 messages published per second
 22,000 messages consumed per second
 on a 2-node cluster
 with 6-disk RAID 10.
 See research.microsoft.com/en-
us/um/people/srikanth/netdb11/netdb11papers/net
db11-final12.pdf
Key benefits
 Horizontally scalable
 It’s a distributed system can be elastically and transparently expanded with no downtime
 High throughput
 High throughput is provided for both publishing and subscribing, due to disk structures
that provide constant performance even with many terabytes of stored messages
 Reliable delivery
 Persists messages on disk, and provides intra-cluster replication
 Supports large number of subscribers and automatically balances consumers in case of
failure.
Use cases
 Common use cases include
1. Stream processing, Event sourcing or a replacement for a more traditional message
broker
2. Website activity tracking - original use case for Kafka
3. Metrics collection and monitoring - centralized feeds of operational data
4. Log aggregation
Getting practical
Download and extract Kafka
 Download the archive from
kafka.apache.org/downloads.html and
extract it
Kafka uses ZooKeeper for cluster coordination
 Kafka uses ZooKeeper; which enables
highly reliable distributed coordination so,
one needs to first start a ZooKeeper server.
 Kafka bundles a single-node ZooKeeper
instance. Single node zookeeper cluster
does NOT run a leader and a follower.
 Typically exchanged metadata include
 Kafka broker addresses
 Consumed messages offset
Common Challenges faced by distributed
system
 Outages
 Co-ordination of tasks
 Reduction of operational complexity
 Consistency and ordering guarantees
Zookeeper to rescue
Apache Zookeeper: Definition
 Centralised service for
 Maintaining configuration information
 Naming
 Distributed synchronisation and
 providing group services.
Apache Zookeeper: Features
 Distributed consistent data store which favours consistency over everything else.
 High availability - Tolerates a minority of an ensemble members being unavailable and
continues to function correctly.
 In an ensemble of n members where n is an odd number, the loss of (n-1)/2 members can be
tolerated.
 High performance - All the data is stored in memory and benchmarked at 50k ops/sec but
the numbers really depend on your servers and network
 Tuned for read heavy write light work load. Reads are served from the node to which a client a
connected.
 Provides strictly ordered access for data.
 Atomic write guarantees in the order they're sent to zookeeper.
 Writes are acknowledged and changes are also seen in the order they occurred.
Apache Zookeeper: Operation basics
 A cluster is an ensemble with a leaders and several followers
 Read requests are serviced from each server using its local replica but write request are forwarded to a
leader. When the leader receives a write request, it calculates what the state of the system is when the
write is to be applied and transforms this into a transaction that captures this new state.
 When zookeeper starts, it goes through a loading algorithm where by one node of the cluster is
elected to act as the leader. At any given point in time, only one node acts as a leader.
Apache Zookeeper: Operation basics
 Clients create a state full session (i.e. with heartbeats through an open socket) when they
connect to a node of an ensemble. Number of open sockets available on a zookeeper node will
limit the number of clients that connect to it.
 When a cluster member dies, clients notice a disconnect event and thus reconnect themselves
to another member of the quorum.
 Session (i.e. state of the client connected to node of an ensemble) stay alive when the client
goes down, as the session events go through the leader and gets replicated in a cluster onto
another node.
 When the leader goes down, remaining members of the cluster will re-elect a new leader using
a atomic broadcast consensus algorithm. Cluster remains unavailable only when it re-elects a
new leader.
1. Start ZooKeeper
2. Set-up a cluster with 3 brokers
2. Adjust broker
configuration
files
3. Start
kafka server
1
3. Start
kafka server
2
3. Start
kafka server
3
3. Servers 1, 2 and 3 are started and running
4. Create a kakfa topic, list topics and describe one
5. Start a producer and publish some messages
6. Start a consumer and process messages
Log directories for each of the broker instances

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewDmitry Tolpeko
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaAkash Vacher
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overviewiamtodor
 

Was ist angesagt? (20)

Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
kafka
kafkakafka
kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 

Andere mochten auch

Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystemconfluent
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafkaconfluent
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...confluent
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Chen-en Lu
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesTodd Palino
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJim Plush
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to GitHub
Introduction to GitHubIntroduction to GitHub
Introduction to GitHubNishan Bose
 
ACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hiveACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hivePadma shree. T
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoopGeoff Hendrey
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaAndrew Schofield
 
Event Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuEvent Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuHeroku
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 

Andere mochten auch (20)

Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to GitHub
Introduction to GitHubIntroduction to GitHub
Introduction to GitHub
 
ACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hiveACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hive
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
 
Event Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuEvent Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on Heroku
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 

Ähnlich wie Apache kafka

Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperAnandMHadoop
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptxKoiuyt1
 
Kafka Fundamentals
Kafka FundamentalsKafka Fundamentals
Kafka FundamentalsKetan Keshri
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsRavindra kumar
 
Kafka pub sub demo
Kafka pub sub demoKafka pub sub demo
Kafka pub sub demoSrish Kumar
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps_Fest
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationKnoldus Inc.
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet viewyounessx01
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdfTarekHamdi8
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introductionSyed Hadoop
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenDimosthenis Botsaris
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Drivenarconsis
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 

Ähnlich wie Apache kafka (20)

Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptx
 
Kafka Fundamentals
Kafka FundamentalsKafka Fundamentals
Kafka Fundamentals
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Kafka pub sub demo
Kafka pub sub demoKafka pub sub demo
Kafka pub sub demo
 
Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet view
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
 

Mehr von Viswanath J

Introduction to Consul
Introduction to ConsulIntroduction to Consul
Introduction to ConsulViswanath J
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Viswanath J
 
Introduction to NOSQL quadrants
Introduction to NOSQL quadrantsIntroduction to NOSQL quadrants
Introduction to NOSQL quadrantsViswanath J
 
Improving effectiveness of a meeting
Improving effectiveness of a meetingImproving effectiveness of a meeting
Improving effectiveness of a meetingViswanath J
 
Inside the Android application framework - Google I/O 2009
Inside the Android application framework - Google I/O 2009Inside the Android application framework - Google I/O 2009
Inside the Android application framework - Google I/O 2009Viswanath J
 
Android : How Do I Code Thee?
Android : How Do I Code Thee?Android : How Do I Code Thee?
Android : How Do I Code Thee?Viswanath J
 
The anatomy and philosophy of Android - Google I/O 2009
The anatomy and philosophy of Android - Google I/O 2009The anatomy and philosophy of Android - Google I/O 2009
The anatomy and philosophy of Android - Google I/O 2009Viswanath J
 
Introduction To Docbook 4 .5 Authoring
Introduction To Docbook 4 .5   AuthoringIntroduction To Docbook 4 .5   Authoring
Introduction To Docbook 4 .5 AuthoringViswanath J
 

Mehr von Viswanath J (8)

Introduction to Consul
Introduction to ConsulIntroduction to Consul
Introduction to Consul
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1
 
Introduction to NOSQL quadrants
Introduction to NOSQL quadrantsIntroduction to NOSQL quadrants
Introduction to NOSQL quadrants
 
Improving effectiveness of a meeting
Improving effectiveness of a meetingImproving effectiveness of a meeting
Improving effectiveness of a meeting
 
Inside the Android application framework - Google I/O 2009
Inside the Android application framework - Google I/O 2009Inside the Android application framework - Google I/O 2009
Inside the Android application framework - Google I/O 2009
 
Android : How Do I Code Thee?
Android : How Do I Code Thee?Android : How Do I Code Thee?
Android : How Do I Code Thee?
 
The anatomy and philosophy of Android - Google I/O 2009
The anatomy and philosophy of Android - Google I/O 2009The anatomy and philosophy of Android - Google I/O 2009
The anatomy and philosophy of Android - Google I/O 2009
 
Introduction To Docbook 4 .5 Authoring
Introduction To Docbook 4 .5   AuthoringIntroduction To Docbook 4 .5   Authoring
Introduction To Docbook 4 .5 Authoring
 

Kürzlich hochgeladen

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 

Apache kafka

  • 1. Apache Kafka A high throughput distributed messaging system
  • 2. What is Kakfa?  Kafka is a distributed publish-subscribe messaging system rethought as a distributed commit log.  It’s designed to be  Fast  Scalable  Durable  When used in the right way and for the right use case, Kafka has unique attributes that make it a highly attractive option for data integration.
  • 3. Publish subscribe messaging system  Kafka maintains feeds of messages in categories called topics  Producers are processes that publish messages to one or more topics  Consumers are processes that subscribe to topics and process the feed of published messages Subscriber Subscriber Subscriber Publisher Message Message Message Message Topic
  • 4. Kafka cluster  Since Kafka is distributed in nature, Kafka is run as a cluster.  A cluster is typically comprised multiple servers; each of which is called a broker.  Communication between the clients and the servers takes place over TCP protocol Kafka cluster Consumer Consumer Consumer Producer Producer Producer Broker 1 Topic 1 Topic 2 Broker 2 Topic 1 Topic 2 Zookeeper
  • 5. Deep dive into high level abstractions
  • 6. Topic  To balance load, a topic is divided into multiple partitions and replicated across brokers.  Partitions are ordered, immutable sequences of messages that’s continually appended i.e. a commit log.  The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition.
  • 7.  Partitions allow a topic’s log to scale beyond a size that will fit on a single server (i.e. a broker) and act as the unit of parallelism  The partitions of a topic are distributed over the brokers in the Kafka cluster where each broker handles data and requests for a share of the partitions.  For fault tolerance, each partition is replicated across a configurable number of brokers. Distribution and partitions
  • 8. Distribution and fault tolerance  Each partition has one server which acts as the "leader" and zero or more servers which act as "followers".  The leader handles all read and write requests for the partition while the followers passively replicate the leader.  If the leader fails, one of the followers will automatically become the new leader.  Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.
  • 9. Retention  The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time; after which it will be discarded to free up space.  Metadata retained on a per-consumer basis is the position of the consumer in the log, called the offset; which is controlled by consumer.  Normally a consumer will advance its offset linearly as it reads messages, but it can consume messages in any order it likes.  Kafka consumers can come and go without much impact on the cluster or on other consumers.
  • 10. Producers  Producers publish data to the topics by assigning messages to a partition within the topic either in a round-robin fashion or according to some semantic partition function (say based on some key in the message).
  • 11. Consumers  Kafka offers a single consumer abstraction called consumer group that generalises both queue and topic.  Consumers label themselves with a consumer group name.  Each message published to a topic is delivered to one consumer instance within each subscribing consumer group.  If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers.  If all the consumer instances have different consumer groups, then this works like publish-subscribe and all messages are broadcast to all consumers.
  • 12. Consumer groups  Topics have a small number of consumer groups, one for each logical subscriber.  Each group is composed of many consumer instances for scalability and fault tolerance.
  • 13. Ordering guarantees  Kafka assigns partitions in a topic to consumers in a consumer group so, each partition is consumed by exactly one consumer in the group.  Limitation: there cannot be more consumer instances in a consumer group than partitions.  Provides a total order over messages within a partition, not between different partitions in a topic.
  • 14. Comaprison Kafka JMS message broker; Rabbit MQ A fire hose of events arriving at rate of approximately 100k+/sec Messages arriving at a rate of 20k+/sec ‘At least once‘ processed as data is read with an offset within a partition. Exactly once processed by consumers Producer-centric. Doesn't have message acknowledgements as consumers track messages consumed. Broker-centric. Uses the broker itself to maintain state of what's consumed (via message acknowledgements) Supports both online and batch consumers that may be online or offline. It also supports producer message batching - it's designed for holding and distributing large volumes of messages at a very low latency. Consumers are mostly online, and any messages "in wait" (persistent or not) are held opaquely.
  • 15. Comaprison Kafka JMS message broker; Rabbit MQ Provides a rudimentary routing. It uses topic for exchanges. Provides rich routing capabilities with Advanced Message Queuing Protocol’s (AMQP) exchange, binding and queuing model. Makes distributed cluster explicit, by forcing the producer to know it is partitioning a topic's messages across several nodes. Makes the distributed cluster transparent, as if it were a virtual broker Preserves ordered delivery within a partition Almost always unordered delivery. AMQP model says "one producer channel, one exchange, one queue, one consumer channel" is required for in-order delivery
  • 16. Throttling is un-necessary  The whole job of Kafka is to provide a "shock absorber" between the flood of events and those who want to consume them in their own way.
  • 17. Performance benchmark  500,000 messages published per second  22,000 messages consumed per second  on a 2-node cluster  with 6-disk RAID 10.  See research.microsoft.com/en- us/um/people/srikanth/netdb11/netdb11papers/net db11-final12.pdf
  • 18. Key benefits  Horizontally scalable  It’s a distributed system can be elastically and transparently expanded with no downtime  High throughput  High throughput is provided for both publishing and subscribing, due to disk structures that provide constant performance even with many terabytes of stored messages  Reliable delivery  Persists messages on disk, and provides intra-cluster replication  Supports large number of subscribers and automatically balances consumers in case of failure.
  • 19. Use cases  Common use cases include 1. Stream processing, Event sourcing or a replacement for a more traditional message broker 2. Website activity tracking - original use case for Kafka 3. Metrics collection and monitoring - centralized feeds of operational data 4. Log aggregation
  • 21. Download and extract Kafka  Download the archive from kafka.apache.org/downloads.html and extract it
  • 22. Kafka uses ZooKeeper for cluster coordination  Kafka uses ZooKeeper; which enables highly reliable distributed coordination so, one needs to first start a ZooKeeper server.  Kafka bundles a single-node ZooKeeper instance. Single node zookeeper cluster does NOT run a leader and a follower.  Typically exchanged metadata include  Kafka broker addresses  Consumed messages offset
  • 23. Common Challenges faced by distributed system  Outages  Co-ordination of tasks  Reduction of operational complexity  Consistency and ordering guarantees Zookeeper to rescue
  • 24. Apache Zookeeper: Definition  Centralised service for  Maintaining configuration information  Naming  Distributed synchronisation and  providing group services.
  • 25. Apache Zookeeper: Features  Distributed consistent data store which favours consistency over everything else.  High availability - Tolerates a minority of an ensemble members being unavailable and continues to function correctly.  In an ensemble of n members where n is an odd number, the loss of (n-1)/2 members can be tolerated.  High performance - All the data is stored in memory and benchmarked at 50k ops/sec but the numbers really depend on your servers and network  Tuned for read heavy write light work load. Reads are served from the node to which a client a connected.  Provides strictly ordered access for data.  Atomic write guarantees in the order they're sent to zookeeper.  Writes are acknowledged and changes are also seen in the order they occurred.
  • 26. Apache Zookeeper: Operation basics  A cluster is an ensemble with a leaders and several followers  Read requests are serviced from each server using its local replica but write request are forwarded to a leader. When the leader receives a write request, it calculates what the state of the system is when the write is to be applied and transforms this into a transaction that captures this new state.  When zookeeper starts, it goes through a loading algorithm where by one node of the cluster is elected to act as the leader. At any given point in time, only one node acts as a leader.
  • 27. Apache Zookeeper: Operation basics  Clients create a state full session (i.e. with heartbeats through an open socket) when they connect to a node of an ensemble. Number of open sockets available on a zookeeper node will limit the number of clients that connect to it.  When a cluster member dies, clients notice a disconnect event and thus reconnect themselves to another member of the quorum.  Session (i.e. state of the client connected to node of an ensemble) stay alive when the client goes down, as the session events go through the leader and gets replicated in a cluster onto another node.  When the leader goes down, remaining members of the cluster will re-elect a new leader using a atomic broadcast consensus algorithm. Cluster remains unavailable only when it re-elects a new leader.
  • 29. 2. Set-up a cluster with 3 brokers
  • 34. 3. Servers 1, 2 and 3 are started and running
  • 35. 4. Create a kakfa topic, list topics and describe one 5. Start a producer and publish some messages 6. Start a consumer and process messages
  • 36. Log directories for each of the broker instances

Hinweis der Redaktion

  1. Each individual partition must fit on the servers that host it. However, a topic may have many partitions so it can handle an arbitrary amount of data.
  2. For example if the log retention is set to two days, then for the two days after a message is published it is available for consumption, after which it will be discarded to free up space.
  3. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing.