SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Downloaden Sie, um offline zu lesen
Traditional Messaging
Traditional Messaging
● Java Messaging Service (JMS)
Traditional Messaging
● Java Messaging Service (JMS)
● Advanced Messaging Queuing Protocol (AMQP)
Traditional Messaging
● Java Messaging Service (JMS)
● Advanced Messaging Queuing Protocol (AMQP)
● Message Queuing Telemetry Transport (MQTT)
Traditional Messaging
● Java Messaging Service (JMS)
○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ
● Advanced Messaging Queuing Protocol (AMQP)
● Message Queuing Telemetry Transport (MQTT)
Traditional Messaging
● Java Messaging Service (JMS)
○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ
● Advanced Messaging Queuing Protocol (AMQP)
○ Rabbit MQ
● Message Queuing Telemetry Transport (MQTT)
Traditional Messaging
● Java Messaging Service (JMS)
○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ
● Advanced Messaging Queuing Protocol (AMQP)
○ Rabbit MQ
● Message Queuing Telemetry Transport (MQTT)
○ Hive MQ
A very famous Qns
https://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ
“Performance-wise, both are excellent
performers, but have major architectural
differences.”
--from the quora qns discussion
What’s the Diff ?
What’s the Diff ?
What’s the Diff ?
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber ● Subscriber pulls/picks up msg from server
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Server can do complex routing logic.
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
● Client maintains routing logic. Server is blind to it.
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Server can do complex routing logic.
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
● Client maintains routing logic. Server is blind to it.
● Also. Subscriber stores state i.e. which msg’s it picked up
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Server can do complex routing logic.
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
● Client maintains routing logic. Server is blind to it.
● Also. Subscriber stores state i.e. which msg’s it picked up
So Apache Kafka...
Apache Kafka
Notions:
Apache Kafka
Notions:
● Publisher
Apache Kafka
Notions:
● Publisher
● Message
Apache Kafka
Notions:
● Publisher
● Message
● Topic
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
● Subscriber/Consumer
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
● Subscriber/Consumer
● Message Offset
Summary
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
● Broker receives message & appends message to end of topic partition.
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
● Broker receives message & appends message to end of topic partition.
● Subscriber requests broker for msg at specific offset in a Topic Partition.
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
● Broker receives message & appends message to end of topic partition.
● Subscriber requests broker for msg at specific offset in a Topic Partition.
● Upto Subscriber to remember which msg offset it has processed.
A lovely use case - REPLAY
A lovely use case - REPLAY
● Since Subscriber requests for a message at an offset in a topic partition, the
subscriber is free to REPLAY the processing at any point in time.
A lovely use case - REPLAY
● Since Subscriber requests for a message at an offset in a topic partition, the
subscriber is free to REPLAY the processing at any point in time.
● Handy when outages occur.
Hence
Things to Ponder about
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
○ While creating a topic, please set it. You can edit later on.
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
○ While creating a topic, please set it. You can edit later on.
● Think about producer Partitioning key ...
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
○ While creating a topic, please set it. You can edit later on.
● Think about producer Partitioning key …
In dataspark
●
Others ...
● Amazon Kinesis is similar to Kafka ….
● You have Redis - PubSub (different guarantees, not similar to
kafka)
What i did not cover ? :)
● Kafka Replication mechanism
○ ISR = in sync replica set
● Tools like Kafka mirror
● Zookeeper interaction (yes kafka depends on zookeeper)
What’s new in kafka ?
● Kafka stream api
● Kafka Sql
● See release notes … :)
producer.send(“ Any Questions ? Thanks ”)

Weitere ähnliche Inhalte

Ähnlich wie Introduction to Apache Kafka

Towards Improved Data Dissemination of Publish-Subscribe Systems
Towards Improved Data Dissemination of Publish-Subscribe SystemsTowards Improved Data Dissemination of Publish-Subscribe Systems
Towards Improved Data Dissemination of Publish-Subscribe Systems
Srinath Perera
 
Enterprise Messaging with RabbitMQ.pdf
Enterprise Messaging with RabbitMQ.pdfEnterprise Messaging with RabbitMQ.pdf
Enterprise Messaging with RabbitMQ.pdf
Ortus Solutions, Corp
 

Ähnlich wie Introduction to Apache Kafka (20)

Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Reducing load with RabbitMQ
Reducing load with RabbitMQReducing load with RabbitMQ
Reducing load with RabbitMQ
 
Messaging
MessagingMessaging
Messaging
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Towards Improved Data Dissemination of Publish-Subscribe Systems
Towards Improved Data Dissemination of Publish-Subscribe SystemsTowards Improved Data Dissemination of Publish-Subscribe Systems
Towards Improved Data Dissemination of Publish-Subscribe Systems
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
SOA Pattern-Asynchronous Queuing
SOA Pattern-Asynchronous QueuingSOA Pattern-Asynchronous Queuing
SOA Pattern-Asynchronous Queuing
 
Scaling event aggregation at twitter
Scaling event aggregation at twitterScaling event aggregation at twitter
Scaling event aggregation at twitter
 
Patna_Meetup_MQ
Patna_Meetup_MQPatna_Meetup_MQ
Patna_Meetup_MQ
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Enterprise Messaging with RabbitMQ.pdf
Enterprise Messaging with RabbitMQ.pdfEnterprise Messaging with RabbitMQ.pdf
Enterprise Messaging with RabbitMQ.pdf
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Building chat bot
Building chat botBuilding chat bot
Building chat bot
 
What we've learned from running thousands of production RabbitMQ clusters - L...
What we've learned from running thousands of production RabbitMQ clusters - L...What we've learned from running thousands of production RabbitMQ clusters - L...
What we've learned from running thousands of production RabbitMQ clusters - L...
 
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
 
HKG15-901: Upstreaming 101
HKG15-901: Upstreaming 101HKG15-901: Upstreaming 101
HKG15-901: Upstreaming 101
 
MuleSoft Surat Virtual Meetup#33 - Unleash the power of Anypoint MQ and DLQ
MuleSoft Surat Virtual Meetup#33 - Unleash the power of Anypoint MQ and DLQ MuleSoft Surat Virtual Meetup#33 - Unleash the power of Anypoint MQ and DLQ
MuleSoft Surat Virtual Meetup#33 - Unleash the power of Anypoint MQ and DLQ
 
Working with Asynchronous Events
Working with Asynchronous EventsWorking with Asynchronous Events
Working with Asynchronous Events
 

Mehr von vishnu rao

Mehr von vishnu rao (10)

A talk on mysql & aurora
A talk on mysql & auroraA talk on mysql & aurora
A talk on mysql & aurora
 
Mysql Relay log - the unsung hero
Mysql Relay log - the unsung heroMysql Relay log - the unsung hero
Mysql Relay log - the unsung hero
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
 
Druid beginner performance tips
Druid beginner performance tipsDruid beginner performance tips
Druid beginner performance tips
 
Demystifying datastores
Demystifying datastoresDemystifying datastores
Demystifying datastores
 
Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker
 
StormWars - when the data stream shrinks
StormWars - when the data stream shrinksStormWars - when the data stream shrinks
StormWars - when the data stream shrinks
 
Punch clock for debugging apache storm
Punch clock for  debugging apache stormPunch clock for  debugging apache storm
Punch clock for debugging apache storm
 
a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?
 
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
 

Kürzlich hochgeladen

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Kürzlich hochgeladen (20)

Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 

Introduction to Apache Kafka

  • 1.
  • 3. Traditional Messaging ● Java Messaging Service (JMS)
  • 4. Traditional Messaging ● Java Messaging Service (JMS) ● Advanced Messaging Queuing Protocol (AMQP)
  • 5. Traditional Messaging ● Java Messaging Service (JMS) ● Advanced Messaging Queuing Protocol (AMQP) ● Message Queuing Telemetry Transport (MQTT)
  • 6. Traditional Messaging ● Java Messaging Service (JMS) ○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ ● Advanced Messaging Queuing Protocol (AMQP) ● Message Queuing Telemetry Transport (MQTT)
  • 7. Traditional Messaging ● Java Messaging Service (JMS) ○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ ● Advanced Messaging Queuing Protocol (AMQP) ○ Rabbit MQ ● Message Queuing Telemetry Transport (MQTT)
  • 8. Traditional Messaging ● Java Messaging Service (JMS) ○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ ● Advanced Messaging Queuing Protocol (AMQP) ○ Rabbit MQ ● Message Queuing Telemetry Transport (MQTT) ○ Hive MQ
  • 9. A very famous Qns https://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ
  • 10. “Performance-wise, both are excellent performers, but have major architectural differences.” --from the quora qns discussion
  • 14. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Subscriber pulls/picks up msg from server
  • 15. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg
  • 16. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not
  • 17. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format
  • 18. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model
  • 19. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Server can do complex routing logic. ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model ● Client maintains routing logic. Server is blind to it.
  • 20. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Server can do complex routing logic. ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model ● Client maintains routing logic. Server is blind to it. ● Also. Subscriber stores state i.e. which msg’s it picked up
  • 21. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Server can do complex routing logic. ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model ● Client maintains routing logic. Server is blind to it. ● Also. Subscriber stores state i.e. which msg’s it picked up
  • 27. Apache Kafka Notions: ● Publisher ● Message ● Topic ○ Topic Partition
  • 28. Apache Kafka Notions: ● Publisher ● Message ● Topic ○ Topic Partition ● Broker
  • 29. Apache Kafka Notions: ● Publisher ● Message ● Topic ○ Topic Partition ● Broker ● Subscriber/Consumer
  • 30. Apache Kafka Notions: ● Publisher ● Message ● Topic ○ Topic Partition ● Broker ● Subscriber/Consumer ● Message Offset
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 42. Summary ● Publisher chooses a topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
  • 43. Summary ● Publisher chooses a topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key) ● Broker receives message & appends message to end of topic partition.
  • 44. Summary ● Publisher chooses a topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key) ● Broker receives message & appends message to end of topic partition. ● Subscriber requests broker for msg at specific offset in a Topic Partition.
  • 45. Summary ● Publisher chooses a topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key) ● Broker receives message & appends message to end of topic partition. ● Subscriber requests broker for msg at specific offset in a Topic Partition. ● Upto Subscriber to remember which msg offset it has processed.
  • 46. A lovely use case - REPLAY
  • 47. A lovely use case - REPLAY ● Since Subscriber requests for a message at an offset in a topic partition, the subscriber is free to REPLAY the processing at any point in time.
  • 48. A lovely use case - REPLAY ● Since Subscriber requests for a message at an offset in a topic partition, the subscriber is free to REPLAY the processing at any point in time. ● Handy when outages occur.
  • 49.
  • 50. Hence
  • 52. Things to Ponder about ● How do i achieve high Read/Write Throughput ?
  • 53. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput
  • 54. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ?
  • 55. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes
  • 56. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ?
  • 57. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition
  • 58. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ?
  • 59. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition.
  • 60. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ?
  • 61. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ? ○ While creating a topic, please set it. You can edit later on.
  • 62. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ? ○ While creating a topic, please set it. You can edit later on. ● Think about producer Partitioning key ...
  • 63. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ? ○ While creating a topic, please set it. You can edit later on. ● Think about producer Partitioning key …
  • 65. Others ... ● Amazon Kinesis is similar to Kafka …. ● You have Redis - PubSub (different guarantees, not similar to kafka)
  • 66. What i did not cover ? :) ● Kafka Replication mechanism ○ ISR = in sync replica set ● Tools like Kafka mirror ● Zookeeper interaction (yes kafka depends on zookeeper)
  • 67. What’s new in kafka ? ● Kafka stream api ● Kafka Sql ● See release notes … :)