SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Agenda
1. What is kafka ?
2. Why Kafka ?
3. Kafka Use Cases
4. Who use Kafka ?
5. Why is Kafka so fast ?
6. Kafka Core Concept (Theory)
7. Kafka CLI 101
What is Kafka ?
At the beginning …
“ ... a publish/subscribe
messaging system...”
What is Kafka ?
... today …
“ ... a stream data platform ...”
What is Kafka ?
... but at the core …
“ ... a distributed, horizontally-
scalable, fault-tolerant ...”
What is Kafka ?
● Developed at Linkedin back in 2010, open source in 2011
● Designed to be fast, scalable, durable and available
● Used to decouple of data stream and system
● Distributed by nature (cluster)
● Resilient architecture
● Fault tolerant
● High throughput / low latency
● Ability to handle huge number of consumers
Why Kafka ?
● Great performance (low latency < 10 ms)
● Horizontal scalable (can add more node on cluster)
● Fault tolerant storage
○ Replicates topic log partitions to multiple server
● Stable, Reliable, Durability
● Robust Replication (no data lost)
Application data flow
Without using kafka
Application data flow
Kafka as a central hub
Kafka use cases
● Messaging
○ As the “traditional” messaging system
● Website activity tracking
○ Event like page views, search
● Metrics collection and Monitoring
○ Alerting and reporting on operational metrics
● Log aggregation
○ Collect logs from multiple service
● Stream processing
○ Read, process and write stream for real-time analysis
Who use Kafka?
● LinkedIn use kafka to monitoring activity data and operational metrics
● Uber uses Kafka to gather user, taxi and trip data in real-time to compute and
forecast surge pricing in real-time
● Netflix uses kafka to apply recommendation in real-time while you ‘re
watching TV-Show
Why kafka so fast ?
● Zero Copy - calls the OS kernel direct rather to move data fast
● Batch Data in Chunks - Batches data into chunks
○ End to end from Producer to file system to Consumer with minimises cross machine latency
○ Provides More efficient data compression. Reduces I/O latency
● Sequential Disk Write - Avoids Random Disk Access
○ Writes to immutable commit log. No slow disk seeking. No random I/O Operations.
○ Disk accessed in sequential manner
● Horizontal Scale - uses 100s to thousands of partitions for single topic
○ Spread out to thousands of servers
○ Handle massive load
Kafka Core Concept - Kafka ecosystem
Kafka Core Concept - Topic and Partitions
Topic :-
● Similar to table in database (but no reference id)
● Each topic identified by name (Unique Key)
Partitions :-
● Topic split into partitions, and each partition is ordered
● Each message in partition is assigned a sequential id called an offset
○ Start from zero and increase to 1,2,3,... and so on
● Ordering in only guaranteed within partition for a topic
● Once the data is written to partition, it cannot be changed (Immutability)
● Data is retained for a configurable period of time (default is 7 days)
Kafka Core Concept - Topic and Partitions
For example, 1 topic 3 partitions
Partition 0
Partition 1
Partition 2
old new
0 1 2 3 4 5
0 1 2 3
0 1 2 3 4 5 6 7
write
Topic A
Kafka Core Concept - Kafka Brokers
Kafka Brokers :-
● Broker is a Kafka server which is contain partition of topic
● Each Broker has an ID (number)
● Kafka Cluster is composed of multiple Brokers (servers)
● Topic consist of partition that can spread to multiple nodes on cluster
● Connecting to one broker bootstraps client to entire cluster (bootstrap server)
● Start with at least 3 brokers, cluster can have 10, 100, 1000 brokers of
needed
Kafka Core Concept - Kafka Brokers
Broker 1
Topic 1
Partition 0
Topic 2
Partition 0
Broker 2
Topic 1
Partition 1
Topic 2
Partition 1
Broker 3
Topic 1
Partition 2
For example, 3 brokers, 2 topic
Kafka Cluster
Kafka Core Concept - Kafka Cluster
Broker 1
Topic 1
Partition 0
Topic 2
Partition 0
Broker 2
Topic 1
Partition 1
Topic 2
Partition 1
Broker 3
Topic 1
Partition 2
Kafka Core Concept - Kafka Replication
Kafka replication factor, Failover, ISR
● Kafka replicated Topic Partitions
○ Across multiple nodes in cluster for failover
● For Topic with replication factor N, Kafka can tolerate upto N-1 server failures
without losing data
○ For example, you have 3 broker
Then replication factor is 3 - 1 = 2
■ This mean if there are 2 broker server alive
then so your data not be lost
■ this determines on how many brokers a partition will be replicated
Kafka Cluster
Kafka Core Concept - Kafka Replication
Broker 1
Topic A
Partition 0
Broker 2
Topic A
Partition 1
Topic A
Partition 0
Broker 3
Topic A
Partition 1
For example, Topic A with 2 partition , replication factor of 2
replicated
replicated
Kafka Cluster
Kafka Core Concept - Kafka Replication
Broker 1
Topic A
Partition 0
Broker 2
Topic A
Partition 1
Topic A
Partition 0
Broker 3
Topic A
Partition 1
For example, Topic A with 2 partition , replication factor of 2
Kafka Core Concept - Leader for Partition
Leader for Partition
● Each partition in topic has 1 leader and 0 or more replicas
● At any time only one broker can be leader for given partition
● Only leader can receive and serve data for partition
● The other broker will synchronize the data (follower)
○ The group of in-sync replicas for partition is called ISR (in-sync replicas)
● Therefore each partition is going have one leader and multiple ISR
● Kafka replication is for Failover
○ If one broker goes down then another broker (with ISR) can serve data
Kafka Core Concept - Leader in Partition
Topic A *
Partition 0
(Leader)
Broker 1
Topic A *
Partition 1
(Leader)
Broker 2
Topic A *
Partition 1
(ISR)
Broker 3
Topic A
Partition 0
(ISR)
replication
replication
Kafka Core Concept - Failover and ISR
Topic 1
Topic 1
Topic 1
Kafka Core Concept - Failover and ISR
Topic 1
Topic 1
Topic 1
Kafka Core Concept - Failover and ISRs
Topic 1
Topic 1
Kafka Core Concept - Producers
Producers
● Producer write data to a topics (which is made partition)
● The load is balanced to many brokers
0 1 2 3 4 5
0 1 2 3
0 1 2 3 4 5 6 7
producer
Broker 1
Topic A,Partition 0
Broker 2
Topic A,Partition 1
Broker 3
Topic A,Partition 1
writes
writes
writes
Send data
Kafka Core Concept - Producers
Durable Write
● Durability can be configured with the producer configuration
○ acks=0 : The producer never waits for an ack (possible data lost)
○ acks=1 : The producer gets an ack after the leader has receive the data (limited data lost)
○ acks=all : The producer gets an ack after all ISRs receive the data (no data lost)
● Producer can trade off between throughput or durability of writes
Kafka Core Concept - Consumer
Consumer
● Consumer read data from topic
● Data is read in order within each partitions
● Message stay on kafka … they are not remove after they are consumed
Read in order
a, e, i ,k
c, g
b, d, f, h, j, l, m, n
Kafka Core Concept - Consumer Groups
Consumer Groups
● Consumers can be organised into
Consumer Groups
● If you have more consumers
than partition,
some consumer will be inactive
Kafka Core Concept - Consumer Offsets
Consumer Offsets
● Kafka topic store the offset at which a consumer group has been reading
(Check pointing / bookmarking)
● The offset committed stored in kafka topic named “__consumer_offsets”
● When a consumer in a group has processed data received from kafka, it
should be committing the offsets (though “__consumer_offsets”)
● If a consumer dies, it will be able to read back from where it left
Kafka Core Concept - Message Delivery Semantics
Delivery Semantics for consumer
● Consumer choose when to commit offsets
● There are 3 delivery semantics
○ At most once:
■ Read message, commit offset, process message
■ Messages may be lost but are never redelivered
○ At least once: (usually preferred)
■ Read message, process message, commit offset
■ Messages are never lost but may be redelivered
○ Exactly once:
■ each message is delivered once and only once
Kafka Core Concept - Zookeeper
Zookeeper
● Manage Broker (keep a list of them)
● Zookeeper help with leadership election of Kafka Broker and
Topic Partition paris
● Zookeeper manages service discovery for Kafka Brokers that
form the cluster
● Zookeeper sends notification to Kafka in case of changes
○ New Broker join,
○ Broker died,
○ Topic removed,
○ Topic added, etc
● Kafka cannot work without Zookeeper
Kafka CLI - 101
● Kafka Topics CLI
● Kafka Console Producer CLI
● Kafka Console Consumer CLI
● Kafka Consumer Groups CLI
● Resetting Offsets
● CLI Options that are good to know
● Let’s play kafka in action
Reference
● https://www.slideshare.net/paolopat/meet-apache-kafka-data-streaming-in-
your-hands?from_action=save
● https://www.slideshare.net/JeanPaulAzar1/kafka-tutorial-introduction-to-
apache-kafka-part-1?from_action=save
● http://searene.me/2017/07/09/Why-is-Kafka-so-fast/
● https://bravenewgeek.com/building-a-distributed-log-from-scratch-part-2-data-
replication/
● https://medium.com/linedevth/apache-kafka-
%E0%B8%89%E0%B8%9A%E0%B8%B1%E0%B8%9A%E0%B8%9C%E0
%B8%B9%E0%B9%89%E0%B9%80%E0%B8%A3%E0%B8%B4%E0%B9%
88%E0%B8%A1%E0%B8%95%E0%B9%89%E0%B8%99-1-hello-apache-
kafka-242788d4f3c6

Weitere ähnliche Inhalte

Was ist angesagt?

Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaAkash Vacher
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - KafkaMayank Bansal
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 

Was ist angesagt? (20)

Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 

Ähnlich wie Apache Kafka

Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaAvinash Ramineni
 
Kafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - PaytmKafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - PaytmSumit Jain
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka TLV
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveHostedbyConfluent
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free FridayOtávio Carvalho
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafkadatamantra
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar AppsTimothy Spann
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introductionSyed Hadoop
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveUlf Wendel
 

Ähnlich wie Apache Kafka (20)

Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Kafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - PaytmKafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - Paytm
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
kafka
kafkakafka
kafka
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Kafka overview v0.1
Kafka overview v0.1Kafka overview v0.1
Kafka overview v0.1
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Apache Kafka

  • 1.
  • 2. Agenda 1. What is kafka ? 2. Why Kafka ? 3. Kafka Use Cases 4. Who use Kafka ? 5. Why is Kafka so fast ? 6. Kafka Core Concept (Theory) 7. Kafka CLI 101
  • 3. What is Kafka ? At the beginning … “ ... a publish/subscribe messaging system...”
  • 4. What is Kafka ? ... today … “ ... a stream data platform ...”
  • 5. What is Kafka ? ... but at the core … “ ... a distributed, horizontally- scalable, fault-tolerant ...”
  • 6. What is Kafka ? ● Developed at Linkedin back in 2010, open source in 2011 ● Designed to be fast, scalable, durable and available ● Used to decouple of data stream and system ● Distributed by nature (cluster) ● Resilient architecture ● Fault tolerant ● High throughput / low latency ● Ability to handle huge number of consumers
  • 7. Why Kafka ? ● Great performance (low latency < 10 ms) ● Horizontal scalable (can add more node on cluster) ● Fault tolerant storage ○ Replicates topic log partitions to multiple server ● Stable, Reliable, Durability ● Robust Replication (no data lost)
  • 9. Application data flow Kafka as a central hub
  • 10. Kafka use cases ● Messaging ○ As the “traditional” messaging system ● Website activity tracking ○ Event like page views, search ● Metrics collection and Monitoring ○ Alerting and reporting on operational metrics ● Log aggregation ○ Collect logs from multiple service ● Stream processing ○ Read, process and write stream for real-time analysis
  • 11. Who use Kafka? ● LinkedIn use kafka to monitoring activity data and operational metrics ● Uber uses Kafka to gather user, taxi and trip data in real-time to compute and forecast surge pricing in real-time ● Netflix uses kafka to apply recommendation in real-time while you ‘re watching TV-Show
  • 12. Why kafka so fast ? ● Zero Copy - calls the OS kernel direct rather to move data fast ● Batch Data in Chunks - Batches data into chunks ○ End to end from Producer to file system to Consumer with minimises cross machine latency ○ Provides More efficient data compression. Reduces I/O latency ● Sequential Disk Write - Avoids Random Disk Access ○ Writes to immutable commit log. No slow disk seeking. No random I/O Operations. ○ Disk accessed in sequential manner ● Horizontal Scale - uses 100s to thousands of partitions for single topic ○ Spread out to thousands of servers ○ Handle massive load
  • 13. Kafka Core Concept - Kafka ecosystem
  • 14. Kafka Core Concept - Topic and Partitions Topic :- ● Similar to table in database (but no reference id) ● Each topic identified by name (Unique Key) Partitions :- ● Topic split into partitions, and each partition is ordered ● Each message in partition is assigned a sequential id called an offset ○ Start from zero and increase to 1,2,3,... and so on ● Ordering in only guaranteed within partition for a topic ● Once the data is written to partition, it cannot be changed (Immutability) ● Data is retained for a configurable period of time (default is 7 days)
  • 15. Kafka Core Concept - Topic and Partitions For example, 1 topic 3 partitions Partition 0 Partition 1 Partition 2 old new 0 1 2 3 4 5 0 1 2 3 0 1 2 3 4 5 6 7 write Topic A
  • 16. Kafka Core Concept - Kafka Brokers Kafka Brokers :- ● Broker is a Kafka server which is contain partition of topic ● Each Broker has an ID (number) ● Kafka Cluster is composed of multiple Brokers (servers) ● Topic consist of partition that can spread to multiple nodes on cluster ● Connecting to one broker bootstraps client to entire cluster (bootstrap server) ● Start with at least 3 brokers, cluster can have 10, 100, 1000 brokers of needed
  • 17. Kafka Core Concept - Kafka Brokers Broker 1 Topic 1 Partition 0 Topic 2 Partition 0 Broker 2 Topic 1 Partition 1 Topic 2 Partition 1 Broker 3 Topic 1 Partition 2 For example, 3 brokers, 2 topic
  • 18. Kafka Cluster Kafka Core Concept - Kafka Cluster Broker 1 Topic 1 Partition 0 Topic 2 Partition 0 Broker 2 Topic 1 Partition 1 Topic 2 Partition 1 Broker 3 Topic 1 Partition 2
  • 19. Kafka Core Concept - Kafka Replication Kafka replication factor, Failover, ISR ● Kafka replicated Topic Partitions ○ Across multiple nodes in cluster for failover ● For Topic with replication factor N, Kafka can tolerate upto N-1 server failures without losing data ○ For example, you have 3 broker Then replication factor is 3 - 1 = 2 ■ This mean if there are 2 broker server alive then so your data not be lost ■ this determines on how many brokers a partition will be replicated
  • 20. Kafka Cluster Kafka Core Concept - Kafka Replication Broker 1 Topic A Partition 0 Broker 2 Topic A Partition 1 Topic A Partition 0 Broker 3 Topic A Partition 1 For example, Topic A with 2 partition , replication factor of 2 replicated replicated
  • 21. Kafka Cluster Kafka Core Concept - Kafka Replication Broker 1 Topic A Partition 0 Broker 2 Topic A Partition 1 Topic A Partition 0 Broker 3 Topic A Partition 1 For example, Topic A with 2 partition , replication factor of 2
  • 22. Kafka Core Concept - Leader for Partition Leader for Partition ● Each partition in topic has 1 leader and 0 or more replicas ● At any time only one broker can be leader for given partition ● Only leader can receive and serve data for partition ● The other broker will synchronize the data (follower) ○ The group of in-sync replicas for partition is called ISR (in-sync replicas) ● Therefore each partition is going have one leader and multiple ISR ● Kafka replication is for Failover ○ If one broker goes down then another broker (with ISR) can serve data
  • 23. Kafka Core Concept - Leader in Partition Topic A * Partition 0 (Leader) Broker 1 Topic A * Partition 1 (Leader) Broker 2 Topic A * Partition 1 (ISR) Broker 3 Topic A Partition 0 (ISR) replication replication
  • 24. Kafka Core Concept - Failover and ISR Topic 1 Topic 1 Topic 1
  • 25. Kafka Core Concept - Failover and ISR Topic 1 Topic 1 Topic 1
  • 26. Kafka Core Concept - Failover and ISRs Topic 1 Topic 1
  • 27. Kafka Core Concept - Producers Producers ● Producer write data to a topics (which is made partition) ● The load is balanced to many brokers 0 1 2 3 4 5 0 1 2 3 0 1 2 3 4 5 6 7 producer Broker 1 Topic A,Partition 0 Broker 2 Topic A,Partition 1 Broker 3 Topic A,Partition 1 writes writes writes Send data
  • 28. Kafka Core Concept - Producers Durable Write ● Durability can be configured with the producer configuration ○ acks=0 : The producer never waits for an ack (possible data lost) ○ acks=1 : The producer gets an ack after the leader has receive the data (limited data lost) ○ acks=all : The producer gets an ack after all ISRs receive the data (no data lost) ● Producer can trade off between throughput or durability of writes
  • 29. Kafka Core Concept - Consumer Consumer ● Consumer read data from topic ● Data is read in order within each partitions ● Message stay on kafka … they are not remove after they are consumed Read in order a, e, i ,k c, g b, d, f, h, j, l, m, n
  • 30. Kafka Core Concept - Consumer Groups Consumer Groups ● Consumers can be organised into Consumer Groups ● If you have more consumers than partition, some consumer will be inactive
  • 31. Kafka Core Concept - Consumer Offsets Consumer Offsets ● Kafka topic store the offset at which a consumer group has been reading (Check pointing / bookmarking) ● The offset committed stored in kafka topic named “__consumer_offsets” ● When a consumer in a group has processed data received from kafka, it should be committing the offsets (though “__consumer_offsets”) ● If a consumer dies, it will be able to read back from where it left
  • 32. Kafka Core Concept - Message Delivery Semantics Delivery Semantics for consumer ● Consumer choose when to commit offsets ● There are 3 delivery semantics ○ At most once: ■ Read message, commit offset, process message ■ Messages may be lost but are never redelivered ○ At least once: (usually preferred) ■ Read message, process message, commit offset ■ Messages are never lost but may be redelivered ○ Exactly once: ■ each message is delivered once and only once
  • 33. Kafka Core Concept - Zookeeper Zookeeper ● Manage Broker (keep a list of them) ● Zookeeper help with leadership election of Kafka Broker and Topic Partition paris ● Zookeeper manages service discovery for Kafka Brokers that form the cluster ● Zookeeper sends notification to Kafka in case of changes ○ New Broker join, ○ Broker died, ○ Topic removed, ○ Topic added, etc ● Kafka cannot work without Zookeeper
  • 34. Kafka CLI - 101 ● Kafka Topics CLI ● Kafka Console Producer CLI ● Kafka Console Consumer CLI ● Kafka Consumer Groups CLI ● Resetting Offsets ● CLI Options that are good to know ● Let’s play kafka in action
  • 35. Reference ● https://www.slideshare.net/paolopat/meet-apache-kafka-data-streaming-in- your-hands?from_action=save ● https://www.slideshare.net/JeanPaulAzar1/kafka-tutorial-introduction-to- apache-kafka-part-1?from_action=save ● http://searene.me/2017/07/09/Why-is-Kafka-so-fast/ ● https://bravenewgeek.com/building-a-distributed-log-from-scratch-part-2-data- replication/ ● https://medium.com/linedevth/apache-kafka- %E0%B8%89%E0%B8%9A%E0%B8%B1%E0%B8%9A%E0%B8%9C%E0 %B8%B9%E0%B9%89%E0%B9%80%E0%B8%A3%E0%B8%B4%E0%B9% 88%E0%B8%A1%E0%B8%95%E0%B9%89%E0%B8%99-1-hello-apache- kafka-242788d4f3c6

Hinweis der Redaktion

  1. Hi good morning everyone, today I will present apache kafka
  2. We will start at 1st topic We will focus on last two topic
  3. Linkedin ผู้ใช้จํานวน 300 million users events ทุกวันๆ ทําให้ในบางทีมันเกิดปัญหาเรื่องของ data lost ทีนี้มันก็เลยเกิดออกมาเป็น Kafka นั้นเอง! ออกแบบมาเพื่อให้จัดการข้อมูลขนาดใหญ่,การันตีข้อมูลจะถึงผู้รับ และเป็น distributed system มีการกระจาย (distributed) การเก็บข้อมูลใน clusters มีความยืดหยุ่น (resilient architecture) เช่น we can add/remove consumer at anytime, kafka will be rebalance the load มีการทนต่อความเสียหาย (fault tolerant) = durable of data มีความสามารถในการขยายเชิงขนาน หรือ เพิ่มเครื่อง (node) ใน cluster ได้ (horizontal scalability) มีประสิทธิภาพด้านความเร็ว (latency น้อยกว่า 10ms)
  4. Fault tolerant - backup part of data into several different server in cluster Robust because Records written to Kafka server are persisted to disk and replicated to other server
  5. Kafka come into picture Kafka decouple/separate data from systems Kafka really good at making your data move bec kafka really fast
  6. Messing - Message Queue ใช้ Kafka ทํา messaging มากกว่าตัวอื่นนน คงเป็นเพราะ replication,built-in partitioning, fault-tolerance กว่า traditional messaging อื่นๆ อย่าง rabbit MQ เป็นต้น … Kafka often use instead of message queue such as rabbitMq because high throughput, reliability, replication, fault-tolerant Stream processing Because kafka is a real-time publish-subscribe message system then people usually use kafka as real-time processing, monitoring system
  7. All these companies using kafka so then they can make real time recommendations,Real-time decision give you real-time insight to their user
  8. Kafka compress your data fit to your bandwidth For example, your network have a bandwidth is 10MB, but your data is 100MB It is better send 10 times of 10MB instead of 100MB for 1 times for more efficient, reduce I/O latency Sequential disk faster than random disk access As you can see, it’s not that different. But still, sequential memory access is faster than Sequential Disk Access, why not choose memory? Because Kafka runs on top of JVM, which gives us two disadvantages. The memory overhead of objects is very high, often doubling the size of the data stored(or even higher). ใช้ memory เยอะเพราะข้อมูลใน memory จะเป็น 2 เท่าของ data store Garbage Collection happens every now and then, so creating objects in memory is very expensive as in-heap data increases because we will need more time to collect unused data(which is garbage) มีการเรียกใช้งาน garbage collection บ่อย Kafka runs on top of JVM, if we write data into memory directly, the memory overhead would be high and GC would happen frequently. So we use MMAP here to avoid the issue. MMAP is a map the file contents from the disk into memory
  9. Producer get data from source system and send data to kafka Consumer consume data from kafka and send it to target system Zookeeper ที่ทำหน้าที่บริหารจัดการว่าควรจะไปอ่านข้อมูลที่ replica ตัวไหนหรือ Consumer นี้ควรไปอ่านข้อมูลตรงไหนต่อ Zookeeper uses to manage kafka server Kafka client -> broker_1 : connection establish + metadata request Broker_1-> kafka client : return list of all broker Kafka client -> broker_3 :Kafka client can connect to the needed server
  10. We have to understand topic and partition first before we go dive drive into the kafka Topic and partition uses to handle message in kafka Topic - Topic are broken up into ordered commit log called partitions Imagine kafka have a huge number of data, and how we know which message we can get? We need unique key to find topic ลองจิตนาการข้อมูลที่เราส่งไปมาใน Kafka Stream นี้มันเยอะมากๆๆๆๆๆๆเลยน่ะ แล้วเราจะรู้ได้ไงว่าต้องเอาข้อความไหนล่ะ? คําตอบง่ายมากเลยก็คือต้องหา Unique key ใช่มั้ยล่ะ? ซึ่งเจ้าตัวนั้นก็คือ Topic นั้นเอง (หรือจะมองว่า Topic คือ Unique key ใน database ได้) Partitions As we known Kafka server/Broker will store data, but some time data is very big and single computer cannot save it It good idea to divide data into partitions and keep them to different machine (distributed data) อย่างที่รู้ว่า Broker จะเป็นคนเก็บข้อมูลที่ส่งไปมาใช่มั้ย? แต่บางที่ข้อมูลที่เราเก็บเนี่ยมันใหญ่มากๆๆๆๆเกินกว่าที่ computer เครื่องนึงจะรับไหว เลยต้องใช้ distributed system เข้ามา มันเลยมีไอเดียที่จะต้องแบ่ง data ออกเป็น Partition หลายๆๆส่วน แล้วกระจายไปเก็บไว้ใน distributed system นั้นเอง ไม่สามารถแก้ไขข้อมูลที่เอาใส่ใน topic แล้ว(Immutability) ข้อมูลจะถูกลบในเวลาที่เราตั้งไว้ (default 604800000 ms หรือ 7 วัน) ไม่ว่ามันจะถูกหยิบไปใช้หรือไม่ เพราะต้องเคลียร์พื้นที่ใน hdd (clear space in storage) Data in kafka have a limit time Offset: A record in a partition has an offset associated with it. Think of it like this: partition is like an array; offsets are like indexs.
  11. Order gaurantee only within partition (not accross partition) Start from zero and increase to 1,2,3,... and so on Look at partition 0 -> latest offset is 4 the next one should be offset 5 Offset just specify position of partition เริ่มต้นที่ยังไม่มีค่าอะไร พอข้อมูลถูกเพิ่มมาครั้งแรก ตัว offset จะถูกนับเป็น 0 จากนั้น ถูกเพิ่มมาเรื่อยๆ ก็นับเพิ่มตามมาเรือยๆ ตามลำดับ ข้อมูลจะถูกเรียงตามลำดับก่อนหลัง ใน partition นั้นๆ แปลว่า offset ที่ 0 ของ partition ที่ 0 อาจจะมาก่อนหรือหลัง offset ที่ 0 ของ partition 1 ก็ได้
  12. Bootstrap server - Each Broker knows about all broker , all topic and all partitions (metadata)
  13. Topic 1 has 3 partitions Topic 2 has 2 partitions Data is broken up to partition and distribute to different brokers/machine When you created topic, kafka automatic assign the topic & distribute it across all your brokers
  14. Multiple broker of single group is called Kafka Cluster
  15. Kafka is distributed system Replication means If one broker goes down, but things will still working ----- แต่ละ topic จะมีการทำสำเนา (replica) partition จากตัวหลัก(leader)ไปสำเนากี่ server หรือ broker (เรียกว่า in-sync replication คือถ้าข้อมูลเข้ามาตัวหลักก็ส่งไปทำสำเนาเพิ่มเลย) ซึ่งควรมีมากกว่า 1 ปกตินิยม 2–3 ครับ และ partition ตัวหลักก็มีได้ 1 ตัว ซึ่งคุณสมบัตินี้เอง เป็นตัวที่ทำให้ Apacha Kafka มีคุณสมบัติ Fault Tolerance (ความทนต่อความเสียหาย) ถ้ามี broker ตัวนึงเน่าไป มันยังสามารถไปอ่านและบันทึกข้อมูลต่อในตัวที่เป็น replica ได้ด้วย โดยการเปลี่ยน replica ให้เป็น leader ---- ถ้าหาก broker ตัวใดตัวนึงตาย ไป เรายังมีอีก 2 ตัวให้ทำงาน แต่ถ้าหากมันตายมากกว่า 1 ตัวก็จบ เพราะฉะนั้น การกำหนดตัวเลข replication factor จึงสำคัญ ยิ่งมากยิ่งปลอดภัย ซึ่งจะทนการเน่าของ server ได้ n-1 ตัว ถ้าสมมติกำหนด replication factor เป็น n แต่ก็มีสิ่งแลกเปลี่ยนคือมันเปลืองพื้นที่ในการเก็บ ก็ต้องประมาณจากความเสี่ยงที่มีเองครับ
  16. you have two copies for each data
  17. If broker 2 goes down but topic A will be not lost Replica allow us to ensure that data would not be lost
  18. Only leader can receive and serve data for partition >> In Other word, The leader can be read and write data in partition that consumer/producer use for exchange message The other broker will synchronize the data >> it is called Follower is other replicas, they don’t serve client request And they replicate message from the leader to “in-sync replica” (ISR) Who decode leader and follower ? Answer Zookeeper ❖ Kafka Replication is for Failover ❖ Mirror Maker is used for Disaster Recovery ❖ Mirror Maker replicates a Kafka cluster to another data-center or AWS region ❖ Called mirroring since replication happens within a cluster
  19. What is decides leader & ISR ? zookeeper
  20. As I mentioned before, Single topic can have 1 leader and other is a follower Broker 1 is a leader Broker 2 and Broker 3 are follower
  21. What happen If Broker 1 goes down
  22. Then Broker 2 will become leader because Broker 2 it was ISR before And then If Borker 1 come back Broker1 it will try to become a leader again after replicate the data
  23. Message are appended to topic - partition in the order they are sent Basically. Producer send data without key then data will be be send round robin to broker 1, broker 2, broker 3 If producer send data with key, kafka will hashing the key and use it’s value to find which broker is selected If once key go there then it will go there all the time, key will always goto same position of partition ถ้าเราไม่กำหนด key ให้มัน มันก็จะวนส่งข้อมูลแบบ round-robin ไปยัง broker ที่ partition ใน topic นั้นๆ อยู่ แต่ถ้ามีการกำหนด key ไว้ หลังจากการ produce ครั้งแรกแล้ว ถ้ามี key เดิมซ้ำกันเข้ามา มันจะวิ่งเข้าไปหา broker ที่เดิมที่ key นั้นเคยเข้าไปอยู่ครั้งแรก ส่วนเงื่อนไขที่จะพิจารณาว่า message key อันไหน ควรอยู่ที่ partition ไหน ในกรณีที่ยังไม่มี key อันนั้น อันนี้มันจะเอา key ไป hash แล้วเอาค่ามาทำอะไรบางอย่าง
  24. Producer can choose to receive acknowledgement of data writes Acknowledgement is synonym for confirmation There are 3 confirmation acks=0 just send data acks=1 to get ack of write only leader (default) acks=all get after all ISR (leader & replicas) received
  25. Consumer read message in the order stored in topic-partition Consumers หน้าที่หลักของ consumer คืออ่านข้อมูลจาก partition เพียงแค่ต่อเข้า broker สักตัว แล้วระบุ topic ไป มันก็จะอ่านให้เอง ไม่ว่า partition จะอยู่ที่ broker ไหนก็ตาม เหมือนๆ กับฝั่ง producer เลยครับ Order อยากเน้นเรื่องลำดับการอ่านอีกแล้วครับ ตัว consumer จะอ่านข้อมูลตามลำดับใน partition เดียวกัน แต่ถ้าต่าง partition มันจะอ่านแบบขนาน(parallel) เพราะฉะนั้น การออกแบบการเรียงลำดับตัว message key ตั้งแต่ฝั่ง consumer เลย จริงเป็นสิ่งที่สำคัญมาก ถ้าข้อมูลจำเป็นต้องถูกเรียกใช้ตามลำดับ
  26. If our system received many many topic from producer, at the same time we have consumer not enough consumer then we add more consumer we can add/remove consumer at anytime, kafka will be rebalance the load Consumer Groups ปัญหาคือ ถ้าหากระบบเรา produce ข้อมูลเข้ามามากๆ ในขณะเดียวกัน consumer ที่เรามีก็น้อยไม่เพียงพอ เราเพิ่ม consumer ได้เลย เพื่อให้มัน consume ได้เร็วขึ้น โดยมันจะ consume แบบขนานกันไป ซึ่งหนึ่งข้อบังคับของ consumer ใน group หนึ่ง ต้องมีจำนวน consumer ไม่เกินจำนวน partition ใน topic ที่ consumer นั้นสนใจ ถ้ามีเกินมา ตัวนั้นจะไม่ได้ทำอะไร จากตัวอย่างเดียวกัน จะสังเกตดูได้ว่าจะไม่มี consumer อ่าน partition ที่ซ้ำกันเลย แปลว่าไม่ว่าเราจะเพิ่มหรือลด consumer มันจะไม่อ่านข้อมูลซ้ำกันเด็ดขาด และความสุดยอดอีกอย่างนึงคือ เราสามารถเพิ่มและลด consumer ได้ตอนไหนก็ได้ มันจะมีการแบ่ง(re-balance)หน้าที่ของ consumer ที่มีให้เองว่าควรไปอ่านที่ partition ไหน ซึ่งตรงนี้เป็นคุณสมบัติ resilient หรือความยืดหยุ่นนั่นเอง อาจจะสงสัยนะครับ แล้วถ้ามีหลาย group ล่ะ(เพื่อจุดประสงค์การ consume ที่ต่างออกไป เช่นมี serviceใหม่) มันจะเริ่มอ่านจากตรงไหน ใน topic เดียวกัน คำตอบคือ แต่ละ group จะมีตัวนับ(offset) ของมันเอง สมมติมี group แรก กำลังทำงานอยู่ แล้วเราเพิ่ม group ที่ 2 เข้ามา มันก็จะเริ่มอ่านตั้งแต่แรก(ซึ่งเซ็ตได้ว่าจะให้อ่านจากตั้งแต่แรกหรือจากข้อมูลปัจจุบัน) คือตัวนับมันจะแยกกันโดยสิ้นเชิง two consumers cannot consume messages from the same partition at the same time. A consumer can consume from multiple partitions at the same time.
  27. How kafka know consumer will read which next topic ? ตัวนับว่าตัวถัดของ consumer ใน consumer group ควรใช้ offset อะไร Store offset comited in topic named “__consumer_offsets” (version < 0.9 “__consumer_offsets” keep in zookeeper) I ‘m died, now I ‘m back alive. So now, I can start at this offset & continue read from there
  28. At most once Offset are committed as soon as the message is received If the processing goes wrong the message will be lost (it won’t be read again) At least once Offset are committed after the message processed You read data and do something with data and then commit the offset, if processing goes wrong, If your consumer goes down, then the message will be read agian
  29. - ทำหน้าที่จัดการ brokers คือรู้ว่า broker ตัวไหน อยู่ที่ไหน ตายอยู่หรือไม่ตาย - บันทึกว่า topic ไหนมีหรือไม่มี มีกี่ partition ใน topic นี้ - ทำการเลือก leader/replica ของ partition - ส่งสัญญาณไปหา Kafka ในทุกๆ การเปลี่ยนแปลงที่เกิดขึ้น เช่น มี topic มาใหม่ หรือมี broker ตาย หรือเพิ่มขึ้นมา - บันทึกว่า producer/consumer แต่ละตัวควรจะเขียนหรืออ่าน data ได้เท่าไหร่ - เก็บ Authorization ว่า user ไหนถูกอนุญาตให้สร้าง topic บ้าง - บันทึกว่าแต่ละ consumer group มี consumer กี่ตัว อ่านไปถึง offset ไหนแล้ว - มีพรรคพวก(quorum)ของมันเอง นิยมให้เป็นจำนวนเลขคี่เช่น มี 3,5,7… ตัวของ จำนวน Zookeeper เพราะมีเรื่อง consensus ในการบันทึกข้อมูลด้วย เช่น ต้องเป็นจำนวนมากกว่าครึ่งนงของ Zookeeper ที่รันอยู่เช็คแล้วว่าถูกบันทึกแล้ว ตัว leader ของ Zookeeper เองจะบันทึกเสร็จแล้วจริงๆ - Zookeeper has a leader (handle write) and the rest of the server are follower (handle read) - Consmer & producer don’t write to zookeeper, they write to kafka - kafka just manage all metadata in zookeeper -zookeeper does not store consumer offsets
  30. Kafka is require using java version 8 not 9, not 10 Virsulization tool -> kafkatool Create topic kafka-topics --zookeeper 127.0.0.1:2181 --topic first_topic --create --replication-factor 1 --partitions 3 List topic kafka-topics --zookeeper 127.0.0.1:2181 --list Describe topic kafka-topics --zookeeper 127.0.0.1:2181 --topic first_topic --describe Delete topic kafka-topics --zookeeper 127.0.0.1:2181 --topic first_topic --delete Console Produce CLI - produce kafka-console-producer --broker-list localhost:9092 --topic first_topic Console Consume CLI - consume kafka-console-consumer --bootstrap-server localhost:9092 --topic first_topic Console Consume CLI - consume from begin kafka-console-consumer --bootstrap-server localhost:9092 --topic first_topic --from-beginning Consumer Group kafka-console-consumer --bootstrap-server localhost:9092 --topic topic_1 --group my_app Consumer Group CLI – list kafka-consumer-group --bootstrap-server localhost:9092 --list Consumer Group CLI – describe kafka-consumer-group --bootstrap-server localhost:9092 --group my_app --describe Consumer Group CLI - reset offset kafka-consumer-group --bootstrap-server localhost:9092 --group my_app --reset-offsets --to-earliest --execute CLi Option Produce topic with key kafka-console-producer --broker-list localhost:9092 --topic first_topic --property parse.key=true --property key.separator=, Consumer with key kafka-console-consumer --broker-list localhost:9092 --topic first_topic --property print.key=true --property key.separator=,