SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Apache Kafka
The Big Data Messaging Tool
Index
● Need of Apache Kafka
● Kafka At LinkedIn
● What is Apache Kafka and its features
● Components of Apache Kafka
● Architecture and Flow in Apache Kafka
● Uses of Apache Kafka
● Kafka in Real World
● Comparison with other messaging system
● Demo
Need Of Apache Kafka
In Big Data, an enormous volume of data is used. Regarding data, we have two
main challenges.The first challenge is how to collect large volume of data and
the second challenge is to analyze the collected data. To overcome those
challenges, you must need a messaging system.
Kafka at Linkedin
If data is the lifeblood of high technology, Apache Kafka is the circulatory system in use at
LinkedIn -Todd Palin
The Kafka ecosystem at LinkedIn is sent over 800 billion messages per day
which amounts to over 175 terabytes of data. Over 650 terabytes of messages
are then consumed daily, which is why the ability of Kafka to handle multiple
producers and multiple consumers for each topic is important. At the busiest
times of day, linkedin is receiving over 13 million messages per second, or 2.75
gigabytes of data per second. To handle all these messages, LinkedIn runs over
1100 Kafka brokers organized into more than 60 clusters.
What is Apache Kafka
Apache Kafka is a distributed publish-subscribe messaging system and a robust
queue that can handle a high volume of data and enables you to pass messages
from one end-point to another. Kafka is suitable for both offline and online
message consumption. Kafka messages are persisted on the disk and replicated
within the cluster to prevent data loss.
Kafka supports low latency message delivery and gives guarantee for fault
tolerance in the presence of machine failures. Kafka is very fast, performs 2
million writes/sec.
Kafka persists all data to the disk, which essentially means that all the writes go
to the page cache of the OS (RAM). This makes it very efficient to transfer data
from page cache to a network socket.
Kafka is very fast and guarantees zero downtime and zero data loss
Kafka is built on top of the ZooKeeper synchronization service. It integrates very
well with Apache Storm and Spark for real-time streaming data analysis.
Features of Kafka
Following are a few benefits of Kafka −
● Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.
● Scalability − Kafka messaging system scales easily without down time..
● Durability − Kafka uses distributed commit log which means messages
persists on disk as fast as possible, hence it is durable.
● Performance − Kafka has high throughput for both publishing and
subscribing messages. It maintains stable performance even many TB of
messages are stored.
Components Of Kafka
● Topics: The categories in which Kafka maintains its feeds of messages.
● Producers: The processes that publish messages to a topic.
● Consumers: The processes that subscribe to topics so as to fetch the
above published messages.
● Broker: The cluster consisting of one or more servers in Kafka.
● TCP Protocol: The client and server communicate using this protocol.
● ZooKeeper : distributed configuration and synchronization service
Uses of Zookeeper
● Each Kafka Broker can coordinate with other broker with the help of
zookeeper
● Zookeeper serves as the coordination interface between the Kafka brokers
and consumers.
● Kafka stores basic metadata in Zookeeper such as information about
topics, brokers, consumer offsets (queue readers) and so on.
● The leader election between the Kafka broker is also done by using
Zookeeper in the event of leader failure.
Kafka Cluster
With Kafka we can create multiple types of cluster
● A single node—single broker cluster
● A single node—multiple broker clusters
● Multiple nodes—multiple broker clusters
A single node—multiple broker clusters
Partition and topics
● A topic may have many partitions thus enabling it to handle an arbitrary amount of data. Each
partition is an ordered, immutable sequence of messages that is continually appended to a
commit log. The messages in the partition are each assigned with a sequential id number called
the offset, which uniquely identifies each message within the partition.
● The partitions of the log are distributed over the servers in the Kafka cluster with each server
handling data and requests for a share of partitions. Each partition is replicated across a
configurable number of servers.
● Kafka assigns each server with a leader and follower, which helps in the whole replication cycle of
messages in the partitions.
● In a nutshell, Kafka partitions the incoming messages for a topic, and assigns these partitions to
an available Kafka broker.
Architectural Flow for Pub-Sub Messaging
Kafka offers a single consumer abstraction that generalizes both Queuing and Publish-Subscribe
Following is the step wise workflow of the Pub-Sub Messaging −
● Producers send message to a topic at regular intervals.
● Kafka broker stores all messages in the partitions configured for that particular topic. It ensures
the messages are equally shared between partitions. If the producer sends two messages and
there are two partitions, Kafka will store one message in the first partition and the second
message in the second partition.
● Consumer subscribes to a specific topic.
● Once the consumer subscribes to a topic, Kafka will provide the current offset of the topic to the
consumer and also saves the offset in the Zookeeper ensemble.
● Consumer will request the Kafka in a regular interval (like 100 Ms) for new messages.
Architectural Flow
● Once Kafka receives the messages from producers, it forwards these messages to the
consumers.
● Consumer will receive the message and process it.
● Once the messages are processed, consumer will send an acknowledgement to the Kafka broker.
● Once Kafka receives an acknowledgement, it changes the offset to the new value and updates it in
the Zookeeper. Since offsets are maintained in the Zookeeper, the consumer can read next
message correctly even during server outrages.
● This above flow will repeat until the consumer stops the request.
● Consumer has the option to rewind/skip to the desired offset of a topic at any time and read all
the subsequent messages.
Using Apache Kafka
● Install Zookeeper
sudo apt-get install zookeeperd
● Download and extract kafka in a directory
● Start the Kafka Server
Sh ~/kafka/bin/kafka-server-start.sh ~/kafka/config/server.properties
● Create a producer with a topic
sh ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 -
--topic TutorialTopic
● Create a consumer with same topic
~/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic
TutorialTopic --from-beginning
Who else uses Kafka
Twitter - Twitter uses Storm-Kafka as a part of their stream processing
infrastructure.
Netflix - uses Kafka for real-time monitoring and event processing.
Mozilla - Kafka will soon be replacing a part of Mozilla current production system
to collect performance and usage data from the end-user’s browser for projects
like Telemetry, Test Pilot, etc.
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Demo of Kafka
● This demo is a grails web application which uses kafka to send messages
from producer to consumer
● The messages will include time spent on page , time when page visited .
time when page left , current authenticated user and uri of page
● This will be received on consumer end and records are persisted in db
Common Use cases of Apache kafka
● Website activity tracking: The web application sends events such as page
views and searches Kafka, where they become available for real-time
processing, dashboards and offline analytics in Hadoop
● Operational metrics: Alerting and reporting on operational metrics
● Log aggregation: Kafka can be used across an organization to collect logs
from multiple services and make them available in standard format to
multiple consumers, including Hadoop and Apache Solr.
Common Use cases of Apache kafka
Stream processing: A framework such as Spark Streaming reads data from a
topic, processes it and writes processed data to a new topic where it becomes
available for users and applications. Kafka’s strong durability is also very useful
in the context of stream processing.
Comparison with other Messaging system
In maturity and features Rabbit MQ outshines Kafka but when it comes to
durability , high throughput and fault tolerance Apache Kafka stands as winner
https://www.infoq.com/articles/apache-kafka
Uses of Zookeeper
https://www.safaribooksonline.com/library/view/learning-apache-kafka/9
https://www.infoq.com/articles/apache-kafka
http://www.tutorialspoint.com/apache_kafka
https://engineering.linkedin.com/kafka/running-kafka-scale
https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka
-on-ubuntu-14-04
Thanks
Project Demo url : https://github.com/ackhare/GrailsKafkaPageCounter
Presented By - Chetan Khare

Weitere ähnliche Inhalte

Was ist angesagt?

Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Worksconfluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 

Was ist angesagt? (20)

Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Andere mochten auch (20)

Advanced criteria queries
Advanced criteria queriesAdvanced criteria queries
Advanced criteria queries
 
Elastic search
Elastic searchElastic search
Elastic search
 
Grails Custom Plugin
Grails Custom PluginGrails Custom Plugin
Grails Custom Plugin
 
Grails audit logging
Grails audit loggingGrails audit logging
Grails audit logging
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
Grails internationalization-160524154831
Grails internationalization-160524154831Grails internationalization-160524154831
Grails internationalization-160524154831
 
Bootcamp linux commands
Bootcamp linux commandsBootcamp linux commands
Bootcamp linux commands
 
G pars
G parsG pars
G pars
 
Grails custom tag lib
Grails custom tag libGrails custom tag lib
Grails custom tag lib
 
Angular 2 - An Introduction
Angular 2 - An IntroductionAngular 2 - An Introduction
Angular 2 - An Introduction
 
Twilio
TwilioTwilio
Twilio
 
Spring boot
Spring bootSpring boot
Spring boot
 
Gorm
GormGorm
Gorm
 
Mixpanel
MixpanelMixpanel
Mixpanel
 
Grails Plugins(Console, DB Migration, Asset Pipeline and Remote pagination)
Grails Plugins(Console, DB Migration, Asset Pipeline and Remote pagination)Grails Plugins(Console, DB Migration, Asset Pipeline and Remote pagination)
Grails Plugins(Console, DB Migration, Asset Pipeline and Remote pagination)
 
Actors model in gpars
Actors model in gparsActors model in gpars
Actors model in gpars
 
Java reflection
Java reflectionJava reflection
Java reflection
 
MetaProgramming with Groovy
MetaProgramming with GroovyMetaProgramming with Groovy
MetaProgramming with Groovy
 
Grails Controllers
Grails ControllersGrails Controllers
Grails Controllers
 
Grails services
Grails servicesGrails services
Grails services
 

Ähnlich wie Apache kafka

Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperAnandMHadoop
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptxKoiuyt1
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationKnoldus Inc.
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationMuleSoft Meetup
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Denodo
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdfTarekHamdi8
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introductionSyed Hadoop
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuideInexture Solutions
 

Ähnlich wie Apache kafka (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptx
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_Integration
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Event driven-arch
Event driven-archEvent driven-arch
Event driven-arch
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 

Mehr von NexThoughts Technologies (20)

Alexa skill
Alexa skillAlexa skill
Alexa skill
 
GraalVM
GraalVMGraalVM
GraalVM
 
Docker & kubernetes
Docker & kubernetesDocker & kubernetes
Docker & kubernetes
 
Apache commons
Apache commonsApache commons
Apache commons
 
HazelCast
HazelCastHazelCast
HazelCast
 
MySQL Pro
MySQL ProMySQL Pro
MySQL Pro
 
Microservice Architecture using Spring Boot with React & Redux
Microservice Architecture using Spring Boot with React & ReduxMicroservice Architecture using Spring Boot with React & Redux
Microservice Architecture using Spring Boot with React & Redux
 
Swagger
SwaggerSwagger
Swagger
 
Solid Principles
Solid PrinciplesSolid Principles
Solid Principles
 
Arango DB
Arango DBArango DB
Arango DB
 
Jython
JythonJython
Jython
 
Introduction to TypeScript
Introduction to TypeScriptIntroduction to TypeScript
Introduction to TypeScript
 
Smart Contract samples
Smart Contract samplesSmart Contract samples
Smart Contract samples
 
My Doc of geth
My Doc of gethMy Doc of geth
My Doc of geth
 
Geth important commands
Geth important commandsGeth important commands
Geth important commands
 
Ethereum genesis
Ethereum genesisEthereum genesis
Ethereum genesis
 
Ethereum
EthereumEthereum
Ethereum
 
Springboot Microservices
Springboot MicroservicesSpringboot Microservices
Springboot Microservices
 
An Introduction to Redux
An Introduction to ReduxAn Introduction to Redux
An Introduction to Redux
 
Google authentication
Google authenticationGoogle authentication
Google authentication
 

Kürzlich hochgeladen

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Kürzlich hochgeladen (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Apache kafka

  • 1. Apache Kafka The Big Data Messaging Tool
  • 2. Index ● Need of Apache Kafka ● Kafka At LinkedIn ● What is Apache Kafka and its features ● Components of Apache Kafka ● Architecture and Flow in Apache Kafka ● Uses of Apache Kafka ● Kafka in Real World ● Comparison with other messaging system ● Demo
  • 3. Need Of Apache Kafka In Big Data, an enormous volume of data is used. Regarding data, we have two main challenges.The first challenge is how to collect large volume of data and the second challenge is to analyze the collected data. To overcome those challenges, you must need a messaging system.
  • 4. Kafka at Linkedin If data is the lifeblood of high technology, Apache Kafka is the circulatory system in use at LinkedIn -Todd Palin The Kafka ecosystem at LinkedIn is sent over 800 billion messages per day which amounts to over 175 terabytes of data. Over 650 terabytes of messages are then consumed daily, which is why the ability of Kafka to handle multiple producers and multiple consumers for each topic is important. At the busiest times of day, linkedin is receiving over 13 million messages per second, or 2.75 gigabytes of data per second. To handle all these messages, LinkedIn runs over 1100 Kafka brokers organized into more than 60 clusters.
  • 5. What is Apache Kafka Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you to pass messages from one end-point to another. Kafka is suitable for both offline and online message consumption. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss. Kafka supports low latency message delivery and gives guarantee for fault tolerance in the presence of machine failures. Kafka is very fast, performs 2 million writes/sec.
  • 6. Kafka persists all data to the disk, which essentially means that all the writes go to the page cache of the OS (RAM). This makes it very efficient to transfer data from page cache to a network socket. Kafka is very fast and guarantees zero downtime and zero data loss Kafka is built on top of the ZooKeeper synchronization service. It integrates very well with Apache Storm and Spark for real-time streaming data analysis.
  • 7. Features of Kafka Following are a few benefits of Kafka − ● Reliability − Kafka is distributed, partitioned, replicated and fault tolerance. ● Scalability − Kafka messaging system scales easily without down time.. ● Durability − Kafka uses distributed commit log which means messages persists on disk as fast as possible, hence it is durable. ● Performance − Kafka has high throughput for both publishing and subscribing messages. It maintains stable performance even many TB of messages are stored.
  • 8. Components Of Kafka ● Topics: The categories in which Kafka maintains its feeds of messages. ● Producers: The processes that publish messages to a topic. ● Consumers: The processes that subscribe to topics so as to fetch the above published messages. ● Broker: The cluster consisting of one or more servers in Kafka. ● TCP Protocol: The client and server communicate using this protocol. ● ZooKeeper : distributed configuration and synchronization service
  • 9. Uses of Zookeeper ● Each Kafka Broker can coordinate with other broker with the help of zookeeper ● Zookeeper serves as the coordination interface between the Kafka brokers and consumers. ● Kafka stores basic metadata in Zookeeper such as information about topics, brokers, consumer offsets (queue readers) and so on. ● The leader election between the Kafka broker is also done by using Zookeeper in the event of leader failure.
  • 10. Kafka Cluster With Kafka we can create multiple types of cluster ● A single node—single broker cluster ● A single node—multiple broker clusters ● Multiple nodes—multiple broker clusters
  • 11. A single node—multiple broker clusters
  • 12. Partition and topics ● A topic may have many partitions thus enabling it to handle an arbitrary amount of data. Each partition is an ordered, immutable sequence of messages that is continually appended to a commit log. The messages in the partition are each assigned with a sequential id number called the offset, which uniquely identifies each message within the partition. ● The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of partitions. Each partition is replicated across a configurable number of servers. ● Kafka assigns each server with a leader and follower, which helps in the whole replication cycle of messages in the partitions. ● In a nutshell, Kafka partitions the incoming messages for a topic, and assigns these partitions to an available Kafka broker.
  • 13. Architectural Flow for Pub-Sub Messaging Kafka offers a single consumer abstraction that generalizes both Queuing and Publish-Subscribe Following is the step wise workflow of the Pub-Sub Messaging − ● Producers send message to a topic at regular intervals. ● Kafka broker stores all messages in the partitions configured for that particular topic. It ensures the messages are equally shared between partitions. If the producer sends two messages and there are two partitions, Kafka will store one message in the first partition and the second message in the second partition. ● Consumer subscribes to a specific topic. ● Once the consumer subscribes to a topic, Kafka will provide the current offset of the topic to the consumer and also saves the offset in the Zookeeper ensemble. ● Consumer will request the Kafka in a regular interval (like 100 Ms) for new messages.
  • 14. Architectural Flow ● Once Kafka receives the messages from producers, it forwards these messages to the consumers. ● Consumer will receive the message and process it. ● Once the messages are processed, consumer will send an acknowledgement to the Kafka broker. ● Once Kafka receives an acknowledgement, it changes the offset to the new value and updates it in the Zookeeper. Since offsets are maintained in the Zookeeper, the consumer can read next message correctly even during server outrages. ● This above flow will repeat until the consumer stops the request. ● Consumer has the option to rewind/skip to the desired offset of a topic at any time and read all the subsequent messages.
  • 15. Using Apache Kafka ● Install Zookeeper sudo apt-get install zookeeperd ● Download and extract kafka in a directory ● Start the Kafka Server Sh ~/kafka/bin/kafka-server-start.sh ~/kafka/config/server.properties
  • 16. ● Create a producer with a topic sh ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 - --topic TutorialTopic ● Create a consumer with same topic ~/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic TutorialTopic --from-beginning
  • 17. Who else uses Kafka Twitter - Twitter uses Storm-Kafka as a part of their stream processing infrastructure. Netflix - uses Kafka for real-time monitoring and event processing. Mozilla - Kafka will soon be replacing a part of Mozilla current production system to collect performance and usage data from the end-user’s browser for projects like Telemetry, Test Pilot, etc. https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
  • 18. Demo of Kafka ● This demo is a grails web application which uses kafka to send messages from producer to consumer ● The messages will include time spent on page , time when page visited . time when page left , current authenticated user and uri of page ● This will be received on consumer end and records are persisted in db
  • 19. Common Use cases of Apache kafka ● Website activity tracking: The web application sends events such as page views and searches Kafka, where they become available for real-time processing, dashboards and offline analytics in Hadoop ● Operational metrics: Alerting and reporting on operational metrics ● Log aggregation: Kafka can be used across an organization to collect logs from multiple services and make them available in standard format to multiple consumers, including Hadoop and Apache Solr.
  • 20. Common Use cases of Apache kafka Stream processing: A framework such as Spark Streaming reads data from a topic, processes it and writes processed data to a new topic where it becomes available for users and applications. Kafka’s strong durability is also very useful in the context of stream processing.
  • 21. Comparison with other Messaging system In maturity and features Rabbit MQ outshines Kafka but when it comes to durability , high throughput and fault tolerance Apache Kafka stands as winner https://www.infoq.com/articles/apache-kafka
  • 23. Thanks Project Demo url : https://github.com/ackhare/GrailsKafkaPageCounter Presented By - Chetan Khare