SlideShare ist ein Scribd-Unternehmen logo
1 von 87
Downloaden Sie, um offline zu lesen
©2017 LinkedIn Corporation. All Rights Reserved.
An Introduction to Apache Kafka and
Kafka Ecosystem at LinkedIn
Dong Lin
Data Infra Streaming @ LinkedIn
Open Data Science Conference
©2017 LinkedIn Corporation. All Rights Reserved.
Agenda
▪ Kafka basics (50 min)
▪ Kafka ecosystem at LinkedIn (40 min)
▪ Hands-on (30 min)
©2017 LinkedIn Corporation. All Rights Reserved. 3
Kafka basics
▪ What is Kafka?
– Motivation and design philosophy
▪ Who uses Kafka?
– Adoption in the open source community and use-cases at LinkedIn
▪ What is the fundamental design of Kafka?
– Partition and replication model
▪ How to configure Kafka for your use-case?
– Tradeoff among performance, persistence, availability and message order
▪ What is the development roadmap of Kafka?
– Recent and upcoming features
©2017 LinkedIn Corporation. All Rights Reserved. 4
Publish/Subscribe Messaging
• Multiple producers
• Multiple consumers
• Scalable and durable
• Created by LinkedIn
• Open sourced under Apache
©2017 LinkedIn Corporation. All Rights Reserved. 5
PageViewEvent
Hadoop
Direct transmission
Web server
©2017 LinkedIn Corporation. All Rights Reserved.
Many problems
Multiple
consumers
Destination
is slow
Destination
permanent
failure
Bug in
downstream
application
Destination
temporarily
unavailable
Multiple
producers
At least once
delivery
6
PageViewEvent
HadoopWeb server
©2017 LinkedIn Corporation. All Rights Reserved.
Use a publish-subscribe messaging system
Multiple
consumers
Destination
permanent
failure
Bug in
downstream
application
Multiple
producers
Destination
temporarily
unavailable
Pub/sub
system
7
Hadoop
Destination
is slow
At least once
delivery
Web server
©2017 LinkedIn Corporation. All Rights Reserved.
Use Kafka
Spark streaming
Multiple
consumers
Destination
permanent
failure
Bug in
downstream
application
FunctionalityPersistent
Delivery semanticsPerformance
Destination
temporarily
unavailable
Availability
8
Destination
is slow
At least once
delivery
Multiple
producers
Web server
©2017 LinkedIn Corporation. All Rights Reserved.
Problem: closely-coupled pipelines
▪ O(N^2) pipelines – limited organizational scalability
▪ Messages are duplicated proportional to number of clients
9
©2017 LinkedIn Corporation. All Rights Reserved.
Solution: publish-subscribe messaging system
▪ O(N) pipelines
▪ Space efficient
▪ Producers are decoupled from consumers
10
©2017 LinkedIn Corporation. All Rights Reserved.
Kafka as Unix Pipes
$ cat *.txt | tr A-Z a-z | grep hello
$ tail –F *.txt | tr A-Z a-z | grep hello
producer kafka Hadoop kafka Hadoop
Samza kafka Samza
Reference: http://www.confluent.io/blog
11
©2017 LinkedIn Corporation. All Rights Reserved.
Fan In
12
©2017 LinkedIn Corporation. All Rights Reserved.
Fan Out
13
©2017 LinkedIn Corporation. All Rights Reserved.
Add Branch
14
©2017 LinkedIn Corporation. All Rights Reserved.
Switch Branch
15
©2017 LinkedIn Corporation. All Rights Reserved.
Delete Branch
16
©2017 LinkedIn Corporation. All Rights Reserved.
Parallel Consumption
17
©2017 LinkedIn Corporation. All Rights Reserved. 18
Kafka basics
▪ What is Kafka?
– Motivation and design philosophy
▪ Who uses Kafka?
– Adoption in the open source community and use-cases at LinkedIn
▪ What is the fundamental design of Kafka?
– Partition and replication model
▪ How to configure Kafka for your use-case?
– Tradeoff among performance, persistence, availability and message order
▪ What is the development roadmap of Kafka?
– Recent and upcoming features
©2017 LinkedIn Corporation. All Rights Reserved.
Companies that use Kafka
LinkedIn Yahoo Twitter Airbnb
Pinterest Square Coursera Uber
Goldman Sachs Box Paypal Cisco
Dropbox Spotify Wikipedia Microsoft
Netflix CloudFlare Hotels.com …
Reference: https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
19
©2017 LinkedIn Corporation. All Rights Reserved.
Apache projects integrated with Kafka
• Stream processing
• Apache Storm
• Apache Samza
• Apache Spark Streaming
• Search and Query
• Apache Hive
• Presto
• Apache Hadoop
…
20
©2017 LinkedIn Corporation. All Rights Reserved.
Kafka volume at LinkedIn
21
• Produced
• Per day
2Trillion
messages
• Single cluster
• Unique data
5Gbps
Inbound
• Average 3X
consumption
• Before mirroring
18Gbps
Outbound
• Largest cluster has
250k partitions
• Up to 10k partitions
per broker
2.5M
Partitions
©2017 LinkedIn Corporation. All Rights Reserved.
Kafka use-cases at LinkedIn
22
• Member-related
Activity
Tracking Metrics Queuing Logging
• Application
metrics, service
calls
• Internal
application data,
messaging
• Largest users
are Samza and
Search
• Dedicated
cluster for
application logs
going to ELK
• High volume, low
retention
©2017 LinkedIn Corporation. All Rights Reserved. 23
Kafka basics
▪ What is Kafka?
– Motivation and design philosophy
▪ Who uses Kafka?
– Adoption in the open source community and use-cases at LinkedIn
▪ What is the fundamental design of Kafka?
– Partition and replication model
▪ How to configure Kafka for your use-case?
– Tradeoff among performance, persistence, availability and message order
▪ What is the development roadmap of Kafka?
– Recent and upcoming features
©2017 LinkedIn Corporation. All Rights Reserved.
Design goal
▪ Performance
– High throughput
– Low latency
– Scalable
▪ Persistence and availability
– Data should be available in the event of (permanent) server failure
▪ Functionality
– Rewind back in time
▪ Strong delivery semantics
– At-least-once delivery / exactly-once delivery
– In-order message delivery within partition
24
©2017 LinkedIn Corporation. All Rights Reserved.
Characteristics
• High throughput (~300 MBps per machine)
– Immutable append-only data structure for fast disk access
– Efficient data transfer via zero copy
– Mostly messages are read directly from page cache
– Partitioning model for scalability
– Batching and compression
• Low latency (~2 ms)
– Make data universally available in near real-time
• Strong guarantees about messages
– Messages strictly ordered within partition
– All data persistent on disk with replication
– Exactly once delivery
25
©2017 LinkedIn Corporation. All Rights Reserved.
Is disk slow?
26
©2017 LinkedIn Corporation. All Rights Reserved.
Traditional data copy
27
▪ 4 copies
▪ 2 context switches
©2017 LinkedIn Corporation. All Rights Reserved.
Efficient zero copy
28
▪ 3 copies
▪ 0 context switch
▪ Only 2 copies if consumers
are mostly caught up
©2017 LinkedIn Corporation. All Rights Reserved.
Kafka as log
29
©2017 LinkedIn Corporation. All Rights Reserved.
Producer -> Topic -> Consumer
30
©2017 LinkedIn Corporation. All Rights Reserved.
Topic divided into partitions
• Partitions are distributed and replicated across brokers
• Parallel produce/consume
• Messages with the same key go to the same partition
31
©2017 LinkedIn Corporation. All Rights Reserved.
Old New
Partition consists of messages with offsets
• Append only
• Strict order
• Messages assigned with incremental offsets
32
©2017 LinkedIn Corporation. All Rights Reserved. 33
▪ Disk/network/CPU load
distributed across brokers in
unit of partitions
Broker in Kafka
©2017 LinkedIn Corporation. All Rights Reserved.
Producer in Kafka
▪ Messages with same key go
to the same partition
▪ Messages without a key go to
a random partition
34
©2017 LinkedIn Corporation. All Rights Reserved.
Consumer in Kafka
▪ Consume can belong to a
consumer group (CG)
▪ Consumes in the same CG
– Parallel processing of messages
– Share the consumer offset
35
©2017 LinkedIn Corporation. All Rights Reserved.
When a broker fails…
X
36
©2017 LinkedIn Corporation. All Rights Reserved.
Partition replication in Kafka
▪Brokers can fail
– Controlled: e.g., upgrades/config changes
– Uncontrolled: disk failure, power outage, out-of-memory etc.
▪Need high availability
– Typical failover < 10 ms
▪Need data persistence
37
©2017 LinkedIn Corporation. All Rights Reserved.
Partition replica assignment
▪ Replicas are laid out evenly across brokers
▪ First assigned replica is preferred as leader.
▪ Writes/reads go to leader, which sends message to followers
38
©2017 LinkedIn Corporation. All Rights Reserved.
Replication (at a high-level)
39
©2017 LinkedIn Corporation. All Rights Reserved.
Replication (at a high-level)
40
©2017 LinkedIn Corporation. All Rights Reserved.
Replication (at a high-level)
41
©2017 LinkedIn Corporation. All Rights Reserved.
Replication (at a high-level)
42
©2017 LinkedIn Corporation. All Rights Reserved. 43
Kafka basics
▪ What is Kafka?
– Motivation and design philosophy
▪ Who uses Kafka?
– Adoption in the open source community and use-cases at LinkedIn
▪ What is the fundamental design of Kafka?
– Partition and replication model
▪ How to configure Kafka for your use-case?
– Tradeoff among performance, persistence, availability and message order
▪ What is the development roadmap of Kafka?
– Recent and upcoming features
©2017 LinkedIn Corporation. All Rights Reserved.
No one-size-fits-all configuration
44
©2017 LinkedIn Corporation. All Rights Reserved.
Tradeoff between performance and persistence
• Should broker send ack to producer right after step 1?
• Higher persistence and lower throughput with acks = -1 in producer config
X
45
©2017 LinkedIn Corporation. All Rights Reserved.
Tradeoff between performance and message order
46
• Should producer send new message before ack of the last message?
• In-order delivery and lower throughput with
max.in.flight.requests.per.connection = 1 in producer config
Kafka BrokerProducer
message 1
message 0 failed
retry message 0
message 0
©2017 LinkedIn Corporation. All Rights Reserved.
Tradeoff between persistence and availability
• Should we allow message produce if all in-sync replicas are offline?
• Higher availability and weaker persistence with
unclean.leader.election.enable = true in broker config
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 60 1 2 3 4 5
Follower 1 Follower 2
Leader
Read Read
47
7 8
X
X
©2017 LinkedIn Corporation. All Rights Reserved.
Tradeoff between availability and cost
• Do we need more replicas for the topic?
• Higher availability and higher cost with RF=3 in comparison to RF=2)
48
producer
Broker
Broker Broker
producer
Broker
Broker
RF=3 RF=2
©2017 LinkedIn Corporation. All Rights Reserved. 49
Kafka basics
▪ What is Kafka?
– Motivation and design philosophy
▪ Who uses Kafka?
– Adoption in the open source community and use-cases at LinkedIn
▪ What is the fundamental design of Kafka?
– Partition and replication model
▪ How to configure Kafka for your use-case?
– Tradeoff among performance, persistence, availability and message order
▪ What is the development roadmap of Kafka?
– Recent and upcoming features
©2017 LinkedIn Corporation. All Rights Reserved.
Kafka provides great performance, availability and data persistence
Are there other features that will be valuable to users?
50
©2017 LinkedIn Corporation. All Rights Reserved.
Improved support for multi-tenancy
▪ Sasl/Kerberos and SSL support (KIP-12)
▪ Quota (KIP-13)
▪ Namespace in Kafka topics (KIP-37)
▪ Zookeeper authentication (KIP-38)
▪ End-to-end encryption
51
©2017 LinkedIn Corporation. All Rights Reserved.
Reduced hardware and operational cost
▪ Dynamic configuration (KIP-21)
▪ Rack aware replica assignment (KIP-36)
▪ Self healing (KIP-46)
▪ On demand data deletion (KIP-107)
▪ JBOD support (KIP-112 and KIP-113)
52
©2017 LinkedIn Corporation. All Rights Reserved.
Additional functionality for broader use-cases
▪ Kafka connect for data import/export (KIP-26)
▪ Streaming processor (KIP-28)
▪ Timestamp in message (KIP-32)
▪ Exactly-once delivery and transactional messaging (KIP-98)
53
©2017 LinkedIn Corporation. All Rights Reserved.
Learn more about Kafka
▪ Stream processing meetup
▪ Kafka summit
▪ Kafka improvement proposals
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
▪ LinkedIn engineering blog https://engineering.linkedin.com/blog
54
©2017 LinkedIn Corporation. All Rights Reserved. 55
©2017 LinkedIn Corporation. All Rights Reserved.
Agenda
▪ Kafka basics (50 min)
▪ Kafka ecosystem at LinkedIn (40 min)
– Projects to monitor and manage Kafka servers
– Projects to monitor and debug Kafka clients
– Projects to make Kafka easier to use
– Projects that are built on Kafka
▪ Hands on (30 min)
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to monitor and manage Kafka servers
▪ cruise-control for automatically balancing partitions across brokers
▪ kafka-monitor for monitoring kafka service availability etc.
▪ kafka-audit for monitoring data loss
▪ InGraph for monitoring all JMX metrics from Kafka as time-series graph
57
©2017 LinkedIn Corporation. All Rights Reserved.
Problems before having Cruise Control
▪ SRE needs to wake up at night to move partitions in case of hardware failure
▪ SRE needs to manually move partitions to balance load across brokers
▪ Reduced availability due to need to wait for manual recovery
▪ The partition movement may impact production traffic
58
Open sourced on Github in Aug, 2017
©2017 LinkedIn Corporation. All Rights Reserved.
Cruise Control Architecture
59
▪ Self-heal from broker failure
▪ Balance load across brokers
without manual intervention
▪ Controlled impact on PROD
traffic when moving partitions
©2017 LinkedIn Corporation. All Rights Reserved.
Example Cruise Control goals
▪ Partitions should be distributed across brokers in a rack-aware manner
▪ Broker resource utilization should be below the user-specified threshold
▪ Try to evenly distribute resource utilization across brokers
60
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to monitor and manage Kafka servers
▪ cruise-control for automatically balancing partitions across brokers
▪ kafka-monitor for monitoring kafka service availability etc.
▪ kafka-audit for monitoring data loss
▪ InGraph for monitoring all JMX metrics from Kafka as time-series graph
61
©2017 LinkedIn Corporation. All Rights Reserved.
Problems before having Kafka Monitor
▪ Some issues are discovered only after bug report from Kafka user
▪ Can not quantify the availability and the latency of Kafka cluster
▪ Can not quantify the availability and the latency of Kafka mirrored pipeline
62
©2017 LinkedIn Corporation. All Rights Reserved.
Kafka Monitor Architecture
63
▪ Alert on service unavailability
▪ Quantify service availability
▪ Measure end-to-end latency
▪ Detect violation of Kafka semantics
Our availability SLA is 99.99%
©2017 LinkedIn Corporation. All Rights Reserved.
Other Kafka Monitor features
64
▪ Automatically distribute partitions of the monitor topic evenly across brokers
▪ Extensible module to export JMX metrics to various stores (e.g. Graphite)
▪ Pluggable interface to test Kafka service with your own client implementation
Open sourced on Github in May, 2016
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to monitor and manage Kafka servers
▪ cruise-control for automatically balancing partitions across brokers
▪ kafka-monitor for monitoring kafka service availability etc.
▪ kafka-audit for monitoring data loss
▪ InGraph for monitoring all JMX metrics from Kafka as time-series graph
65
©2017 LinkedIn Corporation. All Rights Reserved.
Problems before having Kafka Audit
▪ Hard to help user identify why their message is not received
▪ Hard to detect and debug message loss in Kafka pipelines
66
©2017 LinkedIn Corporation. All Rights Reserved.
Kafka Audit Architecture
67
▪ Detect messages loss
▪ Debug message loss
▪ Audit Kafka resource usage
©2017 LinkedIn Corporation. All Rights Reserved.
Example Kafka Audit UI
68
When, where and how many of messages are delivered to Kafka
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to monitor and manage Kafka servers
▪ cruise-control for automatically balancing partitions across brokers
▪ kafka-monitor for monitoring kafka service availability etc.
▪ kafka-audit for monitoring data loss
▪ InGraph for monitoring all JMX metrics from Kafka as time-series graph
69
©2017 LinkedIn Corporation. All Rights Reserved.
InGraph Architecture
70
Metric topic
in
Kafka Cluster
Broker
Broker
Client
InGraph
with
UI
Metric
messages
metric
messages
©2017 LinkedIn Corporation. All Rights Reserved.
Example InGraph UI
71
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to monitor and debug Kafka clients
▪ Burrow for monitoring offset lag of consumer groups
▪ kafka-audit for monitoring Kafka resource usage per client
72
©2017 LinkedIn Corporation. All Rights Reserved.
Burrow Architecture
▪ Detect lagging consumers
▪ Detect stalled consumers
▪ Detect stopped consumers
▪ Detect offset rewind
▪ Open sourced on Github
73
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to monitor and debug Kafka clients
▪ Burrow for monitoring offset lag of consumer groups
▪ kafka-audit for monitoring Kafka resource usage per client
74
Attribute the hardware cost in $$ to users of Kafka
and reduce unnecessary usage of Kafka
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to make Kafka easier to use
▪ kafka-rest to allow non-Java client to produce and consume from Kafka cluster
▪ schema-registry for conversion between binary data and IndexedRecord
▪ li-apache-kafka-clients to support large message etc.
▪ Nuage for users to create and manage properties (e.g. retention time) of their
topic by themselves
75
©2017 LinkedIn Corporation. All Rights Reserved.
Kafka Rest Architecture
76
▪ Support non-Java clients
▪ No need to maintain client
libraries in multiple languages
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to make Kafka easier to use
▪ kafka-rest to allow non-Java client to produce and consume from Kafka cluster
▪ schema-registry for conversion between binary data and IndexedRecord
▪ li-apache-kafka-clients to support large message etc.
▪ Nuage for users to create and manage properties (e.g. retention time) of their
topic by themselves
77
©2017 LinkedIn Corporation. All Rights Reserved.
Schema Registry Architecture
78
▪ Enable efficient binary
encoding of schema in the
Kafka message
▪ Track schema evolution
for forward and backward
compatibility
Kafka Cluster
LiProducer
with
Schema cache
LiConsumer
with
Schema cache
IndexedRecord
IndexedRecord
Binary
data
Binary
data
Schema Registry
Register schema Fetch schema
User application User application
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to make Kafka easier to use
▪ kafka-rest to allow non-Java client to produce and consume from Kafka cluster
▪ schema-registry for conversion between binary data and IndexedRecord
▪ li-apache-kafka-clients to support large message etc.
▪ Nuage for users to create and manage properties (e.g. retention time) of their
topic by themselves
79
©2017 LinkedIn Corporation. All Rights Reserved.
Large message support in li-apache-kafka-clients
80
©2017 LinkedIn Corporation. All Rights Reserved.
Projects to make Kafka easier to use
▪ kafka-rest to allow non-Java client to produce and consume from Kafka cluster
▪ schema-registry for conversion between binary data and IndexedRecord
▪ li-apache-kafka-clients to support large message etc.
▪ Nuage for users to create and manage properties (e.g. retention time) of their
topic by themselves
81
©2017 LinkedIn Corporation. All Rights Reserved.
Put things together
82
©2017 LinkedIn Corporation. All Rights Reserved.
Help yourself with these open source projects
▪ Cruise Control (https://github.com/linkedin/cruise-control)
▪ Kafka Monitor (https://github.com/linkedin/kafka-monitor)
▪ Burrow (https://github.com/linkedin/burrow)
▪ li-apache-kafka-clients (https://github.com/linkedin/li-apache-kafka-clients)
▪ Future projects open sourced by LinkedIn streaming team can be found at
https://github.com/linkedin/streaming
83
All projects are actively maintained and used in LinkedIn production environment
100% free of charge!
©2017 LinkedIn Corporation. All Rights Reserved.
Projects at LinkedIn that are built on Kafka
▪ Stream processing – Apache Samza
▪ Change data capture – Brooklin
▪ Strongly consistent key-value store – Espresso
▪ Efficient key-value store for derived data – Venice
84
©2017 LinkedIn Corporation. All Rights Reserved. 85
©2017 LinkedIn Corporation. All Rights Reserved. 86
Agenda
▪ Kafka basics (50 min)
▪ Kafka ecosystem at LinkedIn (40 min)
▪ Hands-on (30 min)
©2017 LinkedIn Corporation. All Rights Reserved. 87
Hands-on
▪ Visit goo.gl/D7GFfB
▪ Single cluster
– Download and compile Apache Kafka
– Setup a cluster of one broker
– Create and describe topic
– Produce and consume using Apache Kafka tools
– Monitor availability of your cluster using Kafka Monitor
▪ Mirrored pipeline
– Setup another cluster of one broker
– Setup MM to mirror traffic from the source cluster to the destination cluster
– Produce to the source cluster and consume from the destination cluster
– Monitor availability of your pipeline using Kafka Monitor

Weitere ähnliche Inhalte

Was ist angesagt?

Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
 
PHP and the Cloud: The view from the bazaar
PHP and the Cloud: The view from the bazaarPHP and the Cloud: The view from the bazaar
PHP and the Cloud: The view from the bazaarvitoc
 
Building Kafka-powered Activity Stream
Building Kafka-powered Activity StreamBuilding Kafka-powered Activity Stream
Building Kafka-powered Activity StreamOleksiy Holubyev
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...AWS Summits
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scalejimriecken
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIALa Cuisine du Web
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridPaolo Castagna
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaAkash Vacher
 

Was ist angesagt? (20)

Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
PHP and the Cloud: The view from the bazaar
PHP and the Cloud: The view from the bazaarPHP and the Cloud: The view from the bazaar
PHP and the Cloud: The view from the bazaar
 
Building Kafka-powered Activity Stream
Building Kafka-powered Activity StreamBuilding Kafka-powered Activity Stream
Building Kafka-powered Activity Stream
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Fabric8 mq
Fabric8 mqFabric8 mq
Fabric8 mq
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIA
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Micro service architecture
Micro service architecture  Micro service architecture
Micro service architecture
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 

Ähnlich wie An introduction to Apache Kafka and Kafka ecosystem at LinkedIn

Kafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot ProgrammerKafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot Programmerconfluent
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through KafkaDileep Kalidindi
 
Building Agile and Resilient Schema Transformations using Apache Kafka and ESB's
Building Agile and Resilient Schema Transformations using Apache Kafka and ESB'sBuilding Agile and Resilient Schema Transformations using Apache Kafka and ESB's
Building Agile and Resilient Schema Transformations using Apache Kafka and ESB'sRicardo Ferreira
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudAndrew Schofield
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQLCouchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQLDATAVERSITY
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
MySQL 5.7 InnoDB Cluster (Jan 2018)
MySQL 5.7 InnoDB Cluster (Jan 2018)MySQL 5.7 InnoDB Cluster (Jan 2018)
MySQL 5.7 InnoDB Cluster (Jan 2018)Olivier DASINI
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLOlivier DASINI
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hPrecisely
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability MattersMatt Lord
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-PipelinesTimothy Spann
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems IntegrationJenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems IntegrationOleg Nenashev
 

Ähnlich wie An introduction to Apache Kafka and Kafka ecosystem at LinkedIn (20)

Kafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot ProgrammerKafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot Programmer
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
 
Building Agile and Resilient Schema Transformations using Apache Kafka and ESB's
Building Agile and Resilient Schema Transformations using Apache Kafka and ESB'sBuilding Agile and Resilient Schema Transformations using Apache Kafka and ESB's
Building Agile and Resilient Schema Transformations using Apache Kafka and ESB's
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQLCouchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
MySQL 5.7 InnoDB Cluster (Jan 2018)
MySQL 5.7 InnoDB Cluster (Jan 2018)MySQL 5.7 InnoDB Cluster (Jan 2018)
MySQL 5.7 InnoDB Cluster (Jan 2018)
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability Matters
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems IntegrationJenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Mehr von Dong Lin

FeatHub_DataFun_2023.pptx
FeatHub_DataFun_2023.pptxFeatHub_DataFun_2023.pptx
FeatHub_DataFun_2023.pptxDong Lin
 
FeatHub_GAIDC_2022.pptx
FeatHub_GAIDC_2022.pptxFeatHub_GAIDC_2022.pptx
FeatHub_GAIDC_2022.pptxDong Lin
 
FeatHub_FFA_2022
FeatHub_FFA_2022FeatHub_FFA_2022
FeatHub_FFA_2022Dong Lin
 
基于 Flink 和 AI Flow 的实时推荐系统
基于 Flink 和 AI Flow 的实时推荐系统基于 Flink 和 AI Flow 的实时推荐系统
基于 Flink 和 AI Flow 的实时推荐系统Dong Lin
 
为实时机器学习设计的算法接口与迭代引擎_FFA_2021
为实时机器学习设计的算法接口与迭代引擎_FFA_2021为实时机器学习设计的算法接口与迭代引擎_FFA_2021
为实时机器学习设计的算法接口与迭代引擎_FFA_2021Dong Lin
 
Kafka at half the price with JBOD setup
Kafka at half the price with JBOD setupKafka at half the price with JBOD setup
Kafka at half the price with JBOD setupDong Lin
 

Mehr von Dong Lin (6)

FeatHub_DataFun_2023.pptx
FeatHub_DataFun_2023.pptxFeatHub_DataFun_2023.pptx
FeatHub_DataFun_2023.pptx
 
FeatHub_GAIDC_2022.pptx
FeatHub_GAIDC_2022.pptxFeatHub_GAIDC_2022.pptx
FeatHub_GAIDC_2022.pptx
 
FeatHub_FFA_2022
FeatHub_FFA_2022FeatHub_FFA_2022
FeatHub_FFA_2022
 
基于 Flink 和 AI Flow 的实时推荐系统
基于 Flink 和 AI Flow 的实时推荐系统基于 Flink 和 AI Flow 的实时推荐系统
基于 Flink 和 AI Flow 的实时推荐系统
 
为实时机器学习设计的算法接口与迭代引擎_FFA_2021
为实时机器学习设计的算法接口与迭代引擎_FFA_2021为实时机器学习设计的算法接口与迭代引擎_FFA_2021
为实时机器学习设计的算法接口与迭代引擎_FFA_2021
 
Kafka at half the price with JBOD setup
Kafka at half the price with JBOD setupKafka at half the price with JBOD setup
Kafka at half the price with JBOD setup
 

Kürzlich hochgeladen

AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxLina Kadam
 
ADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studyADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studydhruvamdhruvil123
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliNimot Muili
 
input buffering in lexical analysis in CD
input buffering in lexical analysis in CDinput buffering in lexical analysis in CD
input buffering in lexical analysis in CDHeadOfDepartmentComp1
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfLivre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfsaad175691
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHSneha Padhiar
 
Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxIntroduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxPoonam60376
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
Machine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdfMachine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdfadeyimikaipaye
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunicationnovrain7111
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organizationchnrketan
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
Introduction to Machine Learning Part1.pptx
Introduction to Machine Learning Part1.pptxIntroduction to Machine Learning Part1.pptx
Introduction to Machine Learning Part1.pptxPavan Mohan Neelamraju
 

Kürzlich hochgeladen (20)

AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptx
 
ADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studyADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain study
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
 
input buffering in lexical analysis in CD
input buffering in lexical analysis in CDinput buffering in lexical analysis in CD
input buffering in lexical analysis in CD
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfLivre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
 
Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxIntroduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptx
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
Machine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdfMachine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdf
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunication
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organization
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
Introduction to Machine Learning Part1.pptx
Introduction to Machine Learning Part1.pptxIntroduction to Machine Learning Part1.pptx
Introduction to Machine Learning Part1.pptx
 

An introduction to Apache Kafka and Kafka ecosystem at LinkedIn

  • 1. ©2017 LinkedIn Corporation. All Rights Reserved. An Introduction to Apache Kafka and Kafka Ecosystem at LinkedIn Dong Lin Data Infra Streaming @ LinkedIn Open Data Science Conference
  • 2. ©2017 LinkedIn Corporation. All Rights Reserved. Agenda ▪ Kafka basics (50 min) ▪ Kafka ecosystem at LinkedIn (40 min) ▪ Hands-on (30 min)
  • 3. ©2017 LinkedIn Corporation. All Rights Reserved. 3 Kafka basics ▪ What is Kafka? – Motivation and design philosophy ▪ Who uses Kafka? – Adoption in the open source community and use-cases at LinkedIn ▪ What is the fundamental design of Kafka? – Partition and replication model ▪ How to configure Kafka for your use-case? – Tradeoff among performance, persistence, availability and message order ▪ What is the development roadmap of Kafka? – Recent and upcoming features
  • 4. ©2017 LinkedIn Corporation. All Rights Reserved. 4 Publish/Subscribe Messaging • Multiple producers • Multiple consumers • Scalable and durable • Created by LinkedIn • Open sourced under Apache
  • 5. ©2017 LinkedIn Corporation. All Rights Reserved. 5 PageViewEvent Hadoop Direct transmission Web server
  • 6. ©2017 LinkedIn Corporation. All Rights Reserved. Many problems Multiple consumers Destination is slow Destination permanent failure Bug in downstream application Destination temporarily unavailable Multiple producers At least once delivery 6 PageViewEvent HadoopWeb server
  • 7. ©2017 LinkedIn Corporation. All Rights Reserved. Use a publish-subscribe messaging system Multiple consumers Destination permanent failure Bug in downstream application Multiple producers Destination temporarily unavailable Pub/sub system 7 Hadoop Destination is slow At least once delivery Web server
  • 8. ©2017 LinkedIn Corporation. All Rights Reserved. Use Kafka Spark streaming Multiple consumers Destination permanent failure Bug in downstream application FunctionalityPersistent Delivery semanticsPerformance Destination temporarily unavailable Availability 8 Destination is slow At least once delivery Multiple producers Web server
  • 9. ©2017 LinkedIn Corporation. All Rights Reserved. Problem: closely-coupled pipelines ▪ O(N^2) pipelines – limited organizational scalability ▪ Messages are duplicated proportional to number of clients 9
  • 10. ©2017 LinkedIn Corporation. All Rights Reserved. Solution: publish-subscribe messaging system ▪ O(N) pipelines ▪ Space efficient ▪ Producers are decoupled from consumers 10
  • 11. ©2017 LinkedIn Corporation. All Rights Reserved. Kafka as Unix Pipes $ cat *.txt | tr A-Z a-z | grep hello $ tail –F *.txt | tr A-Z a-z | grep hello producer kafka Hadoop kafka Hadoop Samza kafka Samza Reference: http://www.confluent.io/blog 11
  • 12. ©2017 LinkedIn Corporation. All Rights Reserved. Fan In 12
  • 13. ©2017 LinkedIn Corporation. All Rights Reserved. Fan Out 13
  • 14. ©2017 LinkedIn Corporation. All Rights Reserved. Add Branch 14
  • 15. ©2017 LinkedIn Corporation. All Rights Reserved. Switch Branch 15
  • 16. ©2017 LinkedIn Corporation. All Rights Reserved. Delete Branch 16
  • 17. ©2017 LinkedIn Corporation. All Rights Reserved. Parallel Consumption 17
  • 18. ©2017 LinkedIn Corporation. All Rights Reserved. 18 Kafka basics ▪ What is Kafka? – Motivation and design philosophy ▪ Who uses Kafka? – Adoption in the open source community and use-cases at LinkedIn ▪ What is the fundamental design of Kafka? – Partition and replication model ▪ How to configure Kafka for your use-case? – Tradeoff among performance, persistence, availability and message order ▪ What is the development roadmap of Kafka? – Recent and upcoming features
  • 19. ©2017 LinkedIn Corporation. All Rights Reserved. Companies that use Kafka LinkedIn Yahoo Twitter Airbnb Pinterest Square Coursera Uber Goldman Sachs Box Paypal Cisco Dropbox Spotify Wikipedia Microsoft Netflix CloudFlare Hotels.com … Reference: https://cwiki.apache.org/confluence/display/KAFKA/Powered+By 19
  • 20. ©2017 LinkedIn Corporation. All Rights Reserved. Apache projects integrated with Kafka • Stream processing • Apache Storm • Apache Samza • Apache Spark Streaming • Search and Query • Apache Hive • Presto • Apache Hadoop … 20
  • 21. ©2017 LinkedIn Corporation. All Rights Reserved. Kafka volume at LinkedIn 21 • Produced • Per day 2Trillion messages • Single cluster • Unique data 5Gbps Inbound • Average 3X consumption • Before mirroring 18Gbps Outbound • Largest cluster has 250k partitions • Up to 10k partitions per broker 2.5M Partitions
  • 22. ©2017 LinkedIn Corporation. All Rights Reserved. Kafka use-cases at LinkedIn 22 • Member-related Activity Tracking Metrics Queuing Logging • Application metrics, service calls • Internal application data, messaging • Largest users are Samza and Search • Dedicated cluster for application logs going to ELK • High volume, low retention
  • 23. ©2017 LinkedIn Corporation. All Rights Reserved. 23 Kafka basics ▪ What is Kafka? – Motivation and design philosophy ▪ Who uses Kafka? – Adoption in the open source community and use-cases at LinkedIn ▪ What is the fundamental design of Kafka? – Partition and replication model ▪ How to configure Kafka for your use-case? – Tradeoff among performance, persistence, availability and message order ▪ What is the development roadmap of Kafka? – Recent and upcoming features
  • 24. ©2017 LinkedIn Corporation. All Rights Reserved. Design goal ▪ Performance – High throughput – Low latency – Scalable ▪ Persistence and availability – Data should be available in the event of (permanent) server failure ▪ Functionality – Rewind back in time ▪ Strong delivery semantics – At-least-once delivery / exactly-once delivery – In-order message delivery within partition 24
  • 25. ©2017 LinkedIn Corporation. All Rights Reserved. Characteristics • High throughput (~300 MBps per machine) – Immutable append-only data structure for fast disk access – Efficient data transfer via zero copy – Mostly messages are read directly from page cache – Partitioning model for scalability – Batching and compression • Low latency (~2 ms) – Make data universally available in near real-time • Strong guarantees about messages – Messages strictly ordered within partition – All data persistent on disk with replication – Exactly once delivery 25
  • 26. ©2017 LinkedIn Corporation. All Rights Reserved. Is disk slow? 26
  • 27. ©2017 LinkedIn Corporation. All Rights Reserved. Traditional data copy 27 ▪ 4 copies ▪ 2 context switches
  • 28. ©2017 LinkedIn Corporation. All Rights Reserved. Efficient zero copy 28 ▪ 3 copies ▪ 0 context switch ▪ Only 2 copies if consumers are mostly caught up
  • 29. ©2017 LinkedIn Corporation. All Rights Reserved. Kafka as log 29
  • 30. ©2017 LinkedIn Corporation. All Rights Reserved. Producer -> Topic -> Consumer 30
  • 31. ©2017 LinkedIn Corporation. All Rights Reserved. Topic divided into partitions • Partitions are distributed and replicated across brokers • Parallel produce/consume • Messages with the same key go to the same partition 31
  • 32. ©2017 LinkedIn Corporation. All Rights Reserved. Old New Partition consists of messages with offsets • Append only • Strict order • Messages assigned with incremental offsets 32
  • 33. ©2017 LinkedIn Corporation. All Rights Reserved. 33 ▪ Disk/network/CPU load distributed across brokers in unit of partitions Broker in Kafka
  • 34. ©2017 LinkedIn Corporation. All Rights Reserved. Producer in Kafka ▪ Messages with same key go to the same partition ▪ Messages without a key go to a random partition 34
  • 35. ©2017 LinkedIn Corporation. All Rights Reserved. Consumer in Kafka ▪ Consume can belong to a consumer group (CG) ▪ Consumes in the same CG – Parallel processing of messages – Share the consumer offset 35
  • 36. ©2017 LinkedIn Corporation. All Rights Reserved. When a broker fails… X 36
  • 37. ©2017 LinkedIn Corporation. All Rights Reserved. Partition replication in Kafka ▪Brokers can fail – Controlled: e.g., upgrades/config changes – Uncontrolled: disk failure, power outage, out-of-memory etc. ▪Need high availability – Typical failover < 10 ms ▪Need data persistence 37
  • 38. ©2017 LinkedIn Corporation. All Rights Reserved. Partition replica assignment ▪ Replicas are laid out evenly across brokers ▪ First assigned replica is preferred as leader. ▪ Writes/reads go to leader, which sends message to followers 38
  • 39. ©2017 LinkedIn Corporation. All Rights Reserved. Replication (at a high-level) 39
  • 40. ©2017 LinkedIn Corporation. All Rights Reserved. Replication (at a high-level) 40
  • 41. ©2017 LinkedIn Corporation. All Rights Reserved. Replication (at a high-level) 41
  • 42. ©2017 LinkedIn Corporation. All Rights Reserved. Replication (at a high-level) 42
  • 43. ©2017 LinkedIn Corporation. All Rights Reserved. 43 Kafka basics ▪ What is Kafka? – Motivation and design philosophy ▪ Who uses Kafka? – Adoption in the open source community and use-cases at LinkedIn ▪ What is the fundamental design of Kafka? – Partition and replication model ▪ How to configure Kafka for your use-case? – Tradeoff among performance, persistence, availability and message order ▪ What is the development roadmap of Kafka? – Recent and upcoming features
  • 44. ©2017 LinkedIn Corporation. All Rights Reserved. No one-size-fits-all configuration 44
  • 45. ©2017 LinkedIn Corporation. All Rights Reserved. Tradeoff between performance and persistence • Should broker send ack to producer right after step 1? • Higher persistence and lower throughput with acks = -1 in producer config X 45
  • 46. ©2017 LinkedIn Corporation. All Rights Reserved. Tradeoff between performance and message order 46 • Should producer send new message before ack of the last message? • In-order delivery and lower throughput with max.in.flight.requests.per.connection = 1 in producer config Kafka BrokerProducer message 1 message 0 failed retry message 0 message 0
  • 47. ©2017 LinkedIn Corporation. All Rights Reserved. Tradeoff between persistence and availability • Should we allow message produce if all in-sync replicas are offline? • Higher availability and weaker persistence with unclean.leader.election.enable = true in broker config 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 60 1 2 3 4 5 Follower 1 Follower 2 Leader Read Read 47 7 8 X X
  • 48. ©2017 LinkedIn Corporation. All Rights Reserved. Tradeoff between availability and cost • Do we need more replicas for the topic? • Higher availability and higher cost with RF=3 in comparison to RF=2) 48 producer Broker Broker Broker producer Broker Broker RF=3 RF=2
  • 49. ©2017 LinkedIn Corporation. All Rights Reserved. 49 Kafka basics ▪ What is Kafka? – Motivation and design philosophy ▪ Who uses Kafka? – Adoption in the open source community and use-cases at LinkedIn ▪ What is the fundamental design of Kafka? – Partition and replication model ▪ How to configure Kafka for your use-case? – Tradeoff among performance, persistence, availability and message order ▪ What is the development roadmap of Kafka? – Recent and upcoming features
  • 50. ©2017 LinkedIn Corporation. All Rights Reserved. Kafka provides great performance, availability and data persistence Are there other features that will be valuable to users? 50
  • 51. ©2017 LinkedIn Corporation. All Rights Reserved. Improved support for multi-tenancy ▪ Sasl/Kerberos and SSL support (KIP-12) ▪ Quota (KIP-13) ▪ Namespace in Kafka topics (KIP-37) ▪ Zookeeper authentication (KIP-38) ▪ End-to-end encryption 51
  • 52. ©2017 LinkedIn Corporation. All Rights Reserved. Reduced hardware and operational cost ▪ Dynamic configuration (KIP-21) ▪ Rack aware replica assignment (KIP-36) ▪ Self healing (KIP-46) ▪ On demand data deletion (KIP-107) ▪ JBOD support (KIP-112 and KIP-113) 52
  • 53. ©2017 LinkedIn Corporation. All Rights Reserved. Additional functionality for broader use-cases ▪ Kafka connect for data import/export (KIP-26) ▪ Streaming processor (KIP-28) ▪ Timestamp in message (KIP-32) ▪ Exactly-once delivery and transactional messaging (KIP-98) 53
  • 54. ©2017 LinkedIn Corporation. All Rights Reserved. Learn more about Kafka ▪ Stream processing meetup ▪ Kafka summit ▪ Kafka improvement proposals https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals ▪ LinkedIn engineering blog https://engineering.linkedin.com/blog 54
  • 55. ©2017 LinkedIn Corporation. All Rights Reserved. 55
  • 56. ©2017 LinkedIn Corporation. All Rights Reserved. Agenda ▪ Kafka basics (50 min) ▪ Kafka ecosystem at LinkedIn (40 min) – Projects to monitor and manage Kafka servers – Projects to monitor and debug Kafka clients – Projects to make Kafka easier to use – Projects that are built on Kafka ▪ Hands on (30 min)
  • 57. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to monitor and manage Kafka servers ▪ cruise-control for automatically balancing partitions across brokers ▪ kafka-monitor for monitoring kafka service availability etc. ▪ kafka-audit for monitoring data loss ▪ InGraph for monitoring all JMX metrics from Kafka as time-series graph 57
  • 58. ©2017 LinkedIn Corporation. All Rights Reserved. Problems before having Cruise Control ▪ SRE needs to wake up at night to move partitions in case of hardware failure ▪ SRE needs to manually move partitions to balance load across brokers ▪ Reduced availability due to need to wait for manual recovery ▪ The partition movement may impact production traffic 58 Open sourced on Github in Aug, 2017
  • 59. ©2017 LinkedIn Corporation. All Rights Reserved. Cruise Control Architecture 59 ▪ Self-heal from broker failure ▪ Balance load across brokers without manual intervention ▪ Controlled impact on PROD traffic when moving partitions
  • 60. ©2017 LinkedIn Corporation. All Rights Reserved. Example Cruise Control goals ▪ Partitions should be distributed across brokers in a rack-aware manner ▪ Broker resource utilization should be below the user-specified threshold ▪ Try to evenly distribute resource utilization across brokers 60
  • 61. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to monitor and manage Kafka servers ▪ cruise-control for automatically balancing partitions across brokers ▪ kafka-monitor for monitoring kafka service availability etc. ▪ kafka-audit for monitoring data loss ▪ InGraph for monitoring all JMX metrics from Kafka as time-series graph 61
  • 62. ©2017 LinkedIn Corporation. All Rights Reserved. Problems before having Kafka Monitor ▪ Some issues are discovered only after bug report from Kafka user ▪ Can not quantify the availability and the latency of Kafka cluster ▪ Can not quantify the availability and the latency of Kafka mirrored pipeline 62
  • 63. ©2017 LinkedIn Corporation. All Rights Reserved. Kafka Monitor Architecture 63 ▪ Alert on service unavailability ▪ Quantify service availability ▪ Measure end-to-end latency ▪ Detect violation of Kafka semantics Our availability SLA is 99.99%
  • 64. ©2017 LinkedIn Corporation. All Rights Reserved. Other Kafka Monitor features 64 ▪ Automatically distribute partitions of the monitor topic evenly across brokers ▪ Extensible module to export JMX metrics to various stores (e.g. Graphite) ▪ Pluggable interface to test Kafka service with your own client implementation Open sourced on Github in May, 2016
  • 65. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to monitor and manage Kafka servers ▪ cruise-control for automatically balancing partitions across brokers ▪ kafka-monitor for monitoring kafka service availability etc. ▪ kafka-audit for monitoring data loss ▪ InGraph for monitoring all JMX metrics from Kafka as time-series graph 65
  • 66. ©2017 LinkedIn Corporation. All Rights Reserved. Problems before having Kafka Audit ▪ Hard to help user identify why their message is not received ▪ Hard to detect and debug message loss in Kafka pipelines 66
  • 67. ©2017 LinkedIn Corporation. All Rights Reserved. Kafka Audit Architecture 67 ▪ Detect messages loss ▪ Debug message loss ▪ Audit Kafka resource usage
  • 68. ©2017 LinkedIn Corporation. All Rights Reserved. Example Kafka Audit UI 68 When, where and how many of messages are delivered to Kafka
  • 69. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to monitor and manage Kafka servers ▪ cruise-control for automatically balancing partitions across brokers ▪ kafka-monitor for monitoring kafka service availability etc. ▪ kafka-audit for monitoring data loss ▪ InGraph for monitoring all JMX metrics from Kafka as time-series graph 69
  • 70. ©2017 LinkedIn Corporation. All Rights Reserved. InGraph Architecture 70 Metric topic in Kafka Cluster Broker Broker Client InGraph with UI Metric messages metric messages
  • 71. ©2017 LinkedIn Corporation. All Rights Reserved. Example InGraph UI 71
  • 72. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to monitor and debug Kafka clients ▪ Burrow for monitoring offset lag of consumer groups ▪ kafka-audit for monitoring Kafka resource usage per client 72
  • 73. ©2017 LinkedIn Corporation. All Rights Reserved. Burrow Architecture ▪ Detect lagging consumers ▪ Detect stalled consumers ▪ Detect stopped consumers ▪ Detect offset rewind ▪ Open sourced on Github 73
  • 74. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to monitor and debug Kafka clients ▪ Burrow for monitoring offset lag of consumer groups ▪ kafka-audit for monitoring Kafka resource usage per client 74 Attribute the hardware cost in $$ to users of Kafka and reduce unnecessary usage of Kafka
  • 75. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to make Kafka easier to use ▪ kafka-rest to allow non-Java client to produce and consume from Kafka cluster ▪ schema-registry for conversion between binary data and IndexedRecord ▪ li-apache-kafka-clients to support large message etc. ▪ Nuage for users to create and manage properties (e.g. retention time) of their topic by themselves 75
  • 76. ©2017 LinkedIn Corporation. All Rights Reserved. Kafka Rest Architecture 76 ▪ Support non-Java clients ▪ No need to maintain client libraries in multiple languages
  • 77. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to make Kafka easier to use ▪ kafka-rest to allow non-Java client to produce and consume from Kafka cluster ▪ schema-registry for conversion between binary data and IndexedRecord ▪ li-apache-kafka-clients to support large message etc. ▪ Nuage for users to create and manage properties (e.g. retention time) of their topic by themselves 77
  • 78. ©2017 LinkedIn Corporation. All Rights Reserved. Schema Registry Architecture 78 ▪ Enable efficient binary encoding of schema in the Kafka message ▪ Track schema evolution for forward and backward compatibility Kafka Cluster LiProducer with Schema cache LiConsumer with Schema cache IndexedRecord IndexedRecord Binary data Binary data Schema Registry Register schema Fetch schema User application User application
  • 79. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to make Kafka easier to use ▪ kafka-rest to allow non-Java client to produce and consume from Kafka cluster ▪ schema-registry for conversion between binary data and IndexedRecord ▪ li-apache-kafka-clients to support large message etc. ▪ Nuage for users to create and manage properties (e.g. retention time) of their topic by themselves 79
  • 80. ©2017 LinkedIn Corporation. All Rights Reserved. Large message support in li-apache-kafka-clients 80
  • 81. ©2017 LinkedIn Corporation. All Rights Reserved. Projects to make Kafka easier to use ▪ kafka-rest to allow non-Java client to produce and consume from Kafka cluster ▪ schema-registry for conversion between binary data and IndexedRecord ▪ li-apache-kafka-clients to support large message etc. ▪ Nuage for users to create and manage properties (e.g. retention time) of their topic by themselves 81
  • 82. ©2017 LinkedIn Corporation. All Rights Reserved. Put things together 82
  • 83. ©2017 LinkedIn Corporation. All Rights Reserved. Help yourself with these open source projects ▪ Cruise Control (https://github.com/linkedin/cruise-control) ▪ Kafka Monitor (https://github.com/linkedin/kafka-monitor) ▪ Burrow (https://github.com/linkedin/burrow) ▪ li-apache-kafka-clients (https://github.com/linkedin/li-apache-kafka-clients) ▪ Future projects open sourced by LinkedIn streaming team can be found at https://github.com/linkedin/streaming 83 All projects are actively maintained and used in LinkedIn production environment 100% free of charge!
  • 84. ©2017 LinkedIn Corporation. All Rights Reserved. Projects at LinkedIn that are built on Kafka ▪ Stream processing – Apache Samza ▪ Change data capture – Brooklin ▪ Strongly consistent key-value store – Espresso ▪ Efficient key-value store for derived data – Venice 84
  • 85. ©2017 LinkedIn Corporation. All Rights Reserved. 85
  • 86. ©2017 LinkedIn Corporation. All Rights Reserved. 86 Agenda ▪ Kafka basics (50 min) ▪ Kafka ecosystem at LinkedIn (40 min) ▪ Hands-on (30 min)
  • 87. ©2017 LinkedIn Corporation. All Rights Reserved. 87 Hands-on ▪ Visit goo.gl/D7GFfB ▪ Single cluster – Download and compile Apache Kafka – Setup a cluster of one broker – Create and describe topic – Produce and consume using Apache Kafka tools – Monitor availability of your cluster using Kafka Monitor ▪ Mirrored pipeline – Setup another cluster of one broker – Setup MM to mirror traffic from the source cluster to the destination cluster – Produce to the source cluster and consume from the destination cluster – Monitor availability of your pipeline using Kafka Monitor