SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Trade-offs in Distributed Systems Design:
Is Kafka The Best?
Ben Stopford, Michael G. Noll
Office of the CTO, Confluent Inc
Kafka Summit Austin 2020 @ August 24-25, 2020
Trade-offs in Infrastructure Design:
‘Better’ is always subjective
2
Impressing
your friends
Taking the kids
to school
Benchmark comparison of
• Kafka (Log)
• RabbitMQ (Classical Messaging)
• Pulsar (BookKeeper derivative)
This chart shows the results for:
• Maximum steady-state throughput
using the Open Messaging Benchmark
on identical 3-node clusters.
• Equal Produce/Consume workload.
• Full details available at
https://www.confluent.io/blog/kafka-fast
est-messaging-system/
Impact of trade-offs is tangible
3
605MB/s
305MB/s
38MB/s
Genesis of Messaging Systems
4
2000 2010 2020
Early Messaging, JMS & later AMQP (1990’s onwards)
● Design: Message / Channel / Single machine.
● ActiveMQ first open source. Later HornetQ added.
● RabbitMQ built for AMQP.
● Many others (NATS, Aeron, ZeroMQ…)
AWS Kinesis (2013)
● Kafka-like design
● Limited relative
performance
● Novel shared-service
design
Azure Event Hubs (2014)
● Kafka-like design
● Implements AMQP
(difficult as pre-streaming
protocol)
Early Messaging Era Event Streaming Era
Kafka (2012)
● 1st Event Streaming system (distributed in all layers)
● Designed primarily for ‘events’ using the log abstraction
● Departs from messaging world by including scalable
storage and processing
Bookkeeper derivatives
(2016+)
● Distributed Log (2016)
● Pulsar (2018)
● Pravega (2018)
● All “caching tiers”++
built over Bookkeeper.
BookKeeper (2011)
● Goal: write-ahead-log for
Hadoop HDFS NameNode
(ultimately not used).
● 2011 BookKeeper released
as part of ZooKeeper
Messaging Model Basics
5
Unordered + most
recent delivery,
msg-level ack
Ordered + partitioned
delivery for parallelism
(consumer group)
Point-to-Point Channel, 1 consumer
(ordered)
Point-to-Point, many competing consumers
(unordered)
Publish-Subscribe, many individual consumers
(ordered, same dataset to each)
Event Streaming, many partitioned consumers
(ordered, partitioned consumption)
Messaging Model Basics
6
Suits Classical Messaging: single machine, message-oriented
Suits Event Streaming: distributed, data-oriented (events)
Trade-offs in
Distributed System Design
Contiguous Streams
vs.
Fragmented Streams
Contiguous Streams vs. Fragmented Streams
Trade-off: Little vs. Lots of Metadata // Navigational Simplicity vs. Even Storage Distribution
Log-based Approach (Kafka):
partition data is contiguous, on 1 node
BookKeeper derivatives (DistributedLog, Pulsar, etc.):
partition data is fragmented, spread across N nodes
Pros:
● Fast reads and writes
(Quick navigation. Data
locality.)
● Little metadata (what is
where?):
p1[r0,r1,r2]
● Makes it easier to
remove ZK, where
metadata is stored
Cons:
● Network indirection.
● Lots of metadata (what is
where?) everywhere to keep
consistent, cache locally, etc.:
p1[r0[0,10], r1[11,22],
r2[23,45], r1[46,47],
r3[48,50], r0[51,54],
r2[55,58], …]
● Slow recovery of lost data
Cons:
● Storage unevenly
distributed, if using
key-based partitioning
● Partition must fit on
one machine (without
tiered storage)
Pros:
● Storage distributed more
evenly
● Partition can span
multiple machines
● Also useful to let new
machines accept writes
immediately
9
Log-based storage
vs.
Index-based storage
Sequential Access vs. Random Access
11
Log-based Approach (Kafka):
Contiguous storage per partition
Classical Approach (Rabbit, ActiveMQ, BookKeeper derivatives):
Interleaved entries for many partitions in one file
Index
(KahaDB,
LevelDB,
RocksDB,
etc.)
Trade-off: Log-based storage vs. Index-based storage
P1
P2
All Partitions
Fetch messages for
partition 2.
Fetch messages for
partition 2.
Pros
● Uses contiguous
operations that allow
fast reads and writes
Cons
● Number of partitions
limited by file handles
Pros
● Good write performance
● Number of partitions not
limited by file handles
Cons
● Slower read performance
● Indexing overhead
Single Tier
vs.
Separate Tiers
● Single tiers are great as they make our systems simpler, efficient, easier to build and to use.
● Adding a tier is no free lunch. Upsides should outweigh the downsides.
Single Tier vs. Multiple Tiers
Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers
☺ Simple is beautiful
13
Kafka Core
● Such simplicity is great. That’s why many of us look forward to Kafka without ZooKeeper!
● But Kafka’s relation to ZooKeeper is not really about tiering, so we cover it in the next section.
Single Tier vs. Multiple Tiers
Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers
Kafka Core ZooKeeper
☹ But not really a ‘tier’!
14
● Tiering can make sense, e.g. as you enhance your system with other systems.
● For example, when the tiers should be scaled independently.
Single Tier vs. Multiple Tiers
Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers
Kafka Core
ksqlDB
☺
CPU bound
IO/network bound
15
● Pulsar is ‘caching’ over BookKeeper (read performance, read elasticity).
● Much like memcached can add caching to PostgreSQL.
Single Tier vs. Multiple Tiers
Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers
Pulsar (caching)
BookKeeper (storage) PostgreSQL (storage)
memcached (caching)
Would you add memcached over Kafka?
16
● Adding a caching tier to Kafka?
● Probably not, because of cost ($$$) as layers aren’t free, and Kafka is already faster!
Single Tier vs. Multiple Tiers
Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers
17
=Kafka
broker
Kafka
broker
Kafka
broker
Kafka
broker
Pulsar
broker
BK bookie BK bookie
Pulsar
broker
Better: add Tiered Storage (KIP-405)
18
Kafka Core (hot data)
Tiered Storage (cold data)
● Already elastic (e.g. AWS S3)
● Unlimited storage
● Cheaper storage
● Scale-in/out requires movement
of active segments only
● Biggest challenge for elasticity is moving large quantities of cold data
● Tiered storage eliminates the expensive data-intensive move operations needed for
scale-in/out.
● Kafka is already faster for hot data (cf. benchmark).
● Tiered storage adds elasticity with cold data tiered.
● In a Cloud-native architecture ⇒ The BookKeeper layer becomes redundant.
Single Tier vs. Multiple Tiers
Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers
Kafka Core (hot data) Pulsar (hot data)
BookKeeper (cold data)
Tiered Storage (cold data) Tiered Storage (cold data)
Redundant?
19
● Confluent Cloud provides a great example of Kafka’s elasticity
● Scales from 0 to 100 Megabytes/s and down near-instantaneously
● Unlimited data storage
Single Tier vs. Multiple Tiers
Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers
20
User never has to ‘resize’ a
cluster because there are
no brokers or servers to manage
‘It Just Works’
vs.
Flexibility of Many Parts
Integrated vs. Portfolio Solution
Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup
Image credit: Apple Image credit: Confluent #gamers channel
● Just works
● Expensive to build.
● Faster time-to-market
● Integration issues
22
● Portfolio makes sense when there are separate concerns, and you want to deploy them
independently.
Integrated vs. Portfolio Solution
Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup
Kafka Core
☺
Kafka
Connect
Kafka
Connect
Kafka
Connect
Kafka
Connect
Finance
team
InfoSec
team
Ops
team
Your
team
23
● Portfolio of ‘Kafka + ZooKeeper’ gave the Kafka project fast time-to-market in 2012.
● By 2020, however, Kafka and the needs of its users have changed.
Integrated vs. Portfolio Solution
Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup
Kafka Core
ZooKeeper
☹ ZK is always required
by Kafka until KIP-500
24
After KIP-500, Kafka is
self-sufficient (no ZK
needed)
● Portfolio of ‘Kafka + ZooKeeper’ gave the Kafka project fast time-to-market in 2012.
● By 2020, however, Kafka and the needs of its users have changed.
● KIP-500 replaces ZK with integrated Kafka functionality for ‘It Just Works’.
● Removes e.g. scalability limitations like max number of partitions in a Kafka cluster
Integrated vs. Portfolio Solution
Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup
Kafka Core
☺
25
“Integrated” is always better (for the user), but it’s more expensive to build.
Integrated vs. Portfolio Solution
Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup
26
VS.
Summary
Summary
28
“It’s not right or wrong. It’s trade-offs!”
● Kafka
○ Log-based approach provides top-of-class performance with low-overhead reads and writes.
○ Confluent Cloud is the most complete, cloud-native Kafka offering on the cloud.
● RabbitMQ, ActiveMQ
○ Designed for short-lived messaging, where data is quickly removed after it is consumed.
● BookKeeper derivatives: Distributed Log, Pulsar, Pravega
○ Has elements of log-based storage, but inherits some limitations of traditional messaging (e.g., disk and
segment fragmentation).
● AWS Kinesis, Azure Event Hubs:
○ Limited performance compared to Kafka (anecdotal).
○ Novel cloud-native designs, but little known about internal implementations.
● Lots we didn’t mention: scale up vs. scale out, transactional messaging, …
○ See the associated blog for all the details.
“For event streaming, Kafka remains the best.
The most mature. The largest ecosystem.”
Thank you!
@benstopford
@miguno

Weitere ähnliche Inhalte

Was ist angesagt?

Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Jukka Zitting
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...Databricks
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...Amazon Web Services Japan
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflixVinay Kumar Chella
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersDatabricks
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Introduction to Apache ActiveMQ Artemis
Introduction to Apache ActiveMQ ArtemisIntroduction to Apache ActiveMQ Artemis
Introduction to Apache ActiveMQ ArtemisYoshimasa Tanabe
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)
AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)
AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)Amazon Web Services
 
ELK, a real case study
ELK,  a real case studyELK,  a real case study
ELK, a real case studyPaolo Tonin
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaJiangjie Qin
 

Was ist angesagt? (20)

Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
Terraform
TerraformTerraform
Terraform
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Introduction to Apache ActiveMQ Artemis
Introduction to Apache ActiveMQ ArtemisIntroduction to Apache ActiveMQ Artemis
Introduction to Apache ActiveMQ Artemis
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)
AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)
AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
ELK, a real case study
ELK,  a real case studyELK,  a real case study
ELK, a real case study
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 

Ähnlich wie Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and Michael Noll, Confluent) Kafka Summit 2020

14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
What's new in the world of apache kafka
What's new in the world of apache kafkaWhat's new in the world of apache kafka
What's new in the world of apache kafkaOfir Sharony
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafkaconfluent
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...HostedbyConfluent
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityConSanFrancisco123
 
Elastically Scaling Kafka Using Confluent
Elastically Scaling Kafka Using ConfluentElastically Scaling Kafka Using Confluent
Elastically Scaling Kafka Using Confluentconfluent
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureYaroslav Tkachenko
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowEd Balduf
 
A day in the life of a log message
A day in the life of a log messageA day in the life of a log message
A day in the life of a log messageJosef Karásek
 

Ähnlich wie Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and Michael Noll, Confluent) Kafka Summit 2020 (20)

14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
What's new in the world of apache kafka
What's new in the world of apache kafkaWhat's new in the world of apache kafka
What's new in the world of apache kafka
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And Availability
 
Elastically Scaling Kafka Using Confluent
Elastically Scaling Kafka Using ConfluentElastically Scaling Kafka Using Confluent
Elastically Scaling Kafka Using Confluent
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda Architecture
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
 
A day in the life of a log message
A day in the life of a log messageA day in the life of a log message
A day in the life of a log message
 

Mehr von HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

Mehr von HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Kürzlich hochgeladen

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and Michael Noll, Confluent) Kafka Summit 2020

  • 1. Trade-offs in Distributed Systems Design: Is Kafka The Best? Ben Stopford, Michael G. Noll Office of the CTO, Confluent Inc Kafka Summit Austin 2020 @ August 24-25, 2020
  • 2. Trade-offs in Infrastructure Design: ‘Better’ is always subjective 2 Impressing your friends Taking the kids to school
  • 3. Benchmark comparison of • Kafka (Log) • RabbitMQ (Classical Messaging) • Pulsar (BookKeeper derivative) This chart shows the results for: • Maximum steady-state throughput using the Open Messaging Benchmark on identical 3-node clusters. • Equal Produce/Consume workload. • Full details available at https://www.confluent.io/blog/kafka-fast est-messaging-system/ Impact of trade-offs is tangible 3 605MB/s 305MB/s 38MB/s
  • 4. Genesis of Messaging Systems 4 2000 2010 2020 Early Messaging, JMS & later AMQP (1990’s onwards) ● Design: Message / Channel / Single machine. ● ActiveMQ first open source. Later HornetQ added. ● RabbitMQ built for AMQP. ● Many others (NATS, Aeron, ZeroMQ…) AWS Kinesis (2013) ● Kafka-like design ● Limited relative performance ● Novel shared-service design Azure Event Hubs (2014) ● Kafka-like design ● Implements AMQP (difficult as pre-streaming protocol) Early Messaging Era Event Streaming Era Kafka (2012) ● 1st Event Streaming system (distributed in all layers) ● Designed primarily for ‘events’ using the log abstraction ● Departs from messaging world by including scalable storage and processing Bookkeeper derivatives (2016+) ● Distributed Log (2016) ● Pulsar (2018) ● Pravega (2018) ● All “caching tiers”++ built over Bookkeeper. BookKeeper (2011) ● Goal: write-ahead-log for Hadoop HDFS NameNode (ultimately not used). ● 2011 BookKeeper released as part of ZooKeeper
  • 5. Messaging Model Basics 5 Unordered + most recent delivery, msg-level ack Ordered + partitioned delivery for parallelism (consumer group) Point-to-Point Channel, 1 consumer (ordered) Point-to-Point, many competing consumers (unordered) Publish-Subscribe, many individual consumers (ordered, same dataset to each) Event Streaming, many partitioned consumers (ordered, partitioned consumption)
  • 6. Messaging Model Basics 6 Suits Classical Messaging: single machine, message-oriented Suits Event Streaming: distributed, data-oriented (events)
  • 9. Contiguous Streams vs. Fragmented Streams Trade-off: Little vs. Lots of Metadata // Navigational Simplicity vs. Even Storage Distribution Log-based Approach (Kafka): partition data is contiguous, on 1 node BookKeeper derivatives (DistributedLog, Pulsar, etc.): partition data is fragmented, spread across N nodes Pros: ● Fast reads and writes (Quick navigation. Data locality.) ● Little metadata (what is where?): p1[r0,r1,r2] ● Makes it easier to remove ZK, where metadata is stored Cons: ● Network indirection. ● Lots of metadata (what is where?) everywhere to keep consistent, cache locally, etc.: p1[r0[0,10], r1[11,22], r2[23,45], r1[46,47], r3[48,50], r0[51,54], r2[55,58], …] ● Slow recovery of lost data Cons: ● Storage unevenly distributed, if using key-based partitioning ● Partition must fit on one machine (without tiered storage) Pros: ● Storage distributed more evenly ● Partition can span multiple machines ● Also useful to let new machines accept writes immediately 9
  • 11. Sequential Access vs. Random Access 11 Log-based Approach (Kafka): Contiguous storage per partition Classical Approach (Rabbit, ActiveMQ, BookKeeper derivatives): Interleaved entries for many partitions in one file Index (KahaDB, LevelDB, RocksDB, etc.) Trade-off: Log-based storage vs. Index-based storage P1 P2 All Partitions Fetch messages for partition 2. Fetch messages for partition 2. Pros ● Uses contiguous operations that allow fast reads and writes Cons ● Number of partitions limited by file handles Pros ● Good write performance ● Number of partitions not limited by file handles Cons ● Slower read performance ● Indexing overhead
  • 13. ● Single tiers are great as they make our systems simpler, efficient, easier to build and to use. ● Adding a tier is no free lunch. Upsides should outweigh the downsides. Single Tier vs. Multiple Tiers Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers ☺ Simple is beautiful 13 Kafka Core
  • 14. ● Such simplicity is great. That’s why many of us look forward to Kafka without ZooKeeper! ● But Kafka’s relation to ZooKeeper is not really about tiering, so we cover it in the next section. Single Tier vs. Multiple Tiers Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers Kafka Core ZooKeeper ☹ But not really a ‘tier’! 14
  • 15. ● Tiering can make sense, e.g. as you enhance your system with other systems. ● For example, when the tiers should be scaled independently. Single Tier vs. Multiple Tiers Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers Kafka Core ksqlDB ☺ CPU bound IO/network bound 15
  • 16. ● Pulsar is ‘caching’ over BookKeeper (read performance, read elasticity). ● Much like memcached can add caching to PostgreSQL. Single Tier vs. Multiple Tiers Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers Pulsar (caching) BookKeeper (storage) PostgreSQL (storage) memcached (caching) Would you add memcached over Kafka? 16
  • 17. ● Adding a caching tier to Kafka? ● Probably not, because of cost ($$$) as layers aren’t free, and Kafka is already faster! Single Tier vs. Multiple Tiers Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers 17 =Kafka broker Kafka broker Kafka broker Kafka broker Pulsar broker BK bookie BK bookie Pulsar broker
  • 18. Better: add Tiered Storage (KIP-405) 18 Kafka Core (hot data) Tiered Storage (cold data) ● Already elastic (e.g. AWS S3) ● Unlimited storage ● Cheaper storage ● Scale-in/out requires movement of active segments only ● Biggest challenge for elasticity is moving large quantities of cold data ● Tiered storage eliminates the expensive data-intensive move operations needed for scale-in/out.
  • 19. ● Kafka is already faster for hot data (cf. benchmark). ● Tiered storage adds elasticity with cold data tiered. ● In a Cloud-native architecture ⇒ The BookKeeper layer becomes redundant. Single Tier vs. Multiple Tiers Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers Kafka Core (hot data) Pulsar (hot data) BookKeeper (cold data) Tiered Storage (cold data) Tiered Storage (cold data) Redundant? 19
  • 20. ● Confluent Cloud provides a great example of Kafka’s elasticity ● Scales from 0 to 100 Megabytes/s and down near-instantaneously ● Unlimited data storage Single Tier vs. Multiple Tiers Trade-off: Efficiency of a Single Tier vs. Independence of Separate Tiers 20 User never has to ‘resize’ a cluster because there are no brokers or servers to manage
  • 22. Integrated vs. Portfolio Solution Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup Image credit: Apple Image credit: Confluent #gamers channel ● Just works ● Expensive to build. ● Faster time-to-market ● Integration issues 22
  • 23. ● Portfolio makes sense when there are separate concerns, and you want to deploy them independently. Integrated vs. Portfolio Solution Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup Kafka Core ☺ Kafka Connect Kafka Connect Kafka Connect Kafka Connect Finance team InfoSec team Ops team Your team 23
  • 24. ● Portfolio of ‘Kafka + ZooKeeper’ gave the Kafka project fast time-to-market in 2012. ● By 2020, however, Kafka and the needs of its users have changed. Integrated vs. Portfolio Solution Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup Kafka Core ZooKeeper ☹ ZK is always required by Kafka until KIP-500 24
  • 25. After KIP-500, Kafka is self-sufficient (no ZK needed) ● Portfolio of ‘Kafka + ZooKeeper’ gave the Kafka project fast time-to-market in 2012. ● By 2020, however, Kafka and the needs of its users have changed. ● KIP-500 replaces ZK with integrated Kafka functionality for ‘It Just Works’. ● Removes e.g. scalability limitations like max number of partitions in a Kafka cluster Integrated vs. Portfolio Solution Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup Kafka Core ☺ 25
  • 26. “Integrated” is always better (for the user), but it’s more expensive to build. Integrated vs. Portfolio Solution Trade-off: ‘It Just Works’ vs. Flexibility of a Multi-part Setup 26 VS.
  • 28. Summary 28 “It’s not right or wrong. It’s trade-offs!” ● Kafka ○ Log-based approach provides top-of-class performance with low-overhead reads and writes. ○ Confluent Cloud is the most complete, cloud-native Kafka offering on the cloud. ● RabbitMQ, ActiveMQ ○ Designed for short-lived messaging, where data is quickly removed after it is consumed. ● BookKeeper derivatives: Distributed Log, Pulsar, Pravega ○ Has elements of log-based storage, but inherits some limitations of traditional messaging (e.g., disk and segment fragmentation). ● AWS Kinesis, Azure Event Hubs: ○ Limited performance compared to Kafka (anecdotal). ○ Novel cloud-native designs, but little known about internal implementations. ● Lots we didn’t mention: scale up vs. scale out, transactional messaging, … ○ See the associated blog for all the details. “For event streaming, Kafka remains the best. The most mature. The largest ecosystem.”