Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Anomaly Detectionon 19 Billion events a day
1. Kafka, Cassandra and Kubernetes
at Scale –
Real-time Anomaly Detection
on 19 Billion events a day
Paul Brebner
instaclustr.com Technology Evangelist
Cassandra Track, ApacheCon 2019, Thursday September 12th 2019, Las Vegas, USA
https://www.apachecon.com/acna19/s/#/scheduledEvent/1187
2. Overview
1. Wow! (headlines)
2. Why? (did we do it)
3. What? (does it do)
4. How? (does it work))
5. Well? (how well did it work)
6. So What?
11. • 500x better than
previously
published results
for similar system
• 2018, Kafka,
Cassandra, Spark
• Bigger numbers?
440
220000
0
50000
100000
150000
200000
250000
Per Second
Previous published results
Previous published result Anomaly Checks/s
x500
Headline
Numbers
Per Second
12. • Peak 2.3 Million
Kafka writes/s
• x10 rest of
pipeline
• Kafka as a buffer,
absorbs load
spike
0.2
2.3
0.0
0.5
1.0
1.5
2.0
2.5
Millions per second
Millions Per Second
Anomaly checks/s (M) Peak Kafka writes/s (M)
Headline
Numbers
Millions
Per Second
13. Headline
Numbers
Daily
• Planetary scale
(population 7.7B)
• 19 Billion (1,000
Million)
checks/day
• 2.5 events per
person per day
• Had to stop
somewhere, but
no upper limit
0
2
4
6
8
10
12
14
16
18
20
Billions per day
Daily Big Numbers (Billions/day)
World Population Anomaly Checks
19. Apache
Kafka and
Cassandra
• Technology -
Kafka+Cassandra
use case
• Platform -
Instaclustr’s
Managed
Platform
• Features -
Provisioning,
monitoring,
scaling, and more
20. Kafka as a
Buffer
• Cost effective for
short load spikes
• E.g. Influx of
unexpected
festival goers
• Prevent
overloading of
rest of pipeline
• All events
(eventually)
processed
24. “Man on Moon”
headlines
• 400,000 people
got them there
• JoAnn Morgan,
Saturn 5
monitoring
engineer
• Only woman in
the control room
for Apollo 11
36. 4 How does
it work?
• Anomaly
Detection
• Architecture
• Technologies
37. Is this our
machine?
• The Audio-Telly-o-
Tally-o Count
• Streams
processing
machine for
counting sleepers
• We’ve advanced
from this 1960’s
technology
38. How does it
work?
• CUSUM
(Cumulative Sum
Control Chart)
• Statistical
analysis of
historical data
39. Logical
steps
(1) Events arrive in a
stream
(2) Get the next event from
the stream
(3) Write the event to the
database (4)
(5) Query the historic data
from the database (4)
(6) If there are sufficient
observations, run the
anomaly detector
(7) Was a potential
anomaly detected? Take
appropriate action.
40. Pipeline
Design
• Design, showing
interaction with
Kafka and
Cassandra
Clusters
• Load generator,
detector pipeline
• 2 thread pools
• To constrain the
number Kafka
consumers (
Kafka partitions)
Limits number of
Kafka Consumers
2 thread pools to
Decouple Kafka Consumers
from rest of pipeline
42. Cassandra
• Open Source
• NoSQL Database
• Masterless ring
architecture &
partitioned data
for
• Linear scalability
• High availability
• Fast writes
• Powerful queries
with indexes
43. Instaclustr
Managed
Apache
Cassandra
Benefits
■ Optimised for low latency/high throughput
■ Automated Provisioning, Monitoring, Management
■ SOC2 certified
■ Multiple cloud providers
■ 24/7 Technical support
■ Automated Health Checks
■ Dynamic scaling
■ Zero downtime migrations
■ New! Certified Apache Cassandra
● Key highlights of the Certification Report include:
ᐨ Performance testing (latency and throughput) comparing the
current version to previous versions
ᐨ 24-hour soak testing (including repairs and replaces)
ᐨ Testing against popular drivers
44. What is Kafka?
Message flow
Distributed streams
processing
1 Distributed Producers…
2 Send Messages
3 To Distributed Consumers
4 Via Kafka Cluster
45. Kafka
Key Benefits
■ Fast – high throughput and low latency
■ Scalable – horizontally scalable, just add nodes and
partitions
■ Reliable – distributed and fault tolerant
■ Zero data loss
■ Open Source
■ Heterogeneous data sources and sinks
■ Available as an Instaclustr Managed service
47. Kubernetes
• An automation
system for the
management,
scaling and
deployment of
containerized
applications
• Master/worker
Nodes architecture
• Pods are units of
concurrency
48. Kubernetes
Benefits
• Open Source
• Cloud provider and programming language agnostic
• Develop and test code locally, then deploy at scale
• Helps with resource management – deploy application
to Kubernetes and it manages scaling up/down and
keeping application alive
• More powerful frameworks built on Kubernetes APIs
are becoming available
49. Observability 1
Prometheus
Monitoring
• Ran using
Kubernetes
Prometheus
Operator
• Grafana for
graphing
• Used to debug,
tune, and observe
business metrics
(TPS, RT) from
100 Pods
53. OpenTracing
Standard API for
distributed tracing
■ Specification, not implementation
■ Need
● Application instrumentation
● OpenTracing tracer
Traced Applications API Tracer implementations
Open Source, Datadog
56. 5 How well
did it work?
Scaling Out
From 3 to ???
Cassandra nodes
57. How well did
it work?
Scaling Out
From 3 to ???
Cassandra nodes
Due to 1:1 read/write
ratio, decreased
compression chunk
size to 1KB
“La Jamais Contente”, first car to reach 100 km/h in 1899 (electric, 68hp)
59. • Kubernetes easy
to scale application,
just increase Pods
• First attempt, tuned
for 3 node Cassandra
cluster then scaled
out to 24 nodes
• Whoops (blue line)
Cassandra
scalability
60. Cassandra
scalability -
better
• Then tuned knobs
(thread pools, Pods
and Cassandra
connections) to
maximize throughput
for each configuration
(orange line)
• Also tuned Kafka…
Minimize Cassandra Connections but maximize detector thread pool (pool 2) concurrency
61. Kafka
Scaling
Kubernetes Pods x
Kafka Consumer
threads
More Kafka Consumers
More Kafka Partitions
Lower Throughput!
0
500000
1000000
1500000
2000000
2500000
0 100 200 300 400 500 600 700
Writes/s
Partitions
Partitions vs Throughput (Writes/s)
6 node x 4 cores/node Kafka Cluster
62. Kafka
Scaling -
better
Solutions?
Bigger Kafka cluster
Kafka tuning?
num.replica.fetchers = 1
by default, may help to
increase
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
0 100 200 300 400 500 600 700
Writes/s
Partitions
Partitions vs Throughput (Writes/s)
Throughput (6 nodes, 4 cores/node) Throughput (9 nodes, 8 cores/node)
Increased Throughput
at 200 partitions
63. Final system
resources
Cluster Details (all
running in AWS, US
East North Virginia)
■ Instaclustr managed Kafka – EBS: high
throughput 1500, 9 x r4.2xlarge-1500 (1,500 GB
Disk, 61 GB RAM, 8 cores), Apache Kafka
2.1.0, Replication Factor=3
■ Instaclustr managed Cassandra – Extra Large,
48 x i3.2xlarge (1769 GB SSD, 61 GB RAM, 8
cores), Apache Cassandra 3.11.3, Replication
Factor=3
■ AWS EKS Kubernetes Worker Nodes – 2 x
c5.18xlarge (72 cores, 144 GB RAM, 25 Gbps
network), Kubernetes Version 1.10, Platform
Version eks.3
64. Scaling Out
From 3 to ??
Cassandra nodes
“Pininfarina Battista” the fastest car in the world (2019)
0-100 kph in 2 seconds, top speed 350 kph (electric, 1,900hp).
65. Scaling Out
• From 3 to 48
Cassandra Nodes
• 1.9 to19 Billion
checks/day
• No upper limit
66. Resources
• Throughout
(checks per
second) vs cores
for each
subsystem:
• Cassandra >
Workers > Kafka
• Maximum 574
Throughput
CPU Cores
Total cores
Cassandra cores
Kubernetes cores
Kafka cores
574 Cores
@ 220,000 TPS
68. Maximum
cores used
Cassandra 384 +
Workers 118 +
Kafka 72 =
574 Cores Total
384
118
72
0
100
200
300
400
500
600
700
Cores
Cores Used
574 Total
Cassandra Workers Kafka
69. Cost –
Affordability
at scale
• Operational $
(AWS instances)
only
• Total $1,000/day
• Can be scaled
with incremental
cost change
48 Cassandra nodes
$1,000/day
$100/day
3 Cassandra nodes
70. Kafka as a
Buffer
• Kafka acts as a
buffer, can
process 10x the
Cassandra
capacity
• 2.3M/s vs
220,000/s
• Cheaper than
increasing
Cassandra
capacity x10
73. Takeaways
Technical
■ Kubernetes (+AWS EKS) enabled automation
(deployment, scaling, monitoring) of the application
● Some effort to understand and setup
● But once working it makes application deployment fast, scalable,
repeatable and low cost
■ Prometheus and OpenTracing+Jaeger critical for
debugging, tuning and reporting application
performance and scalability
● Tricky to monitor applications in Kubernetes, but using the
Kubernetes Operators automates the monitoring configuration
■ To achieve near linear scalability and maximize
throughput need to optimize pipeline, by tuning
thread pools and number of Kubernetes Pods to:
● Minimize: Cassandra Connections
● Minimize: Kafka Consumers Kafka Partitions
● Maximize: Detector thread pool concurrency
74. Takeaways
Business
■ Kafka+Cassandra enable Fast Streaming+Storage
at Scale
■ Instaclustr Managed Kafka+Cassandra service
● Makes it easy to automate cluster provisioning
(creation/deletion/scaling), and monitoring
● Highly available SLAs
● Proactive cluster monitoring, alerting and maintenance
■ Affordability at Scale
● Low cost Open Source and Commodity Cloud infrastructure
● only pay for what you use, application and Kafka+Cassandra
clusters scale linearly with load so cost only increases
incrementally
■ Application can be easily resized (scaled up and
down) for any workload, no upper limit
■ Lots more use cases using Kafka+Cassandra
76. Newsflash!
Geospatial Anomaly
Detection
Compared
performance of
multiple Spatial
representations and
Cassandra
implementations
■ Extensions to detect anomalies over time and space
● E.g. is an event unusual relative to nearest 50 neighbours?
■ How to find neighbours using
● Distance between Latitude/longitude points
● Bounding Box
● Geohashes
● 3D (including 3D Geohashes)
■ Using different Cassandra implementations
● Clustering columns
● Secondary indexes
● Denormalized multiple tables
● Cassandra Lucene Index Plugin
77. Further
information
■ The complete Anomalia Machina Blog Series (10 Parts):
● Massive scale Kafka and Cassandra deployment for real-time anomaly
detection: 19 Billion events per day https://www.instaclustr.com/massive-
scale-kafka-cassandra-real-time-anomaly-detection/
■ Latest 4-part Geospatial Anomaly Detection blogs:
● https://www.instaclustr.com/geospatial-anomaly-detection-with-kafka-
cassandra/
■ The Open Source Anomalia Machina Code
● https://github.com/instaclustr/AnomaliaMachina
■ All of Paul’s Blogs
● https://www.instaclustr.com/paul-brebner/
Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
1969 noteworthy year, lots of 50th anniversary events recently celebrated
I don’t think Elvis is returning home again
But moon and woodstock
50,000 expected, 1 Million descended on the site, 500,000 reached it
50,000 expected, 1 Million descended on the site, 500,000 reached it
Is this big? More realistic is per second
1 Billion = 1000 Million = 10^9 events/day
Actually 220,000 events per second
2.3M/s Kafka write/s
Per Day, Yes Big.
Planetary scale! More than double world population (7.7 Billion)
Could process 2.5 events per person per day
Bigger than most (any?) single company’s daily financial transactions
Better (500x throughput and much faster) than published results for similar problem (from 2018, using Kafka, Cassandra and Spark, 200 events/s, RT >> 1s)
Bigger numbers only limited by imagination
We could have kept going, but had to stop somewhere
US FINRA (Financial Industry Regulatory Authority) processes up to 78 Billion events a day (also using public cloud)
Computer Systems generate massive amounts of metrics
E.g. Netflix uses Kafka to process > 1 Trillion (10^12) events/day (2018)
And the system will scale arbitrarily high to match business requirements
Is this big? More realistic is per second
1 Billion = 1000 Million = 10^9 events/day
Actually 220,000 events per second
2.3M/s Kafka write/s
Per Day, Yes Big.
Planetary scale! More than double world population (7.7 Billion)
Could process 2.5 events per person per day
Bigger than most (any?) single company’s daily financial transactions
Better (500x throughput and much faster) than published results for similar problem (from 2018, using Kafka, Cassandra and Spark, 200 events/s, RT >> 1s)
Bigger numbers only limited by imagination
We could have kept going, but had to stop somewhere
US FINRA (Financial Industry Regulatory Authority) processes up to 78 Billion events a day (also using public cloud)
Computer Systems generate massive amounts of metrics
E.g. Netflix uses Kafka to process > 1 Trillion (10^12) events/day (2018)
And the system will scale arbitrarily high to match business requirements
Is this big? More realistic is per second
1 Billion = 1000 Million = 10^9 events/day
Actually 220,000 events per second
2.3M/s Kafka write/s
Per Day, Yes Big.
Planetary scale! More than double world population (7.7 Billion)
Could process 2.5 events per person per day
Bigger than most (any?) single company’s daily financial transactions
Better (500x throughput and much faster) than published results for similar problem (from 2018, using Kafka, Cassandra and Spark, 200 events/s, RT >> 1s)
Bigger numbers only limited by imagination
We could have kept going, but had to stop somewhere
US FINRA (Financial Industry Regulatory Authority) processes up to 78 Billion events a day (also using public cloud)
Computer Systems generate massive amounts of metrics
E.g. Netflix uses Kafka to process > 1 Trillion (10^12) events/day (2018)
And the system will scale arbitrarily high to match business requirements
Project Goals - multiple
Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling)
Kafka + Cassandra demo use case
Kafka as a buffer use case (cost effective for coping with short load spikes)
Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring)
Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
Project Goals - multiple
Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling)
Kafka + Cassandra demo use case
Kafka as a buffer use case (cost effective for coping with short load spikes)
Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring)
Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
Project Goals - multiple
Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling)
Kafka + Cassandra demo use case
Kafka as a buffer use case (cost effective for coping with short load spikes)
Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring)
Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
Project Goals - multiple
Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling)
Kafka + Cassandra demo use case
Kafka as a buffer use case (cost effective for coping with short load spikes)
Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring)
Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
Project Goals - multiple
Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling)
Kafka + Cassandra demo use case
Kafka as a buffer use case (cost effective for coping with short load spikes)
Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring)
Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
Project Goals - multiple
Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling)
Kafka + Cassandra demo use case
Kafka as a buffer use case (cost effective for coping with short load spikes)
Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring)
Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
Anomaly detection needs to be fast, under 1s
The headlines 50 years ago may have been about men on the moon, but the success of the program depended on many women
Anomaly detection needs to be fast, under 1s
Anomaly detection needs to be fast, under 1s, streams processing
Anomaly detection needs to be scalable, increasing key requires more storage, size and processing capacity. Need scalable database
Anomaly detection needs to be scalable, for high throughputs, linearly scalable for more processing capacity, ability to handle load spikes (buffer use case), and no upper limit
And affordable, i.e. elastic, scale up and down on demand, have correct resources based on actual load (not too many or too few)
And affordable, i.e. linear, elastic, scale up and down on demand, have just sufficient resources based on actual load (not too many or too few)
For experiments, want to spin resources up and down (provision, scale, delete)
Anomaly detection is used in a wide variety of domains including:
Infrastructure monitoring
Anomaly detection is used in a wide variety of domains including:
Infrastructure monitoring
Anomaly detection is used in a wide variety of domains including:
Infrastructure monitoring
Anomaly detection is used in a wide variety of domains including:
Infrastructure monitoring
Anomaly detection is used in a wide variety of domains including:
Infrastructure monitoring
Anomaly detection is used in a wide variety of domains including:
Infrastructure monitoring
A simple type of anomaly detection is called Break or Changepoint analysis.
This takes a stream of events and analyses them to see if the most recent events are “different” to previous ones.
We picked a simple version to start with (CUSUM).
It only uses data for a single variable at a time, which could be something like an account number, or an IP address.
This is the prototype application design
The Anomaly detection pipeline is written in Java and runs in a single multi-threaded process.
It consists of a Kafka consumer which gets each new event and passes it to
A Cassandra client, which writes the event to Cassandra, gets the previous 50 rows for the ID, runs the detector and decides if there’s an anomaly or not.
Thread pools? Kafka Consumer pool useful to constrain the number of Kafka Consumers, and thereby constrain the number of Kafka partitions which are expensive!
What is Kafka? Kafka is a distributed streams processing system, it allows distributed producers to send messages to distributed consumers via a Kafka cluster.
The next graph shows the Kafka producer ramping up (from 1 to 9 Kubernetes Pods), with 2 minutes load time, peaking at 2.3M events/s (this time in Grafana). Note that because each metric was being retrieved from multiple Pods I had to view them as stacked graphs to get the total metric value for all the Pods.
This graph shows the anomaly check rate reaching 220,000 events/s and continuing (until all the events are processed). Prometheus is gathering this metric from 100 Kubernetes Pods.
After also instrumenting the application with OpenTracing, here’s the Jaeger dependencies view (there are other views which show single traces in detail) which shows the topology of the system, including tracing across process boundaries (producers to consumers):
The Anomalia Machina pipeline is relatively simple, so I wondered how well OpenTracing would work for discovering and visualising more complex Kafka topologies. For example, would it be possible to visualise the topology of data flow across many Kafka topics? I wrote a simple Markov chain simulator which allows you to choose the number of source topics, intermediate topics, and sink topics, and a graph density, and then produces random traces. The code is in this gist.
Here’s the dependency graph for a run of this code. In practice you would also want to add information about the Kafka producers and consumers (either as extra nodes, or by labelling the edges). There’s also a cool Force directed graph view which allows you to select a node and highlight the dependent nodes.
Pre-tuning: “La Jamais Contente”, first automobile to reach 100 km/h in 1899 (electric, 68hp)
Pre-tuning: “La Jamais Contente”, first automobile to reach 100 km/h in 1899 (electric, 68hp)
Knobs for scaling
Scaling from 3 to ? Cassandra nodes:
Initial method was just to increase number of Worker Pods with no tuning of application parameters.
This resulted in blue line eeek. Ended up tuning each configuration (number of Worker Pods + Cassandra Nodes), including thread pool sizes and C* connections.
Had to optimise the anomaly detection pipeline to minimize: Cassandra connections, and Kafka partitions
By tuning the number of pipeline worker Pods in Kubernetes and the application thread pools
Initially sub-linear scalability (blue line), eventually close to perfect scalability (orange line)
Scaling from 3 to ? Cassandra nodes:
Initial method was just to increase number of Worker Pods with no tuning of application parameters.
This resulted in blue line eeek. Ended up tuning each configuration (number of Worker Pods + Cassandra Nodes), including thread pool sizes and C* connections.
Had to optimise the anomaly detection pipeline to minimize: Cassandra connections, and Kafka partitions
By tuning the number of pipeline worker Pods in Kubernetes and the application thread pools
Initially sub-linear scalability (blue line), eventually close to perfect scalability (orange line)
Post-tuning: Fast-forward 120 years… “Pininfarina Battista” the fastest car in the world, 0-100 kph in 2 seconds, top speed 350 kph (electric, 1,900hp).
The complete machine for the biggest result (48 Cassandra nodes) has 574 cores in total.
Cassandra (384) > Workers (118) > Kafka (72)
The complete machine for the biggest result (48 Cassandra nodes) has 574 cores in total.
Cassandra (384) > Workers (118) > Kafka (72)
The complete machine for the biggest result (48 Cassandra nodes) has 574 cores in total.
Cassandra (384) > Workers (118) > Kafka (72)
This graph shows that it only costs around $1,000 a day for the basic infrastructure using on-demand AWS instances.
This graph also shows that the system can easily be scaled up or down to match different business requirements, and the infrastructure costs will scale proportionally. For example, the smallest system we ran still checked 1.5 Billion events per day, for a cost of only $100/day for the AWS infrastructure.
https://medium.com/vizzuality-blog/the-amazon-is-on-fire-is-it-worse-than-normal-5fa430a7880e
https://www.news.com.au/technology/environment/amazon-fires-dwarfed-by-the-blazes-burning-across-africa/news-story/4ff4d1a4b2cbbc55f79f367bf5f2bc9d
Questions
C* Read/write, how did I tune reads?
Decreasing the compression chunk size to 1KB (the smallest possible value) resulted in higher CPU usage and an increase in throughput to 9,000 TPS. The Apache Cassandra documentation explains the benefits of compression as follows:
“Compression’s primary benefit is that it reduces the amount of data written to disk. Not only does the reduced size save in storage requirements, it often increases read and write throughput, as the CPU overhead of compressing data is faster than the time it would take to read or write the larger volume of uncompressed data from disk.”
Kafka monitoring and tuning?
Cost with scale, looks good from $100 to $1000 from 3 to 48
Clarify flow to emphasize that data is read from Kafka with consumer
Could we automate the tuning? I.e. feedback loop between monitoring and k8, how to set threads?
Add Prometheus monitoring architecture/story to both talks?
Did we think about getting rid of C*? Yes, here’s why not (streams, not random access via IDs, so need to read and filter, or have 1 topic per Id (but millions), or use streams and C* as state store????!!!