SlideShare ist ein Scribd-Unternehmen logo
1 von 81
Kafka, Cassandra and Kubernetes
at Scale –
Real-time Anomaly Detection
on 19 Billion events a day
Paul Brebner
instaclustr.com Technology Evangelist
Cassandra Track, ApacheCon 2019, Thursday September 12th 2019, Las Vegas, USA
https://www.apachecon.com/acna19/s/#/scheduledEvent/1187
Overview
1. Wow! (headlines)
2. Why? (did we do it)
3. What? (does it do)
4. How? (does it work))
5. Well? (how well did it work)
6. So What?
1 Wow!?!
Headlines
50,000 expected
1 Million descended on Woodstock
500,000 reached the venue
The Instaclustr Times
19 Billion
Anomaly Checks A Day!
Instaclustr reveals
Massively Scalable!
Fast! Affordable!
Anomaly Detector Machine
Using Open Source
Apache Cassandra,
Apache Kafka,
Kubernetes & AWS
The Instaclustr Times
19 Billion
Anomaly Checks A Day!
Instaclustr reveals
Massively Scalable!
Fast! Affordable!
Anomaly Detector Machine
Using Open Source
Apache Cassandra,
Apache Kafka,
Kubernetes & AWS
Headline
Numbers
Per Second
• 220,000 Anomaly
checks Per
Second
220000
0
50000
100000
150000
200000
250000
Anomaly checks/s
• 500x better than
previously
published results
for similar system
• 2018, Kafka,
Cassandra, Spark
• Bigger numbers?
440
220000
0
50000
100000
150000
200000
250000
Per Second
Previous published results
Previous published result Anomaly Checks/s
x500
Headline
Numbers
Per Second
• Peak 2.3 Million
Kafka writes/s
• x10 rest of
pipeline
• Kafka as a buffer,
absorbs load
spike
0.2
2.3
0.0
0.5
1.0
1.5
2.0
2.5
Millions per second
Millions Per Second
Anomaly checks/s (M) Peak Kafka writes/s (M)
Headline
Numbers
Millions
Per Second
Headline
Numbers
Daily
• Planetary scale
(population 7.7B)
• 19 Billion (1,000
Million)
checks/day
• 2.5 events per
person per day
• Had to stop
somewhere, but
no upper limit
0
2
4
6
8
10
12
14
16
18
20
Billions per day
Daily Big Numbers (Billions/day)
World Population Anomaly Checks
2 Why?
Project Goals
Project
Goals
Multiple (like Aussie
Rules Football -
AFL)
Project
Goals
• Fast Data
• Real Time
Streams
processing
• < 1s RT
Project
Goals
• Big Data
• Throughput and
Size scale
• no upper limit
• big benchmark
numbers
Cost
Effective
• Incrementally
scalable
• Only pay for what
you use
• High benefit/cost
ratio
• 1/2 car “Malcom”
movie, 1986
Apache
Kafka and
Cassandra
• Technology -
Kafka+Cassandra
use case
• Platform -
Instaclustr’s
Managed
Platform
• Features -
Provisioning,
monitoring,
scaling, and more
Kafka as a
Buffer
• Cost effective for
short load spikes
• E.g. Influx of
unexpected
festival goers
• Prevent
overloading of
rest of pipeline
• All events
(eventually)
processed
Application
Automation
and
Observability
• Complementary
technologies:
• Kubernetes
(automation)
• Prometheus
(monitoring)
• OpenTracing+
• Jaeger (tracing)
3 What?
Does it do?
What does it
do?
Anomaly
Detection
Use Case
Spot unusual events
“Man on Moon”
headlines
• 400,000 people
got them there
• JoAnn Morgan,
Saturn 5
monitoring
engineer
• Only woman in
the control room
for Apollo 11
Anomaly
Detection
Goals
Spot the difference
At speed and scale
Spot the
difference at
speed
• 1 second
maximum
• Streams
processing not
batch
Spot the
difference at
scale
• Keys and
Concurrency
• Multiple keys
• Need Big Data
database
• For Storage and
Processing
capacity
Scalability
• Massive load
(data velocity)
• Increasing load
• No upper bound
• Load spikes
Time
Load
Affordability
• Linear resource
scalability
• Elastic, on-
demand
• Incremental
resources and
cost with
changing load
Time
ResourcesandCost
$x1
$x2
$x3
$x5
$x3
$x4
Anomaly
Detection
Use Cases
Many and varied
Infrastructure
monitoring
Anomaly
Detection
Use Cases
Many and varied
Application
Monitoring
Anomaly
Detection
Use Cases
Many and varied
IoT
Anomaly
Detection
Use Cases
Many and varied
Finance fraud
detection
Anomaly
Detection
Use Cases
Many and varied
Clickstream
analytics
Anomaly
Detection
Use Cases
Many and varied
Drone deliveries
4 How does
it work?
• Anomaly
Detection
• Architecture
• Technologies
Is this our
machine?
• The Audio-Telly-o-
Tally-o Count
• Streams
processing
machine for
counting sleepers
• We’ve advanced
from this 1960’s
technology
How does it
work?
• CUSUM
(Cumulative Sum
Control Chart)
• Statistical
analysis of
historical data
Logical
steps
(1) Events arrive in a
stream
(2) Get the next event from
the stream
(3) Write the event to the
database (4)
(5) Query the historic data
from the database (4)
(6) If there are sufficient
observations, run the
anomaly detector
(7) Was a potential
anomaly detected? Take
appropriate action.
Pipeline
Design
• Design, showing
interaction with
Kafka and
Cassandra
Clusters
• Load generator,
detector pipeline
• 2 thread pools
• To constrain the
number Kafka
consumers (
Kafka partitions)
Limits number of
Kafka Consumers
2 thread pools to
Decouple Kafka Consumers
from rest of pipeline
Cloud
Deployment
Context
• Kafka and
Cassandra
clusters managed
by Instaclustr
• Application in
AWS
Cassandra
• Open Source
• NoSQL Database
• Masterless ring
architecture &
partitioned data
for
• Linear scalability
• High availability
• Fast writes
• Powerful queries
with indexes
Instaclustr
Managed
Apache
Cassandra
Benefits
■ Optimised for low latency/high throughput
■ Automated Provisioning, Monitoring, Management
■ SOC2 certified
■ Multiple cloud providers
■ 24/7 Technical support
■ Automated Health Checks
■ Dynamic scaling
■ Zero downtime migrations
■ New! Certified Apache Cassandra
● Key highlights of the Certification Report include:
ᐨ Performance testing (latency and throughput) comparing the
current version to previous versions
ᐨ 24-hour soak testing (including repairs and replaces)
ᐨ Testing against popular drivers
What is Kafka?
Message flow
Distributed streams
processing
1 Distributed Producers…
2 Send Messages
3 To Distributed Consumers
4 Via Kafka Cluster
Kafka
Key Benefits
■ Fast – high throughput and low latency
■ Scalable – horizontally scalable, just add nodes and
partitions
■ Reliable – distributed and fault tolerant
■ Zero data loss
■ Open Source
■ Heterogeneous data sources and sinks
■ Available as an Instaclustr Managed service
Application
Automation
with
Kubernetes
• AWS EKS
• Kafka load
generator and
Anomaly
Detection
Pipeline deployed
on worker nodes
Kubernetes
• An automation
system for the
management,
scaling and
deployment of
containerized
applications
• Master/worker
Nodes architecture
• Pods are units of
concurrency
Kubernetes
Benefits
• Open Source
• Cloud provider and programming language agnostic
• Develop and test code locally, then deploy at scale
• Helps with resource management – deploy application
to Kubernetes and it manages scaling up/down and
keeping application alive
• More powerful frameworks built on Kubernetes APIs
are becoming available
Observability 1
Prometheus
Monitoring
• Ran using
Kubernetes
Prometheus
Operator
• Grafana for
graphing
• Used to debug,
tune, and observe
business metrics
(TPS, RT) from
100 Pods
Prometheus
Architecture
• Monitoring of
applications and
servers
• Instrumentation
• Pull-based
• Architecture &
Components…
Prometheus
Operator
In production on
Kubernetes
Use Prometheus
Operator to manage
application
complexity and
dynamics
Observability 2
Tracing with
OpenTracing
and Jaeger
• Single traces
• Topology of
system
• Even though this
example has
simple topology,
valuable for
debugging
OpenTracing
Standard API for
distributed tracing
■ Specification, not implementation
■ Need
● Application instrumentation
● OpenTracing tracer
Traced Applications API Tracer implementations
Open Source, Datadog
Jaeger
Tracer
Open Source Tracer
Uber/CNCF
Tracing
across
Kafka topics
More complex
example:
discovering event
flows across
multiple topics
E.g. Kafka ESB
5 How well
did it work?
Scaling Out
From 3 to ???
Cassandra nodes
How well did
it work?
Scaling Out
From 3 to ???
Cassandra nodes
Due to 1:1 read/write
ratio, decreased
compression chunk
size to 1KB
“La Jamais Contente”, first car to reach 100 km/h in 1899 (electric, 68hp)
Scaling
Knobs
• Load generator
(red)
• Cluster sizes and
worker pods
(orange)
• Thread pools,
partitions and
connections
(yellow)
Load Rate Cluster Size
Kafka Consumers =
Kafka Partitions
Concurrency for Cassandra
writes/reads and detection Cluster Size
Cassandra
Connections
Kubernetes Pods
• Kubernetes  easy
to scale application,
just increase Pods
• First attempt, tuned
for 3 node Cassandra
cluster then scaled
out to 24 nodes
• Whoops (blue line)
Cassandra
scalability
Cassandra
scalability -
better
• Then tuned knobs
(thread pools, Pods
and Cassandra
connections) to
maximize throughput
for each configuration
(orange line)
• Also tuned Kafka…
Minimize Cassandra Connections but maximize detector thread pool (pool 2) concurrency
Kafka
Scaling
Kubernetes Pods x
Kafka Consumer
threads

More Kafka Consumers

More Kafka Partitions

Lower Throughput!
0
500000
1000000
1500000
2000000
2500000
0 100 200 300 400 500 600 700
Writes/s
Partitions
Partitions vs Throughput (Writes/s)
6 node x 4 cores/node Kafka Cluster
Kafka
Scaling -
better
Solutions?
Bigger Kafka cluster
Kafka tuning?
num.replica.fetchers = 1
by default, may help to
increase
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
0 100 200 300 400 500 600 700
Writes/s
Partitions
Partitions vs Throughput (Writes/s)
Throughput (6 nodes, 4 cores/node) Throughput (9 nodes, 8 cores/node)
Increased Throughput
at 200 partitions
Final system
resources
Cluster Details (all
running in AWS, US
East North Virginia)
■ Instaclustr managed Kafka – EBS: high
throughput 1500, 9 x r4.2xlarge-1500 (1,500 GB
Disk, 61 GB RAM, 8 cores), Apache Kafka
2.1.0, Replication Factor=3
■ Instaclustr managed Cassandra – Extra Large,
48 x i3.2xlarge (1769 GB SSD, 61 GB RAM, 8
cores), Apache Cassandra 3.11.3, Replication
Factor=3
■ AWS EKS Kubernetes Worker Nodes – 2 x
c5.18xlarge (72 cores, 144 GB RAM, 25 Gbps
network), Kubernetes Version 1.10, Platform
Version eks.3
Scaling Out
From 3 to ??
Cassandra nodes
“Pininfarina Battista” the fastest car in the world (2019)
0-100 kph in 2 seconds, top speed 350 kph (electric, 1,900hp).
Scaling Out
• From 3 to 48
Cassandra Nodes
• 1.9 to19 Billion
checks/day
• No upper limit
Resources
• Throughout
(checks per
second) vs cores
for each
subsystem:
• Cassandra >
Workers > Kafka
• Maximum 574
Throughput
CPU Cores
Total cores
Cassandra cores
Kubernetes cores
Kafka cores
574 Cores
@ 220,000 TPS
Cores used -
balance
Cassandra (67%) >
Kubernetes (21%) >
Kafka (12%)
67%
21%
12%
Cores per Sub-system (%)
Cassandra
Kubernetes Workers
Kafka
Maximum
cores used
Cassandra 384 +
Workers 118 +
Kafka 72 =
574 Cores Total
384
118
72
0
100
200
300
400
500
600
700
Cores
Cores Used
574 Total
Cassandra Workers Kafka
Cost –
Affordability
at scale
• Operational $
(AWS instances)
only
• Total $1,000/day
• Can be scaled
with incremental
cost change
48 Cassandra nodes
$1,000/day
$100/day
3 Cassandra nodes
Kafka as a
Buffer
• Kafka acts as a
buffer, can
process 10x the
Cassandra
capacity
• 2.3M/s vs
220,000/s
• Cheaper than
increasing
Cassandra
capacity x10
6 So What?
Some
Takeaways
Takeaways
Technical
■ Kubernetes (+AWS EKS) enabled automation
(deployment, scaling, monitoring) of the application
● Some effort to understand and setup
● But once working it makes application deployment fast, scalable,
repeatable and low cost
■ Prometheus and OpenTracing+Jaeger critical for
debugging, tuning and reporting application
performance and scalability
● Tricky to monitor applications in Kubernetes, but using the
Kubernetes Operators automates the monitoring configuration
■ To achieve near linear scalability and maximize
throughput need to optimize pipeline, by tuning
thread pools and number of Kubernetes Pods to:
● Minimize: Cassandra Connections
● Minimize: Kafka Consumers  Kafka Partitions
● Maximize: Detector thread pool concurrency
Takeaways
Business
■ Kafka+Cassandra enable Fast Streaming+Storage
at Scale
■ Instaclustr Managed Kafka+Cassandra service
● Makes it easy to automate cluster provisioning
(creation/deletion/scaling), and monitoring
● Highly available SLAs
● Proactive cluster monitoring, alerting and maintenance
■ Affordability at Scale
● Low cost Open Source and Commodity Cloud infrastructure
● only pay for what you use, application and Kafka+Cassandra
clusters scale linearly with load so cost only increases
incrementally
■ Application can be easily resized (scaled up and
down) for any workload, no upper limit
■ Lots more use cases using Kafka+Cassandra
Newsflash!
Geospatial Anomaly
Detection
Newsflash!
Geospatial Anomaly
Detection
Compared
performance of
multiple Spatial
representations and
Cassandra
implementations
■ Extensions to detect anomalies over time and space
● E.g. is an event unusual relative to nearest 50 neighbours?
■ How to find neighbours using
● Distance between Latitude/longitude points
● Bounding Box
● Geohashes
● 3D (including 3D Geohashes)
■ Using different Cassandra implementations
● Clustering columns
● Secondary indexes
● Denormalized multiple tables
● Cassandra Lucene Index Plugin
Further
information
■ The complete Anomalia Machina Blog Series (10 Parts):
● Massive scale Kafka and Cassandra deployment for real-time anomaly
detection: 19 Billion events per day https://www.instaclustr.com/massive-
scale-kafka-cassandra-real-time-anomaly-detection/
■ Latest 4-part Geospatial Anomaly Detection blogs:
● https://www.instaclustr.com/geospatial-anomaly-detection-with-kafka-
cassandra/
■ The Open Source Anomalia Machina Code
● https://github.com/instaclustr/AnomaliaMachina
■ All of Paul’s Blogs
● https://www.instaclustr.com/paul-brebner/
Some Anomalies
are easy to detect
“Woodstock was the blip, the tie-dyed anomaly”
Some Anomalies
can be detected
given sufficient time
But other potential
anomalies are harder to
detect
Amazon? Congo? Siberia?
Which fires are worse than normal?
Detect complex
spatio-temporal
anomalies
reliably at scale
with Kafka, Cassandra
& Kubernetes on the
Instaclustr Managed Platform for Open Source
www.instaclustr.com/platform/
The End

Weitere ähnliche Inhalte

Was ist angesagt?

Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructuremattlieber
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beamconfluent
 
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...confluent
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streamsconfluent
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafkaMole Wong
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...confluent
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...Paul Brebner
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormRan Silberman
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNblueboxtraveler
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyAllen (Xiaozhong) Wang
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacreconfluent
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Monal Daxini
 
Jitney, Kafka at Airbnb
Jitney, Kafka at AirbnbJitney, Kafka at Airbnb
Jitney, Kafka at Airbnbalexismidon
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 

Was ist angesagt? (20)

Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
 
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacre
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
Jitney, Kafka at Airbnb
Jitney, Kafka at AirbnbJitney, Kafka at Airbnb
Jitney, Kafka at Airbnb
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 

Ähnlich wie ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Anomaly Detection on 19 Billion events a day

Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵Amazon Web Services Korea
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s DilemmaAmazon Web Services
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Coburn Watson
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HAtcp cloud
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High AvailabilityJakub Pavlik
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
Re invent announcements_2016_hcls_use_cases_mchampion
Re invent announcements_2016_hcls_use_cases_mchampionRe invent announcements_2016_hcls_use_cases_mchampion
Re invent announcements_2016_hcls_use_cases_mchampionMia D Champion
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelinesSumant Tambe
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Kai Wähner
 

Ähnlich wie ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Anomaly Detection on 19 Billion events a day (20)

Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Ceilosca
CeiloscaCeilosca
Ceilosca
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High Availability
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
Re invent announcements_2016_hcls_use_cases_mchampion
Re invent announcements_2016_hcls_use_cases_mchampionRe invent announcements_2016_hcls_use_cases_mchampion
Re invent announcements_2016_hcls_use_cases_mchampion
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 

Mehr von Paul Brebner

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersPaul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaPaul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaPaul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialPaul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980'sPaul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...Paul Brebner
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...Paul Brebner
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Paul Brebner
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
Introduction to programming class 13
Introduction to programming   class 13Introduction to programming   class 13
Introduction to programming class 13Paul Brebner
 

Mehr von Paul Brebner (20)

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Introduction to programming class 13
Introduction to programming   class 13Introduction to programming   class 13
Introduction to programming class 13
 

Kürzlich hochgeladen

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Kürzlich hochgeladen (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Anomaly Detection on 19 Billion events a day

  • 1. Kafka, Cassandra and Kubernetes at Scale – Real-time Anomaly Detection on 19 Billion events a day Paul Brebner instaclustr.com Technology Evangelist Cassandra Track, ApacheCon 2019, Thursday September 12th 2019, Las Vegas, USA https://www.apachecon.com/acna19/s/#/scheduledEvent/1187
  • 2. Overview 1. Wow! (headlines) 2. Why? (did we do it) 3. What? (does it do) 4. How? (does it work)) 5. Well? (how well did it work) 6. So What?
  • 4.
  • 5.
  • 6.
  • 7. 50,000 expected 1 Million descended on Woodstock 500,000 reached the venue
  • 8. The Instaclustr Times 19 Billion Anomaly Checks A Day! Instaclustr reveals Massively Scalable! Fast! Affordable! Anomaly Detector Machine Using Open Source Apache Cassandra, Apache Kafka, Kubernetes & AWS
  • 9. The Instaclustr Times 19 Billion Anomaly Checks A Day! Instaclustr reveals Massively Scalable! Fast! Affordable! Anomaly Detector Machine Using Open Source Apache Cassandra, Apache Kafka, Kubernetes & AWS
  • 10. Headline Numbers Per Second • 220,000 Anomaly checks Per Second 220000 0 50000 100000 150000 200000 250000 Anomaly checks/s
  • 11. • 500x better than previously published results for similar system • 2018, Kafka, Cassandra, Spark • Bigger numbers? 440 220000 0 50000 100000 150000 200000 250000 Per Second Previous published results Previous published result Anomaly Checks/s x500 Headline Numbers Per Second
  • 12. • Peak 2.3 Million Kafka writes/s • x10 rest of pipeline • Kafka as a buffer, absorbs load spike 0.2 2.3 0.0 0.5 1.0 1.5 2.0 2.5 Millions per second Millions Per Second Anomaly checks/s (M) Peak Kafka writes/s (M) Headline Numbers Millions Per Second
  • 13. Headline Numbers Daily • Planetary scale (population 7.7B) • 19 Billion (1,000 Million) checks/day • 2.5 events per person per day • Had to stop somewhere, but no upper limit 0 2 4 6 8 10 12 14 16 18 20 Billions per day Daily Big Numbers (Billions/day) World Population Anomaly Checks
  • 16. Project Goals • Fast Data • Real Time Streams processing • < 1s RT
  • 17. Project Goals • Big Data • Throughput and Size scale • no upper limit • big benchmark numbers
  • 18. Cost Effective • Incrementally scalable • Only pay for what you use • High benefit/cost ratio • 1/2 car “Malcom” movie, 1986
  • 19. Apache Kafka and Cassandra • Technology - Kafka+Cassandra use case • Platform - Instaclustr’s Managed Platform • Features - Provisioning, monitoring, scaling, and more
  • 20. Kafka as a Buffer • Cost effective for short load spikes • E.g. Influx of unexpected festival goers • Prevent overloading of rest of pipeline • All events (eventually) processed
  • 23. What does it do? Anomaly Detection Use Case Spot unusual events
  • 24. “Man on Moon” headlines • 400,000 people got them there • JoAnn Morgan, Saturn 5 monitoring engineer • Only woman in the control room for Apollo 11
  • 26. Spot the difference at speed • 1 second maximum • Streams processing not batch
  • 27. Spot the difference at scale • Keys and Concurrency • Multiple keys • Need Big Data database • For Storage and Processing capacity
  • 28. Scalability • Massive load (data velocity) • Increasing load • No upper bound • Load spikes Time Load
  • 29. Affordability • Linear resource scalability • Elastic, on- demand • Incremental resources and cost with changing load Time ResourcesandCost $x1 $x2 $x3 $x5 $x3 $x4
  • 30. Anomaly Detection Use Cases Many and varied Infrastructure monitoring
  • 31. Anomaly Detection Use Cases Many and varied Application Monitoring
  • 33. Anomaly Detection Use Cases Many and varied Finance fraud detection
  • 34. Anomaly Detection Use Cases Many and varied Clickstream analytics
  • 35. Anomaly Detection Use Cases Many and varied Drone deliveries
  • 36. 4 How does it work? • Anomaly Detection • Architecture • Technologies
  • 37. Is this our machine? • The Audio-Telly-o- Tally-o Count • Streams processing machine for counting sleepers • We’ve advanced from this 1960’s technology
  • 38. How does it work? • CUSUM (Cumulative Sum Control Chart) • Statistical analysis of historical data
  • 39. Logical steps (1) Events arrive in a stream (2) Get the next event from the stream (3) Write the event to the database (4) (5) Query the historic data from the database (4) (6) If there are sufficient observations, run the anomaly detector (7) Was a potential anomaly detected? Take appropriate action.
  • 40. Pipeline Design • Design, showing interaction with Kafka and Cassandra Clusters • Load generator, detector pipeline • 2 thread pools • To constrain the number Kafka consumers ( Kafka partitions) Limits number of Kafka Consumers 2 thread pools to Decouple Kafka Consumers from rest of pipeline
  • 41. Cloud Deployment Context • Kafka and Cassandra clusters managed by Instaclustr • Application in AWS
  • 42. Cassandra • Open Source • NoSQL Database • Masterless ring architecture & partitioned data for • Linear scalability • High availability • Fast writes • Powerful queries with indexes
  • 43. Instaclustr Managed Apache Cassandra Benefits ■ Optimised for low latency/high throughput ■ Automated Provisioning, Monitoring, Management ■ SOC2 certified ■ Multiple cloud providers ■ 24/7 Technical support ■ Automated Health Checks ■ Dynamic scaling ■ Zero downtime migrations ■ New! Certified Apache Cassandra ● Key highlights of the Certification Report include: ᐨ Performance testing (latency and throughput) comparing the current version to previous versions ᐨ 24-hour soak testing (including repairs and replaces) ᐨ Testing against popular drivers
  • 44. What is Kafka? Message flow Distributed streams processing 1 Distributed Producers… 2 Send Messages 3 To Distributed Consumers 4 Via Kafka Cluster
  • 45. Kafka Key Benefits ■ Fast – high throughput and low latency ■ Scalable – horizontally scalable, just add nodes and partitions ■ Reliable – distributed and fault tolerant ■ Zero data loss ■ Open Source ■ Heterogeneous data sources and sinks ■ Available as an Instaclustr Managed service
  • 46. Application Automation with Kubernetes • AWS EKS • Kafka load generator and Anomaly Detection Pipeline deployed on worker nodes
  • 47. Kubernetes • An automation system for the management, scaling and deployment of containerized applications • Master/worker Nodes architecture • Pods are units of concurrency
  • 48. Kubernetes Benefits • Open Source • Cloud provider and programming language agnostic • Develop and test code locally, then deploy at scale • Helps with resource management – deploy application to Kubernetes and it manages scaling up/down and keeping application alive • More powerful frameworks built on Kubernetes APIs are becoming available
  • 49. Observability 1 Prometheus Monitoring • Ran using Kubernetes Prometheus Operator • Grafana for graphing • Used to debug, tune, and observe business metrics (TPS, RT) from 100 Pods
  • 50. Prometheus Architecture • Monitoring of applications and servers • Instrumentation • Pull-based • Architecture & Components…
  • 51. Prometheus Operator In production on Kubernetes Use Prometheus Operator to manage application complexity and dynamics
  • 52. Observability 2 Tracing with OpenTracing and Jaeger • Single traces • Topology of system • Even though this example has simple topology, valuable for debugging
  • 53. OpenTracing Standard API for distributed tracing ■ Specification, not implementation ■ Need ● Application instrumentation ● OpenTracing tracer Traced Applications API Tracer implementations Open Source, Datadog
  • 55. Tracing across Kafka topics More complex example: discovering event flows across multiple topics E.g. Kafka ESB
  • 56. 5 How well did it work? Scaling Out From 3 to ??? Cassandra nodes
  • 57. How well did it work? Scaling Out From 3 to ??? Cassandra nodes Due to 1:1 read/write ratio, decreased compression chunk size to 1KB “La Jamais Contente”, first car to reach 100 km/h in 1899 (electric, 68hp)
  • 58. Scaling Knobs • Load generator (red) • Cluster sizes and worker pods (orange) • Thread pools, partitions and connections (yellow) Load Rate Cluster Size Kafka Consumers = Kafka Partitions Concurrency for Cassandra writes/reads and detection Cluster Size Cassandra Connections Kubernetes Pods
  • 59. • Kubernetes  easy to scale application, just increase Pods • First attempt, tuned for 3 node Cassandra cluster then scaled out to 24 nodes • Whoops (blue line) Cassandra scalability
  • 60. Cassandra scalability - better • Then tuned knobs (thread pools, Pods and Cassandra connections) to maximize throughput for each configuration (orange line) • Also tuned Kafka… Minimize Cassandra Connections but maximize detector thread pool (pool 2) concurrency
  • 61. Kafka Scaling Kubernetes Pods x Kafka Consumer threads  More Kafka Consumers  More Kafka Partitions  Lower Throughput! 0 500000 1000000 1500000 2000000 2500000 0 100 200 300 400 500 600 700 Writes/s Partitions Partitions vs Throughput (Writes/s) 6 node x 4 cores/node Kafka Cluster
  • 62. Kafka Scaling - better Solutions? Bigger Kafka cluster Kafka tuning? num.replica.fetchers = 1 by default, may help to increase 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 0 100 200 300 400 500 600 700 Writes/s Partitions Partitions vs Throughput (Writes/s) Throughput (6 nodes, 4 cores/node) Throughput (9 nodes, 8 cores/node) Increased Throughput at 200 partitions
  • 63. Final system resources Cluster Details (all running in AWS, US East North Virginia) ■ Instaclustr managed Kafka – EBS: high throughput 1500, 9 x r4.2xlarge-1500 (1,500 GB Disk, 61 GB RAM, 8 cores), Apache Kafka 2.1.0, Replication Factor=3 ■ Instaclustr managed Cassandra – Extra Large, 48 x i3.2xlarge (1769 GB SSD, 61 GB RAM, 8 cores), Apache Cassandra 3.11.3, Replication Factor=3 ■ AWS EKS Kubernetes Worker Nodes – 2 x c5.18xlarge (72 cores, 144 GB RAM, 25 Gbps network), Kubernetes Version 1.10, Platform Version eks.3
  • 64. Scaling Out From 3 to ?? Cassandra nodes “Pininfarina Battista” the fastest car in the world (2019) 0-100 kph in 2 seconds, top speed 350 kph (electric, 1,900hp).
  • 65. Scaling Out • From 3 to 48 Cassandra Nodes • 1.9 to19 Billion checks/day • No upper limit
  • 66. Resources • Throughout (checks per second) vs cores for each subsystem: • Cassandra > Workers > Kafka • Maximum 574 Throughput CPU Cores Total cores Cassandra cores Kubernetes cores Kafka cores 574 Cores @ 220,000 TPS
  • 67. Cores used - balance Cassandra (67%) > Kubernetes (21%) > Kafka (12%) 67% 21% 12% Cores per Sub-system (%) Cassandra Kubernetes Workers Kafka
  • 68. Maximum cores used Cassandra 384 + Workers 118 + Kafka 72 = 574 Cores Total 384 118 72 0 100 200 300 400 500 600 700 Cores Cores Used 574 Total Cassandra Workers Kafka
  • 69. Cost – Affordability at scale • Operational $ (AWS instances) only • Total $1,000/day • Can be scaled with incremental cost change 48 Cassandra nodes $1,000/day $100/day 3 Cassandra nodes
  • 70. Kafka as a Buffer • Kafka acts as a buffer, can process 10x the Cassandra capacity • 2.3M/s vs 220,000/s • Cheaper than increasing Cassandra capacity x10
  • 73. Takeaways Technical ■ Kubernetes (+AWS EKS) enabled automation (deployment, scaling, monitoring) of the application ● Some effort to understand and setup ● But once working it makes application deployment fast, scalable, repeatable and low cost ■ Prometheus and OpenTracing+Jaeger critical for debugging, tuning and reporting application performance and scalability ● Tricky to monitor applications in Kubernetes, but using the Kubernetes Operators automates the monitoring configuration ■ To achieve near linear scalability and maximize throughput need to optimize pipeline, by tuning thread pools and number of Kubernetes Pods to: ● Minimize: Cassandra Connections ● Minimize: Kafka Consumers  Kafka Partitions ● Maximize: Detector thread pool concurrency
  • 74. Takeaways Business ■ Kafka+Cassandra enable Fast Streaming+Storage at Scale ■ Instaclustr Managed Kafka+Cassandra service ● Makes it easy to automate cluster provisioning (creation/deletion/scaling), and monitoring ● Highly available SLAs ● Proactive cluster monitoring, alerting and maintenance ■ Affordability at Scale ● Low cost Open Source and Commodity Cloud infrastructure ● only pay for what you use, application and Kafka+Cassandra clusters scale linearly with load so cost only increases incrementally ■ Application can be easily resized (scaled up and down) for any workload, no upper limit ■ Lots more use cases using Kafka+Cassandra
  • 76. Newsflash! Geospatial Anomaly Detection Compared performance of multiple Spatial representations and Cassandra implementations ■ Extensions to detect anomalies over time and space ● E.g. is an event unusual relative to nearest 50 neighbours? ■ How to find neighbours using ● Distance between Latitude/longitude points ● Bounding Box ● Geohashes ● 3D (including 3D Geohashes) ■ Using different Cassandra implementations ● Clustering columns ● Secondary indexes ● Denormalized multiple tables ● Cassandra Lucene Index Plugin
  • 77. Further information ■ The complete Anomalia Machina Blog Series (10 Parts): ● Massive scale Kafka and Cassandra deployment for real-time anomaly detection: 19 Billion events per day https://www.instaclustr.com/massive- scale-kafka-cassandra-real-time-anomaly-detection/ ■ Latest 4-part Geospatial Anomaly Detection blogs: ● https://www.instaclustr.com/geospatial-anomaly-detection-with-kafka- cassandra/ ■ The Open Source Anomalia Machina Code ● https://github.com/instaclustr/AnomaliaMachina ■ All of Paul’s Blogs ● https://www.instaclustr.com/paul-brebner/
  • 79. “Woodstock was the blip, the tie-dyed anomaly” Some Anomalies can be detected given sufficient time
  • 80. But other potential anomalies are harder to detect Amazon? Congo? Siberia? Which fires are worse than normal?
  • 81. Detect complex spatio-temporal anomalies reliably at scale with Kafka, Cassandra & Kubernetes on the Instaclustr Managed Platform for Open Source www.instaclustr.com/platform/ The End

Hinweis der Redaktion

  1. Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
  2. 1969 noteworthy year, lots of 50th anniversary events recently celebrated I don’t think Elvis is returning home again But moon and woodstock
  3. 50,000 expected, 1 Million descended on the site, 500,000 reached it
  4. 50,000 expected, 1 Million descended on the site, 500,000 reached it
  5. Is this big? More realistic is per second 1 Billion = 1000 Million = 10^9 events/day Actually 220,000 events per second 2.3M/s Kafka write/s Per Day, Yes Big. Planetary scale! More than double world population (7.7 Billion) Could process 2.5 events per person per day Bigger than most (any?) single company’s daily financial transactions Better (500x throughput and much faster) than published results for similar problem (from 2018, using Kafka, Cassandra and Spark, 200 events/s, RT >> 1s) Bigger numbers only limited by imagination We could have kept going, but had to stop somewhere US FINRA (Financial Industry Regulatory Authority) processes up to 78 Billion events a day (also using public cloud) Computer Systems generate massive amounts of metrics E.g. Netflix uses Kafka to process > 1 Trillion (10^12) events/day (2018) And the system will scale arbitrarily high to match business requirements
  6. Is this big? More realistic is per second 1 Billion = 1000 Million = 10^9 events/day Actually 220,000 events per second 2.3M/s Kafka write/s Per Day, Yes Big. Planetary scale! More than double world population (7.7 Billion) Could process 2.5 events per person per day Bigger than most (any?) single company’s daily financial transactions Better (500x throughput and much faster) than published results for similar problem (from 2018, using Kafka, Cassandra and Spark, 200 events/s, RT >> 1s) Bigger numbers only limited by imagination We could have kept going, but had to stop somewhere US FINRA (Financial Industry Regulatory Authority) processes up to 78 Billion events a day (also using public cloud) Computer Systems generate massive amounts of metrics E.g. Netflix uses Kafka to process > 1 Trillion (10^12) events/day (2018) And the system will scale arbitrarily high to match business requirements
  7. Is this big? More realistic is per second 1 Billion = 1000 Million = 10^9 events/day Actually 220,000 events per second 2.3M/s Kafka write/s Per Day, Yes Big. Planetary scale! More than double world population (7.7 Billion) Could process 2.5 events per person per day Bigger than most (any?) single company’s daily financial transactions Better (500x throughput and much faster) than published results for similar problem (from 2018, using Kafka, Cassandra and Spark, 200 events/s, RT >> 1s) Bigger numbers only limited by imagination We could have kept going, but had to stop somewhere US FINRA (Financial Industry Regulatory Authority) processes up to 78 Billion events a day (also using public cloud) Computer Systems generate massive amounts of metrics E.g. Netflix uses Kafka to process > 1 Trillion (10^12) events/day (2018) And the system will scale arbitrarily high to match business requirements
  8. Project Goals - multiple Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling) Kafka + Cassandra demo use case Kafka as a buffer use case (cost effective for coping with short load spikes) Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring) Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
  9. Project Goals - multiple Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling) Kafka + Cassandra demo use case Kafka as a buffer use case (cost effective for coping with short load spikes) Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring) Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
  10. Project Goals - multiple Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling) Kafka + Cassandra demo use case Kafka as a buffer use case (cost effective for coping with short load spikes) Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring) Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
  11. Project Goals - multiple Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling) Kafka + Cassandra demo use case Kafka as a buffer use case (cost effective for coping with short load spikes) Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring) Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
  12. Project Goals - multiple Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling) Kafka + Cassandra demo use case Kafka as a buffer use case (cost effective for coping with short load spikes) Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring) Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
  13. Project Goals - multiple Fast (RT), Big (Scalable, no upper limit), Cost effective (Open Source, Automatic cluster creation/delete, scaling) Kafka + Cassandra demo use case Kafka as a buffer use case (cost effective for coping with short load spikes) Demonstrate Instaclustr managed service for Kafka and Cassandra (provisioning, management, monitoring) Try complementary tech for application management and scale (K8, Prometheus, OpenTracing, Jaeger)
  14. Anomaly detection needs to be fast, under 1s
  15. The headlines 50 years ago may have been about men on the moon, but the success of the program depended on many women
  16. Anomaly detection needs to be fast, under 1s
  17. Anomaly detection needs to be fast, under 1s, streams processing
  18. Anomaly detection needs to be scalable, increasing key requires more storage, size and processing capacity. Need scalable database
  19. Anomaly detection needs to be scalable, for high throughputs, linearly scalable for more processing capacity, ability to handle load spikes (buffer use case), and no upper limit And affordable, i.e. elastic, scale up and down on demand, have correct resources based on actual load (not too many or too few)
  20. And affordable, i.e. linear, elastic, scale up and down on demand, have just sufficient resources based on actual load (not too many or too few) For experiments, want to spin resources up and down (provision, scale, delete)
  21. Anomaly detection is used in a wide variety of domains including: Infrastructure monitoring
  22. Anomaly detection is used in a wide variety of domains including: Infrastructure monitoring
  23. Anomaly detection is used in a wide variety of domains including: Infrastructure monitoring
  24. Anomaly detection is used in a wide variety of domains including: Infrastructure monitoring
  25. Anomaly detection is used in a wide variety of domains including: Infrastructure monitoring
  26. Anomaly detection is used in a wide variety of domains including: Infrastructure monitoring
  27. A simple type of anomaly detection is called Break or Changepoint analysis.  This takes a stream of events and analyses them to see if the most recent events are “different” to previous ones. We picked a simple version to start with (CUSUM). It only uses data for a single variable at a time, which could be something like an account number, or an IP address.
  28. This is the prototype application design The Anomaly detection pipeline is written in Java and runs in a single multi-threaded process. It consists of a Kafka consumer which gets each new event and passes it to A Cassandra client, which writes the event to Cassandra, gets the previous 50 rows for the ID, runs the detector and decides if there’s an anomaly or not. Thread pools? Kafka Consumer pool useful to constrain the number of Kafka Consumers, and thereby constrain the number of Kafka partitions which are expensive!
  29. What is Kafka? Kafka is a distributed streams processing system, it allows distributed producers to send messages to distributed consumers via a Kafka cluster.
  30. The next graph shows the Kafka producer ramping up (from 1 to 9 Kubernetes Pods), with 2 minutes load time, peaking at 2.3M events/s (this time in Grafana). Note that because each metric was being retrieved from multiple Pods I had to view them as stacked graphs to get the total metric value for all the Pods. This graph shows the anomaly check rate reaching 220,000 events/s and continuing (until all the events are processed). Prometheus is gathering this metric from 100 Kubernetes Pods.
  31. After also instrumenting the application with OpenTracing, here’s the Jaeger dependencies view (there are other views which show single traces in detail) which shows the topology of the system, including tracing across process boundaries (producers to consumers):
  32. The Anomalia Machina pipeline is relatively simple, so I wondered how well OpenTracing would work for discovering and visualising more complex Kafka topologies. For example, would it be possible to visualise the topology of data flow across many Kafka topics? I wrote a simple Markov chain simulator which allows you to choose the number of source topics, intermediate topics, and sink topics, and a graph density, and then produces random traces. The code is in this gist. Here’s the dependency graph for a run of this code. In practice you would also want to add information about the Kafka producers and consumers (either as extra nodes, or by labelling the edges). There’s also a cool Force directed graph view which allows you to select a node and highlight the dependent nodes.
  33. Pre-tuning: “La Jamais Contente”, first automobile to reach 100 km/h in 1899 (electric, 68hp)
  34. Pre-tuning: “La Jamais Contente”, first automobile to reach 100 km/h in 1899 (electric, 68hp)
  35. Knobs for scaling
  36. Scaling from 3 to ? Cassandra nodes: Initial method was just to increase number of Worker Pods with no tuning of application parameters. This resulted in blue line eeek. Ended up tuning each configuration (number of Worker Pods + Cassandra Nodes), including thread pool sizes and C* connections. Had to optimise the anomaly detection pipeline to minimize: Cassandra connections, and Kafka partitions By tuning the number of pipeline worker Pods in Kubernetes and the application thread pools Initially sub-linear scalability (blue line), eventually close to perfect scalability (orange line)
  37. Scaling from 3 to ? Cassandra nodes: Initial method was just to increase number of Worker Pods with no tuning of application parameters. This resulted in blue line eeek. Ended up tuning each configuration (number of Worker Pods + Cassandra Nodes), including thread pool sizes and C* connections. Had to optimise the anomaly detection pipeline to minimize: Cassandra connections, and Kafka partitions By tuning the number of pipeline worker Pods in Kubernetes and the application thread pools Initially sub-linear scalability (blue line), eventually close to perfect scalability (orange line)
  38. https://azure.microsoft.com/fr-fr/blog/processing-trillions-of-events-per-day-with-apache-kafka-on-azure/
  39. https://azure.microsoft.com/fr-fr/blog/processing-trillions-of-events-per-day-with-apache-kafka-on-azure/
  40. Post-tuning: Fast-forward 120 years… “Pininfarina Battista” the fastest car in the world, 0-100 kph in 2 seconds, top speed 350 kph (electric, 1,900hp).
  41. The complete machine for the biggest result (48 Cassandra nodes) has 574 cores in total.  Cassandra (384) > Workers (118) > Kafka (72)
  42. The complete machine for the biggest result (48 Cassandra nodes) has 574 cores in total.  Cassandra (384) > Workers (118) > Kafka (72)
  43. The complete machine for the biggest result (48 Cassandra nodes) has 574 cores in total.  Cassandra (384) > Workers (118) > Kafka (72)
  44. This graph shows that it only costs around $1,000 a day for the basic infrastructure using on-demand AWS instances.   This graph also shows that the system can easily be scaled up or down to match different business requirements, and the infrastructure costs will scale proportionally. For example, the smallest system we ran still checked 1.5 Billion events per day, for a cost of only $100/day for the AWS infrastructure.
  45. https://medium.com/vizzuality-blog/the-amazon-is-on-fire-is-it-worse-than-normal-5fa430a7880e https://www.news.com.au/technology/environment/amazon-fires-dwarfed-by-the-blazes-burning-across-africa/news-story/4ff4d1a4b2cbbc55f79f367bf5f2bc9d
  46. https://medium.com/vizzuality-blog/the-amazon-is-on-fire-is-it-worse-than-normal-5fa430a7880e https://www.news.com.au/technology/environment/amazon-fires-dwarfed-by-the-blazes-burning-across-africa/news-story/4ff4d1a4b2cbbc55f79f367bf5f2bc9d Questions C* Read/write, how did I tune reads?  Decreasing the compression chunk size to 1KB (the smallest possible value) resulted in higher CPU usage and an increase in throughput to 9,000 TPS.  The Apache Cassandra documentation explains the benefits of compression as follows: “Compression’s primary benefit is that it reduces the amount of data written to disk. Not only does the reduced size save in storage requirements, it often increases read and write throughput, as the CPU overhead of compressing data is faster than the time it would take to read or write the larger volume of uncompressed data from disk.” Kafka monitoring and tuning? Cost with scale, looks good from $100 to $1000 from 3 to 48 Clarify flow to emphasize that data is read from Kafka with consumer Could we automate the tuning? I.e. feedback loop between monitoring and k8, how to set threads? Add Prometheus monitoring architecture/story to both talks? Did we think about getting rid of C*? Yes, here’s why not (streams, not random access via IDs, so need to read and filter, or have 1 topic per Id (but millions), or use streams and C* as state store????!!!