SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Osman Sarood

Infrastructure and Distributed Systems Lead, Mist Systems

Chunky Gupta 

Distributed Systems Engineer, Mist Systems
Cost Effectively and Reliably Aggregating
Billions of Messages Per Day Using Apache
Kafka®
Mist Architecture
1 TB+
10 Billion+ Msgs
10’s TB+
500+ partitions
Mist Architecture
Live Aggregators: Real-time Aggregation System
80% DC on Spot

70% cheaper (reserved)
Acknowledgement
Amarinder Singh Bindra
Ebrahim SafaviJitendra Harlalka
• How do we aggregate?
• Live Aggregators architecture
• Autoscaling
• Multi-level Aggregations
Outline
Realtime Processing/Aggregation
What Live Aggregators is forYou?
What Live Aggregators is forYou? (contd ..)
Total Time Series: 2 # Aggregation Operations: 8
• View : A set of tuples which contain aggregated data for defined time interval based
on user-defined groupings
Terminologies
• Grouping Columns : Columns to consider as Aggregation keys
• Aggregation Info : Type of aggregation, aggregation on what, etc
• Time Series : Series of data points for a grouping cols in time order
Sum Count
Percentiles Median
Average Distinct Count
SpatialCount ??
20+ Aggregation Types
Live Aggregators Architecture
LA Data Store
Process 1: Kafka ReaderProcess 2: Shared Memory Manager
Process 3: View Runner 1 Process 4: View Runner 2
Live Aggregators Executor
Process 1: Kafka ReaderProcess 2: Shared Memory Manager
Process 3: View Runner 1 Process 4: View Runner 2
Time
Interval
Org num_clients total_bytes_tx
00:00-00:10 Mist 1 100
Time Interval Org max_bytes_tx
00:00-00:10 Mist 100
Time
Interval
Org num_clients total_bytes_tx
00:00-00:10 Mist 2 160
Time Interval Org max_bytes_tx
00:00-00:10 Mist 100
View 1 State View 2 State
View 1 State
View 2 State
Time
Interval
Org num_clients total_bytes_tx
00:00-00:10 Mist 2 160
Time Interval Org max_bytes_tx
00:00-00:10 Mist 100
View 1 State View 2 State
Checkpoint
Fetch
Checkpoint
S3
EC2 Spot Instances
Msg# 1
Client: Sam
Bytes_tx: 100
Org: Mist
Msg# 1
Client: Sam
Bytes_tx: 100
Org: Mist
Process 2: Shared Memory Manager
Msg# 2
Client: John
Bytes_tx: 60
Org: Mist
Msg# 2
Client: John
Bytes_tx: 60
Org: Mist
Msg# 3
Client: Ayaana
Bytes_tx: 20
Org: Home
Component State
View 1 Running
View 2 Running
View 3 Running
Autoscaling : Live Aggregators Scheduler
LA Scheduler
View1 View2 View3
View Queue
ZookeeperManager
Task Manager
LA Task 1
Component State
View 1 Waiting
View 2 Waiting
View 3 Waiting
View1
View1
View1
LA Task 1
View1
View1
View1
View 1: Partition 1
View 2: Partition 1
View 3: Partition 1
LA Task 1
View 1: Partition 1
View 2: Partition 1
View 3: Partition 2
LA Task 1
Component State
View 1 Picked
View 2 Picked
View 3 Picked
Live Aggregators Scale
• Message consumption rate from Kafka : 25 Billion+ reads per day
~620k
Messages Per Sec
~480k
Messages Per Sec
Live Aggregators Scale (contd ..)
• Number of Time Series : 300 Million+ at peak times • Aggregation Operations : 2 Million+ at peak times
Live Aggregators Scale (contd ..)
• Memory Footprint : 2.5 TB+ at peak times • Writes to Cassandra : 4 Billion+ writes per day
Reliable?
Cost Effective?
Scalable?
Reliability 24*7
Spot Fleet
Controlled Chaos
(Stop and resume)
Uncontrolled Chaos
Spot MarketVolatility
800 Spot instances terminated in a single day! (more than our production DC)
Live Aggregators Controller
Lag = Timestamp of Most Recent Produced Msg - Timestamp of Last Msg LA processed
Msg # Offset Timestamp Lag (sec)
1 10 4:59:00 pm 60
2 11 4:59:30 pm 30
3 12 4:59:55 pm 5
4 13 5:00:00 pm 0
Fast Recovery After Failure
Dynamic Load (Trend vs Seasonality)
Daily Seasonality
Trend
Right Sizing
Best Fit
Live Aggregators Executor
Autoscaling : Live Aggregators Executor
0.8 cores 0.6 cores
0.2 cores 0.2 cores
1.8 cores
Component Cores
Kafka Reader 0.2
Shared memory (per view) 0.1
View 1 0.8
View 2 0.6
View 3 0.9
LA Task 1
Autoscaling : Live Aggregators Scheduler
LA Scheduler
View1
0.8 cores
View2
0.6 cores
View3
0.9 cores
View Queue
ZookeeperManager
Task Manager
View1
0.8 cores
View2
0.6 cores
Core Available
2.01.80.90.2
LA Task 1
View1
0.8 cores
View2
0.6 cores
Component Cores
Kafka Reader 0.2
Shared memory (per view) 0.1
View 1 0.8
View 2 0.6
View 3 0.9
Cores
Reserved
1.8
Offer: 2 cores
KR
0.2 cores
SMM
0.1 cores
SMM
0.2 cores
KR
0.2 cores
SMM
0.2 cores
Lying Factor
Lying Factor = #Cores reserved - #Cores used
Lying Factor
Time0
Component
Evening Load
(Cores)
Kafka Reader 0.2
Shared memory (per view) 0.1*2
View 1 0.8
View 2 0.6
Total Cores for LA Task 1.8
Reserved Cores 1.8
Lying Factor 0
High Load
(Cores)
0.3
0.15
0.9
0.7
2.2
1.8
-0.4
• Lower Threshold = -0.05 Cores
• Upper Threshold = 0.20 Cores
Autoscaler: No Scaling
• Lower Threshold = -0.05 Cores
• Upper Threshold = 0.20 Cores
Autoscaler: Scale Up
Noisy Neighbor!!
Autoscaler: Scale Down
• Lower Threshold = -0.05 Cores
• Upper Threshold = 0.20 Cores
Autoscaling Effectiveness
• Resources UsedVs Reserved (Seasonality)
1000 cores
Multi Level Aggregation (Heatmap Example)
Device
Mist Office
• Each device location every
second to Kafka

• Client Density Heatmap

• Sharded by Client ID
across multiple partitions
LA Task 2
Topic:1 partition: 1
Multi Level Aggregation
0 0 1 0
0 4 6 0
1 0 1 0
0 0 0 0
0 0 0 0
0 0 1 0
0 1 1 1
0 0 3 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 2
0 0 1 0
0 0 0 0
1 4 0 0
0 0 2 0
0 5 7 3
1 0 5 0
0 0 0 0
2 4 0 0
LA Task 1
Topic:1 partition: 0
LA Task 3
Topic:1 partition: 2
LA Task 4
Topic:2 partition: 2
Consume: Topic 1
Produce: Topic 2
Consume: Topic 2
Multi Level Aggregation: Client Density for a School
We will be adding the architecture diagram for this to explain
Future Work
1.Joining multiple streams

2.Instance specific resource allocation

3.Improving shared memory usage using Go

4.Dynamic rescheduling of views to improve
Kafka load
Rate today’s session
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
HostedbyConfluent
 

Was ist angesagt? (20)

Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
 
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
 
How Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message QueueHow Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message Queue
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
 
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
 
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
Leveraging Microservice Architectures & Event-Driven Systems for Global APIsLeveraging Microservice Architectures & Event-Driven Systems for Global APIs
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
 
Real-world Streaming Architectures
Real-world Streaming ArchitecturesReal-world Streaming Architectures
Real-world Streaming Architectures
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Monitoring Apache Kafka
Monitoring Apache KafkaMonitoring Apache Kafka
Monitoring Apache Kafka
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
 
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
 

Ähnlich wie Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using Kafka (Chunky Gupta and Osman Sarood, Mist Systems) Kafka Summit NYC 2019

Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Lucidworks
 
London devops logging
London devops loggingLondon devops logging
London devops logging
Tomas Doran
 

Ähnlich wie Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using Kafka (Chunky Gupta and Osman Sarood, Mist Systems) Kafka Summit NYC 2019 (20)

Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical Overview
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
 
Harvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's FeedHarvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's Feed
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
AWS Data Collection & Storage
AWS Data Collection & StorageAWS Data Collection & Storage
AWS Data Collection & Storage
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
Spil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLSpil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRL
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using Kafka (Chunky Gupta and Osman Sarood, Mist Systems) Kafka Summit NYC 2019

  • 1. Osman Sarood
 Infrastructure and Distributed Systems Lead, Mist Systems Chunky Gupta 
 Distributed Systems Engineer, Mist Systems Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using Apache Kafka®
  • 2. Mist Architecture 1 TB+ 10 Billion+ Msgs 10’s TB+ 500+ partitions Mist Architecture Live Aggregators: Real-time Aggregation System 80% DC on Spot
 70% cheaper (reserved)
  • 4. • How do we aggregate? • Live Aggregators architecture • Autoscaling • Multi-level Aggregations Outline
  • 7. What Live Aggregators is forYou? (contd ..) Total Time Series: 2 # Aggregation Operations: 8
  • 8. • View : A set of tuples which contain aggregated data for defined time interval based on user-defined groupings Terminologies • Grouping Columns : Columns to consider as Aggregation keys • Aggregation Info : Type of aggregation, aggregation on what, etc • Time Series : Series of data points for a grouping cols in time order Sum Count Percentiles Median Average Distinct Count SpatialCount ?? 20+ Aggregation Types
  • 10. Process 1: Kafka ReaderProcess 2: Shared Memory Manager Process 3: View Runner 1 Process 4: View Runner 2 Live Aggregators Executor Process 1: Kafka ReaderProcess 2: Shared Memory Manager Process 3: View Runner 1 Process 4: View Runner 2 Time Interval Org num_clients total_bytes_tx 00:00-00:10 Mist 1 100 Time Interval Org max_bytes_tx 00:00-00:10 Mist 100 Time Interval Org num_clients total_bytes_tx 00:00-00:10 Mist 2 160 Time Interval Org max_bytes_tx 00:00-00:10 Mist 100 View 1 State View 2 State View 1 State View 2 State Time Interval Org num_clients total_bytes_tx 00:00-00:10 Mist 2 160 Time Interval Org max_bytes_tx 00:00-00:10 Mist 100 View 1 State View 2 State Checkpoint Fetch Checkpoint S3 EC2 Spot Instances Msg# 1 Client: Sam Bytes_tx: 100 Org: Mist Msg# 1 Client: Sam Bytes_tx: 100 Org: Mist Process 2: Shared Memory Manager Msg# 2 Client: John Bytes_tx: 60 Org: Mist Msg# 2 Client: John Bytes_tx: 60 Org: Mist Msg# 3 Client: Ayaana Bytes_tx: 20 Org: Home
  • 11. Component State View 1 Running View 2 Running View 3 Running Autoscaling : Live Aggregators Scheduler LA Scheduler View1 View2 View3 View Queue ZookeeperManager Task Manager LA Task 1 Component State View 1 Waiting View 2 Waiting View 3 Waiting View1 View1 View1 LA Task 1 View1 View1 View1 View 1: Partition 1 View 2: Partition 1 View 3: Partition 1 LA Task 1 View 1: Partition 1 View 2: Partition 1 View 3: Partition 2 LA Task 1 Component State View 1 Picked View 2 Picked View 3 Picked
  • 12. Live Aggregators Scale • Message consumption rate from Kafka : 25 Billion+ reads per day ~620k Messages Per Sec ~480k Messages Per Sec
  • 13. Live Aggregators Scale (contd ..) • Number of Time Series : 300 Million+ at peak times • Aggregation Operations : 2 Million+ at peak times
  • 14. Live Aggregators Scale (contd ..) • Memory Footprint : 2.5 TB+ at peak times • Writes to Cassandra : 4 Billion+ writes per day
  • 16. Reliability 24*7 Spot Fleet Controlled Chaos (Stop and resume) Uncontrolled Chaos
  • 17. Spot MarketVolatility 800 Spot instances terminated in a single day! (more than our production DC)
  • 18. Live Aggregators Controller Lag = Timestamp of Most Recent Produced Msg - Timestamp of Last Msg LA processed Msg # Offset Timestamp Lag (sec) 1 10 4:59:00 pm 60 2 11 4:59:30 pm 30 3 12 4:59:55 pm 5 4 13 5:00:00 pm 0
  • 20. Dynamic Load (Trend vs Seasonality) Daily Seasonality Trend
  • 23. Autoscaling : Live Aggregators Executor 0.8 cores 0.6 cores 0.2 cores 0.2 cores 1.8 cores Component Cores Kafka Reader 0.2 Shared memory (per view) 0.1 View 1 0.8 View 2 0.6 View 3 0.9
  • 24. LA Task 1 Autoscaling : Live Aggregators Scheduler LA Scheduler View1 0.8 cores View2 0.6 cores View3 0.9 cores View Queue ZookeeperManager Task Manager View1 0.8 cores View2 0.6 cores Core Available 2.01.80.90.2 LA Task 1 View1 0.8 cores View2 0.6 cores Component Cores Kafka Reader 0.2 Shared memory (per view) 0.1 View 1 0.8 View 2 0.6 View 3 0.9 Cores Reserved 1.8 Offer: 2 cores KR 0.2 cores SMM 0.1 cores SMM 0.2 cores KR 0.2 cores SMM 0.2 cores
  • 25. Lying Factor Lying Factor = #Cores reserved - #Cores used Lying Factor Time0 Component Evening Load (Cores) Kafka Reader 0.2 Shared memory (per view) 0.1*2 View 1 0.8 View 2 0.6 Total Cores for LA Task 1.8 Reserved Cores 1.8 Lying Factor 0 High Load (Cores) 0.3 0.15 0.9 0.7 2.2 1.8 -0.4
  • 26. • Lower Threshold = -0.05 Cores • Upper Threshold = 0.20 Cores Autoscaler: No Scaling
  • 27. • Lower Threshold = -0.05 Cores • Upper Threshold = 0.20 Cores Autoscaler: Scale Up Noisy Neighbor!!
  • 28. Autoscaler: Scale Down • Lower Threshold = -0.05 Cores • Upper Threshold = 0.20 Cores
  • 29. Autoscaling Effectiveness • Resources UsedVs Reserved (Seasonality) 1000 cores
  • 30. Multi Level Aggregation (Heatmap Example) Device Mist Office • Each device location every second to Kafka
 • Client Density Heatmap
 • Sharded by Client ID across multiple partitions
  • 31. LA Task 2 Topic:1 partition: 1 Multi Level Aggregation 0 0 1 0 0 4 6 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 1 4 0 0 0 0 2 0 0 5 7 3 1 0 5 0 0 0 0 0 2 4 0 0 LA Task 1 Topic:1 partition: 0 LA Task 3 Topic:1 partition: 2 LA Task 4 Topic:2 partition: 2 Consume: Topic 1 Produce: Topic 2 Consume: Topic 2
  • 32. Multi Level Aggregation: Client Density for a School We will be adding the architecture diagram for this to explain
  • 33. Future Work 1.Joining multiple streams
 2.Instance specific resource allocation
 3.Improving shared memory usage using Go
 4.Dynamic rescheduling of views to improve Kafka load