Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020

•

1 gefällt mir•6,987 views

One of the key metrics to monitor when working with Apache Kafka, as a data pipeline or a streaming platform, is Consumer Groups Lag. Lag is the delta between the last produced message and the last committed message of a partition. In other words, lag indicates how far behind your application is in processing up-to-date information. For a long time, we used our own service to keep track of these metrics, collect them and visualize them. But this didn’t scale well. You had to perform many manual operations, redeploy it and to do other tedious manual tasks, but most importantly, the biggest gap for us, was that its output was represented in absolute numbers (e.g - your lag is 30K), which basically tells you nothing as a human being. We understood that we had to find a more suitable solution that will give us better visibility and will allow us to measure the lag in a time-based format that we all understand. In this talk, I’m going to go over the core concepts of Kafka offsets and lags, and explain why lag even matters and is an important KPI to measure. I’ll also talk about the kind of research we did to find the right tool, what the options in the market were at the time, and eventually why we chose Linkedin’s Burrow as the right tool for us. And finally, I’ll take a closer look at Burrow, its building blocks, how we build and deploy it, how we monitor better with it, and eventually the most important improvement - how we transformed its output from numbers to time-based metrics.

Technologie

Kafka Lag
Monitoring For
Human Beings
Elad Leev
Aug 2020
(Sorry robots)

Elad Leev
Platform Engineer
Real Time Infrastructure Team
Who am I?

AppsFlyer is a mobile
attribution and analytics
platform
AppsFlyer in a Nutshell

1+
Million
Incoming HTTP
requests/sec
100+
Billion
Events per day
20+ Kafka Clusters

Kafka
Is used for building real-time data
pipelines and streaming
applications

Offset
Is a simple integer that is used by
Kafka to maintain the current
position of a consumer

Lag
Delta between the last produced
message
and the last committed offset

__consumer_offsets
Offsets can be stored
either in Zookeeper
or in a special topic
called
__consumer_offsets
https://cwiki.apache.org/conﬂuence/display/KAFKA/Offset+Management

Zookeeper is not
built for high-write
load such as offset
storage
https://cwiki.apache.org/conﬂuence/display/KAFKA/Offset+Management

A consistent, fault
tolerant and
partitioned way of
storing offsets
https://cwiki.apache.org/conﬂuence/display/KAFKA/Offset+Management
__consumer_offsets

__consumer_offsets
OffsetCommitRequest
Broker Broker Broker
Group Coordinator
Application

Lag is a
major KPI
when
working with
Kafka

Lag indicates how
far behind your
application is in
processing
up-to-date
information
Why Lag Matters?
__consumer_offsets

Lag indicates how
far behind your
application is in
processing
up-to-date
information
Why Lag Matters?

Kafka persistence
is based on
retention
Why Lag Matters?

We want to keep
the lag of our
application
to be as small as
possible
Why Lag Matters?

How did we previously monitor it?
We created a Clojure
service
Kafka Monitor

How did we previously monitor it?
Parsed the values from
Kafka-consumer-group.sh
And send it to our metrics
stack

How did we used to monitor it?
Hard to maintain
Not scalable
Small part of the config file

You get a
PagerDuty
alert on
Kafka lag
How did we used to monitor it?

You want to
understand
what’s going
on in your
system
How did we used to monitor it?

You go to our
shiny, full of
knowledge
Grafana
dashboard
How did we used to monitor it?

And then you realise that….
How did we used to monitor it?

How did we used to monitor it?
It’s just a number.

How did we used to monitor it?
What the hell does 40K lag mean?

Automatic
No need to change
conﬁg ﬁle
Filter consumer groups
based on Regex
Scalable
Small footprint
Easy to scale
Simple, easy to use
Support both ZK and
__consumer_offsets
topic
What we wanted to achieve?

The “raw” metrics that we
looked for are:
> Per Partition
> Per Consumer Group
> Per Topic
What we wanted to achieve?

Linkedin - Burrow
> A LinkedIn Project
> More than 2.5K stars
> Active community
> Production ready
Lightbend - Kafka Lag Exporter
> Smart
> Time Based
> Still in Beta
Zalando - Remora
> Inspired by Burrow
> CloudWatch & DataDog
integration
> Wrap around Kafka CLI
What are the options?

Burrow is a
monitoring solution
for Apache Kafka that
provides consumer
lag checking as a
service
Burrow

Cluster / Consumers
Kafka client that
periodically update
cluster information
Burrow

Storage
Stores Burrow’s
information
Burrow

Evaluator
Calculate the status of
each group
Burrow

Notiﬁer
Requests status on
consumer groups and
notiﬁes
Burrow

HTTP Server
Provides an API
Interface for Burrow
Burrow

Associated Projects
Burrow
Burrow UI
Burrow Dashboard

Deployment Process
Clone Burrow
Source
Trigger
Gitlab Ci
Conﬁg ﬁle
(.toml)
Linter
Check that all
hosts are
resolvable
Build Burrow
container
Deploy using
in-house
deployment system

Burrow Dashboard
Lag By Number
Produce vs. Consume
Producer Rate Consumer Rate

Time Lag - How did we do it?
Diff ( Last_Consumed , Last_Produced )
Producer Rate

Time Lag - How did we do it?
Timeline
Time: 12:00AM
Msg_offset: 134
Time: 12:10AM
Msg_offset: 144
Time: 12:20AM
Msg_offset: 154
Consumer
Producer
Lag

Smart Alerts
Dynamic alerts based
on lag and retention
Decoupling
As we grow, Burrow will
be deployed per cluster
Migration
Migrating crucial part
of the infrastructure is
hard
What’s next?

@eladleev
Thank You!
linkedin.com/in/elad-leev
medium.com/eladleev

Empfohlen

Kafka internalsDavid Groozman

A Deep Dive into Kafka Controllerconfluent

A Kafka Client’s Request: There and Back Again with Danica FineHostedbyConfluent

Hyperspace for Delta LakeDatabricks

Improving Data Locality for Spark Jobs on Kubernetes Using AlluxioAlluxio, Inc.

Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks

A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks

Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue

Empfohlen

Kafka internalsDavid Groozman

A Deep Dive into Kafka Controllerconfluent

A Kafka Client’s Request: There and Back Again with Danica FineHostedbyConfluent

Hyperspace for Delta LakeDatabricks

Improving Data Locality for Spark Jobs on Kubernetes Using AlluxioAlluxio, Inc.

Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks

A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks

Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue

Introduction to ELKYuHsuan Chen

Kafka 101 and Developer Best Practicesconfluent

Securing Kafka confluent

PostgreSQL + ZFS best practicesSean Chittenden

InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxData

Apples and Oranges - Comparing Kafka Streams and Flink with Bill BejeckHostedbyConfluent

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward

Data Pipelines with Kafka ConnectKaufman Ng

Apache BookKeeper: A High Performance and Low Latency Storage ServiceSijie Guo

Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...StreamNative

Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward

Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...HostedbyConfluent

Hardening Kafka Replication confluent

Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann

Kafka Tutorial: Kafka SecurityJean-Paul Azar

Presto anatomyDongmin Yu

Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.

Interactive real time dashboards on data streams using Kafka, Druid, and Supe...DataWorks Summit

Presto query optimizer: pursuit of performanceDataWorks Summit

Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner

How to Build Streaming Apps with Confluent IIconfluent

Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to ELKYuHsuan Chen

Kafka 101 and Developer Best Practicesconfluent

Securing Kafka confluent

PostgreSQL + ZFS best practicesSean Chittenden

InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxData

Apples and Oranges - Comparing Kafka Streams and Flink with Bill BejeckHostedbyConfluent

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward

Data Pipelines with Kafka ConnectKaufman Ng

Apache BookKeeper: A High Performance and Low Latency Storage ServiceSijie Guo

Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...StreamNative

Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward

Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...HostedbyConfluent

Hardening Kafka Replication confluent

Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann

Kafka Tutorial: Kafka SecurityJean-Paul Azar

Presto anatomyDongmin Yu

Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.

Interactive real time dashboards on data streams using Kafka, Druid, and Supe...DataWorks Summit

Presto query optimizer: pursuit of performanceDataWorks Summit

Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner

Was ist angesagt? (20)

Introduction to ELK

Kafka 101 and Developer Best Practices

Securing Kafka

PostgreSQL + ZFS best practices

InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...

Apples and Oranges - Comparing Kafka Streams and Flink with Bill Bejeck

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...

Data Pipelines with Kafka Connect

Apache BookKeeper: A High Performance and Low Latency Storage Service

Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...

Building Reliable Lakehouses with Apache Flink and Delta Lake

Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...

Hardening Kafka Replication

Introduction to Apache NiFi dws19 DWS - DC 2019

Kafka Tutorial: Kafka Security

Presto anatomy

Apache Iceberg - A Table Format for Hige Analytic Datasets

Interactive real time dashboards on data streams using Kafka, Druid, and Supe...

Presto query optimizer: pursuit of performance

Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture

Ähnlich wie Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020

How to Build Streaming Apps with Confluent IIconfluent

Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner

Fault Tolerance with KafkaEdureka!

Data Streaming with Apache Kafka & MongoDBconfluent

JHipster conf 2019 - Kafka EcosystemFlorent Ramiere

Apache Kafka - Scalable Message-Processing and more !Guido Schmutz

Confluent Tech Talk Koreaconfluent

Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...HostedbyConfluent

Kafka for ScaleEyal Ben Ivri

BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...Big Data Week

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent

10 Lessons Learned from using Kafka in 1000 microservices - ScalaUANatan Silnitsky

[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration StoryJoan Viladrosa Riera

PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann

Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData

Current and Future of Apache KafkaJoe Stein

A Practical Deep Dive into Observability of Streaming Applications with Kosta...HostedbyConfluent

Migrate a on-prem platform to the public cloud with Java - SpringBoot and PCFRoy Braam

From nothing to production in 1 hourRoy Braam

Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent

Ähnlich wie Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020 (20)

How to Build Streaming Apps with Confluent II

Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...

Fault Tolerance with Kafka

Data Streaming with Apache Kafka & MongoDB

JHipster conf 2019 - Kafka Ecosystem

Apache Kafka - Scalable Message-Processing and more !

Confluent Tech Talk Korea

Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...

Kafka for Scale

BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA

[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story

PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends

Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...

Current and Future of Apache Kafka

A Practical Deep Dive into Observability of Streaming Applications with Kosta...

Migrate a on-prem platform to the public cloud with Java - SpringBoot and PCF

From nothing to production in 1 hour

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Mehr von HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent

Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent

Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent

Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent

Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent

Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent

Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent

Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent

Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent

Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent

TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent

A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent

Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent

Data Contracts Management: Schema Registry and BeyondHostedbyConfluent

Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent

Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent

Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent

Mehr von HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Renaming a Kafka Topic | Kafka Summit London

Evolution of NRT Data Ingestion Pipeline at Trendyol

Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques

Exactly-once Stream Processing with Arroyo and Kafka

Fish Plays Pokemon | Kafka Summit London

Tiered Storage 101 | Kafla Summit London

Building a Self-Service Stream Processing Portal: How And Why

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...

Navigating Private Network Connectivity Options for Kafka Clusters

Apache Flink: Building a Company-wide Self-service Streaming Data Platform

Explaining How Real-Time GenAI Works in a Noisy Pub

TL;DR Kafka Metrics | Kafka Summit London

A Window Into Your Kafka Streams Tasks | KSL

Mastering Kafka Producer Configs: A Guide to Optimizing Performance

Data Contracts Management: Schema Registry and Beyond

Code-First Approach: Crafting Efficient Flink Apps

Debezium vs. the World: An Overview of the CDC Ecosystem

Beyond Tiered Storage: Serverless Kafka with No Local Disks

Kürzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Artificial Intelligence: Facts and MythsJoaquim Jorge

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Driving Behavioral Change for Information Management through Data-Driven Gree...

Artificial Intelligence: Facts and Myths

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Breaking the Kubernetes Kill Chain: Host Path Mount

Presentation on how to chat with PDF using ChatGPT code interpreter

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Boost PC performance: How more available memory can improve productivity

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

2024: Domino Containers - The Next Step. News from the Domino Container commu...

08448380779 Call Girls In Friends Colony Women Seeking Men

A Domino Admins Adventures (Engage 2024)

Powerful Google developer tools for immediate impact! (2023-24 C)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

CNv6 Instructor Chapter 6 Quality of Service

How to Troubleshoot Apps for the Modern Connected Worker

Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020

1. Kafka Lag Monitoring For Human Beings Elad Leev Aug 2020 (Sorry robots)

2. Elad Leev Platform Engineer Real Time Infrastructure Team Who am I?

3. AppsFlyer in a Nutshell

4. AppsFlyer is a mobile attribution and analytics platform AppsFlyer in a Nutshell

5. 1+ Million Incoming HTTP requests/sec 100+ Billion Events per day 20+ Kafka Clusters

6. A small recap

7. Kafka Is used for building real-time data pipelines and streaming applications

8. Offset Is a simple integer that is used by Kafka to maintain the current position of a consumer

9. Lag Delta between the last produced message and the last committed offset

10. __consumer_offsets Offsets can be stored either in Zookeeper or in a special topic called __consumer_offsets https://cwiki.apache.org/conﬂuence/display/KAFKA/Offset+Management

11. Zookeeper is not built for high-write load such as offset storage https://cwiki.apache.org/conﬂuence/display/KAFKA/Offset+Management

12. A consistent, fault tolerant and partitioned way of storing offsets https://cwiki.apache.org/conﬂuence/display/KAFKA/Offset+Management __consumer_offsets

13. __consumer_offsets OffsetCommitRequest Broker Broker Broker Group Coordinator Application

14. Check consumer lag

15. Check consumer lag

16. Why Lag Matters?

17. Lag is a major KPI when working with Kafka

18. Lag indicates how far behind your application is in processing up-to-date information Why Lag Matters? __consumer_offsets

19. Lag indicates how far behind your application is in processing up-to-date information Why Lag Matters?

20. Kafka persistence is based on retention Why Lag Matters?

21. We want to keep the lag of our application to be as small as possible Why Lag Matters?

22. How did we previously monitor it?

23. How did we previously monitor it? We created a Clojure service Kafka Monitor

24. How did we previously monitor it? Parsed the values from Kafka-consumer-group.sh And send it to our metrics stack

25. How did we used to monitor it? Hard to maintain Not scalable Small part of the config file

26. You get a PagerDuty alert on Kafka lag How did we used to monitor it?

27. You want to understand what’s going on in your system How did we used to monitor it?

28. You go to our shiny, full of knowledge Grafana dashboard How did we used to monitor it?

29. And then you realise that…. How did we used to monitor it?

30.

31. How did we used to monitor it? It’s just a number.

32. How did we used to monitor it? What the hell does 40K lag mean?

33. What we wanted to achieve?

34. Automatic No need to change conﬁg ﬁle Filter consumer groups based on Regex Scalable Small footprint Easy to scale Simple, easy to use Support both ZK and __consumer_offsets topic What we wanted to achieve?

35. The “raw” metrics that we looked for are: > Per Partition > Per Consumer Group > Per Topic What we wanted to achieve?

36. What are the options?

37. Linkedin - Burrow > A LinkedIn Project > More than 2.5K stars > Active community > Production ready Lightbend - Kafka Lag Exporter > Smart > Time Based > Still in Beta Zalando - Remora > Inspired by Burrow > CloudWatch & DataDog integration > Wrap around Kafka CLI What are the options?

38. Burrow

39. Burrow is a monitoring solution for Apache Kafka that provides consumer lag checking as a service Burrow

40. Burrow has a modular design Burrow

41. Cluster / Consumers Kafka client that periodically update cluster information Burrow

42. Storage Stores Burrow’s information Burrow

43. Evaluator Calculate the status of each group Burrow

44. Notiﬁer Requests status on consumer groups and notiﬁes Burrow

45. HTTP Server Provides an API Interface for Burrow Burrow

46. Associated Projects Burrow Burrow UI Burrow Dashboard

47. How do we use it?

48. Burrow Architecture

49. Deployment Process Clone Burrow Source Trigger Gitlab Ci Conﬁg ﬁle (.toml) Linter Check that all hosts are resolvable Build Burrow container Deploy using in-house deployment system

50. Burrow Dashboard

51. Burrow Dashboard

52. Burrow Dashboard Lag By Number Produce vs. Consume Producer Rate Consumer Rate

53. Burrow Dashboard Partitions Analysis

54. And the endgame

55. Burrow Dashboard Time Based Metrics

56. Time Lag - How did we do it? Diff ( Last_Consumed , Last_Produced ) Producer Rate

57. Time Lag - How did we do it? Timeline Time: 12:00AM Msg_offset: 134 Time: 12:10AM Msg_offset: 144 Time: 12:20AM Msg_offset: 154 Consumer Producer Lag

58. What’s next?

59. Smart Alerts Dynamic alerts based on lag and retention Decoupling As we grow, Burrow will be deployed per cluster Migration Migrating crucial part of the infrastructure is hard What’s next?

60. @eladleev Thank You! linkedin.com/in/elad-leev medium.com/eladleev