Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022

•

0 gefällt mir•500 views

Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022 Kafka Streams has recently expanded its options for handling thread death. Historically, upon reaching a fatal exception in a Streams Task, each thread would shut down, causing a rebalance. A different thread would then encounter the error as it picked up the task. This cascading thread failure could take a while. The performance suffers during the entire process due to constant rebalances and a non-optimal amount of threads. Eventually, there will be no threads alive, causing processing to halt entirely. Only then would the state change, alerting users to the issue. In this talk, we will cover the changes to the threading model that made more dynamic error handling possible. We will also introduce the Streams handler, which unlocked options to react immediately in cases that would previously cause cascading thread death. Further improvements included modifying the state machine to clarify the meaning of the ERROR state. The inclusions of Kips 671, 696, and 663 allowed for much more flexibility in exceptional cases. After this talk, the audience can use the new handler to react to exceptional cases in Kafka Streams. They will also understand the updates to the threading model and the changes in the state machine.

Technologie

Handling Failure with
Grace
Stream’s uncaught exception handler
Walker Carlson
wcarlson@conﬂuent.io

KIPs 663, 671 and 696
• KIP-663 - Threading model moves to be more dynamic
• KIP-671 - Introduce a new handler to prevent issues we have seen in the past
• KIP-696 - Changed the state machine

ERROR state
Same as NOT_RUNNING but triggered by an expectation instead of a close call.
CREATED
RUNNING
RE-
BALANCING
PENDING
SHUTDOWN
PENDING
ERROR
NOT
RUNNING
ERROR

Old Handler
• Uses the Java runtime handler behind the scenes
• Kills the thread
• Cascading thread death is possible if not aware
• Is deprecated and will be removed soon

New Handler
• User code returns an ENUM to choose behavior after notiﬁcation logic is done
• Takes advantage of the threading model to make a thread replacement algorithm
• Gives more ﬂexibility
• Does not do anything new with global threads

Shutdown Client
• Closest to the old behavior
• Close the client to ERROR state instead of one thread at a time

Shutdown Application
• Uses the rebalance protocall to try to shutdown all clients of the application
• May fail if a different client is not in the group
• Fastest/only internal way to stop all clients

Replace Thread
• New thread is brought up with new ID
• Rebalances
• Works with static membership
• Not always a good idea

Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022

Empfohlen

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...HostedbyConfluent

UFT-1.pptxAmarDeo7

TCP/IP BasicsSMC Networks Europe

Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent

Kafka PPT.pptxSRIRAMKIRAN9

What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent

Tuning Apache Kafka Connectors for Flink.pptxFlink Forward

Hdfs architectureAisha Siddiqa

Empfohlen

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...HostedbyConfluent

UFT-1.pptxAmarDeo7

TCP/IP BasicsSMC Networks Europe

Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent

Kafka PPT.pptxSRIRAMKIRAN9

What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent

Tuning Apache Kafka Connectors for Flink.pptxFlink Forward

Hdfs architectureAisha Siddiqa

Dynamo and BigTable - Review and ComparisonGrisha Weintraub

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent

Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent

The Current State of Table API in 2022Flink Forward

Introduction to Stream ProcessingGuido Schmutz

Kafka tiered-storage-meetup-2022-final-presentedSumant Tambe

Stream processing using KafkaKnoldus Inc.

When apache pulsar meets apache flinkStreamNative

Latency and Consistency Tradeoffs in Modern Distributed DatabasesScyllaDB

iceberg introduction.pptxDori Waldman

Apache kafkaLong Nguyen

Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent

MLflow at Company ScaleDatabricks

All about agents jadeAryan Rathore

Evening out the uneven: dealing with skew in FlinkFlink Forward

Dynamic Rule-based Real-time Market Data AlertsFlink Forward

Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Michel Schudel

Databricks Overview for MLOpsDatabricks

CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project

Map ReduceMichel Bruley

Xen and-the-art-of-rails-deployment2640Newlink

Weitere ähnliche Inhalte

Was ist angesagt?

Dynamo and BigTable - Review and ComparisonGrisha Weintraub

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent

Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent

The Current State of Table API in 2022Flink Forward

Introduction to Stream ProcessingGuido Schmutz

Kafka tiered-storage-meetup-2022-final-presentedSumant Tambe

Stream processing using KafkaKnoldus Inc.

When apache pulsar meets apache flinkStreamNative

Latency and Consistency Tradeoffs in Modern Distributed DatabasesScyllaDB

iceberg introduction.pptxDori Waldman

Apache kafkaLong Nguyen

Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent

MLflow at Company ScaleDatabricks

All about agents jadeAryan Rathore

Evening out the uneven: dealing with skew in FlinkFlink Forward

Dynamic Rule-based Real-time Market Data AlertsFlink Forward

Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Michel Schudel

Databricks Overview for MLOpsDatabricks

CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project

Map ReduceMichel Bruley

Was ist angesagt? (20)

Dynamo and BigTable - Review and Comparison

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...

Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...

The Current State of Table API in 2022

Introduction to Stream Processing

Kafka tiered-storage-meetup-2022-final-presented

Stream processing using Kafka

When apache pulsar meets apache flink

Latency and Consistency Tradeoffs in Modern Distributed Databases

iceberg introduction.pptx

Apache kafka

Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent

MLflow at Company Scale

All about agents jade

Evening out the uneven: dealing with skew in Flink

Dynamic Rule-based Real-time Market Data Alerts

Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!

Databricks Overview for MLOps

CETH for XDP [Linux Meetup Santa Clara | July 2016]

Map Reduce

Ähnlich wie Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022

Xen and-the-art-of-rails-deployment2640Newlink

Xen and-the-art-of-rails-deployment2640LLC NewLink

Xen and-the-art-of-rails-deployment2640Newlink

Autoscaled Distributed Automation Expedia Know Howaragavan

Working With Concurrency In Java 8Heartin Jacob

Xen_and_Rails_deploymentAbhishek Singh

Apache Helix DevOps & LSPE-IN Meetup Shahnawaz Saifi

Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar

Remote core locking-Andrea LombardoAndrea Lombardo

Threading Made Easy! A Busy Developer’s Guide to Kotlin CoroutinesLauren Yew

Autoscaled Distributed Automation using AWS at Selenium London MeetUparagavan

Akkurate AkkaYurii Ostapchuk

Java concurrencySrinivasan Raghvan

Making the most out of CakePHP 2.2José Lorenzo Rodríguez Urdaneta

Clojure Conj 2014 - Paradigms of core.async - Julian GambleJulian Gamble

Akka and AngularJS – Reactive Applications in PracticeRoland Kuhn

Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...confluent

Architecture for building scalable and highly available Postgres ClusterAshnikbiz

Ähnlich wie Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022 (20)

Xen and-the-art-of-rails-deployment2640

Autoscaled Distributed Automation Expedia Know How

Working With Concurrency In Java 8

Xen_and_Rails_deployment

Apache Helix DevOps & LSPE-IN Meetup

Cross Datacenter Replication in Apache Solr 6

Remote core locking-Andrea Lombardo

Threading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines

Autoscaled Distributed Automation using AWS at Selenium London MeetUp

Akkurate Akka

Java concurrency

Making the most out of CakePHP 2.2

Clojure Conj 2014 - Paradigms of core.async - Julian Gamble

Akka and AngularJS – Reactive Applications in Practice

Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...

Architecture for building scalable and highly available Postgres Cluster

Mehr von HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent

Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent

Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent

Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent

Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent

Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent

Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent

Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent

Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent

Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent

TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent

A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent

Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent

Data Contracts Management: Schema Registry and BeyondHostedbyConfluent

Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent

Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent

Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent

Mehr von HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Renaming a Kafka Topic | Kafka Summit London

Evolution of NRT Data Ingestion Pipeline at Trendyol

Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques

Exactly-once Stream Processing with Arroyo and Kafka

Fish Plays Pokemon | Kafka Summit London

Tiered Storage 101 | Kafla Summit London

Building a Self-Service Stream Processing Portal: How And Why

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...

Navigating Private Network Connectivity Options for Kafka Clusters

Apache Flink: Building a Company-wide Self-service Streaming Data Platform

Explaining How Real-Time GenAI Works in a Noisy Pub

TL;DR Kafka Metrics | Kafka Summit London

A Window Into Your Kafka Streams Tasks | KSL

Mastering Kafka Producer Configs: A Guide to Optimizing Performance

Data Contracts Management: Schema Registry and Beyond

Code-First Approach: Crafting Efficient Flink Apps

Debezium vs. the World: An Overview of the CDC Ecosystem

Beyond Tiered Storage: Serverless Kafka with No Local Disks

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Exploring Multimodal Embeddings with MilvusZilliz

Architecting Cloud Native ApplicationsWSO2

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Understanding the FAA Part 107 License ..Christopher Logan Kennedy

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Corporate and higher education May webinar.pptxRustici Software

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

MINDCTI Revenue Release Quarter One 2024MIND CTI

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Exploring Multimodal Embeddings with Milvus

Architecting Cloud Native Applications

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Apidays New York 2024 - The value of a flexible API Management solution for O...

Understanding the FAA Part 107 License ..

presentation ICT roal in 21st century education

Artificial Intelligence Chap.5 : Uncertainty

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Corporate and higher education May webinar.pptx

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

[BuildWithAI] Introduction to Gemini.pdf

Vector Search -An Introduction in Oracle Database 23ai.pptx

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

MINDCTI Revenue Release Quarter One 2024

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

How to Troubleshoot Apps for the Modern Connected Worker

Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022

1. Handling Failure with Grace Stream’s uncaught exception handler Walker Carlson wcarlson@conﬂuent.io

2. KIPs 663, 671 and 696 • KIP-663 - Threading model moves to be more dynamic • KIP-671 - Introduce a new handler to prevent issues we have seen in the past • KIP-696 - Changed the state machine

3. ERROR state Same as NOT_RUNNING but triggered by an expectation instead of a close call. CREATED RUNNING RE- BALANCING PENDING SHUTDOWN PENDING ERROR NOT RUNNING ERROR

4. Old Handler • Uses the Java runtime handler behind the scenes • Kills the thread • Cascading thread death is possible if not aware • Is deprecated and will be removed soon

5. New Handler • User code returns an ENUM to choose behavior after notiﬁcation logic is done • Takes advantage of the threading model to make a thread replacement algorithm • Gives more ﬂexibility • Does not do anything new with global threads

6. Shutdown Client • Closest to the old behavior • Close the client to ERROR state instead of one thread at a time

7. Shutdown Application • Uses the rebalance protocall to try to shutdown all clients of the application • May fail if a different client is not in the group • Fastest/only internal way to stop all clients

8. Replace Thread • New thread is brought up with new ID • Rebalances • Works with static membership • Not always a good idea