Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability

Vanessa Vuibert
Sta
ff
Production Engineer
Resilient Ka
f
ka: How DNS Tra
ff
ic Management
and Client Wrappers Ensure Availability
@V3_XD
862 14
Scale
Ka
f
ka brokers Ka
f
ka clusters
14M 9
Messages per sec GCP Regions
@V3_XD
• Maintenance
• Incidents
• Regionalize tra
ff
ic
Tra
ff
ic management use cases
Kubernetes (K8s) out of the box
🔓open source
Kafka broker
K8s out of the box
dig +short service.namespace.svc.cluster.local
IP0
IP1
IP2
K8s out of the box
bootstrap.servers=
service.namespace.svc.cluster.local:9092
K8s out of the box
dig +short pod2.service.namespace.svc.cluster.local
IP2
K8s out of the box
advertised.listeners=
pod2.service.namespace.svc.cluster.local:9092
• Readiness
• Startup
• Liveness
K8s StatefulSet: probes
dig +short service.namespace.svc.cluster.local
IP0
IP2
K8s readiness probe
dig +short service.namespace.svc.cluster.local
IP0
IP2
IP3
K8s readiness probe
not ready
publishNotReadyAddresses: true
Regional pairs
External tra
ff
ic: load balancers
External tra
ff
ic: load balancers
bootstrap.servers
External tra
ff
ic: load balancers
advertised.listeners
• Issues scaling
• Manual broker DNS
records
• Limited tra
ff
ic
control
Built automation with
k8s controllers.
Stateful buddy: load balancers
🔒closed source
Name buddy: DNS records
🔒closed source
Ka
f
ka access buddy: endpoints
🔒closed source
Ka
f
ka Access Buddy: consumer
Ka
f
ka Access Buddy: producer failover
east
- Elasticsearch on call
“Let me failover real quick.”
Faster failovers with a
DNS tra
ff
ic manager.
DNS tra
ff
ic manager
🔒closed source
DNS tra
ff
ic manager: normal
dig +short us-east1.somedomain.com
US-East1-IP
DNS tra
ff
ic manager: failover
dig +short us-east1.somedomain.com
US-Central1-IP
- A Ka
f
ka client
“DNS trickery.”
used to take
40
Minutes
now only takes
1
Minutes
Failover time savings
@V3_XD
Incident during
fl
ashsale
Failover during
fl
ashsale
US Central1 -> US East1
Reduced toil with
client wrappers.
• Failover reconnection
• Everything needed for connection
• Ruby, go and python
Client wrappers
K8s Deployment template: bootstrap.servers
K8s Deployment template: client ID
K8s Deployment template
Improved availability
with local consumers.
• More availability
• Reduced latency
• Reduced storage costs
• Reduced network costs
Local consumers
Aggregate consumer
Local consumers
Local consumers: DNS records
Aggregate
500
ms
Regional
20
ms
Latency 99th
@V3_XD
Connect directly
through private IPs.
• More secure
• Reduced network costs
• Fetch from closest replica: KIP
-
392
Public to private tra
ff
ic
Tra
ff
ic manager: pod IPs
Reduction
-6%
bill
Network represents
29%
bill
Network cost reduction
@V3_XD
• GKE 1.24 -> 1.25
incident
• Apply
f
irewall rules
• LB more secure for
public tra
ff
ic
Failover: pod IPs
Single stop shop with Multi-
Cluster Services (MCS).
MCS endpoints
🔒closed source
Tra
ff
ic sources
Regional pairs: uneven distribution
Regionalize tra
ff
ic: Ka
f
ka access buddy
east
Regionalize tra
ff
ic: MCS
40 18
MCS time savings
Minutes to regionalize tra
ff
ic Minutes to deploy
1 13
Minutes after migration Minutes after migration
@V3_XD
Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability
• Resiliency: DNS
tra
ff
ic management
• Toil: client wrappers
• Availability: local
consumption
Thanks!
@V3_XD
1 von 58

Recomendados

Keystone - ApacheCon 2016 von
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
301 views75 Folien
Capital One Delivers Risk Insights in Real Time with Stream Processing von
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
1.6K views53 Folien
From Three Nines to Five Nines - A Kafka Journey von
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyAllen (Xiaozhong) Wang
1.4K views39 Folien
Accelerated SDN in Azure von
Accelerated SDN in AzureAccelerated SDN in Azure
Accelerated SDN in AzureOpen Networking Summit
712 views25 Folien
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic... von
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK
282 views23 Folien
Cloud Native SDN von
Cloud Native SDNCloud Native SDN
Cloud Native SDNRomana Project
1.9K views17 Folien

Más contenido relacionado

Similar a Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability

Uber Real Time Data Analytics von
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
2.4K views71 Folien
In Flux Limiting for a multi-tenant logging service von
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceDataWorks Summit/Hadoop Summit
1.4K views15 Folien
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015 von
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
1.2K views96 Folien
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022 von
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
741 views27 Folien
DNS Survival Guide. von
DNS Survival Guide.DNS Survival Guide.
DNS Survival Guide.Qrator Labs
102 views53 Folien
DNS Survival Guide von
DNS Survival GuideDNS Survival Guide
DNS Survival GuideAPNIC
403 views53 Folien

Similar a Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability(20)

Uber Real Time Data Analytics von Ankur Bansal
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
Ankur Bansal2.4K views
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015 von Monal Daxini
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini1.2K views
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022 von HostedbyConfluent
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
HostedbyConfluent741 views
DNS Survival Guide. von Qrator Labs
DNS Survival Guide.DNS Survival Guide.
DNS Survival Guide.
Qrator Labs102 views
DNS Survival Guide von APNIC
DNS Survival GuideDNS Survival Guide
DNS Survival Guide
APNIC403 views
Experience with Kafka & Storm von Otto Mok
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
Otto Mok4.9K views
Battle Tested Event-Driven Patterns for your Microservices Architecture von Natan Silnitsky
Battle Tested Event-Driven Patterns for your Microservices ArchitectureBattle Tested Event-Driven Patterns for your Microservices Architecture
Battle Tested Event-Driven Patterns for your Microservices Architecture
Natan Silnitsky170 views
Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris... von Natan Silnitsky
Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris...Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris...
Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris...
Natan Silnitsky143 views
AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual... von Amazon Web Services
AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual...AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual...
AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn... von HostedbyConfluent
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
HostedbyConfluent1.3K views
Summit 16: Achieving Low Latency Network Function with Opnfv von OPNFV
Summit 16: Achieving Low Latency Network Function with OpnfvSummit 16: Achieving Low Latency Network Function with Opnfv
Summit 16: Achieving Low Latency Network Function with Opnfv
OPNFV816 views
PLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGate von PROIDEA
PLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGatePLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGate
PLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGate
PROIDEA229 views
Integrating OpenStack To Existing Infrastructure von Hui Cheng
Integrating OpenStack To Existing InfrastructureIntegrating OpenStack To Existing Infrastructure
Integrating OpenStack To Existing Infrastructure
Hui Cheng3.7K views
(BDT318) How Netflix Handles Up To 8 Million Events Per Second von Amazon Web Services
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services79.1K views
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc... von Docker, Inc.
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker, Inc.2.7K views
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning von Guido Schmutz
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz1.6K views
Practice of large Hadoop cluster in China Mobile von DataWorks Summit
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit785 views
ddsf-student-presentation_756205.pptx von ssuser498be2
ddsf-student-presentation_756205.pptxddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptx
ssuser498be22 views
FreeSWITCH as a Microservice von Evan McGee
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
Evan McGee3.4K views

Último

Generative AI Models & Their Applications von
Generative AI Models & Their ApplicationsGenerative AI Models & Their Applications
Generative AI Models & Their ApplicationsSN
8 views1 Folie
802.11 Computer Networks von
802.11 Computer Networks802.11 Computer Networks
802.11 Computer NetworksTusharChoudhary72015
10 views33 Folien
MK__Cert.pdf von
MK__Cert.pdfMK__Cert.pdf
MK__Cert.pdfHassan Khan
10 views1 Folie
SUMIT SQL PROJECT SUPERSTORE 1.pptx von
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptxSumit Jadhav
13 views26 Folien
sam_software_eng_cv.pdf von
sam_software_eng_cv.pdfsam_software_eng_cv.pdf
sam_software_eng_cv.pdfsammyigbinovia
5 views5 Folien
Instrumentation & Control Lab Manual.pdf von
Instrumentation & Control Lab Manual.pdfInstrumentation & Control Lab Manual.pdf
Instrumentation & Control Lab Manual.pdfNTU Faisalabad
5 views63 Folien

Último(20)

Generative AI Models & Their Applications von SN
Generative AI Models & Their ApplicationsGenerative AI Models & Their Applications
Generative AI Models & Their Applications
SN8 views
SUMIT SQL PROJECT SUPERSTORE 1.pptx von Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 13 views
Instrumentation & Control Lab Manual.pdf von NTU Faisalabad
Instrumentation & Control Lab Manual.pdfInstrumentation & Control Lab Manual.pdf
Instrumentation & Control Lab Manual.pdf
NTU Faisalabad 5 views
GDSC Mikroskil Members Onboarding 2023.pdf von gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil51 views
zincalume water storage tank design.pdf von 3D LABS
zincalume water storage tank design.pdfzincalume water storage tank design.pdf
zincalume water storage tank design.pdf
3D LABS5 views
Machine Element II Course outline.pdf von odatadese1
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdf
odatadese19 views
Design of machine elements-UNIT 3.pptx von gopinathcreddy
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptx
gopinathcreddy32 views
Advances in micro milling: From tool fabrication to process outcomes von Shivendra Nandan
Advances in micro milling: From tool fabrication to process outcomesAdvances in micro milling: From tool fabrication to process outcomes
Advances in micro milling: From tool fabrication to process outcomes
MSA Website Slideshow (16).pdf von msaucla
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdf
msaucla68 views
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L... von Anowar Hossain
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...
Anowar Hossain13 views

Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability