SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Kafka at the core of an AIOps pipeline
Presented by:
Sunanda Kommula (Distinguished Engineer, Selector.ai)
Alankar Sharma (Sr Principal Architect, Comcast)
Hybrid Cloud AIOps
WWW.SELECTOR.AI
Agenda
• Comcast hybrid cloud
• Key role of Kafka
• Selector AI
• Challenges and Solutions
• System architecture
• Observability of Data Ingestion
• CI/CD with Kafka
• AIOps
WWW.SELECTOR.AI
Use Case
• Hybrid cloud: infrastructure for
Internet, voice, video, storage
• Goal - Application connectivity
• Wide variety of applications
• Large cloud environment,
operationally complex
• Application perspective vs
cloud status
• End to end connectivity
Hybrid Cloud
Compute
AI/ML
Load Balancer
WWW.SELECTOR.AI
The Hybrid Cloud
• Cloud environment is complex,
diverse and evolving
• Application connectivity is key
• Detect grey failures
• Isolate wide-spread issues
• Why AIOps?
• Examine millions of data-points
• Connect the dots
• Faster root-cause, resolution and
remediation
Internet
AWS
Region 1
AWS
Region 2
Data
Center 1
Data
Center 2
Data
Center 3
Backbone
Future
Azure
WWW.SELECTOR.AI
Kafka empowers AIOps
• Critical hub of large networks
• Connects varied data sources
• Reliably delivers high velocity, volume
& variety of data
• Enables communication
between microservices
• Bridges organizations
Broker
Broker
Broker
AWS Cloud
Watch
Logs
Kafka
Clusters
Synthetics
Producer Producer Producer Producer
Broker
Broker
Broker
Network
Monitoring Telemetry
Engine Engine
Broker
Broker
Broker
ChatOps Portal
ML Storage
Kafka Cluster
Device
Metrics
Query Visualize
WWW.SELECTOR.AI
Selector AI: Turn-key AIOps for Instant Actionable Insights
GET RESULTS WITHIN HOURS OF DATA ONBOARDING
Declarative ETL
Hybrid, Edge & Public Cloud Data
Zero Config Analytics
Automate Closed Loop Anomaly
Remediation
Zero touch user onboarding with NLP
Slack as a Collaborative Notebook
NETWORK
APM
SYNTHETICS DATA HYPERVISOR
UNIQUE I-RANK ENGINE
6
WWW.SELECTOR.AI
Challenges – The 6 V's of Big Data
• Single topic
• ~350Mbps
• Noisy data
• Subscribe
• Filter
Volume
Variety
• Metrics
• Logs
• SNMP
• Avro
• JSON
Velocity
• ~230Kpps
• Deserialize
• Time-sensitive
• Ordered
• Batch
Veracity
• Trust
• Access model
• Changing data
• Changing model
Variability
• Statistical
• Events
• Correlate
Value
WWW.SELECTOR.AI
Solutions – Velocity & Volume
• Message filter at ingest (I/O filter) - Head of line drop
Broker message rate (~230Kpps) Post I/O filter (5Kpps)
Byte pattern I/O filter 97% noise reduction
WWW.SELECTOR.AI
Solutions – Velocity & Volume
Most busy consumers (1.8Kpps each) Internal gochannel sizes per decoder
• Leveraged Golang – performance, concurrency, channels
• Live dashboards for cluster KPI monitoring and
performance tuning
WWW.SELECTOR.AI
Solutions – Velocity & Volume
• Scale-out models
Engine
Pod 1
Engine
Pod 2
Engine
Pod 3
Kafka
Cluster
Broker
Broker
Broker
Telemetry
Metrics
Engine Configuration
Sharded Data Ingestion
Shared consumer group - Data sharding
Engine
Pod 1
Engine
Pod 2
Engine
Pod 3
Kafka
Cluster
Broker
Broker
Broker
Telemetry
Metrics
Engine Configuration
Independent consumers – I/O filters
Full Data Ingestion
WWW.SELECTOR.AI
Solutions – Variety & Variability
Dynamic schema
Blocked
patterns
Allowed
patterns
Schema
driven
filters
subscriptions:
- label: sevone-spdb # kafka topic
pathgroup:
- group: port # port group
paths:
- 'indicatorName=ifHCInOctets'
- 'indicatorName=ifHCInUcastPkts'
- 'indicatorName=ifHCInMulticastPkts'
- group: disk # disk group
paths:
- 'indicatorName=s1_sizebytes'
- 'indicatorName=s1_usedbytes'
- group: system # system group
paths:
- 'indicatorName=sysUpTime'
- 'indicatorName=hrProcessorLoad'
- group: bgp # bgp group
paths:
- 'indicatorName=bgpInTotalMessages'
- 'indicatorName=bgpOutTotalMessages'
- 'indicatorName=bgpPeerInUpdates'
- 'indicatorName=bgpPeerOutUpdates'
- 'indicatorName=bgpPeerState'
Dynamic subscriptions
WWW.SELECTOR.AI
Solutions – Value & Veracity
• Declarative ETL
• Extract, enrich, normalize
• GraphQL based record selection
• Deployment specific, zero code changes
• Model events – syslogs and SNMP from devices
• Correlate metrics, events and synthetics
• Access considerations
• Distributed control plane
• Disaggregated data pipeline
• SaaS, DMZ, on-prem PODs
WWW.SELECTOR.AI • CONFIDENTIAL AND PROPRIETARY
System Architecture
13
Disaggregated data pipeline
DMZ
Engine
Engine
Engine
Broker
Broker
Broker
Producer
Producer
Producer
Customer
AWS
Cloud
Watch
Network
Monitoring
Elastic
Search
Selector SaaS
Kafka
Cluster
Broker
Broker
Broker
Kafka
Cluster
Ingest
Engine
ML
Storage
ChatOps
Query
Visualize
Portal
WWW.SELECTOR.AI • CONFIDENTIAL AND PROPRIETARY
Engine Micro Architecture
14
Ingest I/O Filter Decode
Match
Label
Format
Export
Extract
Selector
Pipeline
WWW.SELECTOR.AI
Observability
• Highly visible Kafka
ingest pipeline
• Aggregated &
granular KPIs
• broker-clusters, topics,
consumers, partitions,
decoders, match rates
• Metric ingest KPIs
• cumulative, port, BGP,
system, memory
• Reflects subscription
configuration models
Match rates per partition Metric ingest rate (~0.5Kpps)
Port metrics ingest rate BGP metrics ingest rate
WWW.SELECTOR.AI
CI/CD Pipeline
• Why – Replicate every deployment in a test environment
• What – On demand launch of Kafka cluster, fake data, fake signals
• How – Jenkins pipelines
• Deployment specific testbeds
• Kafka cluster & producers
• Integration, performance & scale tests
• Fake test data for Kafka producer
• Generate network metrics, syslogs, SNMP traps
• Policy driven metrics
• In & out of bounds traffic rates, cpu, memory, protocol metrics
• Errors, alarms, synthetics and signals
• Deterministic violations
WWW.SELECTOR.AI
AIOps – Application Perspective
Application Hop Counts
Application Latency Application Packetloss
WWW.SELECTOR.AI
AIOps – Network Fabric Status
Fabric Status
Device Logs
BGP Peering Status
Device Port Status Site Traffic In
Site Traffic Out
DC-1 DC-2 DC-3
WWW.SELECTOR.AI
AIOps – Correlations
• Connecting the dots
• Detect grey failures
• Examine impacts
Port
Availability
BGP
Status
WWW.SELECTOR.AI
Summary
• Operational complexity of a large, diverse, evolving cloud environment
• Correlation of multiple data sources – The full picture
• Kafka empowers AIOps by collecting and normalizing data at scale
• Selector AIOps
• Provides simple & powerful insights
• Enhances decision making
• Reduces downtime
• Improves performance
Questions & Answers
Thank you for your time!

Weitere ähnliche Inhalte

Was ist angesagt?

Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
confluent
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
confluent
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
HostedbyConfluent
 

Was ist angesagt? (20)

Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
 
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
 
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
 
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
 
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsDeath of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projects
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
 
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, MicrosoftAzure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
 

Ähnlich wie Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Alankar Sharma, Comcast

Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 

Ähnlich wie Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Alankar Sharma, Comcast (20)

Решения WANDL и NorthStar для операторов
Решения WANDL и NorthStar для операторовРешения WANDL и NorthStar для операторов
Решения WANDL и NorthStar для операторов
 
LinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterLinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data Center
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 
Hybrid Integration with BizTalk Server - ACSUG
Hybrid Integration with BizTalk Server - ACSUGHybrid Integration with BizTalk Server - ACSUG
Hybrid Integration with BizTalk Server - ACSUG
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
 
The right Wireless Architecture for you
The right Wireless Architecture for youThe right Wireless Architecture for you
The right Wireless Architecture for you
 
Architectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidthArchitectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidth
 
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Virtual Transport Network Service SDN-Based Optical Network Virtualization
Virtual Transport Network Service SDN-Based Optical Network VirtualizationVirtual Transport Network Service SDN-Based Optical Network Virtualization
Virtual Transport Network Service SDN-Based Optical Network Virtualization
 
QoS, QoS Baby
QoS, QoS BabyQoS, QoS Baby
QoS, QoS Baby
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 

Mehr von HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

Mehr von HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Alankar Sharma, Comcast

  • 1. Kafka at the core of an AIOps pipeline Presented by: Sunanda Kommula (Distinguished Engineer, Selector.ai) Alankar Sharma (Sr Principal Architect, Comcast) Hybrid Cloud AIOps
  • 2. WWW.SELECTOR.AI Agenda • Comcast hybrid cloud • Key role of Kafka • Selector AI • Challenges and Solutions • System architecture • Observability of Data Ingestion • CI/CD with Kafka • AIOps
  • 3. WWW.SELECTOR.AI Use Case • Hybrid cloud: infrastructure for Internet, voice, video, storage • Goal - Application connectivity • Wide variety of applications • Large cloud environment, operationally complex • Application perspective vs cloud status • End to end connectivity Hybrid Cloud Compute AI/ML Load Balancer
  • 4. WWW.SELECTOR.AI The Hybrid Cloud • Cloud environment is complex, diverse and evolving • Application connectivity is key • Detect grey failures • Isolate wide-spread issues • Why AIOps? • Examine millions of data-points • Connect the dots • Faster root-cause, resolution and remediation Internet AWS Region 1 AWS Region 2 Data Center 1 Data Center 2 Data Center 3 Backbone Future Azure
  • 5. WWW.SELECTOR.AI Kafka empowers AIOps • Critical hub of large networks • Connects varied data sources • Reliably delivers high velocity, volume & variety of data • Enables communication between microservices • Bridges organizations Broker Broker Broker AWS Cloud Watch Logs Kafka Clusters Synthetics Producer Producer Producer Producer Broker Broker Broker Network Monitoring Telemetry Engine Engine Broker Broker Broker ChatOps Portal ML Storage Kafka Cluster Device Metrics Query Visualize
  • 6. WWW.SELECTOR.AI Selector AI: Turn-key AIOps for Instant Actionable Insights GET RESULTS WITHIN HOURS OF DATA ONBOARDING Declarative ETL Hybrid, Edge & Public Cloud Data Zero Config Analytics Automate Closed Loop Anomaly Remediation Zero touch user onboarding with NLP Slack as a Collaborative Notebook NETWORK APM SYNTHETICS DATA HYPERVISOR UNIQUE I-RANK ENGINE 6
  • 7. WWW.SELECTOR.AI Challenges – The 6 V's of Big Data • Single topic • ~350Mbps • Noisy data • Subscribe • Filter Volume Variety • Metrics • Logs • SNMP • Avro • JSON Velocity • ~230Kpps • Deserialize • Time-sensitive • Ordered • Batch Veracity • Trust • Access model • Changing data • Changing model Variability • Statistical • Events • Correlate Value
  • 8. WWW.SELECTOR.AI Solutions – Velocity & Volume • Message filter at ingest (I/O filter) - Head of line drop Broker message rate (~230Kpps) Post I/O filter (5Kpps) Byte pattern I/O filter 97% noise reduction
  • 9. WWW.SELECTOR.AI Solutions – Velocity & Volume Most busy consumers (1.8Kpps each) Internal gochannel sizes per decoder • Leveraged Golang – performance, concurrency, channels • Live dashboards for cluster KPI monitoring and performance tuning
  • 10. WWW.SELECTOR.AI Solutions – Velocity & Volume • Scale-out models Engine Pod 1 Engine Pod 2 Engine Pod 3 Kafka Cluster Broker Broker Broker Telemetry Metrics Engine Configuration Sharded Data Ingestion Shared consumer group - Data sharding Engine Pod 1 Engine Pod 2 Engine Pod 3 Kafka Cluster Broker Broker Broker Telemetry Metrics Engine Configuration Independent consumers – I/O filters Full Data Ingestion
  • 11. WWW.SELECTOR.AI Solutions – Variety & Variability Dynamic schema Blocked patterns Allowed patterns Schema driven filters subscriptions: - label: sevone-spdb # kafka topic pathgroup: - group: port # port group paths: - 'indicatorName=ifHCInOctets' - 'indicatorName=ifHCInUcastPkts' - 'indicatorName=ifHCInMulticastPkts' - group: disk # disk group paths: - 'indicatorName=s1_sizebytes' - 'indicatorName=s1_usedbytes' - group: system # system group paths: - 'indicatorName=sysUpTime' - 'indicatorName=hrProcessorLoad' - group: bgp # bgp group paths: - 'indicatorName=bgpInTotalMessages' - 'indicatorName=bgpOutTotalMessages' - 'indicatorName=bgpPeerInUpdates' - 'indicatorName=bgpPeerOutUpdates' - 'indicatorName=bgpPeerState' Dynamic subscriptions
  • 12. WWW.SELECTOR.AI Solutions – Value & Veracity • Declarative ETL • Extract, enrich, normalize • GraphQL based record selection • Deployment specific, zero code changes • Model events – syslogs and SNMP from devices • Correlate metrics, events and synthetics • Access considerations • Distributed control plane • Disaggregated data pipeline • SaaS, DMZ, on-prem PODs
  • 13. WWW.SELECTOR.AI • CONFIDENTIAL AND PROPRIETARY System Architecture 13 Disaggregated data pipeline DMZ Engine Engine Engine Broker Broker Broker Producer Producer Producer Customer AWS Cloud Watch Network Monitoring Elastic Search Selector SaaS Kafka Cluster Broker Broker Broker Kafka Cluster Ingest Engine ML Storage ChatOps Query Visualize Portal
  • 14. WWW.SELECTOR.AI • CONFIDENTIAL AND PROPRIETARY Engine Micro Architecture 14 Ingest I/O Filter Decode Match Label Format Export Extract Selector Pipeline
  • 15. WWW.SELECTOR.AI Observability • Highly visible Kafka ingest pipeline • Aggregated & granular KPIs • broker-clusters, topics, consumers, partitions, decoders, match rates • Metric ingest KPIs • cumulative, port, BGP, system, memory • Reflects subscription configuration models Match rates per partition Metric ingest rate (~0.5Kpps) Port metrics ingest rate BGP metrics ingest rate
  • 16. WWW.SELECTOR.AI CI/CD Pipeline • Why – Replicate every deployment in a test environment • What – On demand launch of Kafka cluster, fake data, fake signals • How – Jenkins pipelines • Deployment specific testbeds • Kafka cluster & producers • Integration, performance & scale tests • Fake test data for Kafka producer • Generate network metrics, syslogs, SNMP traps • Policy driven metrics • In & out of bounds traffic rates, cpu, memory, protocol metrics • Errors, alarms, synthetics and signals • Deterministic violations
  • 17. WWW.SELECTOR.AI AIOps – Application Perspective Application Hop Counts Application Latency Application Packetloss
  • 18. WWW.SELECTOR.AI AIOps – Network Fabric Status Fabric Status Device Logs BGP Peering Status Device Port Status Site Traffic In Site Traffic Out DC-1 DC-2 DC-3
  • 19. WWW.SELECTOR.AI AIOps – Correlations • Connecting the dots • Detect grey failures • Examine impacts Port Availability BGP Status
  • 20. WWW.SELECTOR.AI Summary • Operational complexity of a large, diverse, evolving cloud environment • Correlation of multiple data sources – The full picture • Kafka empowers AIOps by collecting and normalizing data at scale • Selector AIOps • Provides simple & powerful insights • Enhances decision making • Reduces downtime • Improves performance
  • 21. Questions & Answers Thank you for your time!