Large networks consist of a diverse range of equipment, across private, public, hybrid clouds and partner networks. A hierarchical network has layers of infrastructure, catering to access, core, or distribution roles, managed by different organizations specialized to architect the right network hardware, software, and features for that network layer. The nature of data generated by each component can vary in type and form, including logs, events, metrics, or alarms.
The diversity of data generated by a large network is beyond human scale. Apache Kafka® is a critical hub in large networks, empowering AIOps to enhance decision making, improve analysis and insights by contextualizing large volumes of operational data. Kafka solved the big problem of collecting, processing, storing and normalizing data at scale, allowing us to focus on building the AIOps pipeline.
Our platform connects the dots across relevant operations data and provides operations teams with simple and powerful access to insights, from within increasingly popular collaboration environments like Slack and Microsoft teams. The pipeline must also integrate with automation solutions.
This session will cover how large volumes of streaming messages can be received by parallel Kafka consumers, and turned into action by network operations teams, dramatically reducing downtime and improving performance.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Alankar Sharma, Comcast
1. Kafka at the core of an AIOps pipeline
Presented by:
Sunanda Kommula (Distinguished Engineer, Selector.ai)
Alankar Sharma (Sr Principal Architect, Comcast)
Hybrid Cloud AIOps
2. WWW.SELECTOR.AI
Agenda
• Comcast hybrid cloud
• Key role of Kafka
• Selector AI
• Challenges and Solutions
• System architecture
• Observability of Data Ingestion
• CI/CD with Kafka
• AIOps
3. WWW.SELECTOR.AI
Use Case
• Hybrid cloud: infrastructure for
Internet, voice, video, storage
• Goal - Application connectivity
• Wide variety of applications
• Large cloud environment,
operationally complex
• Application perspective vs
cloud status
• End to end connectivity
Hybrid Cloud
Compute
AI/ML
Load Balancer
4. WWW.SELECTOR.AI
The Hybrid Cloud
• Cloud environment is complex,
diverse and evolving
• Application connectivity is key
• Detect grey failures
• Isolate wide-spread issues
• Why AIOps?
• Examine millions of data-points
• Connect the dots
• Faster root-cause, resolution and
remediation
Internet
AWS
Region 1
AWS
Region 2
Data
Center 1
Data
Center 2
Data
Center 3
Backbone
Future
Azure
5. WWW.SELECTOR.AI
Kafka empowers AIOps
• Critical hub of large networks
• Connects varied data sources
• Reliably delivers high velocity, volume
& variety of data
• Enables communication
between microservices
• Bridges organizations
Broker
Broker
Broker
AWS Cloud
Watch
Logs
Kafka
Clusters
Synthetics
Producer Producer Producer Producer
Broker
Broker
Broker
Network
Monitoring Telemetry
Engine Engine
Broker
Broker
Broker
ChatOps Portal
ML Storage
Kafka Cluster
Device
Metrics
Query Visualize
6. WWW.SELECTOR.AI
Selector AI: Turn-key AIOps for Instant Actionable Insights
GET RESULTS WITHIN HOURS OF DATA ONBOARDING
Declarative ETL
Hybrid, Edge & Public Cloud Data
Zero Config Analytics
Automate Closed Loop Anomaly
Remediation
Zero touch user onboarding with NLP
Slack as a Collaborative Notebook
NETWORK
APM
SYNTHETICS DATA HYPERVISOR
UNIQUE I-RANK ENGINE
6
7. WWW.SELECTOR.AI
Challenges – The 6 V's of Big Data
• Single topic
• ~350Mbps
• Noisy data
• Subscribe
• Filter
Volume
Variety
• Metrics
• Logs
• SNMP
• Avro
• JSON
Velocity
• ~230Kpps
• Deserialize
• Time-sensitive
• Ordered
• Batch
Veracity
• Trust
• Access model
• Changing data
• Changing model
Variability
• Statistical
• Events
• Correlate
Value
8. WWW.SELECTOR.AI
Solutions – Velocity & Volume
• Message filter at ingest (I/O filter) - Head of line drop
Broker message rate (~230Kpps) Post I/O filter (5Kpps)
Byte pattern I/O filter 97% noise reduction
9. WWW.SELECTOR.AI
Solutions – Velocity & Volume
Most busy consumers (1.8Kpps each) Internal gochannel sizes per decoder
• Leveraged Golang – performance, concurrency, channels
• Live dashboards for cluster KPI monitoring and
performance tuning
10. WWW.SELECTOR.AI
Solutions – Velocity & Volume
• Scale-out models
Engine
Pod 1
Engine
Pod 2
Engine
Pod 3
Kafka
Cluster
Broker
Broker
Broker
Telemetry
Metrics
Engine Configuration
Sharded Data Ingestion
Shared consumer group - Data sharding
Engine
Pod 1
Engine
Pod 2
Engine
Pod 3
Kafka
Cluster
Broker
Broker
Broker
Telemetry
Metrics
Engine Configuration
Independent consumers – I/O filters
Full Data Ingestion
11. WWW.SELECTOR.AI
Solutions – Variety & Variability
Dynamic schema
Blocked
patterns
Allowed
patterns
Schema
driven
filters
subscriptions:
- label: sevone-spdb # kafka topic
pathgroup:
- group: port # port group
paths:
- 'indicatorName=ifHCInOctets'
- 'indicatorName=ifHCInUcastPkts'
- 'indicatorName=ifHCInMulticastPkts'
- group: disk # disk group
paths:
- 'indicatorName=s1_sizebytes'
- 'indicatorName=s1_usedbytes'
- group: system # system group
paths:
- 'indicatorName=sysUpTime'
- 'indicatorName=hrProcessorLoad'
- group: bgp # bgp group
paths:
- 'indicatorName=bgpInTotalMessages'
- 'indicatorName=bgpOutTotalMessages'
- 'indicatorName=bgpPeerInUpdates'
- 'indicatorName=bgpPeerOutUpdates'
- 'indicatorName=bgpPeerState'
Dynamic subscriptions
12. WWW.SELECTOR.AI
Solutions – Value & Veracity
• Declarative ETL
• Extract, enrich, normalize
• GraphQL based record selection
• Deployment specific, zero code changes
• Model events – syslogs and SNMP from devices
• Correlate metrics, events and synthetics
• Access considerations
• Distributed control plane
• Disaggregated data pipeline
• SaaS, DMZ, on-prem PODs
18. WWW.SELECTOR.AI
AIOps – Network Fabric Status
Fabric Status
Device Logs
BGP Peering Status
Device Port Status Site Traffic In
Site Traffic Out
DC-1 DC-2 DC-3
20. WWW.SELECTOR.AI
Summary
• Operational complexity of a large, diverse, evolving cloud environment
• Correlation of multiple data sources – The full picture
• Kafka empowers AIOps by collecting and normalizing data at scale
• Selector AIOps
• Provides simple & powerful insights
• Enhances decision making
• Reduces downtime
• Improves performance