Kapacitor Manager

©2018 RingCentral, Inc. Proprietary & Confidential.1
Kapacitor Manager
Yuri Ardulov
Alexey Smirnov
Lyubov Fomicheva
Valery Tishkov
Vyacheslav Shvetsov

©2018 RingCentral, Inc. Proprietary & Confidential.2
Collaborative
Communications
Contact Center
Video & meetings
Cloud PBX
Team messaging
Open Platform
AnalyticsGlobal
User Experience2018 RingCentral,
RingCentral
Product

3 | © 2018 RingCentral, Inc. All rights reserved.
Our Journey
RingCentral
Office
2008
Office for
Enterprise &
Video Meetings
Contact Center,
Team Messaging
& Open Platform
Collaborative
Communications
& Global Office
2014
2015
2016 Analytics
& Quality of
Service
Collaborative
Meetings,
Collaborative
Contact Center
& Pulse
2017
2018

RingCentral IP Telecommunication Company
▪ 500000 business customers
▪ 10 data centers across the globe (US, Europe, APAC)
▪ 20K+ servers
▪ 30K simultaneous phone calls
▪ 100K Faxes per day
▪ 20M calls per day

Tools Landscape
CMDB

From Zabbix to Influx - Starting Points
▪ North America:
• 2 Data Centers
• # of hosts: 10K+
• # of metrics: 2.5M+
• # of triggers: 700K+
▪ Europe and Other:
• Multiple Data Centers
• # of hosts: 5K+
• # of metrics: 700K+
• # of triggers: 250K+

Do-It-Yourself (DIY) Framework

DIY Framework
▪ Alert As A Code
▪ Send Any Application Metrics
▪ Dash-Board As A Code
▪ Horizontally Scalable
▪ Sand-Box for graduation
▪ Structure independent
▪ Fully automated service
▪ Integration with Deployment systems

Goals of the project:
▪ Structure independent
▪ Collect metrics with high granularity
▪ Fully automated service
▪ Integration with Deployment systems
▪ Engineering as self-service:
• Alerting as a code
• Dashboards as a code
• CD support
▪ HA implementation
▪ Metrics collection through http get

Design of proposed solution

Problems with existing Kapacitor (v1.3)
▪ Low efficiency (Tasks per Instance)
▪ Not responsive under High Load (API stops functioning)
▪ Streaming tasks (1000+ ) cause high CPU load
▪ Batch tasks: if not grouped by InfluxDB utilizing whole CPU ???
▪ Low internal concurrency: Alert node stalls on writing into
Internal Topic (Alerts stops producing)

Test Cases
1. Alert node -> .tcp() -> Logstash tcp listener -> Kafka
2. Alert node -> .post() -> Logstash http listener -> Kafka
3. Alert node -> InfluxDBOut() -> Logstash http listener as InfluxDB
cluster -> Kafka
4. Eval node -> InfluxDBOut() -> InfluxDB cluster
▪ Kapacitor instance:
• CPU cores: 24
• RAM: 256 GB
1000 tasks generated in following flow:
1. Put value=1 to Kapacitor /write API.
2. Wait for 0.5 seconds
3. Put value=0 to Kapacitor /write API.
4. Wait for 0.5 seconds
5. Repeat 1-4 for 100 times

TestCases
▪ Example: Alert node -> .tcp() -> Logstash tcp listener -> Kafka
▪ Task:
▪ stream
▪ |from()
▪ |alert()
▪ .id('alert-id-{{number}}')
▪ .message('alert message {{number}}')
▪ .info(lambda: "value" == 1)
▪ .details('''{"details": "some details", "fqdn": "test.example.com"}''')
▪ .tcp('172.17.0.1:25000')

Results: Case 1
Result:
• Delay between event generation
time and time when event received
by Logstash up to 7 minutes.
• Looks like root cause is .tcp
method which opens TCP
connection for every event.

Results: Case 2
Result:
by Logstash up to 30 seconds.
• Looks much better but delay
increases on increased number of
tasks

Results: Case 3
Result:
by Logstash ~3 seconds.

Results: Case 4
5000 tasks generated. The same test case.
Result:
• Events received by InfluxDB cluster in near
real-time.

Project Goals
▪ Scalable Solution: 50K+ alerts per location
▪ Increase efficiency –> Maximize the Ratio of tasks per Instance
▪ Manageable Solution -> Dynamically change the tasks
allocation and balancing
▪ No single point of failure
▪ Housekeeping capabilities
▪ Tasks Sand-Boxing as a part of service
▪ RESTfull

Kapacitor Manager (KM): Functional Description

CEP problems
▪ Single core processing
▪ Low Performance
▪ very bad scalability

Open and Closed Events

Kapacitor And Event State Change Detection

K8S: Deployment Media for KM and Kapacitors Nodes

Future Plans
▪ Sand Boxing and Routing
▪ Smart Rebalancing
▪ Open Source
▪ Align with TICK stack upcoming releases
▪ UI

THANK YOU

Kapacitor Manager

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Kapacitor Manager

Ähnlich wie Kapacitor Manager (20)

Mehr von InfluxData

Mehr von InfluxData (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Kapacitor Manager