Wide Open West Is one of the US' top broadband providers with over 3,000 employees. They aim to connect residential homes and businesses to the world with fast and reliable internet, TV and phone services. WOW uses SNMP and Telegraf to collect network data from cable modems and metrics from VMs/containers; they use Kafka to stream all time-stamped data to InfluxDB. Kapacitor is used to send alerts to Slack, ServiceNow, and email. Discover how WOW is using a time series platform to collect, monitor, and alert on their entire service delivery network.
Join this webinar as Peter Jones and Dylan Shorter dives into:
WOW's approach to reducing infrastructure downtime and improving service uptime
Observability and alerting best practices
How they use the InfluxDB platform to monitor 600K + devices
2. proprietary and confidential
Brief Introductions
Peter Jones - Senior Manager, Software & Product Integration Engineering,
WOW! Internet, TV, Phone
● ~22 years in IT/telecom/software development
● 20 years with WOW! in various roles
Dylan Shorter - Engineer III Software & Product Integration Engineering,
WOW! Internet, TV, Phone
● ~18 years in IT/telecom/software development
● Almost 3 years with WOW!
3. proprietary and confidential
What is WOW?
WideOpenWest (dba WOW! Internet, TV, Phone) offers Internet, video, and voice services in
a number of markets in Michigan, Florida, Georgia, Alabama, South Carolina, and
Tennessee.
• Founded in 1996 in Denver, Colorado
• 2001, acquisition of Americast properties in Chicago, Cleveland, Columbus, Detroit
• 2006, acquisition of Sigecom, LLC in Evansville, Indiana
• 2012, acquisition of Knology, who operated in 13 markets in the Southeast & Mid-west
• 2017, IPO
• 2021, sale of IL, IN, OH, & MD properties. Announced build outs of additional fiber builds in
Seminole & Orange Counties, FL & Greenville County, SC.
5. proprietary and confidential
What is DOCSIS?
Data Over Cable Service Interface Specification (DOCSIS) was originally developed by CableLabs.
DOCSIS
version[13]
Production
date
Maximum
downstream capacity
Maximum
upstream capacity
Features
1.0 1997 40 Mbit/s 10 Mbit/s Initial release
1.1 2001 Added VOIP capabilities and QoS mechanisms
2.0 2002 30 Mbit/s Enhanced upstream data rates
3.0 2006 1 Gbit/s 200 Mbit/s Significantly increased downstream and upstream data rates,
introduced support for IPv6, introduced channel bonding
3.1 2013 10 Gbit/s 1–2 Gbit/s Significantly increased downstream and upstream data rates,
restructured channel specifications
4.0 2017 6 Gbit/s Significantly increased upstream rates from DOCSIS 3.1
7. proprietary and confidential
Concerning Monitoring - Nodes
• Circa 2015, with much of the integration of the Knology acquisition completed, we asked ourselves: How can
we monitor individual customer cable modems within the network as well as determine the health of a node
as a whole?
• Various markets had different monitoring platforms
• Purchasing hardware to support monitoring of individual nodes was cost prohibitive
• Rudimentary processes were already in place for gathering telemetry data from individual modems
• Solution: Add additional resources to the existing telemetry polling processes and add logic for alerting on
potential outage conditions, thus creating a homegrown solution for node monitoring
8. proprietary and confidential
Node Monitoring Solved… Sort of…
Our cable modem telemetry polling process used the same time series database for 5 years.
When it worked, it was great. However, the database often had to be restarted to get the
read and write databases back in sync and about once/year the database would have a
weekend killing catastrophic outage.
9. proprietary and confidential
Enter InfluxDB
In 2020, we compared a couple of potential replacements for our previous time series
database. This load testing was performed with Time Series Benchmark Suite
Database Read Speed Write Speed
TimescaleDB 26.88 queries/sec 5189.18 rows/sec
InfluxDB 22.80 queries/sec 111245.38 rows/sec
11. proprietary and confidential
Implementation
• Started with InfluxDB 1.8 OSS as a POC
• Eventually moved to 2.0 upon release
• Decided to purchase InfluxDB Enterprise for an all-in-solution
• We currently have a 4 data node cluster in production and a 2
data node cluster in test running on Openstack
• Cluster setup and installation has been automated using
Ansible
Setting up InfluxDB Enterprise was extremely easy, support has
been great and we are very happy with the product.
12. proprietary and confidential
The Solution In Action
● Primary purpose is for monitoring and alerting and general
telemetry.
● Data collection:
○ Telegraf
○ Filebeats
○ Custom scripts and Vendor APIs
○ Snmp
● Collected data is sent to Kafka which is forwarded into InfluxDB
InfluxDB has given us the flexibility to work around restrictions on vendor
managed systems and enabled us to collect and monitor data from all
kinds of sources
13. proprietary and confidential
The Solution In Action - Modem Data
One of our biggest current data sets is modem data.
We collect status and signal information from over 650k modems on 5 minutes
polling cycles.
This data is used for:
● Analytics
● Alarming
● Troubleshooting
● Reporting and Visualization (we opted to use Grafana)
15. proprietary and confidential
The Solution In Action - Monitoring Streaming Video Feeds
We are using InfluxDB to help monitor services provided by WOW! including statuses of streaming video channels.
16. proprietary and confidential
Challenges
• Steep learning curve (not easy to hand-off to an operations team)
⁃ Needing to learn two new and very different query languages, TICKscript &
Flux
• Using InfluxDB 2.0 as a POC and then having to somewhat relearn 1.x once moving to
InfluxDB Enterprise
• Sometimes difficult to convince vendors to integrate with it
• Testing / debugging non trivial (especially kapacitor)
• ServiceNow integration didn’t work for us out-of-the-box
17. proprietary and confidential
Strengths
• Ease of setup/installation
• Performance
• Support
• Allows for infrastructure as code
• Flexibility and power
• Telegraf (my new favorite hammer)
• Push model data collection (as opposed to pull model like Prometheus)
18. proprietary and confidential
Next Steps
• Full CI/CD implementation and automated code promotion
⁃ Dashboards
⁃ Kapacitor scripts
• Improve automated testing
• Continue to transition away from our other existing monitoring solutions
• Add additional infrastructure monitoring
In the end, we have been very happy overall with InfluxDB