Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Anomaly Detection using ML in Elisa Viihde CDN

1.584 Aufrufe

Veröffentlicht am

Jere Nieminen
Service Architect – Elisa
Jere is experienced architect specialized in video streaming technologies. He is currently working on making video streaming as smooth as possible for Elisa Viihde customers.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Anomaly Detection using ML in Elisa Viihde CDN

  1. 1. 19.12.2018 1 Anomaly Detection using ML in Elisa Viihde CDN Jere Nieminen 13.12.2018 Elisa and Elisa Viihde • Elisa • Telecommunications, ICT and digital service company operating mainly in Finland and Estonia • Over 2.8 million customers who have over 6.2 million subscriptions • Elisa Viihde • Finland’s most popular entertainment service • Several original series and exclusive distribution rights for certain movies and series • Linear TV channels, Network PVR, Catchup, TVOD/SVOD/EST • More than 300 000 household subscribers 2
  2. 2. 19.12.2018 2 Elisa Viihde CDN 3 • Features focused on • Streaming Video • Cache/Network Optimization • Team with 6 members focused on • SW Development and integrations • Daily operations • QoS and QoE High Level Architecture Background - Elastic Stack 101 • Elasticsearch • JSON data store with Restfull API • Beats & Logstash • Ingest data to Elasticsearch • Kibana • Search and Visualize data in Elasticsearch • Machine Learning (X-Pack) • Anomaly Detection
  3. 3. 19.12.2018 3 Terminology • Anomaly • A deviation in the normal behaviour • Machine Learning • Make predictions or decisions without being explicitly programmed to perform the task • Unsupervised Anomaly Detection • Searching for instances that fit the least to remaining unlabeled data set where it is assumed that most of the data is normal • In our use-case, we let the machine learn from the data and detect anomalies, but do not allow the machine to carry out any ”smart” tasks related to it 5 N otifications History of Detecting Streaming Issues 2016 Early days 01 Logging Trials 02 2017 04 2018 Q1 Q2 Q3 Q4Q1 Q2 Q3 Q4 03 Elastic w ith Access Logs Stream ing Session 05 Anom aly D etection Trials 06 Anomaly Detection in Action
  4. 4. 19.12.2018 4 Visual Dashboard - Incorrect caching configuration 7 Increasing daily error rate Fix deployed Reaction time ML Detection Example - Broken Content 8 Fragmented MP4 asset 1920x1080@7Mbps 1280x720@4.5Mbps 1024x576@2Mbps 640x360@800kbps 480x270@300kbps Timeline Timecode drift ML Job Config
  5. 5. 19.12.2018 5 ML Detection Example - Network Issue 9 ML Job Config ML Detection Example – RR Performance 10 ML Job Config Production v1.0-52 Canary v1.0-53
  6. 6. 19.12.2018 6 Ask the Right Questions / Survivorship Bias 11 Image credit to Daniel G. Siegel https://www.dgsiegel.net/talks/the-bullet-hole-misconception Is the CDN performing well? Are the clients getting the best quality of experience? Based on the server side metrics can we answer following questions: ML Example - Anomalies from Client QoE data 12
  7. 7. 19.12.2018 7 ML Example – How to get fooled 13 Anomaly New normal Key Takeaways 14 • Focus on the Data • Logs • Usually made for humans to read • Log also the successful events • Do all the tricks like split, parse etc. before storing • Logging vs. Monitoring • Needless battle • Manual thresholds are still not outdated • Creating ML jobs is easy, but… • Understanding the events is sometimes really hard • Process to investigate all the anomalies • Enhance the data set