SlideShare a Scribd company logo
1 of 18
Download to read offline
Anomaly Detection with Deep Learning
November 2017
1
Founded 2014
Funding $6.3M
OSS 3,700 Github Forks
300,000+ downloads/mo.
Team 35 employees; 25 engineers; 7 PhDs
SKYMIND OVERVIEW
OUR BOOK GIVEAWAY!
(pub. Aug. 2017)
SKIL スカイマインドスキル
Our Production Deep Learning Solution
ディープラーニングソリューション
confidential 4
• There are 5 main approaches to doing anomaly detection
• Probabilistic-based
• Distance-based
• Domain-based
• Reconstruction-based
• Information Theoretic-based
• All of these methods have some sort of drawback that prevents them from being
applicable to any type of data. They are either:
• Have built-in assumptions of the data (like Gaussian Mixture Models)
• Require specific domain knowledge
• Only detects certain patterns of anomalies
• Not suitable for data with high temporal dependencies
• Not suitable for multivariate data or computationally infeasible at our scale
• Multiple approaches are necessary to have a comprehensive detection pipeline
Anomaly Detection Approaches
アノマリー検出アプローチ
5
Anomaly Detection Example
アノマリー検出例
6
Cluster Based Methods1
クラスターベース方式1
Cluster based methods work
by creating a dictionary of non-
anomalous data and finding the
one that best matches the actual
data at inference time.
If the pattern was never seen
before the reconstruction will
be very different than the actual
data.
クラスタベース方式が機能する非アノマリー
データ辞書を作成し、推論時における実際
データに最も一致するものを見つける。
再構築前は従来絶対に見られなかったパ
ターンであれば、実際とは大きく異なるデータ
であるといえる。
confidential 7
Cluster Based Methods2
クラスターベース方式2
Data
データ
Best Reconstruction
最適な再生
Difference
違い
8
Join Raw data Transform
Feed groups into Autoencoder and save
reconstruction error of center
エンコーダへ送り込んで中心的再構築エラーを保
存
Input Data Reconstruction
Example workflow for anomaly detection
アノマリー検出のための作業例
9
Example of VAE detecting anomalies
VAE  (Variational Autoencoders 変分オートエンコーダー)検出アノマ
リー例
Low reconstruction error
低い再生エラー
High reconstruction error
低い再生エラー
confidential 10
Data Processing Step 3: Training
データプロセシング ステップ3:トレーニング
The autoencoder will be trained to recreate the input data as closely as possible
(one row at a time)
11
Data Processing Step 4: Ranking
データプロセシング ステップ4:ランキング
The trained autoencoder will then be run on the new data and we will store the error
(sum of mean squared difference per column per row) in a Ranking engine.
Unusual patterns will have a high reconstruction error
Output (Reconstruction)Input Data
Output (Reconstruction)
Reconstruction
-
Input
= 2 Error
Table
12
Problem: No Free lunch
課題:ノーフリーランチ定理
Wolpert’s no free lunch theorem states that there is no single machine learning algorithm that can perform
well on every task. Deep Learning is itself a set of very different techniques that are good or bad at various
problems.
The system will have to use various algorithms to detect different types of anomalies and possible different
root causes.
Class imbalance will be a problem at the beginning of the system’s lifetime. Labeled data will be overshadowed
by the unlabeled data and the Anomaly detectors will not be able to improve for some time. One possible
solution is to use Pseudo labeling to label all data from a trained classifier. These labels will be very noisy at
first so they might have to be deployed in stages.
confidential 13
• Systems need to be able to handle terabytes of data at minimum
• The system is required to train a large number of neural networks within a short
time-frame. GPU servers are a cost effective way to achieve the computation
resources necessary.
• Due to space considerations GPU servers are storage inefficient and the system
will employ the Hadoop File System and Spark on commodity servers to meet the
storage requirements.
• The system must scale to larger problems.
System Requirements
システム要求事項
confidential 14
Use kmeans and tsne highlighting clusters to label data points
• Uses the representation from the autoencoder to automatically group
data
• TSNE visualization allows highlighting and automatic labeling
• Use KNN and VPTrees to sample the hidden activations learned from the neural
net to interactively label
Automatic Labeling
confidential 15
• Autoencoders can be trained to identify causes of certain kinds of behavior
• “Spikes” in reconstruction error on time series can be used in detecting problems
in infrastructure as well as in network monitoring (dropped connections, unusually
high latency
• Use KNN and VPTrees to sample the hidden activations learned from the neural
net to interactively label
Root cause Analysis
confidential 16
• The system will not use a redundant environments since disaster recovery is not
a requirement.
• Inside an environment the system will have redundant hardware and can tolerate
the loss of one GPU node and one App node without service degradation.
• This system does not employ a remote backup strategy because all data is
ephemeral and can be recreated from the data on S3.
Design Considerations for Production
confidential 17
THANK YOU
ありがとうございました。
18

More Related Content

What's hot

Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 

What's hot (20)

Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvm
 
Skymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round TableSkymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round Table
 
First steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with ExamplesFirst steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with Examples
 
Dl4j in the wild
Dl4j in the wildDl4j in the wild
Dl4j in the wild
 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for Python
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
SKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetupSKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetup
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel Kobran
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
 
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
 
Azure cognitive service
Azure cognitive serviceAzure cognitive service
Azure cognitive service
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlow
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 

Similar to Anomaly Detection and Automatic Labeling with Deep Learning

Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLA
Numenta
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
CSIRO
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 

Similar to Anomaly Detection and Automatic Labeling with Deep Learning (20)

Inerview Quesion on Data Mining and Machine Learning
Inerview Quesion on Data Mining and Machine LearningInerview Quesion on Data Mining and Machine Learning
Inerview Quesion on Data Mining and Machine Learning
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLA
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
 
Machine learning and Autonomous System
Machine learning and Autonomous SystemMachine learning and Autonomous System
Machine learning and Autonomous System
 
Outsourced database
Outsourced databaseOutsourced database
Outsourced database
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Foutse_Khomh.pptx
Foutse_Khomh.pptxFoutse_Khomh.pptx
Foutse_Khomh.pptx
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application Development
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
Webinar: Replication and Replica Sets
Webinar: Replication and Replica SetsWebinar: Replication and Replica Sets
Webinar: Replication and Replica Sets
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 

More from Adam Gibson

More from Adam Gibson (18)

End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j features
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
 
Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singapore
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
 
Skymind - Udacity China presentation
Skymind - Udacity China presentationSkymind - Udacity China presentation
Skymind - Udacity China presentation
 
Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learning
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensors
 
Nd4 j slides.pptx
Nd4 j slides.pptxNd4 j slides.pptx
Nd4 j slides.pptx
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextML
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetup
 

Recently uploaded

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 

Anomaly Detection and Automatic Labeling with Deep Learning

  • 1. Anomaly Detection with Deep Learning November 2017 1
  • 2. Founded 2014 Funding $6.3M OSS 3,700 Github Forks 300,000+ downloads/mo. Team 35 employees; 25 engineers; 7 PhDs SKYMIND OVERVIEW
  • 4. SKIL スカイマインドスキル Our Production Deep Learning Solution ディープラーニングソリューション confidential 4
  • 5. • There are 5 main approaches to doing anomaly detection • Probabilistic-based • Distance-based • Domain-based • Reconstruction-based • Information Theoretic-based • All of these methods have some sort of drawback that prevents them from being applicable to any type of data. They are either: • Have built-in assumptions of the data (like Gaussian Mixture Models) • Require specific domain knowledge • Only detects certain patterns of anomalies • Not suitable for data with high temporal dependencies • Not suitable for multivariate data or computationally infeasible at our scale • Multiple approaches are necessary to have a comprehensive detection pipeline Anomaly Detection Approaches アノマリー検出アプローチ 5
  • 7. Cluster Based Methods1 クラスターベース方式1 Cluster based methods work by creating a dictionary of non- anomalous data and finding the one that best matches the actual data at inference time. If the pattern was never seen before the reconstruction will be very different than the actual data. クラスタベース方式が機能する非アノマリー データ辞書を作成し、推論時における実際 データに最も一致するものを見つける。 再構築前は従来絶対に見られなかったパ ターンであれば、実際とは大きく異なるデータ であるといえる。 confidential 7
  • 8. Cluster Based Methods2 クラスターベース方式2 Data データ Best Reconstruction 最適な再生 Difference 違い 8
  • 9. Join Raw data Transform Feed groups into Autoencoder and save reconstruction error of center エンコーダへ送り込んで中心的再構築エラーを保 存 Input Data Reconstruction Example workflow for anomaly detection アノマリー検出のための作業例 9
  • 10. Example of VAE detecting anomalies VAE  (Variational Autoencoders 変分オートエンコーダー)検出アノマ リー例 Low reconstruction error 低い再生エラー High reconstruction error 低い再生エラー confidential 10
  • 11. Data Processing Step 3: Training データプロセシング ステップ3:トレーニング The autoencoder will be trained to recreate the input data as closely as possible (one row at a time) 11
  • 12. Data Processing Step 4: Ranking データプロセシング ステップ4:ランキング The trained autoencoder will then be run on the new data and we will store the error (sum of mean squared difference per column per row) in a Ranking engine. Unusual patterns will have a high reconstruction error Output (Reconstruction)Input Data Output (Reconstruction) Reconstruction - Input = 2 Error Table 12
  • 13. Problem: No Free lunch 課題:ノーフリーランチ定理 Wolpert’s no free lunch theorem states that there is no single machine learning algorithm that can perform well on every task. Deep Learning is itself a set of very different techniques that are good or bad at various problems. The system will have to use various algorithms to detect different types of anomalies and possible different root causes. Class imbalance will be a problem at the beginning of the system’s lifetime. Labeled data will be overshadowed by the unlabeled data and the Anomaly detectors will not be able to improve for some time. One possible solution is to use Pseudo labeling to label all data from a trained classifier. These labels will be very noisy at first so they might have to be deployed in stages. confidential 13
  • 14. • Systems need to be able to handle terabytes of data at minimum • The system is required to train a large number of neural networks within a short time-frame. GPU servers are a cost effective way to achieve the computation resources necessary. • Due to space considerations GPU servers are storage inefficient and the system will employ the Hadoop File System and Spark on commodity servers to meet the storage requirements. • The system must scale to larger problems. System Requirements システム要求事項 confidential 14
  • 15. Use kmeans and tsne highlighting clusters to label data points • Uses the representation from the autoencoder to automatically group data • TSNE visualization allows highlighting and automatic labeling • Use KNN and VPTrees to sample the hidden activations learned from the neural net to interactively label Automatic Labeling confidential 15
  • 16. • Autoencoders can be trained to identify causes of certain kinds of behavior • “Spikes” in reconstruction error on time series can be used in detecting problems in infrastructure as well as in network monitoring (dropped connections, unusually high latency • Use KNN and VPTrees to sample the hidden activations learned from the neural net to interactively label Root cause Analysis confidential 16
  • 17. • The system will not use a redundant environments since disaster recovery is not a requirement. • Inside an environment the system will have redundant hardware and can tolerate the loss of one GPU node and one App node without service degradation. • This system does not employ a remote backup strategy because all data is ephemeral and can be recreated from the data on S3. Design Considerations for Production confidential 17