Lessons Learned Applying Anomaly Detection at Scale, by Álvaro Clemente, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
14. BigML, Inc #DutchMLSchool 11
Identify examples that are different from the rest of the dataset
• Dataset cleaning: Remove instances that are not
representative of your dataset
• Exploring Data: Identifying outlier situations in your
dataset
• Classification: working with very unbalanced data
or uncertainty
15. BigML, Inc #DutchMLSchool 12
Anomaly Detectors for Classification
• Fraud Detection: Detecting money laundering
and fraud in bank transactions
• Intrusion Detection: Detecting unexpected
events in network traffic
• Quality Control: Detecting failures in
manufacturing processes
17. BigML, Inc #DutchMLSchool 14
Design
Domain
Training
Operation
• You don’t need Anomaly Detection
• Divide and Conquer
• Data Cleaning for free
• Automatic Experts
• Missing the forest for the trees
• Features, Features, Features
• Setup a Feedback Loop
• Customize Thresholds
• You can’t evaluate it!
• Adapt to new times
• Size matters
19. BigML, Inc #DutchMLSchool 16
Lesson 1: You don’t need Anomaly Detection
• You are interested in the unusual case
• You have very few examples of the
interesting class
• You can’t have fully labeled datasets
• The class shows unexpected
behaviors
When possible, other methods will give better control over the performance
21. BigML, Inc #DutchMLSchool 18
Lesson 2: Divide and Conquer
• Identify the different tasks in your problem domain
• Build a model trained with data from that specific
task
• Even with different features
• Each model will be easier to track and reason about
Prefer multiple simpler models over a single complex one
23. BigML, Inc #DutchMLSchool 20
Lesson 3: Data cleaning for free
• Find issues in your data pipelines
• Have a fast feedback loop for reporting issues
• Have an off-switch when those issues are detected
Anomaly Detectors will find data issues
25. BigML, Inc #DutchMLSchool 22
Lesson 4: Automatic Experts
Use expert rules to check predictions
• A combination of anomalies and rules
will yield the best results
• You can automate your rules with a
model
• Train a model to detect False Positives
• This requires even more data
27. BigML, Inc #DutchMLSchool 24
Lesson 5: Don’t miss the forest for the trees
Anomaly patterns contain very interesting information!
• Looking how the anomalies happen over time can reveal
very useful information
• Random failures → Random anomalies
• Significant events → Groups of anomalies
• Macro Alerts
29. BigML, Inc #DutchMLSchool 26
Lesson 6: Features, Features, Features
Use the features to tune model behavior
• Feature engineering and selection is one of the
2 main ways to tune the model behavior
• Keep the number of features to a minimum
• Keep it explainable
31. BigML, Inc #DutchMLSchool 28
Lesson 7: Setup a Feedback Loop
Setup a Feedback Loop for tuning model behavior
• Keep a database of predictions and outcomes
• Usually requires human inspection
• Monitor performance of the models for tuning
• With BigML*, you can update your models with
new data
* currently only available in private deployments
33. BigML, Inc #DutchMLSchool 30
Lesson 8: Customize Thresholds
Customize the thresholds to your requirements and data
• Anomaly can be a fuzzy and subjective concept
• Use multiple thresholds
• Low, medium and High
• Use dynamic thresholds
• Data driven
35. BigML, Inc #DutchMLSchool 32
Lesson 9: You can’t evaluate it!
Evaluating these models is complicated
• Evaluation will not be that simple
• Lack of information
• Macro events affect the individual performance
• Precision and Recall don’t translate so well to these kinds of problems
• Find some useful and realistic evaluation metrics
• Indirect metrics, business metrics (i.e: recall rates on cars)
• Manual exploration of random samples of data
37. BigML, Inc #DutchMLSchool 34
Lesson 10: Adapt to new times
Anomaly Detectors are very sensitive to changes in the working conditions
• Anomaly Detectors are very sensitive to
changes in the working conditions
• Model quality will deteriorate over time faster than
with other models
• Monitor the model performance
• Retrain often
• Find change indicators and disable proactively
39. BigML, Inc #DutchMLSchool 36
Lesson 11: Size matters
Storage and caching for Anomaly Detectors is key for fast predictions
• You will be deploying a lot of these models
• Anomaly Detectors can be heavy
• Efficient storage, transport and loading will be key
• Near real time scenarios and distributed prediction
• Use efficient representations of the models
• Minomaly
46. BigML, Inc #DutchMLSchool 43
BODY SHOP PAINT SHOP ASSEMBLY SHOP
Cost of
fi
xing a
welding failure: $
Cost of
fi
xing a
welding failure: $$$
From Data to Real Time Alerts
Using ML to improve Quality Control
47. BigML, Inc #DutchMLSchool 44
From Data to Real Time Alerts
Using ML to improve Quality Control
Detect as many failures as possible, while keeping a manageable number of car
extractions
48. BigML, Inc #DutchMLSchool 45
From Data to Real Time Alerts
Using ML to improve Quality Control
A case for Anomaly Detectors
• Large amounts of data processed in real time
• 300,000 welds / day
• Extremely few examples of failures
• 1 / 15,000 welds fails (0.0067%!)
• Failures can have unexpected shapes
• Hundreds of machines doing slightly different tasks
• Unreliable labels