SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
July 4 - 6, 2022
2 n d E d i t i o n
BigML, Inc #DutchMLSchool
Anomaly Detection at Scale
Lessons Learned deploying thousands of Anomaly Detectors
Alvaro Clemente


Machine Learning Engineer, BigML
2
BigML, Inc #DutchMLSchool
Agenda
3
BigML, Inc #DutchMLSchool 4
Agenda
Anomaly Detection Primer
1
Lessons
2
Conclusion
3
BigML, Inc #DutchMLSchool
Anomaly Detection in a Nutshell
5
BigML, Inc #DutchMLSchool 6
Identify the anomalies
BigML, Inc #DutchMLSchool 6
Identify the anomalies
1
2
3
4
6
5 7
8
9
BigML, Inc #DutchMLSchool 7
Identify the anomalies
BigML, Inc #DutchMLSchool 7
Identify the anomalies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
BigML, Inc #DutchMLSchool 8
Identify the anomalies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
YES
BigML, Inc #DutchMLSchool 8
Identify the anomalies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
YES *
BigML, Inc #DutchMLSchool 9
BigML, Inc #DutchMLSchool 10
Identify examples that are different
from the rest of the dataset
BigML, Inc #DutchMLSchool 11
Identify examples that are different from the rest of the dataset
• Dataset cleaning: Remove instances that are not
representative of your dataset
• Exploring Data: Identifying outlier situations in your
dataset


• Classification: working with very unbalanced data
or uncertainty
BigML, Inc #DutchMLSchool 12
Anomaly Detectors for Classification
• Fraud Detection: Detecting money laundering
and fraud in bank transactions


• Intrusion Detection: Detecting unexpected
events in network traffic


• Quality Control: Detecting failures in
manufacturing processes
BigML, Inc #DutchMLSchool
Lessons
13
BigML, Inc #DutchMLSchool 14
Design
Domain
Training
Operation
• You don’t need Anomaly Detection


• Divide and Conquer
• Data Cleaning for free


• Automatic Experts


• Missing the forest for the trees
• Features, Features, Features


• Setup a Feedback Loop


• Customize Thresholds
• You can’t evaluate it!


• Adapt to new times


• Size matters
BigML, Inc #DutchMLSchool 15
Lesson 1: You don’t need


Anomaly Detection
Design
BigML, Inc #DutchMLSchool 16
Lesson 1: You don’t need Anomaly Detection
• You are interested in the unusual case


• You have very few examples of the
interesting class


• You can’t have fully labeled datasets


• The class shows unexpected
behaviors
When possible, other methods will give better control over the performance
BigML, Inc #DutchMLSchool 17
Lesson 2: Divide and Conquer
Design
BigML, Inc #DutchMLSchool 18
Lesson 2: Divide and Conquer
• Identify the different tasks in your problem domain


• Build a model trained with data from that specific
task


• Even with different features
• Each model will be easier to track and reason about
Prefer multiple simpler models over a single complex one
BigML, Inc #DutchMLSchool 19
Lesson 3: Data cleaning for free
Domain
BigML, Inc #DutchMLSchool 20
Lesson 3: Data cleaning for free
• Find issues in your data pipelines


• Have a fast feedback loop for reporting issues


• Have an off-switch when those issues are detected
Anomaly Detectors will find data issues
BigML, Inc #DutchMLSchool 21
Lesson 4: Automatic Experts
Domain
BigML, Inc #DutchMLSchool 22
Lesson 4: Automatic Experts
Use expert rules to check predictions
• A combination of anomalies and rules
will yield the best results


• You can automate your rules with a
model


• Train a model to detect False Positives


• This requires even more data
BigML, Inc #DutchMLSchool 23
Lesson 5: Don’t miss the forest


for the trees
Domain
BigML, Inc #DutchMLSchool 24
Lesson 5: Don’t miss the forest for the trees
Anomaly patterns contain very interesting information!
• Looking how the anomalies happen over time can reveal
very useful information


• Random failures → Random anomalies


• Significant events → Groups of anomalies


• Macro Alerts
BigML, Inc #DutchMLSchool 25
Lesson 6: Features, Features,
Features
Training
BigML, Inc #DutchMLSchool 26
Lesson 6: Features, Features, Features
Use the features to tune model behavior
• Feature engineering and selection is one of the
2 main ways to tune the model behavior


• Keep the number of features to a minimum


• Keep it explainable
BigML, Inc #DutchMLSchool 27
Lesson 7: Setup a Feedback Loop
Training
BigML, Inc #DutchMLSchool 28
Lesson 7: Setup a Feedback Loop
Setup a Feedback Loop for tuning model behavior
• Keep a database of predictions and outcomes


• Usually requires human inspection


• Monitor performance of the models for tuning


• With BigML*, you can update your models with
new data
* currently only available in private deployments
BigML, Inc #DutchMLSchool 29
Lesson 8: Customize Thresholds
Training
BigML, Inc #DutchMLSchool 30
Lesson 8: Customize Thresholds
Customize the thresholds to your requirements and data
• Anomaly can be a fuzzy and subjective concept


• Use multiple thresholds
• Low, medium and High
• Use dynamic thresholds
• Data driven
BigML, Inc #DutchMLSchool 31
Lesson 9: You can’t evaluate it
Operation
BigML, Inc #DutchMLSchool 32
Lesson 9: You can’t evaluate it!
Evaluating these models is complicated
• Evaluation will not be that simple


• Lack of information


• Macro events affect the individual performance


• Precision and Recall don’t translate so well to these kinds of problems


• Find some useful and realistic evaluation metrics


• Indirect metrics, business metrics (i.e: recall rates on cars)


• Manual exploration of random samples of data
BigML, Inc #DutchMLSchool 33
Lesson 10: Adapt to new times
Operation
BigML, Inc #DutchMLSchool 34
Lesson 10: Adapt to new times
Anomaly Detectors are very sensitive to changes in the working conditions
• Anomaly Detectors are very sensitive to
changes in the working conditions


• Model quality will deteriorate over time faster than
with other models


• Monitor the model performance


• Retrain often


• Find change indicators and disable proactively
BigML, Inc #DutchMLSchool 35
Lesson 11: Size matters
Operation
BigML, Inc #DutchMLSchool 36
Lesson 11: Size matters
Storage and caching for Anomaly Detectors is key for fast predictions
• You will be deploying a lot of these models


• Anomaly Detectors can be heavy


• Efficient storage, transport and loading will be key
• Near real time scenarios and distributed prediction


• Use efficient representations of the models


• Minomaly
BigML, Inc #DutchMLSchool
Conclusion
37
BigML, Inc #DutchMLSchool 38
Solve problems that are impractical
with traditional methods
BigML, Inc #DutchMLSchool 39
Discover unexpected facts about
your systems
BigML, Inc #DutchMLSchool
Q & A
40
BigML, Inc #DutchMLSchool
MLToolbox Context
42
BigML, Inc #DutchMLSchool 43
BODY SHOP PAINT SHOP ASSEMBLY SHOP
Cost of
fi
xing a
welding failure: $
Cost of
fi
xing a
welding failure: $$$
From Data to Real Time Alerts
Using ML to improve Quality Control
BigML, Inc #DutchMLSchool 44
From Data to Real Time Alerts
Using ML to improve Quality Control
Detect as many failures as possible, while keeping a manageable number of car
extractions
BigML, Inc #DutchMLSchool 45
From Data to Real Time Alerts
Using ML to improve Quality Control
A case for Anomaly Detectors
• Large amounts of data processed in real time


• 300,000 welds / day


• Extremely few examples of failures


• 1 / 15,000 welds fails (0.0067%!)


• Failures can have unexpected shapes


• Hundreds of machines doing slightly different tasks


• Unreliable labels

Weitere ähnliche Inhalte

Ähnlich wie DutchMLSchool 2022 - Anomaly Detection at Scale

LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulation
Karishma Chaudhary
 

Ähnlich wie DutchMLSchool 2022 - Anomaly Detection at Scale (20)

BigML Release: OptiML
BigML Release: OptiMLBigML Release: OptiML
BigML Release: OptiML
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business Perspective
 
BigMLSchool: Customer Segmentation
BigMLSchool: Customer SegmentationBigMLSchool: Customer Segmentation
BigMLSchool: Customer Segmentation
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
DutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised LearningDutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised Learning
 
MLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsMLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, Deepnets
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
No, you don't need to learn python
No, you don't need to learn pythonNo, you don't need to learn python
No, you don't need to learn python
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulation
 
DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and Ensembles
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
Lessons learned
Lessons learnedLessons learned
Lessons learned
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
 
BAMarathon_DanielFylstra_Feb25.pptx
BAMarathon_DanielFylstra_Feb25.pptxBAMarathon_DanielFylstra_Feb25.pptx
BAMarathon_DanielFylstra_Feb25.pptx
 
VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and Deepnets
 

Mehr von BigML, Inc

Mehr von BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryIntelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility Industry
 
Intelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
Intelligent Mobility: Embedded Machine Learning, Damage Detection in RailIntelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
Intelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
 
Intelligent Mobility: Business Value of IoT and ML in Logistics
Intelligent Mobility: Business Value of IoT and ML in LogisticsIntelligent Mobility: Business Value of IoT and ML in Logistics
Intelligent Mobility: Business Value of IoT and ML in Logistics
 

Kürzlich hochgeladen

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

DutchMLSchool 2022 - Anomaly Detection at Scale

  • 1. July 4 - 6, 2022 2 n d E d i t i o n
  • 2. BigML, Inc #DutchMLSchool Anomaly Detection at Scale Lessons Learned deploying thousands of Anomaly Detectors Alvaro Clemente Machine Learning Engineer, BigML 2
  • 4. BigML, Inc #DutchMLSchool 4 Agenda Anomaly Detection Primer 1 Lessons 2 Conclusion 3
  • 5. BigML, Inc #DutchMLSchool Anomaly Detection in a Nutshell 5
  • 6. BigML, Inc #DutchMLSchool 6 Identify the anomalies
  • 7. BigML, Inc #DutchMLSchool 6 Identify the anomalies 1 2 3 4 6 5 7 8 9
  • 8. BigML, Inc #DutchMLSchool 7 Identify the anomalies
  • 9. BigML, Inc #DutchMLSchool 7 Identify the anomalies 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  • 10. BigML, Inc #DutchMLSchool 8 Identify the anomalies 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 YES
  • 11. BigML, Inc #DutchMLSchool 8 Identify the anomalies 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 YES *
  • 13. BigML, Inc #DutchMLSchool 10 Identify examples that are different from the rest of the dataset
  • 14. BigML, Inc #DutchMLSchool 11 Identify examples that are different from the rest of the dataset • Dataset cleaning: Remove instances that are not representative of your dataset • Exploring Data: Identifying outlier situations in your dataset • Classification: working with very unbalanced data or uncertainty
  • 15. BigML, Inc #DutchMLSchool 12 Anomaly Detectors for Classification • Fraud Detection: Detecting money laundering and fraud in bank transactions • Intrusion Detection: Detecting unexpected events in network traffic • Quality Control: Detecting failures in manufacturing processes
  • 17. BigML, Inc #DutchMLSchool 14 Design Domain Training Operation • You don’t need Anomaly Detection • Divide and Conquer • Data Cleaning for free • Automatic Experts • Missing the forest for the trees • Features, Features, Features • Setup a Feedback Loop • Customize Thresholds • You can’t evaluate it! • Adapt to new times • Size matters
  • 18. BigML, Inc #DutchMLSchool 15 Lesson 1: You don’t need 
 Anomaly Detection Design
  • 19. BigML, Inc #DutchMLSchool 16 Lesson 1: You don’t need Anomaly Detection • You are interested in the unusual case • You have very few examples of the interesting class • You can’t have fully labeled datasets • The class shows unexpected behaviors When possible, other methods will give better control over the performance
  • 20. BigML, Inc #DutchMLSchool 17 Lesson 2: Divide and Conquer Design
  • 21. BigML, Inc #DutchMLSchool 18 Lesson 2: Divide and Conquer • Identify the different tasks in your problem domain • Build a model trained with data from that specific task • Even with different features • Each model will be easier to track and reason about Prefer multiple simpler models over a single complex one
  • 22. BigML, Inc #DutchMLSchool 19 Lesson 3: Data cleaning for free Domain
  • 23. BigML, Inc #DutchMLSchool 20 Lesson 3: Data cleaning for free • Find issues in your data pipelines • Have a fast feedback loop for reporting issues • Have an off-switch when those issues are detected Anomaly Detectors will find data issues
  • 24. BigML, Inc #DutchMLSchool 21 Lesson 4: Automatic Experts Domain
  • 25. BigML, Inc #DutchMLSchool 22 Lesson 4: Automatic Experts Use expert rules to check predictions • A combination of anomalies and rules will yield the best results • You can automate your rules with a model • Train a model to detect False Positives • This requires even more data
  • 26. BigML, Inc #DutchMLSchool 23 Lesson 5: Don’t miss the forest 
 for the trees Domain
  • 27. BigML, Inc #DutchMLSchool 24 Lesson 5: Don’t miss the forest for the trees Anomaly patterns contain very interesting information! • Looking how the anomalies happen over time can reveal very useful information • Random failures → Random anomalies • Significant events → Groups of anomalies • Macro Alerts
  • 28. BigML, Inc #DutchMLSchool 25 Lesson 6: Features, Features, Features Training
  • 29. BigML, Inc #DutchMLSchool 26 Lesson 6: Features, Features, Features Use the features to tune model behavior • Feature engineering and selection is one of the 2 main ways to tune the model behavior • Keep the number of features to a minimum • Keep it explainable
  • 30. BigML, Inc #DutchMLSchool 27 Lesson 7: Setup a Feedback Loop Training
  • 31. BigML, Inc #DutchMLSchool 28 Lesson 7: Setup a Feedback Loop Setup a Feedback Loop for tuning model behavior • Keep a database of predictions and outcomes • Usually requires human inspection • Monitor performance of the models for tuning • With BigML*, you can update your models with new data * currently only available in private deployments
  • 32. BigML, Inc #DutchMLSchool 29 Lesson 8: Customize Thresholds Training
  • 33. BigML, Inc #DutchMLSchool 30 Lesson 8: Customize Thresholds Customize the thresholds to your requirements and data • Anomaly can be a fuzzy and subjective concept • Use multiple thresholds • Low, medium and High • Use dynamic thresholds • Data driven
  • 34. BigML, Inc #DutchMLSchool 31 Lesson 9: You can’t evaluate it Operation
  • 35. BigML, Inc #DutchMLSchool 32 Lesson 9: You can’t evaluate it! Evaluating these models is complicated • Evaluation will not be that simple • Lack of information • Macro events affect the individual performance • Precision and Recall don’t translate so well to these kinds of problems • Find some useful and realistic evaluation metrics • Indirect metrics, business metrics (i.e: recall rates on cars) • Manual exploration of random samples of data
  • 36. BigML, Inc #DutchMLSchool 33 Lesson 10: Adapt to new times Operation
  • 37. BigML, Inc #DutchMLSchool 34 Lesson 10: Adapt to new times Anomaly Detectors are very sensitive to changes in the working conditions • Anomaly Detectors are very sensitive to changes in the working conditions • Model quality will deteriorate over time faster than with other models • Monitor the model performance • Retrain often • Find change indicators and disable proactively
  • 38. BigML, Inc #DutchMLSchool 35 Lesson 11: Size matters Operation
  • 39. BigML, Inc #DutchMLSchool 36 Lesson 11: Size matters Storage and caching for Anomaly Detectors is key for fast predictions • You will be deploying a lot of these models • Anomaly Detectors can be heavy • Efficient storage, transport and loading will be key • Near real time scenarios and distributed prediction • Use efficient representations of the models • Minomaly
  • 41. BigML, Inc #DutchMLSchool 38 Solve problems that are impractical with traditional methods
  • 42. BigML, Inc #DutchMLSchool 39 Discover unexpected facts about your systems
  • 44.
  • 46. BigML, Inc #DutchMLSchool 43 BODY SHOP PAINT SHOP ASSEMBLY SHOP Cost of fi xing a welding failure: $ Cost of fi xing a welding failure: $$$ From Data to Real Time Alerts Using ML to improve Quality Control
  • 47. BigML, Inc #DutchMLSchool 44 From Data to Real Time Alerts Using ML to improve Quality Control Detect as many failures as possible, while keeping a manageable number of car extractions
  • 48. BigML, Inc #DutchMLSchool 45 From Data to Real Time Alerts Using ML to improve Quality Control A case for Anomaly Detectors • Large amounts of data processed in real time • 300,000 welds / day • Extremely few examples of failures • 1 / 15,000 welds fails (0.0067%!) • Failures can have unexpected shapes • Hundreds of machines doing slightly different tasks • Unreliable labels