SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Monitoring Models in Production
Keeping track of complex models in a complex world
Jannes Klaas
About me
International
Business @ RSM
Financial Economics
@ Oxford Saïd
Course developer
machine learning @
Turing Society
Author “Machine
Learning for
Finance” out in July
ML consultant non-
profits / impact
investors
Prev. Urban Planning
@ IHS Rotterdam &
Destroyer of my
Startup
The life and
times of an
ML
practitioner
“We send you the data,
you send us back a model,
then we take it from there”
– Consulting Clients
“Define an approach,
evaluate on common
benchmark and publish” –
Academia
Repeat after
me
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
Machine
learning 101
Estimate some function y = f(x)
using (x,y) pairs
Estimated function hopefully
represents the true relationship
between x and y
Model is function of data
Problems you encounter in
production
• The world changes, your training data might
no longer depict the real world
• Your model inputs might change
• There might be unintended bugs and side
effects in complex models
• Models influence the world the try to model
• Model decay: Your model usually becomes
worse over time
Are models
a liability
after
shipping?
No, the real world is the perfect
training environment
Datasets are only an
approximation of the real world
Active learning on real world
examples can greatly reduce
your data needs
Online learning
• Update model continuously as new data streams
in
• Good if you have continuous stream of ground
truth as well
• Needs more monitoring to ensure model does
not go off track
• Can be expensive for big models
• Might need separate training / inference
hardware
Active
learning
Make predictions
Request labels for low confidence
examples
Train on those ‘special cases’
Production is an opportunity for
learning
Monitoring is part of training
Model monitoring vs Ops monitoring
• Model monitoring models model behavior
• Inherently stochastic
• Can be driven by user behavior
• Almost certainly looking for unknown unknowns
• Few established guidelines on what to monitor
Monitoring inputs
•E.g. images arriving at model very small, very dark, high contrast, etc.
More similar to ops monitoring as there can be obvious failures
•Means
•Standard deviations
•Correlations
•KL Divergence between Training & Live data
Monitor classic stats, compare to training data
Output
monitoring
Harder, people might just upload more
plane images one day
Monitoring prediction distribution
surprisingly helpful
Monitor confidence (highest
probability – lowest probability)
Compare against other model
predictions
Compare against ground truth
Ground truth
• In absence of a ground truth signal, ground truth
needs to be established manually
• Can be done by data scientists themselves with good
UI design
• Yields extra insights ‘Our model does worse when
Instagram filters are applied’ / ‘Many users take
sideways pictures’
• Prioritize low confidence predictions for active
learning
• Sample randomly for monitoring
Implementation Example: Prodigy
Alerting / Monitoring is a
UI/UX problem
• The terms might be very hard to explain or
interpret
• Who here would comfortably write down
the formula for KL Divergence and
explain what it means?
• Key metrics are different depending on use
case
• Non – Datascientists might have to make
decisions based on alerts
Alerting Example
0
5
10
15
20
25
Husky Chihaua Mastif Pug Labrador Poodle Retriever Terrier
Training versus live distribution of dog breeds
Train Live
Alerting Example
• Detected !"#(%&'()| +(,- = 1.56394694
which is out of bounds
• Detected model output distribution
significantly different from training data
• Detected an unexpected amount of
pictures classified as Pugs
Model accountability
• Who made the decision
• Model versioning, all versions need to be retained
• On which grounds was the decision made
• All input data needs to be retained and must be linked to transaction ID
• Why was the decision made
• Use tools like LIME to interpret models
• Still a long way to interpretable deep models, but we are getting there
nth order effects
Societal
impact
Business
Metrics
(Revenue)
User
behavior
(e.g. CTR)
Model
metrics
(Accuracy)
Easy to monitor
Hard to monitor
Small impact
Large impact
Large impact effects…
• … are hard to monitor
• … are not what data scientists are trained for
• … only show with large scale deployment
• … are time delayed
• … are influenced by exogenous factors, too
Monitoring
high order
effects
Users are desperate to improve
your model, let them!
User input is a meta metric
showing how well your model
selection does
Implementation
example
Hosting monitoring sys as
separate microservice
Using flask to serve model
Flask service calls monitor
Alt. client can call monitor
A simple monitoring system with Flask
User Keras + Flask SciKit + Flask Data Scientist
Image
Classification
Image +
Classification Alerts
Transaction DB
Store
transaction
Provide
benchmark
data
Bare Bones Flask Serving
image = flask.request.files["image"].read()
image = prepare_image(image, target=(224, 224))
preds = model.predict(image)
results = decode_predictions(preds)
data["predictions"] = []
for (label, prob) in results[0]:
r = {"label": label, "probability": float(prob)}
data["predictions"].append(r)
data["success"] = True
return flask.jsonify(data)
Statistical monitoring with SciKit
ent = scipy.stats.entropy(pk,qk,base=2)
if ent > threshold:
abs_diff = np.abs(pk-qk)
worst_offender = lookup[np.argmax(abs_diff)]
max_deviation = np.max(abs_diff)
alert(model_id,ent,
worst_offender,max_deviation)
Data science teams should own the whole process
Define
approach
Feature
Engineering
Train
model
Deploy
Monitor
Unsolved challenges
• Model versioning
• Dataset versioning
• Continuous Integration for data scientists
• Communication and understanding of model
metrics in the Org
• Managing higher order effects
Recommended reading
• Sculley et al. (2015) Hidden Technical Debt in Machine Learning
Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-
in-machine-learning-systems.pdf
• Breck et al. (2016) What’s your ML Test Score? A rubric for ML
production systems https://ai.google/research/pubs/pub45742
• How Zendesk Serves TensorFlow Models in Production
https://medium.com/zendesk-engineering/how-zendesk-serves-
tensorflow-models-in-production-751ee22f0f4b
• Machine Learning for Finance ;) https://www.packtpub.com/big-
data-and-business-intelligence/machine-learning-finance

Weitere ähnliche Inhalte

Was ist angesagt?

MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleDatabricks
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
MLops workshop AWS
MLops workshop AWSMLops workshop AWS
MLops workshop AWSGili Nachum
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro sessionAvinash Patil
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOpsMarco Parenzan
 
Apply MLOps at Scale
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at ScaleDatabricks
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMatei Zaharia
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
 
Machine learning life cycle
Machine learning life cycleMachine learning life cycle
Machine learning life cycleRamjee Ganti
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and ToolsJorge Davila-Chacon
 
ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsAmy Hodler
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 

Was ist angesagt? (20)

MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
MLops workshop AWS
MLops workshop AWSMLops workshop AWS
MLops workshop AWS
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
 
Apply MLOps at Scale
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at Scale
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine Learning
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
 
Machine learning life cycle
Machine learning life cycleMachine learning life cycle
Machine learning life cycle
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and Tools
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problems
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 

Ähnlich wie Monitoring Models in Production

Enterprise Machine Learning Governance
Enterprise Machine Learning Governance Enterprise Machine Learning Governance
Enterprise Machine Learning Governance Terence Siganakis
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Unit 1 introduction to simulation
Unit 1 introduction to simulationUnit 1 introduction to simulation
Unit 1 introduction to simulationDevaKumari Vijay
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Lviv Startup Club
 
Advancing Testing Using Axioms
Advancing Testing Using AxiomsAdvancing Testing Using Axioms
Advancing Testing Using AxiomsSQALab
 
Module_1_Slide_01.pdf
Module_1_Slide_01.pdfModule_1_Slide_01.pdf
Module_1_Slide_01.pdfFazleeKan
 
Machine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkMachine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkSplunk
 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big dataPoo Kuan Hoong
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Introduction to Modelling and Simulation.pptx
Introduction to Modelling and Simulation.pptxIntroduction to Modelling and Simulation.pptx
Introduction to Modelling and Simulation.pptxPortiaMupfumiraTenda
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for TestingSQALab
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
How Will Your ML Project Fail
How Will Your ML Project FailHow Will Your ML Project Fail
How Will Your ML Project FailElena Samuylova
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managersNitin T Bhat
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportPeter Skomoroch
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model RiskQuantUniversity
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 

Ähnlich wie Monitoring Models in Production (20)

Enterprise Machine Learning Governance
Enterprise Machine Learning Governance Enterprise Machine Learning Governance
Enterprise Machine Learning Governance
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Unit 1 introduction to simulation
Unit 1 introduction to simulationUnit 1 introduction to simulation
Unit 1 introduction to simulation
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
 
Advancing Testing Using Axioms
Advancing Testing Using AxiomsAdvancing Testing Using Axioms
Advancing Testing Using Axioms
 
Module_1_Slide_01.pdf
Module_1_Slide_01.pdfModule_1_Slide_01.pdf
Module_1_Slide_01.pdf
 
Machine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkMachine Learning and Analytics in Splunk
Machine Learning and Analytics in Splunk
 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big data
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Introduction to Modelling and Simulation.pptx
Introduction to Modelling and Simulation.pptxIntroduction to Modelling and Simulation.pptx
Introduction to Modelling and Simulation.pptx
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
How Will Your ML Project Fail
How Will Your ML Project FailHow Will Your ML Project Fail
How Will Your ML Project Fail
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model Risk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 

Kürzlich hochgeladen

Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Kürzlich hochgeladen (20)

Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Monitoring Models in Production

  • 1. Monitoring Models in Production Keeping track of complex models in a complex world Jannes Klaas
  • 2. About me International Business @ RSM Financial Economics @ Oxford Saïd Course developer machine learning @ Turing Society Author “Machine Learning for Finance” out in July ML consultant non- profits / impact investors Prev. Urban Planning @ IHS Rotterdam & Destroyer of my Startup
  • 3. The life and times of an ML practitioner “We send you the data, you send us back a model, then we take it from there” – Consulting Clients “Define an approach, evaluate on common benchmark and publish” – Academia
  • 4. Repeat after me It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship
  • 5. Machine learning 101 Estimate some function y = f(x) using (x,y) pairs Estimated function hopefully represents the true relationship between x and y Model is function of data
  • 6.
  • 7. Problems you encounter in production • The world changes, your training data might no longer depict the real world • Your model inputs might change • There might be unintended bugs and side effects in complex models • Models influence the world the try to model • Model decay: Your model usually becomes worse over time
  • 8. Are models a liability after shipping? No, the real world is the perfect training environment Datasets are only an approximation of the real world Active learning on real world examples can greatly reduce your data needs
  • 9. Online learning • Update model continuously as new data streams in • Good if you have continuous stream of ground truth as well • Needs more monitoring to ensure model does not go off track • Can be expensive for big models • Might need separate training / inference hardware
  • 10. Active learning Make predictions Request labels for low confidence examples Train on those ‘special cases’ Production is an opportunity for learning Monitoring is part of training
  • 11. Model monitoring vs Ops monitoring • Model monitoring models model behavior • Inherently stochastic • Can be driven by user behavior • Almost certainly looking for unknown unknowns • Few established guidelines on what to monitor
  • 12. Monitoring inputs •E.g. images arriving at model very small, very dark, high contrast, etc. More similar to ops monitoring as there can be obvious failures •Means •Standard deviations •Correlations •KL Divergence between Training & Live data Monitor classic stats, compare to training data
  • 13. Output monitoring Harder, people might just upload more plane images one day Monitoring prediction distribution surprisingly helpful Monitor confidence (highest probability – lowest probability) Compare against other model predictions Compare against ground truth
  • 14. Ground truth • In absence of a ground truth signal, ground truth needs to be established manually • Can be done by data scientists themselves with good UI design • Yields extra insights ‘Our model does worse when Instagram filters are applied’ / ‘Many users take sideways pictures’ • Prioritize low confidence predictions for active learning • Sample randomly for monitoring
  • 16. Alerting / Monitoring is a UI/UX problem • The terms might be very hard to explain or interpret • Who here would comfortably write down the formula for KL Divergence and explain what it means? • Key metrics are different depending on use case • Non – Datascientists might have to make decisions based on alerts
  • 17. Alerting Example 0 5 10 15 20 25 Husky Chihaua Mastif Pug Labrador Poodle Retriever Terrier Training versus live distribution of dog breeds Train Live
  • 18. Alerting Example • Detected !"#(%&'()| +(,- = 1.56394694 which is out of bounds • Detected model output distribution significantly different from training data • Detected an unexpected amount of pictures classified as Pugs
  • 19. Model accountability • Who made the decision • Model versioning, all versions need to be retained • On which grounds was the decision made • All input data needs to be retained and must be linked to transaction ID • Why was the decision made • Use tools like LIME to interpret models • Still a long way to interpretable deep models, but we are getting there
  • 20. nth order effects Societal impact Business Metrics (Revenue) User behavior (e.g. CTR) Model metrics (Accuracy) Easy to monitor Hard to monitor Small impact Large impact
  • 21. Large impact effects… • … are hard to monitor • … are not what data scientists are trained for • … only show with large scale deployment • … are time delayed • … are influenced by exogenous factors, too
  • 22. Monitoring high order effects Users are desperate to improve your model, let them! User input is a meta metric showing how well your model selection does
  • 23. Implementation example Hosting monitoring sys as separate microservice Using flask to serve model Flask service calls monitor Alt. client can call monitor
  • 24. A simple monitoring system with Flask User Keras + Flask SciKit + Flask Data Scientist Image Classification Image + Classification Alerts Transaction DB Store transaction Provide benchmark data
  • 25. Bare Bones Flask Serving image = flask.request.files["image"].read() image = prepare_image(image, target=(224, 224)) preds = model.predict(image) results = decode_predictions(preds) data["predictions"] = [] for (label, prob) in results[0]: r = {"label": label, "probability": float(prob)} data["predictions"].append(r) data["success"] = True return flask.jsonify(data)
  • 26. Statistical monitoring with SciKit ent = scipy.stats.entropy(pk,qk,base=2) if ent > threshold: abs_diff = np.abs(pk-qk) worst_offender = lookup[np.argmax(abs_diff)] max_deviation = np.max(abs_diff) alert(model_id,ent, worst_offender,max_deviation)
  • 27. Data science teams should own the whole process Define approach Feature Engineering Train model Deploy Monitor
  • 28. Unsolved challenges • Model versioning • Dataset versioning • Continuous Integration for data scientists • Communication and understanding of model metrics in the Org • Managing higher order effects
  • 29. Recommended reading • Sculley et al. (2015) Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt- in-machine-learning-systems.pdf • Breck et al. (2016) What’s your ML Test Score? A rubric for ML production systems https://ai.google/research/pubs/pub45742 • How Zendesk Serves TensorFlow Models in Production https://medium.com/zendesk-engineering/how-zendesk-serves- tensorflow-models-in-production-751ee22f0f4b • Machine Learning for Finance ;) https://www.packtpub.com/big- data-and-business-intelligence/machine-learning-finance