Over time, our AI predictions degrade. Full Stop.
Whether it's concept drift where the relationships of our data to what we're trying to predict as changed or data drift where our production data no longer resembles the historical training data, identifying meaningful ML drift versus spurious or acceptable drift is tedious. Not to mention the difficulty of uncovering which ML features are the source of poorer accuracy.
This session looked at the key types of machine learning drift and how to catch them before they become a problem.
2. The fundamental
assumption in any machine
learning model is that the
data and logic used actually
mimics the real-world.
3. The fundamental
assumption in any machine
learning model is that the
data and logic used actually
mimics the real-world.
4. Machine Learning Model Drift
The fundamental
assumption in any machine
learning model is that the
data and logic used actually
mimics the real-world. Over time, our ML models
make worse predictions.
Also called “decay”.
6. Building Trust into AI
BEST
PAPER
KEY PARTNERSHIPS
TECHNOLOGY
PIONEER 2020
CB INSIGHTS MOST PROMISING
AI COMPANIES 2021
ENTERPRISE AI GOVERNANCE AND
ETHICAL RESPONSE 2019
7. We had a model drift over the weekend that
cost $500,000.”
— Chief Data Scientist
When something goes wrong, it takes our data
scientist 2 weeks to troubleshoot the problem.”
— Data Science Director
8. Don’t Get Too Caught Up In Terminology
ML Drift, Model Drift, Model Decay, Prediction
Drift = Your predictions are getting worse
Experience, types, causes, and indicators of
drift are sometimes used together, overlap,
and don’t have direct mappings to each
other.
Multiple types of drift can happen at the
same time.
9. How We Experience ML Drift
Not really drift
but can appear
to be
Image: KD nuggets The ravages of concept drift
10. Key Types of Drift
Concept Drift
Training data with
decision boundary
● Reality/behavioral
change
● Relationships
change, not the
input
P(Y|X)
Probability of
y output
given x input
P(Y|X) Changes
Image: Don’t let your model’s quality drift away by Michał Oleszak
11. Key Types of Drift
Concept Drift
Training data with
decision boundary
● Reality/behavioral
change
● Relationships
change, not the
input
P(Y|X)
Probability of
y output
given x input
Data Drift*
P(Y|X) Changes
● Data changes
● Fundamental
relationships do
not change
Image: Don’t let your model’s quality drift away by Michał Oleszak
12. Key Types of Drift
Concept Drift
Training data with
decision boundary
● Reality/behavioral
change
● Relationships
change, not the
input
P(Y|X)
Probability of
y output
given x input
Label Drift
● Output data shifts
● P(Y) Changes
Data Drift*
P(Y|X) Changes
Feature Drift
● Input data shifts
● P(X) Changes
● Data changes
● Fundamental
relationships do
not change
Image: Don’t let your model’s quality drift away by Michał Oleszak
13. Key Types of Drift
Virtual Drift
Data changes but
boundary still works
Concept Drift
Training data with
decision boundary
● Reality/behavioral
change
● Relationships
change, not the
input
P(Y|X)
Probability of
y output
given x input
Label Drift
● Output data shifts
● P(Y) Changes
Data Drift*
P(Y|X) Changes
Feature Drift
● Input data shifts
● P(X) Changes
Image: Don’t let your model’s quality drift away by Michał Oleszak
14. Drift Examples for a Loan Application Model
An income level that was
earlier considered
creditworthy is now
considered riskier.
Concept Drift
*Note that if label and data drift happen together and cancel each other out, there is no concept drift. Otherwise, concept drift will be caused by one of the two, since
they are linked by Bayes equation.
15. Drift Examples for a Loan Application Model
An income level that was
earlier considered
creditworthy is now
considered riskier.
Concept Drift
A larger proportion of
credit-worthy applications
start showing up.
Label Drift
*Note that if label and data drift happen together and cancel each other out, there is no concept drift. Otherwise, concept drift will be caused by one of the two, since
they are linked by Bayes equation.
16. Drift Examples for a Loan Application Model
An income level that was
earlier considered
creditworthy is now
considered riskier.
Concept Drift
A larger proportion of
credit-worthy applications
start showing up.
Label Drift
Incomes of most
applicants increase or
decrease. Or you suddenly
get more application from
one region.
Feature Drift
*Note that if label and data drift happen together and cancel each other out, there is no concept drift. Otherwise, concept drift will be caused by one of the two, since
they are linked by Bayes equation.
17. Triggers of ML Model Drift
● Label or feature distribution
changes e.g. product launch in a
new market
● Concept can change. e.g. a
competitor launching a new
service
May require a new model
Real Data Distribution Change
18. Triggers of ML Model Drift
● Label or feature distribution
changes e.g. product launch in a
new market
● Concept can change. e.g. a
competitor launching a new
service
May require a new model
● Correct data enters at source but
faulty data engineering. E.g.
debt-to-income values & age
values are swapped in the input.
● Incorrect data enters at source.
E.g., due to a front-end issue, a
website form accepts leaving a
field blank.
Real Data Distribution Change Data Integrity Issues
19. But Really. . . What’s Really Important?
NOT SURE IF DATA CHANGED
OR REALITY CHANGED
22. Performance Monitoring & Supervised Learning
Works well if you have
ground truth/labels!!
Monitor performance metrics
● Statistical measures
● Accuracy, precision,
FPR, AUC etc.
Supervised learning methods - Ref “A Survey of Concept Drift Adaptation”
● Sequential analysis (SPRT - CUMSUM & PH) - tune alarms on false positives
● Statistical process control (SPC) - rate of change
● Monitoring 2 distributions (ADWIN) - more precise, more overhead
23. Data Drift Monitoring & Unsupervised Learning
Monitor Statistical Distributions Metrics
● Population Stability Index (PSI)
compares current scoring variable to the
predicted probability from training data
● Kullback–Leibler (KL) divergence measures
the differences of one probability distribution
to a reference probability distribution
● Jensen-Shannon (JS) based on KL, measuring the similarity between two
probability distributions - but is notably symmetric and finite.
● Kolmogorov-Smirnov test (or KS test) quantifies a distance between the
distribution of the sample and the cumulative distribution of the reference
(non-parametric)
24. Data Drift Monitoring & Unsupervised Learning
Un / Semi-supervised Learning
An overview of unsupervised drift
detection methods
● Can be more accurate
● Online methods look at each
instance (batch methods are
more efficient)
● Most are global oriented
● May miss gradual drift /
sensitivity issues
● Not intuitive for explaining
25. Data Integrity & Outlier Monitoring
Data errors slowly degrade
performance and can look like real drift
● Missing values
● Range and Type mismatch
● Schema mismatch
Changes in the business (new cataloged
products, revoked pricing, etc.)
Broken data pipeline due to bugs or API
updates
26. Getting to the Root
Cause
Attribute to drift in features
Account for feature importance
Analyze affected traffic
Root-causing drift in Fiddler
27. Fix It!
Retrain
new data and/or
relabel old data
Model Mgmt
Archive/schedule model,
ensemble balancing
Adapt/Augment
model behavior, weighting,
biz logic, data collections
29. A Few Resources
Fiddler: ML Model Perf Monitoring platform
XAI Summit
Al Infrastructure Alliance —Nonprofit, independent information
An overview of unsupervised drift detection methods
A Survey of Concept Drift Adaptation
30. Let’s build trust into AI
amy.hodler@fiddler.ai
@amyhodler
www.fiddler.ai
31. Fiddler MPM Stack: Deep & Versatile
MONITOR
BUILT-IN EXPLAINABILITY
PLUGGABLE MODEL & DATA INGESTION
Ingest Any
Data Source
Plug into
Any Model
Framework
Connect Via
Fiddler API
APP Custom App
ON-PREM
OR CLOUD
Performance
Model Drift & Bias
Data Integrity & Outliers
ANALYZE
Local & Global Explanations
Bias Detection
Auto-slicing for Performance
CONTROL
Model Inventory
Change & Policy Control
Model Reports