Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
ย
Py data19 final
1. How good is your prediction?
Quantifying uncertainty in Machine Learning predictions
PyData London 2019 (12th- 14th July)
Maria Navarro
2. Outline
Motivating example
Introduction to conformal predictions
Conformal predictions in classification
Conformal predictions in regression
Application
Summary and conclusions
References
3. Motivating example
Introduction to conformal predictions
Conformal predictions in classification
Conformal predictions in regression
Application
Summary and conclusions
References
4. Motivating example
How good is your prediction?
Problem To find out whether a car is a total lo
To do it we have:
1. A set of historical observations ๐ฅ1; ๐ฆ1 , โฏ , ๐ฅ ๐; ๐ฆ ๐ , where:
โข ๐ฅ๐ describes the accident by age of the driver, model of the car, etc.
โข ๐ฆ๐ is a label which identifies whether the car is reparable or not
2. A machine learning algorithm (h ๐ฅ = ๐ฆ)
PROBLEM: To find out whether a car is a total loss or not
5. Motivating example
How good is your prediction, REALLY?
A new accident, ๐ฅ ๐+1, occurs. We run our model, and we obtain the following results:
1. The car is classified as total loss
2. The probability of total loss according to our model is 0.85
3. The model is roughly 91% accurate in training, test and validation sets, so we expect same
behaviour in production data
4. The model has an AUC of 0.88 in training, so again that is what we expect in production data
What do these measurements mean?
Do we have any guarantee about accident ๐ฅ ๐+1?
Are we confident about the prediction?
6. Motivating example
Introduction to conformal predictions
Conformal predictions in classification
Conformal predictions in regression
Application
Summary and conclusions
References
7. Introduction to conformal predictions
Why Conformal Predictions (CP) ?
1. There are several ad hoc ways to obtain some confidence around your predictions (resampling
methods, assume normality, etc.)
2. Conformal predictions assumes very little about the outcome you are trying to predict. It only
assume exchangeability.
3. It can be used with any machine learning algorithm.
4. It provides error bounds at a confidence level that we can select.
5. Probabilities are well-calibrated.
6. It is easy to implement.
7. The framework has been proven:
V. Vovk, A. Gammerman, G. Shafer
Algorithmic learning in a random walk, Springer 2005.
8. Introduction to conformal predictions
General idea
โข Let ๐ be a probability distribution.
โข f z โ โ some function.
โข We draw 5 samples from the distribution ๐ and apply ๐ ๐ง :
๏ ๐ ๐ง๐ = ๐ผ๐, with ๐ = 1, โฆ , 5
๏ For simplicity, we assume ๐ผ1 โค ๐ผ2 โค ๐ผ3 โค ๐ผ4 โค ๐ผ5
โข We estimate the cumulative distribution function (CDF) for the scores:
0 0.2 0.4 0.6 0.8 1
๐ผ1 ๐ผ2 ๐ผ3 ๐ผ4 ๐ผ5
โข We draw a new sample from z โ ๐. We assume exchangeability and compute ๐ ๐ง = ๐ผ.
โข We can estimate its probability: ๐ ๐ผ โค ๐ผ4 = 0.6 and ๐ ๐ผ โค ๐ผ2 = 0.2
9. Introduction to conformal predictions
Relation to our problem
โข Let ๐ง๐ = ๐ฅ๐; ๐ฆ๐ with ๐ = 1, โฆ , ๐ be a sample of the probability distribution, ๐ = ๐, ๐ , where:
๏ ๐ฅ๐ is our observables and ๐ฆ๐ the target we want to predict
โข We define ๐ ๐ง๐ = ๐ฆ๐ โ โ ๐ฅ๐ , where:
๏ โ ๐ฅ๐ is a regression model train on ๐ง๐ with ๐ = 5, โฆ , ๐
โข We apply ๐ ๐ง to the 5 remaining samples
๏ ๐ ๐ง๐ = ๐ผ๐, with ๐ = 1, โฆ , 5
๏ We can compute the exact values 0.10 โค 0.13 โค 0.28 โค 0.30 โค 0.38
โข We estimate the cumulative distribution function (CDF) for the scores:
0 0.2 0.4 0.6 0.8 1
0.10 0.13 0.28 0.30 0.38
โข We draw a new sample from z โ ๐. We assume exchangeability and compute ๐ ๐ง = ๐ฆ โ โ ๐ฅ = ๐ฆ โ 2 .
โข We can estimate its probability:
๏ ๐ ๐ฆ โ 2 โค 0.30 = 0.6 and ๐ ๐ฆ โ 2 โค 0.28 = 0.4
๏ ๐ ๐ฆ โ 2 ยฑ 0.30 = 0.6 and ๐ ๐ฆ โ 2 ยฑ 0.30 = 0.4
๏ ๐ฆ ๐ 1.7, 2.3 with probability 0.6
10. Introduction to conformal predictions
Inputs for conformal predictions
โข A set of training examples ๐ง๐ = ๐ฅ๐, ๐ฆ๐ with ๐ = 1, โฆ , ๐
๏ They must be drawn from an exchangeable distribution (the order of observations is
irrelevant).
โข A non-conformity function ๐ ๐ง โ โ
๏ It measures the โweirdnessโ of an example ๐ฅ๐, ๐ฆ๐
๏ It should give low scores to similar examples ๐ฅ๐, ๐ฆ๐ and high scores to different ones
๐ฅ๐, ยฌ๐ฆ๐
๏ Common choice is take some function of the underlying model, but it can be anything: the
probability estimate for correct class, distance to neighbours with same class, probability from
the trees, absolute error of a regression model, etc.
โข Set a significance level ๐ โ (0,1), so 1 โ ๐ confidence level
11. Introduction to conformal predictions
How does conformal predictions work?
โข Divide training set into two disjoint sets: ๐๐ก with ๐๐ก = ๐ and ๐ ๐ with ๐ ๐ = ๐, ๐ + ๐ = ๐
โข Build the underlying model, โ, using ๐๐ก
โข Apply ๐ ๐ง๐ = ๐ผ๐ to the elements of the set you did not use for training โ , and estimate its probability
distribution ๐ผ1, โฆ , ๐ผ ๐ ~ ๐
โข If a new example comes in ๐ฅ, โ ๐ฅ = ๐ฆ , then we will reject ๐ฆ
๏ We will reject ๐ฆ if ๐ (๐ฅ, ๐ฆ) = ๐ผ ๐ฆ does not belong to ๐
โข We compute the non-conformity degree which is called p-value as follows:
๐ ๐ฆ=
๐ง ๐ ๐ ๐ ๐โถ ๐ผ ๐ โฅ ๐ผ ๐ฆ
๐+1
, ๐ ๐ฆ is the p-value
โข Finally the prediction region:
ฮ ๐
= ๐ฆ ๐ ๐: ๐ ๐ฆ > ๐
Is ๐ a very non-conforming example?
12. Introduction to conformal predictions
Conformal prediction output
The prediction region ฮ ๐
contains prediction ๐ฆ with probability 1 โ ๐
๏ In classification :
๏ผ ๐ผ ๐ฆ is know, but we need to compute ๐ ๐ฆ
๏ผ The result is a set of labels:
ฮ ๐
= ๐ถ๐๐๐ ๐ 1, ๐ถ๐๐๐ ๐ 3, ๐ถ๐๐๐ ๐ 5 s. t. ๐ ๐ฆ โ ฮ ๐
= 1 โ ๐
o If ฮ ๐
= โ , then always erroneous
o If ฮ ๐
= ๐ถ (only one class), then always true (if it is the correct class)
o If ฮ ๐
= ๐ถ๐๐๐ ๐ 1, ๐ถ๐๐๐ ๐ 3, โฆ , ๐ถ๐๐๐ ๐ 5 (several classes), then always correct
๏ In regression is an interval:
๏ผ ๐ ๐ฆ is know, but we need to compute ๐ผ ๐ฆ
๏ผ The result is an interval:
ฮ ๐
= ๐, ๐ where ๐, ๐ โ โ and s. t. ๐ ๐ฆ โ ฮ ๐
= 1 โ ๐
13. Motivating example
Introduction to conformal predictions
Conformal predictions in classification
Conformal predictions in regression
Application
Summary and conclusions
References
14. Conformal predictions in classification
Algorithm to compute conformal prediction regions in classification problems
Let ๐ = ๐, ๐ be the historical data set for our classification problem, where:
๏ ๐ = ๐, ๐ is the information about the problem and ๐ = ๐ถ1 , โฆ , ๐ถ๐ set of labels.
๏ ๐ is exchangeable.
To obtain the prediction region:
1. Divide ๐ into two disjoint sets:
๏ผ ๐๐ก proper training set with ๐๐ก = ๐
๏ผ ๐ ๐ calibration set with ๐ ๐ = ๐
2. Fit a classifier, โ ๐ = ๐, using ๐๐ก
3. Define a non-conformity function ๐ ๐ง to measure the weirdness of your samples
4. Apply ๐ ๐ง to each element in ๐ ๐ to obtain the calibration scores: ๐ผ1, โฆ , ๐ผ ๐
5. Set a significance level ๐ ๐ 0, 1
15. Conformal predictions in classification
Algorithm to compute conformal predictions in classification problems
6. For a new sample ๐ฅ, ๐ฆ compute the scoring value for each label in ๐:
โ ๐ถ๐ ๐ ๐ ๐ ๐ฅ, ๐ฆ = ๐ถ๐ = ๐ผ ๐ถ ๐
7. For each label in ๐ compute the p-value as follows:
โ ๐ถ๐ ๐ ๐ ๐ ๐ถ ๐
=
๐ง ๐ ๐ ๐ ๐โถ ๐ผ ๐ โฅ๐ผ ๐ถ ๐
๐+1
8. Finally build the prediction region as follows:
ฮ ๐
= ๐ถ๐ ๐ ๐: ๐ ๐ถ ๐
> ๐ , then
for the new prediction โ ๐ฅ = ๐ฆ, ๐ ๐ฆ ๐ ฮ ๐
= 1 โ ฮต
16. Motivating example
Introduction to conformal predictions
Conformal predictions in classification
Conformal predictions in regression
Application
Summary and conclusions
References
17. Conformal predictions in regression
Algorithm to compute conformal prediction regions in regression problems
Let ๐ = ๐, ๐ be the historical data set for our classification problem, where:
๏ ๐ = ๐, ๐ is the information about the problem and ๐ a continuous target.
๏ ๐ is exchangeable.
To obtain the prediction region:
1. Divide ๐ into two disjoint sets:
๏ผ ๐๐ก proper training set with ๐๐ก = ๐
๏ผ ๐ ๐ calibration set with ๐ ๐ = ๐
2. Fit a regression model, โ ๐ = ๐, using ๐๐ก
3. Define a non-conformity function ๐ ๐ง to measure the weirdness of your samples
4. Apply ๐ ๐ง to each element in ๐ ๐ to obtain the calibration scores: ๐ผ1, โฆ , ๐ผ ๐
5. Set a significance level ๐ ๐ 0, 1
18. Conformal predictions in regression
Algorithm to compute conformal predictions in regression problems
6. Sort calibrations scores ๐ผ1, โฆ , ๐ผ ๐ in a descending order
7. Compute the index ๐ = ๐ ๐ + 1
๏ This is the index of the (1 โ ฮต)-percentile of the non-conformity score ๐ผ ๐
8. Finally the prediction region for a new sample:
ฮ ๐
= โ ๐ฅ๐ ยฑ ๐ผ ๐ , with ๐ โ(๐ฅ๐)๐ ฮ ๐
= 1 โ ฮต
19. Motivating example
Introduction to conformal predictions
Conformal predictions in classification
Conformal predictions in regression
Application
Summary and conclusions
References
20. Application
Classification with conformal predictors
โข The dataset is imbalanced (Total Loss is the minority class)
โข The model is XGBoost
โข Model performance:
โข A new accident happens the model says it is a Total Loss, but how confident we are?
โข Due to business restrictions we have to minimize the number false positives in TL
PROBLEM: To find out whether a car is a total loss or not
21. Application
Classification with conformal predictors
โข We take the test set, ๐๐ก๐๐ ๐ก = (๐ฅ๐, ๐ฆ๐) with ๐ = 1, โฆ , ๐
โข We define a non-conformity function:
๐ ๐ง =
๐๐๐๐๐๐๐๐๐๐ก๐ฆ ๐๐๐๐ ๐ ๐ + ๐๐๐๐๐๐๐๐ก๐๐ ๐๐๐๐๐๐๐๐๐๐ก๐ฆ ๐๐๐๐ ๐ ๐
2
where:
๏ ๐๐๐๐๐๐๐๐๐๐ก๐ฆ๐๐๐๐ ๐ ๐ according to the model that ๐ฆ = ๐๐๐๐ ๐ ๐
๏ ๐๐๐๐๐๐๐๐ก๐๐ ๐๐๐๐๐๐๐๐๐๐ก๐ฆ๐๐๐๐ ๐ ๐ recalibrated probability that ๐ฆ = ๐๐๐๐ ๐ ๐
22. Application
Classification with conformal predictors
โข Let us assume ๐ = 9 and apply ๐ ๐ง to each ๐ง๐ ๐ ๐๐ก๐๐ ๐ก
โข We order the scores, and use them to compute the p-value per label for the new accident:
TL = 0.85 p-value TL = 8/(9+1) = 0.8 > ๐ = 0.05
Non-TL = 0.15 p-value non-TL = 2/(9+1) = 0.2 > ๐ = 0.05
ฮ ๐
= ๐๐ฟ, ๐๐๐ โ ๐๐ฟ s. t. ๐ ๐ฆ โ ฮ ๐
= 0.95
25. Application
Regression with conformal predictors
โข The dataset is not correctly label there were some inconsistencies.
โข The model is XGBoost.
โข Model performance:
โข The model output was the input to another model
PROBLEM: to compute/find out the price of a car
28. Application
Regression with conformal predictors
โข We take the test set, ๐๐ก๐๐ ๐ก = (๐ฅ๐, ๐ฆ๐) with ๐ = 1, โฆ , ๐
โข We define a non-conformity function:
๐ ๐ง = ๐ฆ โ โ(๐ฅ)
where:
๏ ๐ฆ is the true value, and โ(๐ฅ) the model prediction
โข Let us assume ๐ = 9 and apply ๐ ๐ง to each ๐ง๐ ๐ ๐๐ก๐๐ ๐ก
โข We order in descending order
โข We set ๐ = 0.2, then the index of the score ๐ = 0.2 โ 9 + 1 = 2 ๐ผ ๐ =2
โข The fixed width conformal interval would be: โ(๐ฅ) ยฑ 189.52
30. Motivating example
Introduction to conformal predictions
Conformal predictions in classification
Conformal predictions in regression
Application
Summary and conclusions
References
31. Summary and conclusions
Take away
โข Good model performance does not mean trustable predictions.
โข Conformal predictions is a useful tool with different applications.
โข It is easy to understand and to implement.
โข Define a non-conformity function is not always easy.
โข Confident areound predictions bring some
32. Motivating example
Introduction to conformal predictions
Conformal predictions in classification
Conformal predictions in regression
Application
Summary and conclusions
References
34. References
Some interesting readings
1. V. Vovk, A. Gammerman, G. Shafer, Algorithm learning in a random walk, Springer, 2005.
2. H. Linusson, An introduction to conformal predictions, 2017.
3. V. Vovk, Cross-conformal predictors, Annals of Mathematics and Artificial Intelligence, 1-20, 2013.
4. U. Johannsson, H. Bostrom, T. Lofstrom, H. Linusson, Regression conformal predictors with
random forest, Machine Learning, 95, 155-176, 2014.
5. V. Balasubramanian, S-S. Ho, V. Vovk, Conformal predictions for reliable machine learning, Science
Direct Journal and Book, 2014.
35. How is your prediction? Quantifying uncertainty in Machine Learning
predictions
Questions