SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
MODEL
PERFORMANCEP R E S E N T E D B Y M E G A N V E R B A K E L
Quick refresher
Today we will focus on the simple case of a binary classifier.
A binary classifier is a predictive model where the target can take the value of
0 or 1 (e.g. Predicting whether a customer will reject (0) or accept(1) an offer).
0 and 1 are called classes, where 1 is the positive class (outcome of interest).
We start by taking a historical data set where each row represents one
instance (e.g. a customer) and each column is a feature (e.g. income).
In addition, we need a target column (e.g. an outcome for each customer).
Next, we apply a machine learning algorithm to learn patterns from the
features to predict the probability of each class for each instance (row).
The target values are known for the historical data so we can use it to
understand how the model will perform when applied to new data where
we don't yet know the outcomes.
THEORY
Bias-variance trade-off
Over/under fitting
Finding the performance sweet spot
Data preparation
Performance metrics
Performance plots
PRACTICAL
Walk-through in Python
CONTENT
OUTLINE
Bias Variance Trade-off
Prediction errors can be split into error due to bias and error due to variance
Error due to bias is how far off the predictions are from the true values 
Error due to variance is the variability of model predictions for a given point
As we decrease model bias by increasing complexity, variance increases,
creating a trade-off as we try to minimise both
By thinking of a model with perfect predictions as the bulls-eye, we can
visualise the four scenarios of bias and variance using the below targets:
Low Variance +
Low Bias
Low Variance +
High Bias
High Variance +
High Bias
High Variance +
Low Bias
Over and Under Fitting
Over-fitting occurs when you learn too much detail from the
training data. The model doesn't generalise well, so error increases
when you apply it to new data. 
E.g. if you have one red ice cream in your training data with low
sales, you may incorrectly predict all red ice creams will have low
sales.
Under-fitting is when you don't learn enough detail, so error is high
in both your training and test sets.
Over-fitting increases as you increase complexity (e.g. add more
features, increase depth of trees), resulting in low bias, but high
variance. 
As you decrease complexity, bias increases but variance decreases.
Our job is to find the optimal level of complexity that minimises
error, and balances bias and variance.
Finding the sweet spot 
To find the sweet spot between under/over-fitting test different
levels of model complexity and minimise the total error.
There will always be a trade-off, so you must decide how much of an
increase in variance you will accept for a decrease in bias.
Take into consideration how similar the new data will be to the
training data. 
If very similar, can create a more complex model without worrying
too much about how it will generalise to slightly different data. 
If there is more variation, reduce complexity to improve the stability
of the performance on new data sets.
Must also take into account the importance of 'explainability'. If you
need to be able to explain the model to business stakeholders, a
simpler model may be preferred.
Data: Train/Test Split
To test the performance of a model, split the data into a train set
and a test set. Common splits are 80/20, and 70/30.
Train the model on the training data then apply to the test data to
check if the model works on new (unseen) data (i.e. does it
generalise).
When comparing models, select the model that minimises the
prediction error in the test data.
However, we also want to minimise the performance gap between
the train and test sets (big gap indicates overfitting, low
performance in both indicates under-fitting)
Stratify on the target to ensure the proportion of values in each class
is the same in both the train and test set. This is to maintain the
representation of the original data.
Data: Cross-validation
Cross-validation helps test for over-fitting by checking how the model holds
up when trained and tested on different subsets of the data.
The most common method is k-folds cross validation, where k is the number
of subsets to create (typically between 5 and 10). k-1 subsets are used to train
the model, which is then tested on the set held out. 
At the end, check the mean and standard deviation of the error. If comparing
models, select the model with the lowest mean error and lowest standard
deviation (i.e. minimise bias and variation).
Again, make sure you stratify by your target to ensure the proportion in each
class remains consistent.
https://en.wikipedia.org/wiki/Cross-validation_(statistics)
Performance Plots
Confusion Matrix - A cross tabulation of predicted labels and true labels, used to calculate
recall, precision, and accuracy. Objective in this example: we want to predict which
customers will accept the offer so we can minimise the cost of calling potential customers.
Recall (positive)* = 937 / (937+121) = 0.89 
We correctly predicted 89% of 'accepts'
*Also called True Positive Rate and Sensitivity
Precision (positive) = 937 / (937+212) = 0.82
Of the cases we said would accept, 82% did
Recall (negative)* = 846 / (846+212) = 0.80
* Also called True Negative Rate and Specificity
Precision (negative) = 846 / (846+121) = 0.87
Accuracy = (846+937) / (846+212+121+937) = 0.84
We correctly predicted the label for 84% of cases
Caution: Accuracy is a poor metric if you have class imbalance. If 90% of cases reject, we could be 90% accurate by just
predicting everything will reject. This doesn't help us achieve our objective of understanding which customers will accept. We
therefore have to look at other metrics such as recall and precision for the positive class to understand the prediction error.
True Negative False Positive
False Negative True Positive
Performance Plots
ROC (Receiver Operating Characteristic) - Originally for radio signals, shows the trade-off between
the true positive rate (positive class recall) and the false positive rate (1 - negative class recall) at
different probability thresholds. We want to maximise the TPR to capture as much of the positive
class as possible, while minimising the FPR which is our error or wasted effort.
Area under the curve (AUC) - Measures the area underneath the ROC curve.
0.5 (straight diagonal line) = random (TPR/FPR are equal, the true class is split 50/50)
1 (left corner) = perfect predictions
Performance Plots
Lift & Gain - compares the model to random selection when the data is ordered by the
positive class probability (high to low). For each 10% of the population, the proportion in the
positive class (left graph - lift), and what cumulative proportion have you captured (right
graph - gain)
As hoped, the majority of cases
assigned a high probability for the
positive class were in the positive
class. If we call only the top 10%, over
90% will accept the offer
With the model we can capture > 80% of customers who
will accept the offer while calling only 50% of the total
group, compared to 50% if we called randomly
Performance Metrics
Accuracy:
Where y_hat_i is the predicted value of the ith sample, and y_i is the true value, the
proportion of correct predictions can be expressed as:
Precision & Recall:
Where tp is the number of true positive predictions (correct positive), fp is the number
of false positive predictions (negative predicted as positive), and fn is the number of
false negative predictions (positive predicted as negative).
F1 Score:
Weighted average of precision and recall:
F1 = 2 * (precision * recall) / (precision + recall)
Performance Metrics
ROC_AUC: The area under the ROC curve is the probability a random positive instance
is correctly ranked higher in probability than a random negative instance. The area
under the ROC curve is calculated using the formula for the area of a trapezoid:
Gini co-efficient:
The gini coefficient is ratio of the area between the diagonal line
(perfect equality) and the Lorenz curve (cumulative positive class
proportion) and the total area. B is equal to ROC_AUC - 0.5, thus
the gini co-efficient can be derived from ROC AUC: 
Log Loss (binary):
Where y is the true label, and a probability estimate p=Pr(y=1), the log loss per sample is
the negative log-likelihood of the classifier given the true label:
Gini = A / (A + B)
Gini = (AUC-0.5)*2
https://en.wikipedia.org/wiki/Gini_coefficient
Metrics Summary
Accuracy (0-1) - maximise - of all predictions, the proportion correctly predicted
Recall (0-1) - maximise - of the instances actually in a class, the proportion correctly
predicted as that class (i.e. how many you pick up)
Precision (0-1) - maximise - of the instances predicted to be a class, the proportion
that were correct (i.e. 1-precision is the error or incorrect predictions)
F1 score (0-1) - maximise - weighted average of precision and recall (for binary
classifiers, done for positive class)
ROC_AUC (0-1) - maximise - area under the ROC curve
Gini (0-1) - maximise - a measure of inequality, where a high value indicates a
disproportionate amount of the positive class is represented in the cases with a high
probability (good!)
Log loss (0-1) - minimise - log loss increases as the predicted probability diverges
from the actual label (penalises the model based on how sure it was)
Python Practical
To calculate the performance metrics and create the plots discussed, all you
need is the probabilities for each class, the predicted class (assign a threshold to
the probabilities), and the actual outcomes.
If you are using an sklearn algorithm, these can be easily obtained after you
have fitted the model with the predict and predict_proba methods:
clf = sklearn.ensemble.RandomForestClassifier()
predicted_class = clf.predict(x_test)
probabilities = clf.predict_proba(x_test)
A range of performance metrics are available in the sklearn.metrics module: 
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics

Weitere ähnliche Inhalte

Was ist angesagt?

Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade offVARUN KUMAR
 
Confusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f scoreConfusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f scoreSaurabh Singh
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learningShareDocView.com
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningKhang Pham
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Pradeep Redddy Raamana
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 

Was ist angesagt? (20)

Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Confusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f scoreConfusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f score
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
Feature selection
Feature selectionFeature selection
Feature selection
 

Ähnlich wie Assessing Model Performance - Beginner's Guide

WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedDataminingTools Inc
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learnedweka Content
 
Important Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxImportant Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxChode Amarnath
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)Abhimanyu Dwivedi
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionRupak Roy
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regressionRaman Kannan
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.pptbutest
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.pptbutest
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
 
MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.AmnaArooj13
 
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACROBOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACROAnthony Kilili
 
Advanced Statistics Homework Help
Advanced Statistics Homework HelpAdvanced Statistics Homework Help
Advanced Statistics Homework HelpExcel Homework Help
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using RANURAG SINGH
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data scienceANURAG SINGH
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptShayanChowdary
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 

Ähnlich wie Assessing Model Performance - Beginner's Guide (20)

WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
Important Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxImportant Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptx
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
working with python
working with pythonworking with python
working with python
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.
 
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACROBOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
 
Advanced Statistics Homework Help
Advanced Statistics Homework HelpAdvanced Statistics Homework Help
Advanced Statistics Homework Help
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 

Kürzlich hochgeladen

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 

Kürzlich hochgeladen (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 

Assessing Model Performance - Beginner's Guide

  • 1. MODEL PERFORMANCEP R E S E N T E D B Y M E G A N V E R B A K E L
  • 2. Quick refresher Today we will focus on the simple case of a binary classifier. A binary classifier is a predictive model where the target can take the value of 0 or 1 (e.g. Predicting whether a customer will reject (0) or accept(1) an offer). 0 and 1 are called classes, where 1 is the positive class (outcome of interest). We start by taking a historical data set where each row represents one instance (e.g. a customer) and each column is a feature (e.g. income). In addition, we need a target column (e.g. an outcome for each customer). Next, we apply a machine learning algorithm to learn patterns from the features to predict the probability of each class for each instance (row). The target values are known for the historical data so we can use it to understand how the model will perform when applied to new data where we don't yet know the outcomes.
  • 3. THEORY Bias-variance trade-off Over/under fitting Finding the performance sweet spot Data preparation Performance metrics Performance plots PRACTICAL Walk-through in Python CONTENT OUTLINE
  • 4. Bias Variance Trade-off Prediction errors can be split into error due to bias and error due to variance Error due to bias is how far off the predictions are from the true values  Error due to variance is the variability of model predictions for a given point As we decrease model bias by increasing complexity, variance increases, creating a trade-off as we try to minimise both By thinking of a model with perfect predictions as the bulls-eye, we can visualise the four scenarios of bias and variance using the below targets: Low Variance + Low Bias Low Variance + High Bias High Variance + High Bias High Variance + Low Bias
  • 5. Over and Under Fitting Over-fitting occurs when you learn too much detail from the training data. The model doesn't generalise well, so error increases when you apply it to new data.  E.g. if you have one red ice cream in your training data with low sales, you may incorrectly predict all red ice creams will have low sales. Under-fitting is when you don't learn enough detail, so error is high in both your training and test sets. Over-fitting increases as you increase complexity (e.g. add more features, increase depth of trees), resulting in low bias, but high variance.  As you decrease complexity, bias increases but variance decreases. Our job is to find the optimal level of complexity that minimises error, and balances bias and variance.
  • 6. Finding the sweet spot  To find the sweet spot between under/over-fitting test different levels of model complexity and minimise the total error. There will always be a trade-off, so you must decide how much of an increase in variance you will accept for a decrease in bias. Take into consideration how similar the new data will be to the training data.  If very similar, can create a more complex model without worrying too much about how it will generalise to slightly different data.  If there is more variation, reduce complexity to improve the stability of the performance on new data sets. Must also take into account the importance of 'explainability'. If you need to be able to explain the model to business stakeholders, a simpler model may be preferred.
  • 7. Data: Train/Test Split To test the performance of a model, split the data into a train set and a test set. Common splits are 80/20, and 70/30. Train the model on the training data then apply to the test data to check if the model works on new (unseen) data (i.e. does it generalise). When comparing models, select the model that minimises the prediction error in the test data. However, we also want to minimise the performance gap between the train and test sets (big gap indicates overfitting, low performance in both indicates under-fitting) Stratify on the target to ensure the proportion of values in each class is the same in both the train and test set. This is to maintain the representation of the original data.
  • 8. Data: Cross-validation Cross-validation helps test for over-fitting by checking how the model holds up when trained and tested on different subsets of the data. The most common method is k-folds cross validation, where k is the number of subsets to create (typically between 5 and 10). k-1 subsets are used to train the model, which is then tested on the set held out.  At the end, check the mean and standard deviation of the error. If comparing models, select the model with the lowest mean error and lowest standard deviation (i.e. minimise bias and variation). Again, make sure you stratify by your target to ensure the proportion in each class remains consistent. https://en.wikipedia.org/wiki/Cross-validation_(statistics)
  • 9. Performance Plots Confusion Matrix - A cross tabulation of predicted labels and true labels, used to calculate recall, precision, and accuracy. Objective in this example: we want to predict which customers will accept the offer so we can minimise the cost of calling potential customers. Recall (positive)* = 937 / (937+121) = 0.89  We correctly predicted 89% of 'accepts' *Also called True Positive Rate and Sensitivity Precision (positive) = 937 / (937+212) = 0.82 Of the cases we said would accept, 82% did Recall (negative)* = 846 / (846+212) = 0.80 * Also called True Negative Rate and Specificity Precision (negative) = 846 / (846+121) = 0.87 Accuracy = (846+937) / (846+212+121+937) = 0.84 We correctly predicted the label for 84% of cases Caution: Accuracy is a poor metric if you have class imbalance. If 90% of cases reject, we could be 90% accurate by just predicting everything will reject. This doesn't help us achieve our objective of understanding which customers will accept. We therefore have to look at other metrics such as recall and precision for the positive class to understand the prediction error. True Negative False Positive False Negative True Positive
  • 10. Performance Plots ROC (Receiver Operating Characteristic) - Originally for radio signals, shows the trade-off between the true positive rate (positive class recall) and the false positive rate (1 - negative class recall) at different probability thresholds. We want to maximise the TPR to capture as much of the positive class as possible, while minimising the FPR which is our error or wasted effort. Area under the curve (AUC) - Measures the area underneath the ROC curve. 0.5 (straight diagonal line) = random (TPR/FPR are equal, the true class is split 50/50) 1 (left corner) = perfect predictions
  • 11. Performance Plots Lift & Gain - compares the model to random selection when the data is ordered by the positive class probability (high to low). For each 10% of the population, the proportion in the positive class (left graph - lift), and what cumulative proportion have you captured (right graph - gain) As hoped, the majority of cases assigned a high probability for the positive class were in the positive class. If we call only the top 10%, over 90% will accept the offer With the model we can capture > 80% of customers who will accept the offer while calling only 50% of the total group, compared to 50% if we called randomly
  • 12. Performance Metrics Accuracy: Where y_hat_i is the predicted value of the ith sample, and y_i is the true value, the proportion of correct predictions can be expressed as: Precision & Recall: Where tp is the number of true positive predictions (correct positive), fp is the number of false positive predictions (negative predicted as positive), and fn is the number of false negative predictions (positive predicted as negative). F1 Score: Weighted average of precision and recall: F1 = 2 * (precision * recall) / (precision + recall)
  • 13. Performance Metrics ROC_AUC: The area under the ROC curve is the probability a random positive instance is correctly ranked higher in probability than a random negative instance. The area under the ROC curve is calculated using the formula for the area of a trapezoid: Gini co-efficient: The gini coefficient is ratio of the area between the diagonal line (perfect equality) and the Lorenz curve (cumulative positive class proportion) and the total area. B is equal to ROC_AUC - 0.5, thus the gini co-efficient can be derived from ROC AUC:  Log Loss (binary): Where y is the true label, and a probability estimate p=Pr(y=1), the log loss per sample is the negative log-likelihood of the classifier given the true label: Gini = A / (A + B) Gini = (AUC-0.5)*2 https://en.wikipedia.org/wiki/Gini_coefficient
  • 14. Metrics Summary Accuracy (0-1) - maximise - of all predictions, the proportion correctly predicted Recall (0-1) - maximise - of the instances actually in a class, the proportion correctly predicted as that class (i.e. how many you pick up) Precision (0-1) - maximise - of the instances predicted to be a class, the proportion that were correct (i.e. 1-precision is the error or incorrect predictions) F1 score (0-1) - maximise - weighted average of precision and recall (for binary classifiers, done for positive class) ROC_AUC (0-1) - maximise - area under the ROC curve Gini (0-1) - maximise - a measure of inequality, where a high value indicates a disproportionate amount of the positive class is represented in the cases with a high probability (good!) Log loss (0-1) - minimise - log loss increases as the predicted probability diverges from the actual label (penalises the model based on how sure it was)
  • 15. Python Practical To calculate the performance metrics and create the plots discussed, all you need is the probabilities for each class, the predicted class (assign a threshold to the probabilities), and the actual outcomes. If you are using an sklearn algorithm, these can be easily obtained after you have fitted the model with the predict and predict_proba methods: clf = sklearn.ensemble.RandomForestClassifier() predicted_class = clf.predict(x_test) probabilities = clf.predict_proba(x_test) A range of performance metrics are available in the sklearn.metrics module:  http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics