SlideShare ist ein Scribd-Unternehmen logo
1 von 47
DATA ANALYTICS
Evaluation Metrics for Supervised Learning
Models of Machine Learning
Md. Main Uddin Rony
Software Developer, Infolytx,Inc.
Machine Learning Evaluation Metrics
ML Evaluation Metrics Are…..
● tied to Machine Learning Tasks
● methods which determine an algorithm’s performance and behavior
● helpful to decide the best model to meet the target performance
● helpful to parameterize the model in such a way that can offer best
performing algorithm
Evaluation Metrics Types...
● Various types of ML Algorithms (classification, regression, ranking,
clustering)
● Different types of evaluation metrics for different types of algorithm
● Some metrics can be useful for more than one type of algorithm
(Precision - Recall)
● Will cover Evaluation Metrics for Supervised learning models only (
Classification, Regression, Ranking)
Classification Metrics
Classification Model Does...
Predict class labels given input data
In Binary classification, there are two possible output classes ( 0 or 1, True
or False, Positive or Negative, Yes or No etc.)
Spam detection of email is a good example of Binary classification.
Some Popular Classification Metrics...
Accuracy
Confusion Matrix
Log-Loss
AUC
Accuracy
● Ratio between the number of correct predictions and total number of
predictions
● Example: Suppose we have 100 examples in the positive class and 200
examples in the negative class. Our model declares 80 out of 100
positives as positive correctly and 195 out of 200 negatives as negative
correctly.
● So, accuracy is = (80 + 195)/(100 + 200) = 91.7%
Confusion Matrix
● Shows a more detailed breakdown of correct and incorrect classifications for each
class.
● Think about our previous example and then the confusion matrix looks like:
● What is the accuracy that positive class has ? And Negative class?
● Clearly, positive class has lower accuracy than the negative class
● And that information is lost if we calculate overall accuracy only.
Predicted as positive Predicted as negative
Labeled as positive 80 20
Labeled as negative 5 195
Per-Class Accuracy
● Average per class accuracy of previous example:
(80% + 97.5%)/2 = 88.75 %, different from accuracy
Why important?
- Can show different scenario when there are different numbers of
examples per class
- Class with more examples than other will dominate the statistic of
accuracy, hence produced a distorted picture
Log-Loss
Very much useful when the raw output of classifier is a numeric probability
instead of a class label 0 or 1
Mathematically , log-loss for a binary classifier:
Minimum is 0 when prediction and true label match up
Calculate for a data point predicted by classifier to belong to class 1 with
probability .51 and with probability 1
Minimizing this value, maximizing the accuracy of the classifier
AUC (Area Under Curve)
● The curve is receiver operating
characteristic curve or in short ROC
curve
● Provides nuanced details about the
behavior of the classifier
● Bad ROC curve covers very little area
● Good ROC curve has a lot of space
under it
● But, how?
AUC (contd..)
AUC (contd..)
AUC (contd..)
AUC (contd..)
AUC (contd..)
AUC (contd..)
AUC (contd..)
● So, what’s the advantage of using of ROC curve over a simpler metric?
ROC curve visualizes all possible classification thresholds, whereas
other metrics only represents your error rate for a single threshold
Ranking Metrics
Ranking ...
Is related to binary classification
Internet Search can be a good example which acts as a ranker.
During a query, it returns ranked list of web pages relevant to that query
So, here ranking can be a binary classification of “relevant query” or
“irrelevant query”
It also ordering the results so that the most relevant result should be on top
So, what can be done in underlying implementation considering both??
Can we predict what will ranking metrics evaluate and how?
Some Ranking Metrics..
Precision - Recall
Precision - Recall Curve and F1 Score
NDCG
Precision - Recall
Considering the scenario of web search result, Precision answers this
question:
“Out of the items that the ranker/classifier predicted to be relevant, how many are
truly relevant?”
Whereas, Recall answers this:
“Out of all the items that are truly relevant, how many are found by the
ranker/classifier?”
Precision - Recall (Contd..)
Calculation Example Of Precision- Recall
Total Negative = 9760 + 140 = 9900
Total Positive = 40 + 60 = 100 Total
Negative prediction = 9760 + 40 = 9800 Total
Positive prediction = 140 + 60 = 200
Precision = TP / (TP+FP)
= 60 / (60 + 140) = 30%
Recall = TP / (TP+FN)
= 60 / (60+40) = 60%
Predicted as
Negative
Predicted as
Positive
Actual
Negative
9760 (TN) 140 (FP)
Actual
Positive
40 (FN) 60 (TP)
Precision - Recall Curve
When the numbers of answers returned by
the ranker will change, the precision and
recall score will also be changed
By plotting precision versus recall over a
range of k values which denotes
numbers of results returned, we get the
precision - recall curve
Computing Precision-Recall Point
Interpolating a Recall/Precision Curve
Trade-off between Recall and Precision
F-Measure
One measure of performance that takes into account both recall and
precision
Harmonic mean of recall and precision:
Compared to arithmetic mean, both need to be high for harmonic mean to
be high
NDCG
● Precision and recall treat all retrieved items equally.
● But, a relevant item in position 1 and a relevant item in position 5 bear
same significance?
● Think about a web search result
● NDCG tries to take this scenario into account.
What?
● NDCG stands for Normalized Discounted Cumulative Gain
● First just focus on DCG (Discounted Cumulative Gain)
Discounted Cumulative Gain
● Popular measure for evaluating web search and related tasks.
● Discounts items that are further down the search result list
● Two assumptions:
- Highly relevant documents are more useful than marginally relevant
document
- the lower the ranked position of a relevant document, the less useful it is
for the user, since it is less likely to be examined
Discounted Cumulative Gain
● Uses graded relevance as a measure of the usefulness, or gain, from
examining a document
● Gain is accumulated starting at the top of the ranking and may be
reduced, or discounted, at lower ranks
● Typical discount is 1/log (rank)
- With base 2, the discount at rank 4 is ½, and at rank 8 it is 1/3
Discounted Cumulative Gain
● DCG is the total gain accumulated at a particular rank p:
● Alternative formulation:
- used by some web search companies
- emphasis on retrieving highly relevant documents
* Equation used from Addison Wesley’s
DCG Example
● 10 ranked documents judged on 0-3 relevance scale:
3, 2, 3, 0, 0, 1, 2, 2, 3, 0
● discounted gain:
3, 2/1, 3/1.59, 0, 0, 1/ 2.59, 2/2.81, 2/3 , 3/3.17, 0
= 3, 2, 1.89, 0, 0, 0.39, 0.71, 0.67, 0.95, 0
● DCG:
3, 5, 6.89, 6.89, 6.89, 7.28, 7.99, 8.66, 9.61, 9.61
* Example used from Addison Wesley’s
presentation
Normalized DCG
● Normalized version of discounted cumulative gain
● Often normalized by comparing the DCG at each rank with the DCG value
for the perfect ranking
● Normalized score always lies between 0.0 and 1.0
NDCG Example
● Let’s look back the list of ranked document judged on relevance scale:
3, 2, 3, 0, 0, 1, 2, 2, 3, 0
● Perfect ranking:
3, 3, 3, 2, 2, 2, 1, 0, 0, 0
● Perfect discounted gain:
3, 3/1, 3/1.59, 2/2, 2/2.32, 2/ 2.59, 1/2.81, 0 , 0, 0
= 3, 3, 1.89, 1, 0.86, 0.77, 0.36, 0, 0, 0
NDCG Example
● Ideal DCG values:
3, 6, 7.89, 8.89, 9.75, 10.52, 10.88, 10.88, 10.88, 10.88
NDCG values( divide actual by ideal):
3/3, 5/6, 6.89/7.89, 6.89/8.89, 6.89/9.75, 7.28/10.52,
7.99/10.88, 8.66/10.88, 9.61/10.88, 9.61/10.88
= 1, 0.83, 0.87, 0.76, 0.71, 0.69, 0.73, 0.8, 0.88, 0.88
3, 2, 3, 0, 0, 1, 2, 2, 3, 0
Regression Metrics
What Regression Tasks do?
Model learns to predict numeric scores.
For example, we try to predict the price of a stock on future days given past
price history and other useful information
Some Regression Metrics..
RMSE (Root Mean Square Error)
Quantiles of Errors
RMSE
The most commonly used metric for regression tasks
Also known as RMSD ( root-mean-square deviation)
This is defined as the square root of the average squared distance between
the actual score and the predicted score:
Quantiles of Errors
RMSE is an average, so it is sensitive to large outliers.
If the regressor performs really badly on a single data point, the average
error could be big, not robust
Quantiles (or percentiles) are much more robust
Because it is not affected by large outliers
It’s important to look at the median absolute percentage:
It gives us a relative measure of the typical error.
Acknowledgement
Evaluating Machine Learning Models by Alice Zheng
Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE)
who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech,
Hong Kong)
Tutorial of Data School on ROC Curves and AUC by Kevin Markham
Questions???
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxGauravSonawane51
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector MachineDerek Kane
 
Confusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metricsConfusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metricsMinesh A. Jethva
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniquesVenkata Reddy Konasani
 

Was ist angesagt? (20)

Decision tree
Decision treeDecision tree
Decision tree
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Decision tree
Decision treeDecision tree
Decision tree
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 
Confusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metricsConfusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metrics
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 

Andere mochten auch

Study On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For BanksStudy On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For BanksMd. Main Uddin Rony
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
Tweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-RankingTweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-RankingYoshinari Fujinuma
 
Learning to rankの評価手法
Learning to rankの評価手法Learning to rankの評価手法
Learning to rankの評価手法Kensuke Mitsuzawa
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityPier Luca Lanzi
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationMarina Santini
 
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found OnlineMetrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found OnlineBrightEdge Technologies
 
Nabil Malik - Security performance metrics
Nabil Malik - Security performance metricsNabil Malik - Security performance metrics
Nabil Malik - Security performance metricsnooralmousa
 
Lean Workbench For Creating And Tracking Metrics That Matter
Lean Workbench For Creating And Tracking Metrics That MatterLean Workbench For Creating And Tracking Metrics That Matter
Lean Workbench For Creating And Tracking Metrics That MatterJennifer Rubinovitz
 
DataPower Operations Dashboard
DataPower Operations DashboardDataPower Operations Dashboard
DataPower Operations DashboardIBM Integration
 
in10: How to build a metric in a metric
in10: How to build a metric in a metricin10: How to build a metric in a metric
in10: How to build a metric in a metricPetr Olmer
 
Analytics and Reporting: Measuring Success Along the Journey
Analytics and Reporting: Measuring Success Along the JourneyAnalytics and Reporting: Measuring Success Along the Journey
Analytics and Reporting: Measuring Success Along the JourneyGene Begin
 

Andere mochten auch (20)

Version controll.pptx
Version controll.pptxVersion controll.pptx
Version controll.pptx
 
Grape(Ruby on Rails)
Grape(Ruby on Rails)Grape(Ruby on Rails)
Grape(Ruby on Rails)
 
Study On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For BanksStudy On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For Banks
 
Six sigma (1)
Six sigma (1)Six sigma (1)
Six sigma (1)
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Lean Six Sigma and the Environment - Sample Slides
Lean Six Sigma and the Environment - Sample SlidesLean Six Sigma and the Environment - Sample Slides
Lean Six Sigma and the Environment - Sample Slides
 
Tweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-RankingTweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-Ranking
 
Learning to rankの評価手法
Learning to rankの評価手法Learning to rankの評価手法
Learning to rankの評価手法
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and Credibility
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Measuring Effectiveness
Measuring EffectivenessMeasuring Effectiveness
Measuring Effectiveness
 
Helpdesk
HelpdeskHelpdesk
Helpdesk
 
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found OnlineMetrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
 
Nabil Malik - Security performance metrics
Nabil Malik - Security performance metricsNabil Malik - Security performance metrics
Nabil Malik - Security performance metrics
 
Lean Workbench For Creating And Tracking Metrics That Matter
Lean Workbench For Creating And Tracking Metrics That MatterLean Workbench For Creating And Tracking Metrics That Matter
Lean Workbench For Creating And Tracking Metrics That Matter
 
DataPower Operations Dashboard
DataPower Operations DashboardDataPower Operations Dashboard
DataPower Operations Dashboard
 
in10: How to build a metric in a metric
in10: How to build a metric in a metricin10: How to build a metric in a metric
in10: How to build a metric in a metric
 
Analytics and Reporting: Measuring Success Along the Journey
Analytics and Reporting: Measuring Success Along the JourneyAnalytics and Reporting: Measuring Success Along the Journey
Analytics and Reporting: Measuring Success Along the Journey
 
Action Trumps Everything
Action Trumps EverythingAction Trumps Everything
Action Trumps Everything
 
Metrics that Matter
Metrics that MatterMetrics that Matter
Metrics that Matter
 

Ähnlich wie Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine Learning

What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...Smarten Augmented Analytics
 
Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with YellowbrickRebecca Bilbro
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptxshuchismitjha2
 
Predicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining ProjectPredicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining Projectraj
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...Smarten Augmented Analytics
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationCrossing Minds
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR modelsNisha Arankandath
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?Smarten Augmented Analytics
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsBigML, Inc
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Greg Makowski
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 
Open06
Open06Open06
Open06butest
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...Smarten Augmented Analytics
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Wayyingfeng
 

Ähnlich wie Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine Learning (20)

What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with Yellowbrick
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
 
Predicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining ProjectPredicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining Project
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model Evaluation
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR models
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 Sessions
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Open06
Open06Open06
Open06
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Way
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
 

Kürzlich hochgeladen

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 

Kürzlich hochgeladen (20)

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine Learning

  • 1. DATA ANALYTICS Evaluation Metrics for Supervised Learning Models of Machine Learning Md. Main Uddin Rony Software Developer, Infolytx,Inc.
  • 3. ML Evaluation Metrics Are….. ● tied to Machine Learning Tasks ● methods which determine an algorithm’s performance and behavior ● helpful to decide the best model to meet the target performance ● helpful to parameterize the model in such a way that can offer best performing algorithm
  • 4. Evaluation Metrics Types... ● Various types of ML Algorithms (classification, regression, ranking, clustering) ● Different types of evaluation metrics for different types of algorithm ● Some metrics can be useful for more than one type of algorithm (Precision - Recall) ● Will cover Evaluation Metrics for Supervised learning models only ( Classification, Regression, Ranking)
  • 6. Classification Model Does... Predict class labels given input data In Binary classification, there are two possible output classes ( 0 or 1, True or False, Positive or Negative, Yes or No etc.) Spam detection of email is a good example of Binary classification.
  • 7. Some Popular Classification Metrics... Accuracy Confusion Matrix Log-Loss AUC
  • 8. Accuracy ● Ratio between the number of correct predictions and total number of predictions ● Example: Suppose we have 100 examples in the positive class and 200 examples in the negative class. Our model declares 80 out of 100 positives as positive correctly and 195 out of 200 negatives as negative correctly. ● So, accuracy is = (80 + 195)/(100 + 200) = 91.7%
  • 9. Confusion Matrix ● Shows a more detailed breakdown of correct and incorrect classifications for each class. ● Think about our previous example and then the confusion matrix looks like: ● What is the accuracy that positive class has ? And Negative class? ● Clearly, positive class has lower accuracy than the negative class ● And that information is lost if we calculate overall accuracy only. Predicted as positive Predicted as negative Labeled as positive 80 20 Labeled as negative 5 195
  • 10. Per-Class Accuracy ● Average per class accuracy of previous example: (80% + 97.5%)/2 = 88.75 %, different from accuracy Why important? - Can show different scenario when there are different numbers of examples per class - Class with more examples than other will dominate the statistic of accuracy, hence produced a distorted picture
  • 11. Log-Loss Very much useful when the raw output of classifier is a numeric probability instead of a class label 0 or 1 Mathematically , log-loss for a binary classifier: Minimum is 0 when prediction and true label match up Calculate for a data point predicted by classifier to belong to class 1 with probability .51 and with probability 1 Minimizing this value, maximizing the accuracy of the classifier
  • 12. AUC (Area Under Curve) ● The curve is receiver operating characteristic curve or in short ROC curve ● Provides nuanced details about the behavior of the classifier ● Bad ROC curve covers very little area ● Good ROC curve has a lot of space under it ● But, how?
  • 19. AUC (contd..) ● So, what’s the advantage of using of ROC curve over a simpler metric? ROC curve visualizes all possible classification thresholds, whereas other metrics only represents your error rate for a single threshold
  • 21. Ranking ... Is related to binary classification Internet Search can be a good example which acts as a ranker. During a query, it returns ranked list of web pages relevant to that query So, here ranking can be a binary classification of “relevant query” or “irrelevant query” It also ordering the results so that the most relevant result should be on top So, what can be done in underlying implementation considering both?? Can we predict what will ranking metrics evaluate and how?
  • 22. Some Ranking Metrics.. Precision - Recall Precision - Recall Curve and F1 Score NDCG
  • 23. Precision - Recall Considering the scenario of web search result, Precision answers this question: “Out of the items that the ranker/classifier predicted to be relevant, how many are truly relevant?” Whereas, Recall answers this: “Out of all the items that are truly relevant, how many are found by the ranker/classifier?”
  • 24. Precision - Recall (Contd..)
  • 25. Calculation Example Of Precision- Recall Total Negative = 9760 + 140 = 9900 Total Positive = 40 + 60 = 100 Total Negative prediction = 9760 + 40 = 9800 Total Positive prediction = 140 + 60 = 200 Precision = TP / (TP+FP) = 60 / (60 + 140) = 30% Recall = TP / (TP+FN) = 60 / (60+40) = 60% Predicted as Negative Predicted as Positive Actual Negative 9760 (TN) 140 (FP) Actual Positive 40 (FN) 60 (TP)
  • 26. Precision - Recall Curve When the numbers of answers returned by the ranker will change, the precision and recall score will also be changed By plotting precision versus recall over a range of k values which denotes numbers of results returned, we get the precision - recall curve
  • 29. Trade-off between Recall and Precision
  • 30. F-Measure One measure of performance that takes into account both recall and precision Harmonic mean of recall and precision: Compared to arithmetic mean, both need to be high for harmonic mean to be high
  • 31. NDCG ● Precision and recall treat all retrieved items equally. ● But, a relevant item in position 1 and a relevant item in position 5 bear same significance? ● Think about a web search result ● NDCG tries to take this scenario into account.
  • 32. What? ● NDCG stands for Normalized Discounted Cumulative Gain ● First just focus on DCG (Discounted Cumulative Gain)
  • 33. Discounted Cumulative Gain ● Popular measure for evaluating web search and related tasks. ● Discounts items that are further down the search result list ● Two assumptions: - Highly relevant documents are more useful than marginally relevant document - the lower the ranked position of a relevant document, the less useful it is for the user, since it is less likely to be examined
  • 34. Discounted Cumulative Gain ● Uses graded relevance as a measure of the usefulness, or gain, from examining a document ● Gain is accumulated starting at the top of the ranking and may be reduced, or discounted, at lower ranks ● Typical discount is 1/log (rank) - With base 2, the discount at rank 4 is ½, and at rank 8 it is 1/3
  • 35. Discounted Cumulative Gain ● DCG is the total gain accumulated at a particular rank p: ● Alternative formulation: - used by some web search companies - emphasis on retrieving highly relevant documents * Equation used from Addison Wesley’s
  • 36. DCG Example ● 10 ranked documents judged on 0-3 relevance scale: 3, 2, 3, 0, 0, 1, 2, 2, 3, 0 ● discounted gain: 3, 2/1, 3/1.59, 0, 0, 1/ 2.59, 2/2.81, 2/3 , 3/3.17, 0 = 3, 2, 1.89, 0, 0, 0.39, 0.71, 0.67, 0.95, 0 ● DCG: 3, 5, 6.89, 6.89, 6.89, 7.28, 7.99, 8.66, 9.61, 9.61 * Example used from Addison Wesley’s presentation
  • 37. Normalized DCG ● Normalized version of discounted cumulative gain ● Often normalized by comparing the DCG at each rank with the DCG value for the perfect ranking ● Normalized score always lies between 0.0 and 1.0
  • 38. NDCG Example ● Let’s look back the list of ranked document judged on relevance scale: 3, 2, 3, 0, 0, 1, 2, 2, 3, 0 ● Perfect ranking: 3, 3, 3, 2, 2, 2, 1, 0, 0, 0 ● Perfect discounted gain: 3, 3/1, 3/1.59, 2/2, 2/2.32, 2/ 2.59, 1/2.81, 0 , 0, 0 = 3, 3, 1.89, 1, 0.86, 0.77, 0.36, 0, 0, 0
  • 39. NDCG Example ● Ideal DCG values: 3, 6, 7.89, 8.89, 9.75, 10.52, 10.88, 10.88, 10.88, 10.88 NDCG values( divide actual by ideal): 3/3, 5/6, 6.89/7.89, 6.89/8.89, 6.89/9.75, 7.28/10.52, 7.99/10.88, 8.66/10.88, 9.61/10.88, 9.61/10.88 = 1, 0.83, 0.87, 0.76, 0.71, 0.69, 0.73, 0.8, 0.88, 0.88 3, 2, 3, 0, 0, 1, 2, 2, 3, 0
  • 41. What Regression Tasks do? Model learns to predict numeric scores. For example, we try to predict the price of a stock on future days given past price history and other useful information
  • 42. Some Regression Metrics.. RMSE (Root Mean Square Error) Quantiles of Errors
  • 43. RMSE The most commonly used metric for regression tasks Also known as RMSD ( root-mean-square deviation) This is defined as the square root of the average squared distance between the actual score and the predicted score:
  • 44. Quantiles of Errors RMSE is an average, so it is sensitive to large outliers. If the regressor performs really badly on a single data point, the average error could be big, not robust Quantiles (or percentiles) are much more robust Because it is not affected by large outliers It’s important to look at the median absolute percentage: It gives us a relative measure of the typical error.
  • 45. Acknowledgement Evaluating Machine Learning Models by Alice Zheng Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Tutorial of Data School on ROC Curves and AUC by Kevin Markham