SlideShare ist ein Scribd-Unternehmen logo
1 von 73
Downloaden Sie, um offline zu lesen
Evaluation Metrics for
Classification and
Information Retrieval
Who am I
● Katya
● Natural Language Processing
● CTO at Majio
● Sloth.Works - matches Candidates to Jobs
● Twitter - @kitkate87
● Medium - @ekaterinamihailova
Content
● Classification metrics
● Information Retrieval metrics
● Majio’s evaluation metrics
● Design your own metric
General ML flow
Define goals
and metrics
Gather and
clean data
Build ML
model
Evaluate
results
Analyze
results
Classification Metrics
Evaluating Image recognition algorithms
● Setup
○ Images with sloths and images without sloths
● Goals
○ Distinguish between a sloth and non sloth - 50% sloth pictures
Confusion matrix
True
Negative
False
Positive
FNFP
TNTP
Algorithm
Truth
Confusion matrix
True
Negative
False
Positive
Algorithm
Truth
Accuracy
acc = T / (T + F)
= (TP + TN) / ALL
acc = ( + ) / ALL
Evaluating Image recognition algorithms
● Setup
○ Images with sloths and images without sloths
● Goals
○ Distinguish between a sloth and non sloth- 50% sloth pictures - accuracy
○ Distinguish between a sloth and non sloth - 1% sloth pictures
Accuracy with 1% sloth pictures
Algorithm - always says it is not a sloth
acc = 99%
Accuracy per class
accP = TP / (TP + FN)
accP = /( + )
accN = TN / (TN + FP)
accN = /( + )
acc = (accP + accN)/2
Accuracy per class with 1% sloth pictures
Algorithm - always says it is not a sloth
acc = 50%
Evaluating Image recognition algorithms
● Setup
○ Images with sloths and images without sloths
● Goals
○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy
○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class
○ Distinguish between a sloth and non sloth and ask a person if not sure
Logarithmic Loss
Evaluating Image recognition algorithms
● Setup
○ Images with sloths and images without sloths
● Goals
○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy
○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class
○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss
○ Camera in the forest - 1% sloth pictures
Precision
p = TP / (TP + FP)
p = / ( + )
Precision with 1% sloth pictures
Algorithm – guesses right exactly one monkey and for
everything else says it is not a monkey
p = 100%
Recall (True positive rate)
r = TP / (TP + FN)
r = / ( + )
Recall with 1% sloth pictures
Algorithm - always says it is a sloth
r = 100%
f1-measure
f = 2 p*r / (p+r)
f1-measure with 1% sloth pictures
Algorithm - always says it is a sloth
f = 0%
Algorithm - always says it is NOT a sloth except for 1
f ~ 0%
Algorithm has 30% precision and 70% recall
f = 42%
Algorithm has 50% precision and 50% recall
f = 50%
Parametrized f-measure
f(b) = (1+b) p*r / ((b*p)+r)
Parametrized f-measure with 1% sloth pictures
b = 3; f = 4*p*r/(3*p + r)
Algorithm has 30% precision and 70% recall
f = 52.5%
Algorithm has 50% precision and 50% recall
f = 50%
Evaluating Image recognition algorithms
● Setup
○ Images with sloths and images without sloths
● Goals
○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy
○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class
○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss
○ Camera in the forest - 1% sloth pictures - f-measure
○ Search results for sloth and non sloth - 50% sloth pictures
False positive rate
fpr = FP / (FP + TN)
fpr = / ( + )
False Positive Rate with 1% sloth pictures
Algorithm - always says it is not a sloth
fpr = 0%
Information Retrieval metrics
Search Results
1 2 3 4 5 6 7 8
Search Results
1 0 0 1 1 1 1 1
TPR and FPR for different points
● At point 1- TPR = 2%, FPR = 0%
● At point 25 - TPR = 40%, FPR = 10%
● At point 50 - TPR = 74%, FPR = 26%
● At point 75 - TPR = 96%, FPR = 54%
● At point 100 - TPR = 100%, FPR = 100%
ROC Curve and AUC (Area Under the Curve)
True Positive Rate
False Positive Rate
ROC Curve and AUC (Area Under the Curve)
True Positive Rate
False Positive Rate
Evaluating Image recognition algorithms
● Setup
○ Images with sloths and images without sloths
● Goals
○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy
○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class
○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss
○ Camera in the forest - 1% sloth pictures - f-measure
○ Search results for sloth and non sloth - 50% sloth pictures - AUC
○ Search results for sloth and non sloth - 1% sloth pictures
Search Results
1 0 0 1 1 1 1 1
Precision and Recall at different points
● At point 1 - Recall = 2%, Precision = 100%
● At point 25 - Recall = 40%, Precision = 80%
● At point 50 - Recall = 74%, Precision = 74%
● At point 75 - Recall = 96%, Precision = 64%
● At point 100 - Recall = 100%, Precision = 50%
Precision - Recall curve
Precision
Recall
Average Precision
1 0 0 1 1 1 1 1
Average Precision
(1/1 + 0 + 0 + 2/4 + 3/5 + 4/6 + 5/7 + 6/8)
/6
70.5%
Average Precision
/6
(0 + 0 + 1/3 + 2/4 + 3/5 + 4/6 + 5/7 + 6/8)
59.4%
Average Precision
/6
( 1/1 + 2/2 + 3/3 + 4/4 + 5/5 + 6/6 + 0 + 0)
100%
Mean Average Precision
MAP = (70.5% + 59.4% + 100%) / 3 = 76.64%
Geometric Mean Average Precision
MAP = (70.5% * 59.4% * 100%) = 74.81%∛
Evaluating Image recognition algorithms
● Setup
○ Images with sloths and images without sloths
● Goals
○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy
○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class
○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss
○ Camera in the forest - 1% sloth pictures - f-measure
○ Search results for sloth and non sloth - 50% sloth pictures - AUC
○ Search results for sloth and non sloth - 1% sloth pictures - MAP, GMAP
○ Create image search for sloths with different relevance
Cumulative Gain
2 0 1 2 2 1 1 2
11
CG =∑rel(i)
Discounted Cumulative Gain
2 0 1 2 2 1 1 2
DCG =∑rel(i)/log2(i+1)
Discounted Cumulative Gain
DCG =∑rel(i)/log2(i+1)
5.12
2 0 1/2 0.86 0.77 0.35 1/3 0.31
Ideal Discounted Cumulative Gain
2 1.24 1 0.86 0.39 0.35 1/3 0
6.17
DCG =∑rel(i)/log2(i+1)
Normalized Discounted Cumulative Gain
NDCG = DCG / IDCG = 5.12 / 6.17 = 0.83
Evaluating Image recognition algorithms
● Setup
○ Images with sloths and images without sloths
● Goals
○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy
○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class
○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss
○ Camera in the forest - 1% sloth pictures - f-measure
○ Search results for sloth and non sloth - 50% sloth pictures - AUC
○ Search results for sloth and non sloth - 1% sloth pictures - MAP, GMAP
○ Create image search for sloths with different relevance - NDCG
Majio Usecase
Matching Candidates to Job
1 2 3
Evaluating Matching Candidates to Job - 1
1 3 2 1 1 2 2 1 2 2
( TP/T - FP/T + 1 ) / 2
Evaluating Matching Candidates to Job - 1
1 3 2 1 1 2 2 1 2 2
( 2/4 - 1/4 + 1 ) / 2
62.5%
Evaluating Matching Candidates to Job - 1
3 1 2 2 2 2 2 2 2 2
Evaluating Matching Candidates to Job - 1
3 1 2 2 2 2 2 2 2 2
( 0/1 - 1/1 + 1 ) / 2
0%
Evaluating Matching Candidates to Job - 2
1 3 2 1 1 2 2 1 2 2
Normalized MAP at points 5, 10, 15
Evaluating Matching Candidates to Job - 2
1 3 2 1 1 2 2 1 2 2
MAP = (3.3/5 + 5.7/10) / 2
31.8%
Evaluating Matching Candidates to Job - 2
1 1 1 1 2 2 2 2 2 3
best MAP=(4.3/5 + 5.7/10) / 2
32.8%
Evaluating Matching Candidates to Job - 2
3 2 2 2 2 2 1 1 1 1
worst MAP=(1.3/5 + 5.7/10) / 2
29.8%
Evaluating Matching Candidates to Job - 2
1 3 2 1 1 2 2 1 2 2
normalized MAP = (MAP - wMAP) / (bMAP - wMAP)
33.3%
Evaluating Matching Candidates to Job - 2
normalized MAP = (MAP - wMAP) / (bMAP - wMAP)
33.3%
40% 30% 20% 10% 9% 8% 7% 6% 5% 4%
Evaluating Matching Candidates to Job - 2
40% 30% 20% 10% 9% 8% 7% 6% 5% 4%
0.8 * f1 + 0.2 * AP
Inter-annotator agreement
How much the annotators make the same decision for the
same search result.
The experiment
● 4 Annotators
● 60 randomly generated search results (by order, percentage and cut off line)
● The search results were equally distributed with majio scores between 1 and
100
● Annotators had to give score to the search between 1 (perfect) and 4 (horrible)
● 2 of the search results were there twice but in different context
● At least 3 out 4 annotators have to agree on ranking in order to be accepted
The results
● Inter-annotator agreement on 32 out of 60 rankings
● Two groups of annotators - strict (no 1s fallen behind) and useful (can you do
you with the amount of good candidates we have sent you)
● 2 out of 4 annotators gave different score to the trap rankings
● On the rankings in the inter-annotator agreement the scoring was consistent.
Limits for good and bad ranking acquired values.
Conclusions
● There are a lot of Information Retrieval metrics in the world (only a chosen few
were shown here)
● None is perfect but some are useful
● You can craft a metric yourself but then you have to check how good of a metric
it is
● People don’t generally agree on things in the beginning. Experiment until there is
good enough agreement.

Weitere ähnliche Inhalte

Ähnlich wie Classification and Information Retrieval metrics for machine learning

Analyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKAAnalyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKAYogesh Shinde
 
MLA_Confusion Matrix for Classification
MLA_Confusion Matrix for ClassificationMLA_Confusion Matrix for Classification
MLA_Confusion Matrix for ClassificationShruti Chaudhari
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...Quinn Lathrop
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IRRushdi Shams
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Nfl injury final deck
Nfl injury final deckNfl injury final deck
Nfl injury final deckElijah Hall
 
Quantitative Studies Group - Item Response Theory Spring 2014.pdf
Quantitative Studies Group - Item Response Theory Spring 2014.pdfQuantitative Studies Group - Item Response Theory Spring 2014.pdf
Quantitative Studies Group - Item Response Theory Spring 2014.pdfQuinn Lathrop
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Normal Distribution
Normal DistributionNormal Distribution
Normal DistributionCIToolkit
 
Introduction to machinel learning and deep learning
Introduction to machinel learning and deep learningIntroduction to machinel learning and deep learning
Introduction to machinel learning and deep learningIbrahim Amer
 
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
Study on Data Augmentation Methods for Sonar Image Analysis
Study on Data Augmentation Methods for Sonar Image AnalysisStudy on Data Augmentation Methods for Sonar Image Analysis
Study on Data Augmentation Methods for Sonar Image Analysisharmonylab
 

Ähnlich wie Classification and Information Retrieval metrics for machine learning (20)

Analyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKAAnalyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKA
 
MLA_Confusion Matrix for Classification
MLA_Confusion Matrix for ClassificationMLA_Confusion Matrix for Classification
MLA_Confusion Matrix for Classification
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
MT2
MT2MT2
MT2
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IR
 
Thesis presenation
Thesis presenationThesis presenation
Thesis presenation
 
Video 1B Handout_2023.pptx
Video 1B Handout_2023.pptxVideo 1B Handout_2023.pptx
Video 1B Handout_2023.pptx
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
 
Shreyansh (17526001)
Shreyansh (17526001)Shreyansh (17526001)
Shreyansh (17526001)
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Nfl injury final deck
Nfl injury final deckNfl injury final deck
Nfl injury final deck
 
Quantitative Studies Group - Item Response Theory Spring 2014.pdf
Quantitative Studies Group - Item Response Theory Spring 2014.pdfQuantitative Studies Group - Item Response Theory Spring 2014.pdf
Quantitative Studies Group - Item Response Theory Spring 2014.pdf
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Introduction to machinel learning and deep learning
Introduction to machinel learning and deep learningIntroduction to machinel learning and deep learning
Introduction to machinel learning and deep learning
 
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
 
Study on Data Augmentation Methods for Sonar Image Analysis
Study on Data Augmentation Methods for Sonar Image AnalysisStudy on Data Augmentation Methods for Sonar Image Analysis
Study on Data Augmentation Methods for Sonar Image Analysis
 

Kürzlich hochgeladen

怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 

Kürzlich hochgeladen (20)

怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 

Classification and Information Retrieval metrics for machine learning

  • 1. Evaluation Metrics for Classification and Information Retrieval
  • 2. Who am I ● Katya ● Natural Language Processing ● CTO at Majio ● Sloth.Works - matches Candidates to Jobs ● Twitter - @kitkate87 ● Medium - @ekaterinamihailova
  • 3. Content ● Classification metrics ● Information Retrieval metrics ● Majio’s evaluation metrics ● Design your own metric
  • 4. General ML flow Define goals and metrics Gather and clean data Build ML model Evaluate results Analyze results
  • 6. Evaluating Image recognition algorithms ● Setup ○ Images with sloths and images without sloths ● Goals ○ Distinguish between a sloth and non sloth - 50% sloth pictures
  • 9. Accuracy acc = T / (T + F) = (TP + TN) / ALL acc = ( + ) / ALL
  • 10. Evaluating Image recognition algorithms ● Setup ○ Images with sloths and images without sloths ● Goals ○ Distinguish between a sloth and non sloth- 50% sloth pictures - accuracy ○ Distinguish between a sloth and non sloth - 1% sloth pictures
  • 11. Accuracy with 1% sloth pictures Algorithm - always says it is not a sloth acc = 99%
  • 12. Accuracy per class accP = TP / (TP + FN) accP = /( + ) accN = TN / (TN + FP) accN = /( + ) acc = (accP + accN)/2
  • 13. Accuracy per class with 1% sloth pictures Algorithm - always says it is not a sloth acc = 50%
  • 14. Evaluating Image recognition algorithms ● Setup ○ Images with sloths and images without sloths ● Goals ○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy ○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class ○ Distinguish between a sloth and non sloth and ask a person if not sure
  • 16. Evaluating Image recognition algorithms ● Setup ○ Images with sloths and images without sloths ● Goals ○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy ○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class ○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss ○ Camera in the forest - 1% sloth pictures
  • 17. Precision p = TP / (TP + FP) p = / ( + )
  • 18. Precision with 1% sloth pictures Algorithm – guesses right exactly one monkey and for everything else says it is not a monkey p = 100%
  • 19. Recall (True positive rate) r = TP / (TP + FN) r = / ( + )
  • 20. Recall with 1% sloth pictures Algorithm - always says it is a sloth r = 100%
  • 21. f1-measure f = 2 p*r / (p+r)
  • 22. f1-measure with 1% sloth pictures Algorithm - always says it is a sloth f = 0% Algorithm - always says it is NOT a sloth except for 1 f ~ 0% Algorithm has 30% precision and 70% recall f = 42% Algorithm has 50% precision and 50% recall f = 50%
  • 23. Parametrized f-measure f(b) = (1+b) p*r / ((b*p)+r)
  • 24. Parametrized f-measure with 1% sloth pictures b = 3; f = 4*p*r/(3*p + r) Algorithm has 30% precision and 70% recall f = 52.5% Algorithm has 50% precision and 50% recall f = 50%
  • 25. Evaluating Image recognition algorithms ● Setup ○ Images with sloths and images without sloths ● Goals ○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy ○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class ○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss ○ Camera in the forest - 1% sloth pictures - f-measure ○ Search results for sloth and non sloth - 50% sloth pictures
  • 26. False positive rate fpr = FP / (FP + TN) fpr = / ( + )
  • 27. False Positive Rate with 1% sloth pictures Algorithm - always says it is not a sloth fpr = 0%
  • 29. Search Results 1 2 3 4 5 6 7 8
  • 30. Search Results 1 0 0 1 1 1 1 1
  • 31. TPR and FPR for different points ● At point 1- TPR = 2%, FPR = 0% ● At point 25 - TPR = 40%, FPR = 10% ● At point 50 - TPR = 74%, FPR = 26% ● At point 75 - TPR = 96%, FPR = 54% ● At point 100 - TPR = 100%, FPR = 100%
  • 32. ROC Curve and AUC (Area Under the Curve) True Positive Rate False Positive Rate
  • 33. ROC Curve and AUC (Area Under the Curve) True Positive Rate False Positive Rate
  • 34. Evaluating Image recognition algorithms ● Setup ○ Images with sloths and images without sloths ● Goals ○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy ○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class ○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss ○ Camera in the forest - 1% sloth pictures - f-measure ○ Search results for sloth and non sloth - 50% sloth pictures - AUC ○ Search results for sloth and non sloth - 1% sloth pictures
  • 35. Search Results 1 0 0 1 1 1 1 1
  • 36. Precision and Recall at different points ● At point 1 - Recall = 2%, Precision = 100% ● At point 25 - Recall = 40%, Precision = 80% ● At point 50 - Recall = 74%, Precision = 74% ● At point 75 - Recall = 96%, Precision = 64% ● At point 100 - Recall = 100%, Precision = 50%
  • 37. Precision - Recall curve Precision Recall
  • 38. Average Precision 1 0 0 1 1 1 1 1
  • 39. Average Precision (1/1 + 0 + 0 + 2/4 + 3/5 + 4/6 + 5/7 + 6/8) /6 70.5%
  • 40. Average Precision /6 (0 + 0 + 1/3 + 2/4 + 3/5 + 4/6 + 5/7 + 6/8) 59.4%
  • 41. Average Precision /6 ( 1/1 + 2/2 + 3/3 + 4/4 + 5/5 + 6/6 + 0 + 0) 100%
  • 42. Mean Average Precision MAP = (70.5% + 59.4% + 100%) / 3 = 76.64%
  • 43. Geometric Mean Average Precision MAP = (70.5% * 59.4% * 100%) = 74.81%∛
  • 44. Evaluating Image recognition algorithms ● Setup ○ Images with sloths and images without sloths ● Goals ○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy ○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class ○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss ○ Camera in the forest - 1% sloth pictures - f-measure ○ Search results for sloth and non sloth - 50% sloth pictures - AUC ○ Search results for sloth and non sloth - 1% sloth pictures - MAP, GMAP ○ Create image search for sloths with different relevance
  • 45. Cumulative Gain 2 0 1 2 2 1 1 2 11 CG =∑rel(i)
  • 46. Discounted Cumulative Gain 2 0 1 2 2 1 1 2 DCG =∑rel(i)/log2(i+1)
  • 47. Discounted Cumulative Gain DCG =∑rel(i)/log2(i+1) 5.12 2 0 1/2 0.86 0.77 0.35 1/3 0.31
  • 48. Ideal Discounted Cumulative Gain 2 1.24 1 0.86 0.39 0.35 1/3 0 6.17 DCG =∑rel(i)/log2(i+1)
  • 49. Normalized Discounted Cumulative Gain NDCG = DCG / IDCG = 5.12 / 6.17 = 0.83
  • 50. Evaluating Image recognition algorithms ● Setup ○ Images with sloths and images without sloths ● Goals ○ Distinguish between a sloth and non sloth - 50% sloth pictures - accuracy ○ Distinguish between a sloth and non sloth - 1% sloth pictures - accuracy per class ○ Distinguish between a sloth and non sloth and ask a person if not sure - log loss ○ Camera in the forest - 1% sloth pictures - f-measure ○ Search results for sloth and non sloth - 50% sloth pictures - AUC ○ Search results for sloth and non sloth - 1% sloth pictures - MAP, GMAP ○ Create image search for sloths with different relevance - NDCG
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 58. Evaluating Matching Candidates to Job - 1 1 3 2 1 1 2 2 1 2 2 ( TP/T - FP/T + 1 ) / 2
  • 59. Evaluating Matching Candidates to Job - 1 1 3 2 1 1 2 2 1 2 2 ( 2/4 - 1/4 + 1 ) / 2 62.5%
  • 60. Evaluating Matching Candidates to Job - 1 3 1 2 2 2 2 2 2 2 2
  • 61. Evaluating Matching Candidates to Job - 1 3 1 2 2 2 2 2 2 2 2 ( 0/1 - 1/1 + 1 ) / 2 0%
  • 62. Evaluating Matching Candidates to Job - 2 1 3 2 1 1 2 2 1 2 2 Normalized MAP at points 5, 10, 15
  • 63. Evaluating Matching Candidates to Job - 2 1 3 2 1 1 2 2 1 2 2 MAP = (3.3/5 + 5.7/10) / 2 31.8%
  • 64. Evaluating Matching Candidates to Job - 2 1 1 1 1 2 2 2 2 2 3 best MAP=(4.3/5 + 5.7/10) / 2 32.8%
  • 65. Evaluating Matching Candidates to Job - 2 3 2 2 2 2 2 1 1 1 1 worst MAP=(1.3/5 + 5.7/10) / 2 29.8%
  • 66. Evaluating Matching Candidates to Job - 2 1 3 2 1 1 2 2 1 2 2 normalized MAP = (MAP - wMAP) / (bMAP - wMAP) 33.3%
  • 67. Evaluating Matching Candidates to Job - 2 normalized MAP = (MAP - wMAP) / (bMAP - wMAP) 33.3% 40% 30% 20% 10% 9% 8% 7% 6% 5% 4%
  • 68. Evaluating Matching Candidates to Job - 2 40% 30% 20% 10% 9% 8% 7% 6% 5% 4% 0.8 * f1 + 0.2 * AP
  • 69. Inter-annotator agreement How much the annotators make the same decision for the same search result.
  • 70. The experiment ● 4 Annotators ● 60 randomly generated search results (by order, percentage and cut off line) ● The search results were equally distributed with majio scores between 1 and 100 ● Annotators had to give score to the search between 1 (perfect) and 4 (horrible) ● 2 of the search results were there twice but in different context ● At least 3 out 4 annotators have to agree on ranking in order to be accepted
  • 71.
  • 72. The results ● Inter-annotator agreement on 32 out of 60 rankings ● Two groups of annotators - strict (no 1s fallen behind) and useful (can you do you with the amount of good candidates we have sent you) ● 2 out of 4 annotators gave different score to the trap rankings ● On the rankings in the inter-annotator agreement the scoring was consistent. Limits for good and bad ranking acquired values.
  • 73. Conclusions ● There are a lot of Information Retrieval metrics in the world (only a chosen few were shown here) ● None is perfect but some are useful ● You can craft a metric yourself but then you have to check how good of a metric it is ● People don’t generally agree on things in the beginning. Experiment until there is good enough agreement.