SlideShare a Scribd company logo
1 of 65
Download to read offline
Prof. Pier Luca Lanzi
Classification: Evaluation
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
Prof. Pier Luca Lanzi
Evaluation  Credidibility Issues
•  What measure should we use? 
(Classification accuracy might not be enough!)
•  How reliable are the predicted results?
•  How much should we believe in what was learned?
§ Error on the training data is not a good indicator of
performance on future data
§ The classifier was computed from the very same training data,
any estimate based on that data will be optimistic.
§ In addition, new data will probably not be exactly 
the same as the training data!
2
Prof. Pier Luca Lanzi
How to evaluate the performance of a model?
How to obtain reliable estimates?
How to compare the relative
performance among competing models?
Given two equally performing models,
which one should we prefer?
Prof. Pier Luca Lanzi
Model Evaluation
•  Metrics for Performance Evaluation
§ How to evaluate the performance of a model?
•  Methods for Performance Evaluation
§ How to obtain reliable estimates?
•  Methods for Model Comparison
§ How to compare the performance of competing models?
•  Model Selection
§ Which model should we prefer?
4
Prof. Pier Luca Lanzi
How to evaluate the performance of a model?
(Metrics for Performance Evaluation)
Prof. Pier Luca Lanzi
Metrics for Performance Evaluation
•  Focus on the predictive capability of a model
•  Confusion Matrix:
6
PREDICTED CLASS
ACTUAL
CLASS
Yes No
Yes a (TP) b (FN)
No c (FP) d (TN)
a:TP (true positive)
b: FN (false negative)
c: FP (false positive)
d:TN (true negative)
Prof. Pier Luca Lanzi
Metrics for Performance Evaluation
•  Most widely-used metric:
7
PREDICTED CLASS
ACTUAL
CLASS
Yes No
Yes a (TP) b (FN)
No c (FP) d (TN)
Prof. Pier Luca Lanzi
Sometimes Accuracy is not Enough
•  Consider a 2-class problem
§ Number of Class 0 examples = 9990
§ Number of Class 1 examples = 10
•  If model predicts everything to be class 0, 
accuracy is 9990/10000 = 99.9 %
•  Accuracy is misleading because model does not 
detect any class 1 example
8
Prof. Pier Luca Lanzi
Cost Matrix 9
C(i|j): Cost of misclassifying class j example as class i
PREDICTED CLASS
ACTUAL
CLASS
Yes No
Yes C(Yes|Yes) C(No|Yes)
No C(Yes|No) C(No|No)
Prof. Pier Luca Lanzi
Computing Cost of Classification 10
Cost
Matrix
PREDICTED CLASS
ACTUAL
CLASS
C(i|j) + -
+ -1 100
- 1 0
Model M1 PREDICTED CLASS

ACTUAL
CLASS
+ -
+ 150 40
- 60 250
Model M2 PREDICTED CLASS

ACTUAL
CLASS
+ -
+ 250 45
- 5 200
Accuracy = 80%
Cost = 3910
Accuracy = 90%
Cost = 4255
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Cost-Sensitive Measures
•  Precision is biased towards C(Yes|Yes)  C(Yes|No),
•  Recall is biased towards C(Yes|Yes)  C(No|Yes)
•  F1-measure is biased towards all except C(No|No),
it is high when both precision and recall are reasonably high
15
The higher the precision, the
lower the FPs
The higher the precision, the
lower the FNs
The higher the F1, 
the lower the FPs  FNs
Prof. Pier Luca Lanzi
How to obtain reliable estimates?
(Methods for Performance Evaluation)
Prof. Pier Luca Lanzi
Classification Step 1: Split the Data Between
Train and Test Sets
17
Training set
Testing set
data
Prof. Pier Luca Lanzi
Classification Step 2: Compute a Model using the
Data in the Training Set
18
Training set
Testing set
data
Model Builder
Y N
Prof. Pier Luca Lanzi
Classification Step 3: Evaluate the Model using
the Test Set
19
Training set
Testing set
data
Model Builder Predictions
Y N
Evaluate
+
-
+
-
Prof. Pier Luca Lanzi
Note on Parameter Tuning
•  It is important that the test data is not used 
in any way to create the classifier
•  Some learning schemes operate in two stages:
§ Stage 1: builds the basic structure
§ Stage 2: optimizes parameter settings
•  The test data can’t be used for parameter tuning!
•  Proper procedure uses three sets: training data, validation data,
and test data. Validation data is used to optimize parameters
20
Prof. Pier Luca Lanzi
Making the most of the data
•  Once evaluation is complete, all the data can be used to build the
final classifier
•  Generally, the larger the training data the better the classifier 
(but returns diminish)
•  The larger the test data the more accurate the error estimate
21
Prof. Pier Luca Lanzi
The Full Process: Training, Validation, and Testing 22
Training set
Validation set
data
Model Builder Predictions
Y N
Evaluate
+
-
+
-
Model
Builder
Testing set
Y N
+
-
+
-
Final
Evaluation
Prof. Pier Luca Lanzi
Methods for Performance Evaluation
•  How to obtain a reliable estimate of performance?
•  Performance of a model may depend on 
other factors besides the learning algorithm:
§ Class distribution
§ Cost of misclassification
§ Size of training and test sets
23
Prof. Pier Luca Lanzi
Methods of Estimation
•  Holdout
§ Reserve ½ for training and ½ for testing
§ Reserve 2/3 for training and 1/3 for testing
•  Random subsampling
§ Repeated holdout
•  Cross validation
§ Partition data into k disjoint subsets
§ k-fold: train on k-1 partitions, test on the remaining one
§ Leave-one-out: k=n
•  Stratified sampling
§ oversampling vs undersampling
•  Bootstrap
§ Sampling with replacement
24
Prof. Pier Luca Lanzi
Holdout Evaluation  “Small” Datasets
•  Reserves a certain amount for testing and uses the remainder for
training, typically,
§ Reserve ½ for training and ½ for testing
§ Reserve 2/3 for training and 1/3 for testing
•  For small or “unbalanced” datasets, samples might not be
representative
•  For instance, it might generate training or testing datasets with few or
none instances of some classes
•  Stratified sampling
§ Makes sure that each class is represented with approximately equal
proportions in both subsets
25
Prof. Pier Luca Lanzi
Repeated Holdout
•  Holdout estimate can be made more reliable by repeating the
process with different subsamples
•  In each iteration, a certain proportion is randomly selected for
training (possibly with stratification)
•  The error rates on the different iterations are averaged to yield
an overall error rate
•  Still not optimum since the different test sets overlap
26
Prof. Pier Luca Lanzi
Cross-Validation
•  First step
§ Data is split into k subsets of equal size
•  Second step
§ Each subset in turn is used for testing and the remainder for training
•  This is called k-fold cross-validation and avoids overlapping test sets
•  Often the subsets are stratified before cross-validation is performed
•  The error estimates are averaged to yield an overall error estimate
27
Prof. Pier Luca Lanzi
Ten-fold Crossvalidation 28
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
test train
1 2 3 4 5 6 7 8 9 10
testtrain
2 3 4 5 6 7 8 9 10
test train
1
train
… … …
p1
p2
p10
The final performance is computed as the average pi
Prof. Pier Luca Lanzi
Cross-Validation
•  Standard method for evaluation stratified ten-fold cross-validation
•  Why ten? Extensive experiments have shown that this is the best
choice to get an accurate estimate
•  Stratification reduces the estimate’s variance
•  Even better: repeated stratified cross-validation
•  E.g. ten-fold cross-validation is repeated ten times and results are
averaged (reduces the variance)
•  Other approaches appear to be robust, e.g., 5x2 crossvalidation
29
Prof. Pier Luca Lanzi
Leave-One-Out Cross-Validation
•  It is a particular form of cross-validation
§ Set number of folds to number of training instances
§ I.e., for n training instances, build classifier n times
•  Makes best use of the data
•  Involves no random subsampling
•  Computationally expensive
30
Prof. Pier Luca Lanzi
Leave-One-Out and Stratification
•  Disadvantage of Leave-One-Out is that: 
stratification is not possible
•  It guarantees a non-stratified sample because 
there is only one instance in the test set!
•  Extreme example: random dataset split equally into two classes
§ Best inducer predicts majority class
§ 50% accuracy on fresh data
§ Leave-One-Out-CV estimate is 100% error!
31
Prof. Pier Luca Lanzi
Bootstraping
•  Cross-validation uses sampling without replacement
§ The same instance, once selected, can not be selected again for a
particular training/test set
•  Bootstrap uses sampling with replacement to form the training set
§ Sample a dataset of n instances n times with 
replacement to form a new dataset of n instances
§ Use this data as the training set
§ Use the instances from the original dataset that don’t occur in the new
training set for testing
32
Prof. Pier Luca Lanzi
The 0.632 Bootstrap
•  An instance has a probability of 1–1/n of not being picked
•  Thus its probability of ending up in the test data is:
•  This means the training data will contain approximately 63.2% of
the instances
33
368.0
1
1 1
=≈
#
$
%

'
− −
e
n
n
Prof. Pier Luca Lanzi
Estimating Error using Bootstrap
•  The error estimate on the test data will be very pessimistic, since
training was on just ~63% of the instances
•  Therefore, combine it with the resubstitution error:
•  The resubstitution error gets less weight than the error on the
test data
•  Repeat process several times with different replacement samples;
average the results
34
instancestraininginstancestest 368.0632.0 eeerr ⋅+⋅=
Prof. Pier Luca Lanzi
More on Bootstrap
•  Probably the best way of estimating performance for very small
datasets
•  However, it has some problems
§ Consider a random dataset with a 50% class distribution
§ An model that memorizes all the training data (equivalent to a
kNN with k=1) will achieve 0% error on the training data
and ~50% error on test data
§ Bootstrap estimate for this classifier will be
§ But the true expected error is 50%
35
%6.31%0368.0%50632.0 =⋅+⋅=err
Prof. Pier Luca Lanzi
Model Comparison
How to compare the relative
performance among competing models?
Prof. Pier Luca Lanzi
How to Compare the Performance of Two
Models?
•  Two models,
§ Model M1: accuracy = 85%, tested on 30 instances
§ Model M2: accuracy = 75%, tested on 5000 instances
•  Can we say M1 is better than M2?
•  How much confidence can we place on accuracy of M1  M2?
•  Can the difference in performance measure be explained as a
result of random fluctuations in the test set?
37
Prof. Pier Luca Lanzi
Confidence Interval for Accuracy
•  Prediction can be regarded as a Bernoulli trial with two possible
outcomes, correct or wrong
•  A collection of Bernoulli trials has a Binomial distribution
•  Given,
§ the number of correct test predictions x
§ the number of test instances N,
§ the accuracy p = s/N
•  Can we predict the true accuracy of the model from p?
38
Prof. Pier Luca Lanzi
Mean and Variance
•  Mean and variance for a Bernoulli trial is p and p(1-p)
•  Expected success rate f = S/N
•  When N is large enough, f follows a normal distribution with
mean p and variance p(1-p)/N
•  c% confidence level interval [-z ≤ X ≤z] for a random variable of
mean 0 is given by
•  With a symmetric distribution, we have
39
Prof. Pier Luca Lanzi
Confidence Interval for Accuracy
•  For instance, if X has mean 0 and standard deviation 1, 
the confidence interval for c% is defined by the z values,
•  Thus, for the confidence interval for 90% 
we have,
•  To compute the confidence interval for the
expected accuracy f we need to,
§ Normalize it to a distribution with mean 0 and standard deviation1
§ Compute the confidence interval
§ Remap it to the actual distribution
40
Pr[X ≥ z] z
0.1% 3.09
0.5% 2.58
1% 2.33
5% 1.65
10% 1.28
20% 0.84
25% 0.69
40% 0.25
Prof. Pier Luca Lanzi
Confidence Interval for Accuracy
•  First, transform f using,
•  Compute the confidence interval,
•  Solve it for p,
41
Prof. Pier Luca Lanzi
Examples
•  When f = 75%, N=1000, c=80% (z=1.28), the confidence
interval for the actual performance is [0.732, 0.767]
•  When f = 75%, N=100, c=80% (z=1.28), the confidence interval
for the actual performance is [0.691, 0.801]
•  Note that the assumption about f being normally distributed only
applies for large values of N (at least 30)
•  When f = 75%, N=10, c=80% (z=1.28), the confidence interval
for the actual performance is [0.0.549, 0.881]
42
Prof. Pier Luca Lanzi
Comparing Data Mining Schemes
•  Which of two learning schemes performs better?
•  The answer is obviously domain-dependent
•  The typical approach is to compare10-fold cross-validation (CV)
estimates using the Student’s t-test
•  Student’s t-test tells whether the means of two samples are
significantly different
•  Use a paired t-test when the individual samples are paired, that is,
when the same CV is applied twice. Use unpaired t-test
otherwise
43
Prof. Pier Luca Lanzi
William Gosset (1876-1937)
•  Obtained a post as a chemist in the Guinness
brewery in Dublin in 1899.
•  Invented the t-test to handle small samples for
quality control in brewing
•  Wrote under the name Student”
•  Unpaired and paired two-sample t-test
•  Unequal/equal sample size
•  Unequal/equal sample variance
44
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
T-Test Using R
 j48 - c(96,93.33,92.00,94.67,96.00,94.00,94.67,94.0,94.67,98)
 zeror - c(33.33, 33.33, 33.33, 33.33, 33.33, 33.33, 33.33, 33.33, 33.33, 33.33)
 t.test(j48,zeror)
Welch Two Sample t-test
data: j48 and zeror
t = 117.9102, df = 9, p-value = 1.153e-15
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
60.22594 62.58206
sample estimates:
mean of x mean of y
94.734 33.330
47
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
ROC (Receiver Operating Characteristic)
•  Developed in 1950s for signal detection theory 
to analyze noisy signals
•  ROC curve plots TP (on the y-axis) against FP (on the x-axis), 
or TPR vs FPR (TPR = TP/(TP+FN), FPR=FP/(TN+FP))
•  Performance of each classifier represented 
as a point on the ROC curve
•  Changing the threshold of algorithm, sample distribution or cost
matrix changes the location of the point
49
Prof. Pier Luca Lanzi
ROC Curve
•  (TP,FP)
§ (0,0): declare everything 
to be negative class
§ (1,1): declare everything 
to be positive class
§ (1,0): ideal
•  Diagonal line:
§ Random guessing
§ Below diagonal line, 
prediction is opposite 
of the true class
50
Prof. Pier Luca Lanzi
ROC Curve 51
Prof. Pier Luca Lanzi
Using ROC for Model Comparison
•  No model consistently outperform the other
•  M1 is better for small FPR
•  M2 is better for large FPR
•  Area Under the ROC curve
•  Ideal, area = 1
•  Random guess, 
area = 0.5
52
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Lift Charts
•  Lift is a measure of the effectiveness of a predictive model
calculated as the ratio between the results obtained with and
without the predictive model
•  Cumulative gains and lift charts are visual aids for measuring
model performance
•  Both charts consist of a lift curve and a baseline
•  The greater the area between the lift curve and the baseline, the
better the model
56
Prof. Pier Luca Lanzi
Example: Direct Marketing
•  Mass mailout of promotional offers (1000000)
•  The proportion who normally respond is 0.1%, that is 1000
responses
•  A data mining tool can identify a subset of a 100000 for which
the response rate is 0.4%, that is 400 responses
•  In marketing terminology, the increase of response rate is known
as the lift factor yielded by the model.
•  The same data mining tool, may be able to identify 400000
households for which the response rate is 0.2%, that is 800
respondents corresponding to a lift factor of 2.
•  The overall goal is to find subsets of test instances that have a
high proportion of true positive
57
Prof. Pier Luca Lanzi
An Example from Direct Marketing 58
Prof. Pier Luca Lanzi
Which model should we prefer?
(Model selection)
Prof. Pier Luca Lanzi
Model Selection Criteria
•  Model selection criteria attempt to find a good compromise
between:
§ The complexity of a model
§ Its prediction accuracy on the training data
•  Reasoning: a good model is a simple model that achieves high
accuracy on the given data
•  Also known as Occam’s Razor :
the best theory is the smallest one
that describes all the facts
60
William of Ockham, born in the village of Ockham in Surrey (England)
about 1285, was the most influential philosopher of the 14th century
and a controversial theologian.
Prof. Pier Luca Lanzi
The MDL principle
•  MDL stands for minimum description length
•  The description length is defined as:
space required to describe a theory
+
space required to describe the theory’s mistakes
•  In our case the theory is the classifier and the mistakes are the
errors on the training data
•  We seek a classifier with Minimum Description Language
61
Prof. Pier Luca Lanzi
MDL and compression
•  MDL principle relates to data compression
•  The best theory is the one that compresses the data the most
•  I.e. to compress a dataset we generate a model and 
then store the model and its mistakes
•  We need to compute
(a) size of the model, and
(b) space needed to encode the errors
•  (b) easy: use the informational loss function
•  (a) need a method to encode the model
62
Prof. Pier Luca Lanzi
Which Classification Algorithm?
•  Among the several algorithms, which one is the “best”?
§ Some algorithms have a lower computational complexity
§ Different algorithms provide different representations
§ Some algorithms allow the specification of prior knowledge
•  If we are interested in the generalization performance, 
are there any reasons to prefer one classifier over another?
•  Can we expect any classification method to be superior or
inferior overall?
•  According to the No Free Lunch Theorem, 
the answer to all these questions is known
63
Prof. Pier Luca Lanzi
No Free Lunch Theorem
•  If the goal is to obtain good generalization performance, there are no
context-independent or usage-independent reasons to favor one
classification method over another
•  If one algorithm seems to outperform another in a certain situation, it is
a consequence of its fit to the particular problem, not the general
superiority of the algorithm
•  When confronting a new problem, this theorem suggests that we
should focus on the aspects that matter most
§ Prior information
§ Data distribution
§ Amount of training data
§ Cost or reward
•  The theorem also justifies skepticism regarding studies that
“demonstrate” the overall superiority of a certain algorithm
64
Prof. Pier Luca Lanzi
No Free Lunch Theorem
•  [A]ll algorithms that search for an extremum of a cost [objective]
function perform exactly the same, when averaged over all possible
cost functions.“ [1]
•  “[T]he average performance of any pair of algorithms across all possible
problems is identical.” [2]
•  Wolpert, D.H., Macready, W.G. (1995), No Free Lunch Theorems for
Search, Technical Report SFI-TR-95-02-010 (Santa Fe Institute).
•  Wolpert, D.H., Macready, W.G. (1997), No Free Lunch Theorems for
Optimization, IEEE Transactions on Evolutionary Computation 1, 67.
65

More Related Content

What's hot

DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityPier Luca Lanzi
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 RegressionPier Luca Lanzi
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationPier Luca Lanzi
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationPier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesPier Luca Lanzi
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationPier Luca Lanzi
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationPier Luca Lanzi
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble MethodsDongseo University
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligenceananth
 
Mixed Effects Models - Empirical Logit
Mixed Effects Models - Empirical LogitMixed Effects Models - Empirical Logit
Mixed Effects Models - Empirical LogitScott Fraundorf
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 

What's hot (20)

DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and Credibility
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 Regression
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data Representation
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision Trees
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data Exploration
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
Mixed Effects Models - Empirical Logit
Mixed Effects Models - Empirical LogitMixed Effects Models - Empirical Logit
Mixed Effects Models - Empirical Logit
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 

Viewers also liked

DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1Pier Luca Lanzi
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2Pier Luca Lanzi
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringPier Luca Lanzi
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringPier Luca Lanzi
 
DMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningDMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningPier Luca Lanzi
 
Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Pier Luca Lanzi
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningPier Luca Lanzi
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionPier Luca Lanzi
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesPier Luca Lanzi
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesPier Luca Lanzi
 
Idea Generation and Conceptualization
Idea Generation and ConceptualizationIdea Generation and Conceptualization
Idea Generation and ConceptualizationPier Luca Lanzi
 
Text Mining Using JBoss Rules
Text Mining Using JBoss RulesText Mining Using JBoss Rules
Text Mining Using JBoss RulesMark Maslyn
 
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014Pier Luca Lanzi
 
Working with Formal Elements
Working with Formal ElementsWorking with Formal Elements
Working with Formal ElementsPier Luca Lanzi
 
Elements for the Theory of Fun
Elements for the Theory of FunElements for the Theory of Fun
Elements for the Theory of FunPier Luca Lanzi
 

Viewers also liked (19)

DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical Clustering
 
Course Introduction
Course IntroductionCourse Introduction
Course Introduction
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based Clustering
 
DMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningDMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph Mining
 
Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 
Course Organization
Course OrganizationCourse Organization
Course Organization
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 
Idea Generation and Conceptualization
Idea Generation and ConceptualizationIdea Generation and Conceptualization
Idea Generation and Conceptualization
 
Text Mining Using JBoss Rules
Text Mining Using JBoss RulesText Mining Using JBoss Rules
Text Mining Using JBoss Rules
 
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
 
Working with Formal Elements
Working with Formal ElementsWorking with Formal Elements
Working with Formal Elements
 
The Structure of Games
The Structure of GamesThe Structure of Games
The Structure of Games
 
Elements for the Theory of Fun
Elements for the Theory of FunElements for the Theory of Fun
Elements for the Theory of Fun
 
The Design Document
The Design DocumentThe Design Document
The Design Document
 

Similar to DMTM 2015 - 14 Evaluation of Classification Models

Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluationkhairulhuda242
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptxYouKnowwho28
 
6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptxmohammedalherwi1
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
eMOOCs2015 Does peer grading work?
eMOOCs2015 Does peer grading work?eMOOCs2015 Does peer grading work?
eMOOCs2015 Does peer grading work?Rémi Bachelet
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring EvaluationYun Huang
 
shubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptxshubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptxABINASHPADHY6
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsAndrea Arcuri
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation MethodSHUBHAM GUPTA
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxshamsul2010
 

Similar to DMTM 2015 - 14 Evaluation of Classification Models (20)

Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
eMOOCs2015 Does peer grading work?
eMOOCs2015 Does peer grading work?eMOOCs2015 Does peer grading work?
eMOOCs2015 Does peer grading work?
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation
 
shubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptxshubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptx
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesPier Luca Lanzi
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationPier Luca Lanzi
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionPier Luca Lanzi
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningPier Luca Lanzi
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelinePier Luca Lanzi
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityPier Luca Lanzi
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationPier Luca Lanzi
 

More from Pier Luca Lanzi (17)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision trees
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representation
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 Introduction
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data mining
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipeline
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with Unity
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generation
 

Recently uploaded

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 

Recently uploaded (20)

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 

DMTM 2015 - 14 Evaluation of Classification Models

  • 1. Prof. Pier Luca Lanzi Classification: Evaluation Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
  • 2. Prof. Pier Luca Lanzi Evaluation Credidibility Issues •  What measure should we use? (Classification accuracy might not be enough!) •  How reliable are the predicted results? •  How much should we believe in what was learned? § Error on the training data is not a good indicator of performance on future data § The classifier was computed from the very same training data, any estimate based on that data will be optimistic. § In addition, new data will probably not be exactly the same as the training data! 2
  • 3. Prof. Pier Luca Lanzi How to evaluate the performance of a model? How to obtain reliable estimates? How to compare the relative performance among competing models? Given two equally performing models, which one should we prefer?
  • 4. Prof. Pier Luca Lanzi Model Evaluation •  Metrics for Performance Evaluation § How to evaluate the performance of a model? •  Methods for Performance Evaluation § How to obtain reliable estimates? •  Methods for Model Comparison § How to compare the performance of competing models? •  Model Selection § Which model should we prefer? 4
  • 5. Prof. Pier Luca Lanzi How to evaluate the performance of a model? (Metrics for Performance Evaluation)
  • 6. Prof. Pier Luca Lanzi Metrics for Performance Evaluation •  Focus on the predictive capability of a model •  Confusion Matrix: 6 PREDICTED CLASS ACTUAL CLASS Yes No Yes a (TP) b (FN) No c (FP) d (TN) a:TP (true positive) b: FN (false negative) c: FP (false positive) d:TN (true negative)
  • 7. Prof. Pier Luca Lanzi Metrics for Performance Evaluation •  Most widely-used metric: 7 PREDICTED CLASS ACTUAL CLASS Yes No Yes a (TP) b (FN) No c (FP) d (TN)
  • 8. Prof. Pier Luca Lanzi Sometimes Accuracy is not Enough •  Consider a 2-class problem § Number of Class 0 examples = 9990 § Number of Class 1 examples = 10 •  If model predicts everything to be class 0, accuracy is 9990/10000 = 99.9 % •  Accuracy is misleading because model does not detect any class 1 example 8
  • 9. Prof. Pier Luca Lanzi Cost Matrix 9 C(i|j): Cost of misclassifying class j example as class i PREDICTED CLASS ACTUAL CLASS Yes No Yes C(Yes|Yes) C(No|Yes) No C(Yes|No) C(No|No)
  • 10. Prof. Pier Luca Lanzi Computing Cost of Classification 10 Cost Matrix PREDICTED CLASS ACTUAL CLASS C(i|j) + - + -1 100 - 1 0 Model M1 PREDICTED CLASS ACTUAL CLASS + - + 150 40 - 60 250 Model M2 PREDICTED CLASS ACTUAL CLASS + - + 250 45 - 5 200 Accuracy = 80% Cost = 3910 Accuracy = 90% Cost = 4255
  • 15. Prof. Pier Luca Lanzi Cost-Sensitive Measures •  Precision is biased towards C(Yes|Yes) C(Yes|No), •  Recall is biased towards C(Yes|Yes) C(No|Yes) •  F1-measure is biased towards all except C(No|No), it is high when both precision and recall are reasonably high 15 The higher the precision, the lower the FPs The higher the precision, the lower the FNs The higher the F1, the lower the FPs FNs
  • 16. Prof. Pier Luca Lanzi How to obtain reliable estimates? (Methods for Performance Evaluation)
  • 17. Prof. Pier Luca Lanzi Classification Step 1: Split the Data Between Train and Test Sets 17 Training set Testing set data
  • 18. Prof. Pier Luca Lanzi Classification Step 2: Compute a Model using the Data in the Training Set 18 Training set Testing set data Model Builder Y N
  • 19. Prof. Pier Luca Lanzi Classification Step 3: Evaluate the Model using the Test Set 19 Training set Testing set data Model Builder Predictions Y N Evaluate + - + -
  • 20. Prof. Pier Luca Lanzi Note on Parameter Tuning •  It is important that the test data is not used in any way to create the classifier •  Some learning schemes operate in two stages: § Stage 1: builds the basic structure § Stage 2: optimizes parameter settings •  The test data can’t be used for parameter tuning! •  Proper procedure uses three sets: training data, validation data, and test data. Validation data is used to optimize parameters 20
  • 21. Prof. Pier Luca Lanzi Making the most of the data •  Once evaluation is complete, all the data can be used to build the final classifier •  Generally, the larger the training data the better the classifier (but returns diminish) •  The larger the test data the more accurate the error estimate 21
  • 22. Prof. Pier Luca Lanzi The Full Process: Training, Validation, and Testing 22 Training set Validation set data Model Builder Predictions Y N Evaluate + - + - Model Builder Testing set Y N + - + - Final Evaluation
  • 23. Prof. Pier Luca Lanzi Methods for Performance Evaluation •  How to obtain a reliable estimate of performance? •  Performance of a model may depend on other factors besides the learning algorithm: § Class distribution § Cost of misclassification § Size of training and test sets 23
  • 24. Prof. Pier Luca Lanzi Methods of Estimation •  Holdout § Reserve ½ for training and ½ for testing § Reserve 2/3 for training and 1/3 for testing •  Random subsampling § Repeated holdout •  Cross validation § Partition data into k disjoint subsets § k-fold: train on k-1 partitions, test on the remaining one § Leave-one-out: k=n •  Stratified sampling § oversampling vs undersampling •  Bootstrap § Sampling with replacement 24
  • 25. Prof. Pier Luca Lanzi Holdout Evaluation “Small” Datasets •  Reserves a certain amount for testing and uses the remainder for training, typically, § Reserve ½ for training and ½ for testing § Reserve 2/3 for training and 1/3 for testing •  For small or “unbalanced” datasets, samples might not be representative •  For instance, it might generate training or testing datasets with few or none instances of some classes •  Stratified sampling § Makes sure that each class is represented with approximately equal proportions in both subsets 25
  • 26. Prof. Pier Luca Lanzi Repeated Holdout •  Holdout estimate can be made more reliable by repeating the process with different subsamples •  In each iteration, a certain proportion is randomly selected for training (possibly with stratification) •  The error rates on the different iterations are averaged to yield an overall error rate •  Still not optimum since the different test sets overlap 26
  • 27. Prof. Pier Luca Lanzi Cross-Validation •  First step § Data is split into k subsets of equal size •  Second step § Each subset in turn is used for testing and the remainder for training •  This is called k-fold cross-validation and avoids overlapping test sets •  Often the subsets are stratified before cross-validation is performed •  The error estimates are averaged to yield an overall error estimate 27
  • 28. Prof. Pier Luca Lanzi Ten-fold Crossvalidation 28 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 test train 1 2 3 4 5 6 7 8 9 10 testtrain 2 3 4 5 6 7 8 9 10 test train 1 train … … … p1 p2 p10 The final performance is computed as the average pi
  • 29. Prof. Pier Luca Lanzi Cross-Validation •  Standard method for evaluation stratified ten-fold cross-validation •  Why ten? Extensive experiments have shown that this is the best choice to get an accurate estimate •  Stratification reduces the estimate’s variance •  Even better: repeated stratified cross-validation •  E.g. ten-fold cross-validation is repeated ten times and results are averaged (reduces the variance) •  Other approaches appear to be robust, e.g., 5x2 crossvalidation 29
  • 30. Prof. Pier Luca Lanzi Leave-One-Out Cross-Validation •  It is a particular form of cross-validation § Set number of folds to number of training instances § I.e., for n training instances, build classifier n times •  Makes best use of the data •  Involves no random subsampling •  Computationally expensive 30
  • 31. Prof. Pier Luca Lanzi Leave-One-Out and Stratification •  Disadvantage of Leave-One-Out is that: stratification is not possible •  It guarantees a non-stratified sample because there is only one instance in the test set! •  Extreme example: random dataset split equally into two classes § Best inducer predicts majority class § 50% accuracy on fresh data § Leave-One-Out-CV estimate is 100% error! 31
  • 32. Prof. Pier Luca Lanzi Bootstraping •  Cross-validation uses sampling without replacement § The same instance, once selected, can not be selected again for a particular training/test set •  Bootstrap uses sampling with replacement to form the training set § Sample a dataset of n instances n times with replacement to form a new dataset of n instances § Use this data as the training set § Use the instances from the original dataset that don’t occur in the new training set for testing 32
  • 33. Prof. Pier Luca Lanzi The 0.632 Bootstrap •  An instance has a probability of 1–1/n of not being picked •  Thus its probability of ending up in the test data is: •  This means the training data will contain approximately 63.2% of the instances 33 368.0 1 1 1 =≈ # $ % ' − − e n n
  • 34. Prof. Pier Luca Lanzi Estimating Error using Bootstrap •  The error estimate on the test data will be very pessimistic, since training was on just ~63% of the instances •  Therefore, combine it with the resubstitution error: •  The resubstitution error gets less weight than the error on the test data •  Repeat process several times with different replacement samples; average the results 34 instancestraininginstancestest 368.0632.0 eeerr ⋅+⋅=
  • 35. Prof. Pier Luca Lanzi More on Bootstrap •  Probably the best way of estimating performance for very small datasets •  However, it has some problems § Consider a random dataset with a 50% class distribution § An model that memorizes all the training data (equivalent to a kNN with k=1) will achieve 0% error on the training data and ~50% error on test data § Bootstrap estimate for this classifier will be § But the true expected error is 50% 35 %6.31%0368.0%50632.0 =⋅+⋅=err
  • 36. Prof. Pier Luca Lanzi Model Comparison How to compare the relative performance among competing models?
  • 37. Prof. Pier Luca Lanzi How to Compare the Performance of Two Models? •  Two models, § Model M1: accuracy = 85%, tested on 30 instances § Model M2: accuracy = 75%, tested on 5000 instances •  Can we say M1 is better than M2? •  How much confidence can we place on accuracy of M1 M2? •  Can the difference in performance measure be explained as a result of random fluctuations in the test set? 37
  • 38. Prof. Pier Luca Lanzi Confidence Interval for Accuracy •  Prediction can be regarded as a Bernoulli trial with two possible outcomes, correct or wrong •  A collection of Bernoulli trials has a Binomial distribution •  Given, § the number of correct test predictions x § the number of test instances N, § the accuracy p = s/N •  Can we predict the true accuracy of the model from p? 38
  • 39. Prof. Pier Luca Lanzi Mean and Variance •  Mean and variance for a Bernoulli trial is p and p(1-p) •  Expected success rate f = S/N •  When N is large enough, f follows a normal distribution with mean p and variance p(1-p)/N •  c% confidence level interval [-z ≤ X ≤z] for a random variable of mean 0 is given by •  With a symmetric distribution, we have 39
  • 40. Prof. Pier Luca Lanzi Confidence Interval for Accuracy •  For instance, if X has mean 0 and standard deviation 1, the confidence interval for c% is defined by the z values, •  Thus, for the confidence interval for 90% we have, •  To compute the confidence interval for the expected accuracy f we need to, § Normalize it to a distribution with mean 0 and standard deviation1 § Compute the confidence interval § Remap it to the actual distribution 40 Pr[X ≥ z] z 0.1% 3.09 0.5% 2.58 1% 2.33 5% 1.65 10% 1.28 20% 0.84 25% 0.69 40% 0.25
  • 41. Prof. Pier Luca Lanzi Confidence Interval for Accuracy •  First, transform f using, •  Compute the confidence interval, •  Solve it for p, 41
  • 42. Prof. Pier Luca Lanzi Examples •  When f = 75%, N=1000, c=80% (z=1.28), the confidence interval for the actual performance is [0.732, 0.767] •  When f = 75%, N=100, c=80% (z=1.28), the confidence interval for the actual performance is [0.691, 0.801] •  Note that the assumption about f being normally distributed only applies for large values of N (at least 30) •  When f = 75%, N=10, c=80% (z=1.28), the confidence interval for the actual performance is [0.0.549, 0.881] 42
  • 43. Prof. Pier Luca Lanzi Comparing Data Mining Schemes •  Which of two learning schemes performs better? •  The answer is obviously domain-dependent •  The typical approach is to compare10-fold cross-validation (CV) estimates using the Student’s t-test •  Student’s t-test tells whether the means of two samples are significantly different •  Use a paired t-test when the individual samples are paired, that is, when the same CV is applied twice. Use unpaired t-test otherwise 43
  • 44. Prof. Pier Luca Lanzi William Gosset (1876-1937) •  Obtained a post as a chemist in the Guinness brewery in Dublin in 1899. •  Invented the t-test to handle small samples for quality control in brewing •  Wrote under the name Student” •  Unpaired and paired two-sample t-test •  Unequal/equal sample size •  Unequal/equal sample variance 44
  • 47. Prof. Pier Luca Lanzi T-Test Using R j48 - c(96,93.33,92.00,94.67,96.00,94.00,94.67,94.0,94.67,98) zeror - c(33.33, 33.33, 33.33, 33.33, 33.33, 33.33, 33.33, 33.33, 33.33, 33.33) t.test(j48,zeror) Welch Two Sample t-test data: j48 and zeror t = 117.9102, df = 9, p-value = 1.153e-15 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 60.22594 62.58206 sample estimates: mean of x mean of y 94.734 33.330 47
  • 49. Prof. Pier Luca Lanzi ROC (Receiver Operating Characteristic) •  Developed in 1950s for signal detection theory to analyze noisy signals •  ROC curve plots TP (on the y-axis) against FP (on the x-axis), or TPR vs FPR (TPR = TP/(TP+FN), FPR=FP/(TN+FP)) •  Performance of each classifier represented as a point on the ROC curve •  Changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point 49
  • 50. Prof. Pier Luca Lanzi ROC Curve •  (TP,FP) § (0,0): declare everything to be negative class § (1,1): declare everything to be positive class § (1,0): ideal •  Diagonal line: § Random guessing § Below diagonal line, prediction is opposite of the true class 50
  • 51. Prof. Pier Luca Lanzi ROC Curve 51
  • 52. Prof. Pier Luca Lanzi Using ROC for Model Comparison •  No model consistently outperform the other •  M1 is better for small FPR •  M2 is better for large FPR •  Area Under the ROC curve •  Ideal, area = 1 •  Random guess, area = 0.5 52
  • 56. Prof. Pier Luca Lanzi Lift Charts •  Lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model •  Cumulative gains and lift charts are visual aids for measuring model performance •  Both charts consist of a lift curve and a baseline •  The greater the area between the lift curve and the baseline, the better the model 56
  • 57. Prof. Pier Luca Lanzi Example: Direct Marketing •  Mass mailout of promotional offers (1000000) •  The proportion who normally respond is 0.1%, that is 1000 responses •  A data mining tool can identify a subset of a 100000 for which the response rate is 0.4%, that is 400 responses •  In marketing terminology, the increase of response rate is known as the lift factor yielded by the model. •  The same data mining tool, may be able to identify 400000 households for which the response rate is 0.2%, that is 800 respondents corresponding to a lift factor of 2. •  The overall goal is to find subsets of test instances that have a high proportion of true positive 57
  • 58. Prof. Pier Luca Lanzi An Example from Direct Marketing 58
  • 59. Prof. Pier Luca Lanzi Which model should we prefer? (Model selection)
  • 60. Prof. Pier Luca Lanzi Model Selection Criteria •  Model selection criteria attempt to find a good compromise between: § The complexity of a model § Its prediction accuracy on the training data •  Reasoning: a good model is a simple model that achieves high accuracy on the given data •  Also known as Occam’s Razor : the best theory is the smallest one that describes all the facts 60 William of Ockham, born in the village of Ockham in Surrey (England) about 1285, was the most influential philosopher of the 14th century and a controversial theologian.
  • 61. Prof. Pier Luca Lanzi The MDL principle •  MDL stands for minimum description length •  The description length is defined as: space required to describe a theory + space required to describe the theory’s mistakes •  In our case the theory is the classifier and the mistakes are the errors on the training data •  We seek a classifier with Minimum Description Language 61
  • 62. Prof. Pier Luca Lanzi MDL and compression •  MDL principle relates to data compression •  The best theory is the one that compresses the data the most •  I.e. to compress a dataset we generate a model and then store the model and its mistakes •  We need to compute (a) size of the model, and (b) space needed to encode the errors •  (b) easy: use the informational loss function •  (a) need a method to encode the model 62
  • 63. Prof. Pier Luca Lanzi Which Classification Algorithm? •  Among the several algorithms, which one is the “best”? § Some algorithms have a lower computational complexity § Different algorithms provide different representations § Some algorithms allow the specification of prior knowledge •  If we are interested in the generalization performance, are there any reasons to prefer one classifier over another? •  Can we expect any classification method to be superior or inferior overall? •  According to the No Free Lunch Theorem, the answer to all these questions is known 63
  • 64. Prof. Pier Luca Lanzi No Free Lunch Theorem •  If the goal is to obtain good generalization performance, there are no context-independent or usage-independent reasons to favor one classification method over another •  If one algorithm seems to outperform another in a certain situation, it is a consequence of its fit to the particular problem, not the general superiority of the algorithm •  When confronting a new problem, this theorem suggests that we should focus on the aspects that matter most § Prior information § Data distribution § Amount of training data § Cost or reward •  The theorem also justifies skepticism regarding studies that “demonstrate” the overall superiority of a certain algorithm 64
  • 65. Prof. Pier Luca Lanzi No Free Lunch Theorem •  [A]ll algorithms that search for an extremum of a cost [objective] function perform exactly the same, when averaged over all possible cost functions.“ [1] •  “[T]he average performance of any pair of algorithms across all possible problems is identical.” [2] •  Wolpert, D.H., Macready, W.G. (1995), No Free Lunch Theorems for Search, Technical Report SFI-TR-95-02-010 (Santa Fe Institute). •  Wolpert, D.H., Macready, W.G. (1997), No Free Lunch Theorems for Optimization, IEEE Transactions on Evolutionary Computation 1, 67. 65