SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
Prof. Pier Luca Lanzi
Ensemble Classifiers
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
Prof. Pier Luca Lanzi
Syllabus
• “The Elements of Statistical Learning: Data Mining, Inference, and
Prediction,” Second Edition, February 2009, Trevor Hastie,
Robert Tibshirani, Jerome Friedman
(http://statweb.stanford.edu/~tibs/ElemStatLearn/)
§Section 8.7
§Chapter 10
§Chapter 15
§Chapter 16
• Scikit-Learn
§http://scikit-learn.org/stable/modules/ensemble.html
2
Prof. Pier Luca Lanzi
What are Ensemble Methods?
Prof. Pier Luca Lanzi
Diagnosis Diagnosis Diagnosis
Final DiagnosisFinal Diagnosis
Prof. Pier Luca Lanzi
Ensemble Methods
Generate a set of classifiers from the training data
Predict class label of previously unseen cases by
aggregating predictions made by multiple classifiers
Prof. Pier Luca Lanzi
What is the General Idea?
Original
Training data
....D1
D2 Dt-1 Dt
D
Step 1:
Create Multiple
Data Sets
C1 C2 Ct -1 Ct
Step 2:
Build Multiple
Classifiers
C*
Step 3:
Combine
Classifiers
6
Prof. Pier Luca Lanzi
Building models ensembles
• Basic idea
§Build different “experts”, let them vote
• Advantage
§Often improves predictive performance
• Disadvantage
§Usually produces output that is very hard to analyze
7
However, there are approaches that aim to
produce a single comprehensible structure
Prof. Pier Luca Lanzi
Why does it work?
• Suppose there are 25 base classifiers
• Each classifier has error rate, ε = 0.35
• Assume classifiers are independent
• The probability that the ensemble makes a wrong prediction is
8
Prof. Pier Luca Lanzi
how can we generate several models using
the same data and the same approach (e.g.,Trees)?
Prof. Pier Luca Lanzi
we might …
randomize the selection of training data?
randomize the approach?
…
Prof. Pier Luca Lanzi
Bagging
Prof. Pier Luca Lanzi
What is Bagging?
(Bootstrap Aggregation)
• Analogy
§ Diagnosis based on multiple doctors’ majority vote
• Training
§ Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled
with replacement from D (i.e., bootstrap)
§ A classifier model Mi is learned for each training set Di
• Classification (classify an unknown sample X)
§ Each classifier Mi returns its class prediction
§ The bagged classifier M* counts the votes and
assigns the class with the most votes to X
• Combining predictions by voting/averaging
§ The simplest way is to give equal weight to each model
• Predicting continuous values
§ It can be applied to the prediction of continuous values by taking the average value
of each prediction for a given test tuple
12
Prof. Pier Luca Lanzi
Bagging Algorithm 13
Let n be the number of instances in the training data
For each of t iterations:
Sample n instances from training set
(with replacement)
Apply the learning algorithm to the sample
Store the resulting model
For each of the t models:
Predict class of instance using model
Return class that is predicted most often
Model generation
Classification
Prof. Pier Luca Lanzi
Bagging works because it reduces
variance by voting/averaging
usually, the more classifiers the better
it can help a lot if data are noisy
however, in some pathological hypothetical
situations the overall error might increase
Prof. Pier Luca Lanzi
When Does Bagging Work?
Stable vs Unstable Classifiers
• A learning algorithm is unstable, if small changes to the training
set cause large changes in the learned classifier
• If the learning algorithm is unstable, then
bagging almost always improves performance
• Bagging stable classifiers is not a good idea
• Decision trees, regression trees, linear regression, neural networks
are examples of unstable classifiers
• K-nearest neighbors is a stable classifier
15
Prof. Pier Luca Lanzi
Randomize the Algorithm?
• We could randomize the learning algorithm instead of the data
• Some algorithms already have a random component,
e.g. initial weights in neural net
• Most algorithms can be randomized (e.g., attribute selection in
decision trees)
• More generally applicable than bagging, e.g. random subsets in
nearest-neighbor scheme
• It can be combined with bagging
16
Prof. Pier Luca Lanzi
the more uncorrelated the trees,
the better the variance reduction
Prof. Pier Luca Lanzi
Random ForestsRandom Forests
Prof. Pier Luca Lanzi
learning ensemble consisting of a bagging
of unpruned decision tree learners with
randomized selection of features at each split
Prof. Pier Luca Lanzi
What is a Random Forest?
• Random forests (RF) are a combination of tree predictors
• Each tree depends on the values of a random vector sampled
independently and with the same distribution for all trees in the
forest
• The generalization error of a forest of tree classifiers depends on
the strength of the individual trees in the forest and the
correlation between them
• Using a random selection of features to split each node yields
error rates that compare favorably to Adaboost, and are more
robust with respect to noise
20
Prof. Pier Luca Lanzi
How do Random Forests Work?
• D = training set F = set of variables to test for split
• k = number of trees in forest n = number of variables
• for i = 1 to k do
§ build data set Di by sampling with replacement from D
§ learn tree Ti from Di using at each node only a subset F of the n variables
§ save tree Ti as it is, do not perform any pruning
• Output is computed as the majority voting (for classification) or average (for
prediction) of the set of k generated trees
21
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Out of Bag Samples
• For each observation (xi, yi), construct its random forest predictor
by averaging only those trees corresponding to bootstrap samples
in which the observation did not appear
• The OOB error estimate is almost identical to that obtained by
n-fold crossvalidation and related to the leave-one-out evaluation
• Thus, random forests can be fit in one sequence, with cross-
validation being performed along the way
• Once the OOB error stabilizes, the training can be terminated
23
Prof. Pier Luca Lanzi
24
Prof. Pier Luca Lanzi
Properties of Random Forests
• Easy to use ("off-the-shelve"), only 2 parameters
(no. of trees, %variables for split)
• Very high accuracy
• No overfitting if selecting large number of trees (choose high)
• Insensitive to choice of split% (~20%)
• Returns an estimate of variable importance
• Random forests are an effective tool in prediction.
• Forests give results competitive with boosting and adaptive
bagging, yet do not progressively change the training set.
• Random inputs and random features produce good results in
classification - less so in regression.
• For larger data sets, we can gain accuracy by combining random
features with boosting.
25
Prof. Pier Luca Lanzi
Boosting
26
Prof. Pier Luca Lanzi
Diagn
stomach pain & leg hurts
(headache because of X)
agree for X, leg is because
ofY but cannot explain
stomach
headache & stomach
pain & leg hurts
headache is surely caused
by X but cannot explain
others
Prof. Pier Luca Lanzi
headache is surely caused
by X but cannot explain
others
agree onX, leg is because
of Y but cannot explain
stomach
agree on X sort of agree on
Y, sure stomach is Z
Final Diagnosis
result of the weighted
opinions that the doctors
(sequentially) gave you
Prof. Pier Luca Lanzi
AdaBoost
• AdaBoost computes a strong classifier as a combination of weak
classifiers,
• Where
§ht(x) is the output of the weak classifier t
§αt is the weight assigned to the weak classifier t from its
estimated error ϵt as
29
Prof. Pier Luca Lanzi
What is Boosting?
• Analogy
§ Consult several doctors, sequentially trying to find a doctor that can
explain what other doctors cannot
§ Final diagnose is based on a weighted combination of all the diagnoses
§ Weights are assigned based on the accuracy of previous diagnoses
• How does Boosting work?
§ Weights are assigned to each training example
§ A series of k classifiers is iteratively learned
§ After a classifier Mi is learned, the weights are updated to allow the
subsequent classifier Mi+1 to pay more attention to the training tuples that
were misclassified by Mi
§ The final M* combines the votes of each individual classifier, where the
weight of each classifier's vote is a function of its accuracy
30
Prof. Pier Luca Lanzi
AdaBoost Example (1)
• Suppose our data consist of ten points with a +1/-1 label
• At the beginning, all the data points have the same weight and
thus the same probability of being selected for the training set.
With 10 data points the weight is 1/10 or 0.1
31
w 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100
x 0 1 2 3 4 5 6 7 8 9
y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1
x 0 1 2 3 4 5 6 7 8 9
y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1
Prof. Pier Luca Lanzi
AdaBoost Example (2)
• Suppose that in the first round, we generated the training data
using the weights and generated a weak classifier h1 that returns
+1 if x<3
• The error ϵ1 on the weighted dataset is 2/10 and α1 is 0.69
• The weights are updated by multiplying them with
§exp(-α1) if the example was correctly classified
§exp(α1) if the example was incorrectly classified
• Finally, the weights are normalized since they have to sum up to 1
32
w 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100
x 0 1 2 3 4 5 6 7 8 9
y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1
+1 +1 +1 -1 -1 -1 -1 -1 -1 -1
h1
Prof. Pier Luca Lanzi
AdaBoost – α and ϵ
• The importance of a classifier
is computed as
• Where the error rate is
computed as
33
Prof. Pier Luca Lanzi
AdaBoost Example (3)
• The new weights are thus,
• Suppose now that we computed the second weak classifier h2 as
• That has an error ϵ2 of 0.1875 and α2 is 0.7332
34
w 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.25 0.25
x 0 1 2 3 4 5 6 7 8 9
y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1
w 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.25 0.25
x 0 1 2 3 4 5 6 7 8 9
y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1
-1 -1 -1 -1 -1 -1 -1 -1 +1 +1
h2
Prof. Pier Luca Lanzi
AdaBoost Example (4)
• The new weights are thus,
• Finally, we sample the dataset using the weights and compute the
third weak classifier h3 as
• That has an error ϵ3 of 0.19 and an α3 of 0.72
35
w 0.167 0.167 0.167 0.038 0.038 0.038 0.038 0.038 0.154 0.154
x 0 1 2 3 4 5 6 7 8 9
y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1
w 0.167 0.167 0.167 0.038 0.038 0.038 0.038 0.038 0.154 0.154
x 0 1 2 3 4 5 6 7 8 9
y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1
+1 +1 +1 +1 +1 +1 +1 +1 +1 +1
h3
Prof. Pier Luca Lanzi
AdaBoost Example (5)
• The final model is thus computed as,
36
Prof. Pier Luca Lanzi
AdaBoost.M1 37
Assign equal weight to each training instance
For t iterations:
Apply learning algorithm to weighted dataset,
store resulting model
Compute model’s error e on weighted dataset
If e = 0 or e ≥ 0.5:
Terminate model generation
For each instance in dataset:
If classified correctly by model:
Multiply instance’s weight by e/(1-e)
Normalize weight of all instances
Model generation
Assign weight = 0 to all classes
For each of the t (or less) models:
For the class this model predicts
add –log e/(1-e) to this class’s weight
Return class with highest weight
Classification
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
More on Boosting
• Boosting needs weights ... but
• Can adapt learning algorithm ... or
• Can apply boosting without weights
§ Resample with probability determined by weights
§ Disadvantage: not all instances are used
§ Advantage: if error > 0.5, can resample again
• Stems from computational learning theory
• Theoretical result: training error decreases exponentially
• It works if base classifiers are not too complex, and their error doesn’t become
too large too quickly
• Boosting works with weak learners only condition: error doesn’t exceed 0.5
• In practice, boosting sometimes overfits (in contrast to bagging)
39
Prof. Pier Luca Lanzi
Boosting
Round 1 + + + -- - - - - -
0.0094 0.0094 0.4623
B1
a = 1.9459
Data points
for training
Initial weights for each data point
Original
Data + + + -- - - - + +
0.1 0.1 0.1
Prof. Pier Luca Lanzi
Boosting
Round 1 + + + -- - - - - -
Boosting
Round 2 - - - -- - - - + +
Boosting
Round 3 + + + ++ + + + + +
Overall + + + -- - - - + +
0.0094 0.0094 0.4623
0.3037 0.0009 0.0422
0.0276 0.1819 0.0038
B1
B2
B3
a = 1.9459
a = 2.9323
a = 3.8744
Prof. Pier Luca Lanzi
Training Sample
classifier
h1(x)
Weighted Sample
re-weight
classifier
h2(x)
Weighted Sample
re-weight
classifier
h3(x)
Weighted Sample
re-weight
Weighted Sample
re-weight
classifier
h4(x)
classifier
hT(x)
Prof. Pier Luca Lanzi
AdaBoost in Pictures
Start here:
equal event weights
misclassified events get
larger weights
… and so on
Prof. Pier Luca Lanzi
Boosting Stumps
• Random forests tend to grow rather bushy trees
• Boosting can work well even with shallow trees
• Even with stumps, simple two nodes trees involving just one split
44
boosting iterations
Prof. Pier Luca Lanzi
Boosting
• Boosting continues to reduce test error even if the training error
reaches zero
• This suggests that boosting is not using training error as the basis
for learning its trees
45
number of terms
Prof. Pier Luca Lanzi
Boosting
• Also uses voting/averaging
• Weights models according to performance
• Iterative: new models are influenced
by the performance of previously built ones
§Encourage new model to become an “expert” for instances
misclassified by earlier models
§Intuitive justification: models should be experts that
complement each other
• Several variants
§Boosting by sampling,
the weights are used to sample the data for training
§Boosting by weighting,
the weights are used by the learning algorithm
46
Prof. Pier Luca Lanzi
General Boosting Algorithms
47
Prof. Pier Luca Lanzi
Stage-wise Addictive Modeling
• Boosting builds an addictive model
where ÉŁm is a parameter vector representing the splits
• We can view boosting as a special case of well-known statistical models which
optimize all the parameters jointly for instance using least squares or
maximum likelihood, like GAMs or Basis Expansions,
• Boosting fits the parameters in a stage-wise fashion, each time we optimize
the parameter of the next tree leaving the previous ones unmodified
• This slows down the rate at which the method overfits
48
Prof. Pier Luca Lanzi
Example: Least Squares Boosting
• Start with a function F0(x)=0 and residual r=y, m = 0
• Repeat
§m = m + 1
§Compute a regression tree g(x) to fit the residual r
§fm(x) = ϵg(x) (shrink it down)
§Fm(x) = Fm-1(x) + fm(x)
§r = r - fm(x)
• g(x) is typicall a shallow model (for example, a small tree)
• Stage-wise fitting typically slows down overfitting
• 0<ϵ ≤1 is a shrinkage factor that limits overfit even further
49
Prof. Pier Luca Lanzi
AdaBoost: Stage-wise Modeling
• AdaBoost builds an addictive logistic regression model
by stage-wise fitting using the loss function
• Instead of fitting trees to residuals, the special form of exponential
loss leads to fitting trees to weighted versions of the original data
• exp(-yF(x)) is a monotone upper bound on misclassification loss
at x; there are other more robust loss functions like binomial
variance
50
Prof. Pier Luca Lanzi
General Boosting Algorithms
• The boosting paradigm can be extended to general loss functions
• Sometime we can introduce the shrinkage term ϵ so that the
Fm(x) is updated as,
• Shrinkage slows the stage-wise model building and typically leads
to a better performance
• ϵ default value in the gbm package is 0.001
51
Prof. Pier Luca Lanzi
Gradient Tree Boosting
52
Prof. Pier Luca Lanzi
Gradient Boosting (Friedman, 2001)
• Extension of the general boosting algorithm that applies gradient
descent to minimize loss function during each step
• Inherits all the good features of trees (variable selection, missing
data, mixed predictors) and improves their predictive
performance
• Works with a variety of loss functions
• Tree size has an important role
53
Prof. Pier Luca Lanzi
https://en.wikipedia.org/wiki/Gradient_boosting
Prof. Pier Luca Lanzi
Algorithm 10.3 from “The Element of Statistical Learning”, by Trevor Hastie, Robert Tibshirani,
Jerome Friedman available at http://statweb.stanford.edu/~tibs/ElemStatLearn/
Prof. Pier Luca Lanzi
Figure 10. 9 from “The Element of Statistical Learning”, by Trevor Hastie, Robert Tibshirani, Jerome
Friedman available at http://statweb.stanford.edu/~tibs/ElemStatLearn/
Prof. Pier Luca Lanzi
Table 10.2 from “The Element of Statistical Learning”, by Trevor Hastie, Robert Tibshirani, Jerome
Friedman available at http://statweb.stanford.edu/~tibs/ElemStatLearn/
Prof. Pier Luca Lanzi
eXtreme Gradient Boosting
(XGBoost)
• Efficient and scalable implementation of gradient tree boosting
framework
• Used by most of the winning solutions and runner up of the
recent data science competitions
• Features
§Novel tree learning algorithm to handle sparse data
§Approximate tree learning using quantile sketch
§More than ten times faster on single machines
§More regularized model formalization to control over-fitting,
which gives it better performance.
58
Prof. Pier Luca Lanzi
Learning Ensembles
59
Prof. Pier Luca Lanzi
Learning Ensembles
• Random forests and boosting computes a set of weak models and combines
them to build a stronger model through an incremental optimization process
• Random forests averages the models, boosting uses a weighted average
• We can use such weak models as a dictionary to build a stronger model by
performing a global optimization over the ensemble
• First, we generate the dictionary of models (using random forests or boosting)
• Next we fit a model to the data
60
Prof. Pier Luca Lanzi
Stacked Generalization
(Stacking)
61
Prof. Pier Luca Lanzi
Stacking Generalization (Stacking)
• Generalization of previous ensemble methods and resemble
many similarities to the concept of learning ensembles
• Trains a learning algorithm to combine the predictions of an
heterogeneous set of learning algorithms
§First, a set of base learners are trained over the data
(to generate level-0 models)
§Then, a meta learner is trained using the predictions of base
classifiers as additional inputs (to generate a level-1 model)
• Applies crossvalidation-like scheme
• Typically yields performance better than any single one of the
trained models
62

Weitere ähnliche Inhalte

Was ist angesagt?

DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesPier Luca Lanzi
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationPier Luca Lanzi
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityPier Luca Lanzi
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsPier Luca Lanzi
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringPier Luca Lanzi
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationPier Luca Lanzi
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesPier Luca Lanzi
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationPier Luca Lanzi
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringPier Luca Lanzi
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationPier Luca Lanzi
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationPier Luca Lanzi
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesPier Luca Lanzi
 

Was ist angesagt? (20)

DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification Ensembles
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical Clustering
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representation
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with Unity
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based Clustering
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data Exploration
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision trees
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification Rules
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based Clustering
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data Representation
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision Trees
 

Ă„hnlich wie DMTM Lecture 10 Classification ensembles

Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learningRajasekhar364622
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdfDynamicPitch
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningAbhishek Vijayvargia
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsPalin analytics
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxAbhishekSingh43430
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
ICTIR2016tutorial
ICTIR2016tutorialICTIR2016tutorial
ICTIR2016tutorialTetsuya Sakai
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learningSadia Zafar
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 

Ă„hnlich wie DMTM Lecture 10 Classification ensembles (20)

Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Random forest
Random forestRandom forest
Random forest
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptx
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
ICTIR2016tutorial
ICTIR2016tutorialICTIR2016tutorial
ICTIR2016tutorial
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learning
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 

Mehr von Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionPier Luca Lanzi
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningPier Luca Lanzi
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelinePier Luca Lanzi
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationPier Luca Lanzi
 
VDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game designVDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game designPier Luca Lanzi
 

Mehr von Pier Luca Lanzi (15)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 Introduction
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data mining
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipeline
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generation
 
VDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game designVDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game design
 

KĂĽrzlich hochgeladen

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

KĂĽrzlich hochgeladen (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

DMTM Lecture 10 Classification ensembles

  • 1. Prof. Pier Luca Lanzi Ensemble Classifiers Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
  • 2. Prof. Pier Luca Lanzi Syllabus • “The Elements of Statistical Learning: Data Mining, Inference, and Prediction,” Second Edition, February 2009, Trevor Hastie, Robert Tibshirani, Jerome Friedman (http://statweb.stanford.edu/~tibs/ElemStatLearn/) §Section 8.7 §Chapter 10 §Chapter 15 §Chapter 16 • Scikit-Learn §http://scikit-learn.org/stable/modules/ensemble.html 2
  • 3. Prof. Pier Luca Lanzi What are Ensemble Methods?
  • 4. Prof. Pier Luca Lanzi Diagnosis Diagnosis Diagnosis Final DiagnosisFinal Diagnosis
  • 5. Prof. Pier Luca Lanzi Ensemble Methods Generate a set of classifiers from the training data Predict class label of previously unseen cases by aggregating predictions made by multiple classifiers
  • 6. Prof. Pier Luca Lanzi What is the General Idea? Original Training data ....D1 D2 Dt-1 Dt D Step 1: Create Multiple Data Sets C1 C2 Ct -1 Ct Step 2: Build Multiple Classifiers C* Step 3: Combine Classifiers 6
  • 7. Prof. Pier Luca Lanzi Building models ensembles • Basic idea §Build different “experts”, let them vote • Advantage §Often improves predictive performance • Disadvantage §Usually produces output that is very hard to analyze 7 However, there are approaches that aim to produce a single comprehensible structure
  • 8. Prof. Pier Luca Lanzi Why does it work? • Suppose there are 25 base classifiers • Each classifier has error rate, ε = 0.35 • Assume classifiers are independent • The probability that the ensemble makes a wrong prediction is 8
  • 9. Prof. Pier Luca Lanzi how can we generate several models using the same data and the same approach (e.g.,Trees)?
  • 10. Prof. Pier Luca Lanzi we might … randomize the selection of training data? randomize the approach? …
  • 11. Prof. Pier Luca Lanzi Bagging
  • 12. Prof. Pier Luca Lanzi What is Bagging? (Bootstrap Aggregation) • Analogy § Diagnosis based on multiple doctors’ majority vote • Training § Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled with replacement from D (i.e., bootstrap) § A classifier model Mi is learned for each training set Di • Classification (classify an unknown sample X) § Each classifier Mi returns its class prediction § The bagged classifier M* counts the votes and assigns the class with the most votes to X • Combining predictions by voting/averaging § The simplest way is to give equal weight to each model • Predicting continuous values § It can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple 12
  • 13. Prof. Pier Luca Lanzi Bagging Algorithm 13 Let n be the number of instances in the training data For each of t iterations: Sample n instances from training set (with replacement) Apply the learning algorithm to the sample Store the resulting model For each of the t models: Predict class of instance using model Return class that is predicted most often Model generation Classification
  • 14. Prof. Pier Luca Lanzi Bagging works because it reduces variance by voting/averaging usually, the more classifiers the better it can help a lot if data are noisy however, in some pathological hypothetical situations the overall error might increase
  • 15. Prof. Pier Luca Lanzi When Does Bagging Work? Stable vs Unstable Classifiers • A learning algorithm is unstable, if small changes to the training set cause large changes in the learned classifier • If the learning algorithm is unstable, then bagging almost always improves performance • Bagging stable classifiers is not a good idea • Decision trees, regression trees, linear regression, neural networks are examples of unstable classifiers • K-nearest neighbors is a stable classifier 15
  • 16. Prof. Pier Luca Lanzi Randomize the Algorithm? • We could randomize the learning algorithm instead of the data • Some algorithms already have a random component, e.g. initial weights in neural net • Most algorithms can be randomized (e.g., attribute selection in decision trees) • More generally applicable than bagging, e.g. random subsets in nearest-neighbor scheme • It can be combined with bagging 16
  • 17. Prof. Pier Luca Lanzi the more uncorrelated the trees, the better the variance reduction
  • 18. Prof. Pier Luca Lanzi Random ForestsRandom Forests
  • 19. Prof. Pier Luca Lanzi learning ensemble consisting of a bagging of unpruned decision tree learners with randomized selection of features at each split
  • 20. Prof. Pier Luca Lanzi What is a Random Forest? • Random forests (RF) are a combination of tree predictors • Each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest • The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them • Using a random selection of features to split each node yields error rates that compare favorably to Adaboost, and are more robust with respect to noise 20
  • 21. Prof. Pier Luca Lanzi How do Random Forests Work? • D = training set F = set of variables to test for split • k = number of trees in forest n = number of variables • for i = 1 to k do § build data set Di by sampling with replacement from D § learn tree Ti from Di using at each node only a subset F of the n variables § save tree Ti as it is, do not perform any pruning • Output is computed as the majority voting (for classification) or average (for prediction) of the set of k generated trees 21
  • 23. Prof. Pier Luca Lanzi Out of Bag Samples • For each observation (xi, yi), construct its random forest predictor by averaging only those trees corresponding to bootstrap samples in which the observation did not appear • The OOB error estimate is almost identical to that obtained by n-fold crossvalidation and related to the leave-one-out evaluation • Thus, random forests can be fit in one sequence, with cross- validation being performed along the way • Once the OOB error stabilizes, the training can be terminated 23
  • 24. Prof. Pier Luca Lanzi 24
  • 25. Prof. Pier Luca Lanzi Properties of Random Forests • Easy to use ("off-the-shelve"), only 2 parameters (no. of trees, %variables for split) • Very high accuracy • No overfitting if selecting large number of trees (choose high) • Insensitive to choice of split% (~20%) • Returns an estimate of variable importance • Random forests are an effective tool in prediction. • Forests give results competitive with boosting and adaptive bagging, yet do not progressively change the training set. • Random inputs and random features produce good results in classification - less so in regression. • For larger data sets, we can gain accuracy by combining random features with boosting. 25
  • 26. Prof. Pier Luca Lanzi Boosting 26
  • 27. Prof. Pier Luca Lanzi Diagn stomach pain & leg hurts (headache because of X) agree for X, leg is because ofY but cannot explain stomach headache & stomach pain & leg hurts headache is surely caused by X but cannot explain others
  • 28. Prof. Pier Luca Lanzi headache is surely caused by X but cannot explain others agree onX, leg is because of Y but cannot explain stomach agree on X sort of agree on Y, sure stomach is Z Final Diagnosis result of the weighted opinions that the doctors (sequentially) gave you
  • 29. Prof. Pier Luca Lanzi AdaBoost • AdaBoost computes a strong classifier as a combination of weak classifiers, • Where §ht(x) is the output of the weak classifier t §αt is the weight assigned to the weak classifier t from its estimated error ϵt as 29
  • 30. Prof. Pier Luca Lanzi What is Boosting? • Analogy § Consult several doctors, sequentially trying to find a doctor that can explain what other doctors cannot § Final diagnose is based on a weighted combination of all the diagnoses § Weights are assigned based on the accuracy of previous diagnoses • How does Boosting work? § Weights are assigned to each training example § A series of k classifiers is iteratively learned § After a classifier Mi is learned, the weights are updated to allow the subsequent classifier Mi+1 to pay more attention to the training tuples that were misclassified by Mi § The final M* combines the votes of each individual classifier, where the weight of each classifier's vote is a function of its accuracy 30
  • 31. Prof. Pier Luca Lanzi AdaBoost Example (1) • Suppose our data consist of ten points with a +1/-1 label • At the beginning, all the data points have the same weight and thus the same probability of being selected for the training set. With 10 data points the weight is 1/10 or 0.1 31 w 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 x 0 1 2 3 4 5 6 7 8 9 y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1 x 0 1 2 3 4 5 6 7 8 9 y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1
  • 32. Prof. Pier Luca Lanzi AdaBoost Example (2) • Suppose that in the first round, we generated the training data using the weights and generated a weak classifier h1 that returns +1 if x<3 • The error ϵ1 on the weighted dataset is 2/10 and α1 is 0.69 • The weights are updated by multiplying them with §exp(-α1) if the example was correctly classified §exp(α1) if the example was incorrectly classified • Finally, the weights are normalized since they have to sum up to 1 32 w 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 x 0 1 2 3 4 5 6 7 8 9 y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1 +1 +1 +1 -1 -1 -1 -1 -1 -1 -1 h1
  • 33. Prof. Pier Luca Lanzi AdaBoost – α and ϵ • The importance of a classifier is computed as • Where the error rate is computed as 33
  • 34. Prof. Pier Luca Lanzi AdaBoost Example (3) • The new weights are thus, • Suppose now that we computed the second weak classifier h2 as • That has an error ϵ2 of 0.1875 and α2 is 0.7332 34 w 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.25 0.25 x 0 1 2 3 4 5 6 7 8 9 y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1 w 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.25 0.25 x 0 1 2 3 4 5 6 7 8 9 y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1 -1 -1 -1 -1 -1 -1 -1 -1 +1 +1 h2
  • 35. Prof. Pier Luca Lanzi AdaBoost Example (4) • The new weights are thus, • Finally, we sample the dataset using the weights and compute the third weak classifier h3 as • That has an error ϵ3 of 0.19 and an α3 of 0.72 35 w 0.167 0.167 0.167 0.038 0.038 0.038 0.038 0.038 0.154 0.154 x 0 1 2 3 4 5 6 7 8 9 y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1 w 0.167 0.167 0.167 0.038 0.038 0.038 0.038 0.038 0.154 0.154 x 0 1 2 3 4 5 6 7 8 9 y +1 +1 +1 -1 -1 -1 -1 -1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 h3
  • 36. Prof. Pier Luca Lanzi AdaBoost Example (5) • The final model is thus computed as, 36
  • 37. Prof. Pier Luca Lanzi AdaBoost.M1 37 Assign equal weight to each training instance For t iterations: Apply learning algorithm to weighted dataset, store resulting model Compute model’s error e on weighted dataset If e = 0 or e ≥ 0.5: Terminate model generation For each instance in dataset: If classified correctly by model: Multiply instance’s weight by e/(1-e) Normalize weight of all instances Model generation Assign weight = 0 to all classes For each of the t (or less) models: For the class this model predicts add –log e/(1-e) to this class’s weight Return class with highest weight Classification
  • 39. Prof. Pier Luca Lanzi More on Boosting • Boosting needs weights ... but • Can adapt learning algorithm ... or • Can apply boosting without weights § Resample with probability determined by weights § Disadvantage: not all instances are used § Advantage: if error > 0.5, can resample again • Stems from computational learning theory • Theoretical result: training error decreases exponentially • It works if base classifiers are not too complex, and their error doesn’t become too large too quickly • Boosting works with weak learners only condition: error doesn’t exceed 0.5 • In practice, boosting sometimes overfits (in contrast to bagging) 39
  • 40. Prof. Pier Luca Lanzi Boosting Round 1 + + + -- - - - - - 0.0094 0.0094 0.4623 B1 a = 1.9459 Data points for training Initial weights for each data point Original Data + + + -- - - - + + 0.1 0.1 0.1
  • 41. Prof. Pier Luca Lanzi Boosting Round 1 + + + -- - - - - - Boosting Round 2 - - - -- - - - + + Boosting Round 3 + + + ++ + + + + + Overall + + + -- - - - + + 0.0094 0.0094 0.4623 0.3037 0.0009 0.0422 0.0276 0.1819 0.0038 B1 B2 B3 a = 1.9459 a = 2.9323 a = 3.8744
  • 42. Prof. Pier Luca Lanzi Training Sample classifier h1(x) Weighted Sample re-weight classifier h2(x) Weighted Sample re-weight classifier h3(x) Weighted Sample re-weight Weighted Sample re-weight classifier h4(x) classifier hT(x)
  • 43. Prof. Pier Luca Lanzi AdaBoost in Pictures Start here: equal event weights misclassified events get larger weights … and so on
  • 44. Prof. Pier Luca Lanzi Boosting Stumps • Random forests tend to grow rather bushy trees • Boosting can work well even with shallow trees • Even with stumps, simple two nodes trees involving just one split 44 boosting iterations
  • 45. Prof. Pier Luca Lanzi Boosting • Boosting continues to reduce test error even if the training error reaches zero • This suggests that boosting is not using training error as the basis for learning its trees 45 number of terms
  • 46. Prof. Pier Luca Lanzi Boosting • Also uses voting/averaging • Weights models according to performance • Iterative: new models are influenced by the performance of previously built ones §Encourage new model to become an “expert” for instances misclassified by earlier models §Intuitive justification: models should be experts that complement each other • Several variants §Boosting by sampling, the weights are used to sample the data for training §Boosting by weighting, the weights are used by the learning algorithm 46
  • 47. Prof. Pier Luca Lanzi General Boosting Algorithms 47
  • 48. Prof. Pier Luca Lanzi Stage-wise Addictive Modeling • Boosting builds an addictive model where ÉŁm is a parameter vector representing the splits • We can view boosting as a special case of well-known statistical models which optimize all the parameters jointly for instance using least squares or maximum likelihood, like GAMs or Basis Expansions, • Boosting fits the parameters in a stage-wise fashion, each time we optimize the parameter of the next tree leaving the previous ones unmodified • This slows down the rate at which the method overfits 48
  • 49. Prof. Pier Luca Lanzi Example: Least Squares Boosting • Start with a function F0(x)=0 and residual r=y, m = 0 • Repeat §m = m + 1 §Compute a regression tree g(x) to fit the residual r §fm(x) = ϵg(x) (shrink it down) §Fm(x) = Fm-1(x) + fm(x) §r = r - fm(x) • g(x) is typicall a shallow model (for example, a small tree) • Stage-wise fitting typically slows down overfitting • 0<ϵ ≤1 is a shrinkage factor that limits overfit even further 49
  • 50. Prof. Pier Luca Lanzi AdaBoost: Stage-wise Modeling • AdaBoost builds an addictive logistic regression model by stage-wise fitting using the loss function • Instead of fitting trees to residuals, the special form of exponential loss leads to fitting trees to weighted versions of the original data • exp(-yF(x)) is a monotone upper bound on misclassification loss at x; there are other more robust loss functions like binomial variance 50
  • 51. Prof. Pier Luca Lanzi General Boosting Algorithms • The boosting paradigm can be extended to general loss functions • Sometime we can introduce the shrinkage term ϵ so that the Fm(x) is updated as, • Shrinkage slows the stage-wise model building and typically leads to a better performance • ϵ default value in the gbm package is 0.001 51
  • 52. Prof. Pier Luca Lanzi Gradient Tree Boosting 52
  • 53. Prof. Pier Luca Lanzi Gradient Boosting (Friedman, 2001) • Extension of the general boosting algorithm that applies gradient descent to minimize loss function during each step • Inherits all the good features of trees (variable selection, missing data, mixed predictors) and improves their predictive performance • Works with a variety of loss functions • Tree size has an important role 53
  • 54. Prof. Pier Luca Lanzi https://en.wikipedia.org/wiki/Gradient_boosting
  • 55. Prof. Pier Luca Lanzi Algorithm 10.3 from “The Element of Statistical Learning”, by Trevor Hastie, Robert Tibshirani, Jerome Friedman available at http://statweb.stanford.edu/~tibs/ElemStatLearn/
  • 56. Prof. Pier Luca Lanzi Figure 10. 9 from “The Element of Statistical Learning”, by Trevor Hastie, Robert Tibshirani, Jerome Friedman available at http://statweb.stanford.edu/~tibs/ElemStatLearn/
  • 57. Prof. Pier Luca Lanzi Table 10.2 from “The Element of Statistical Learning”, by Trevor Hastie, Robert Tibshirani, Jerome Friedman available at http://statweb.stanford.edu/~tibs/ElemStatLearn/
  • 58. Prof. Pier Luca Lanzi eXtreme Gradient Boosting (XGBoost) • Efficient and scalable implementation of gradient tree boosting framework • Used by most of the winning solutions and runner up of the recent data science competitions • Features §Novel tree learning algorithm to handle sparse data §Approximate tree learning using quantile sketch §More than ten times faster on single machines §More regularized model formalization to control over-fitting, which gives it better performance. 58
  • 59. Prof. Pier Luca Lanzi Learning Ensembles 59
  • 60. Prof. Pier Luca Lanzi Learning Ensembles • Random forests and boosting computes a set of weak models and combines them to build a stronger model through an incremental optimization process • Random forests averages the models, boosting uses a weighted average • We can use such weak models as a dictionary to build a stronger model by performing a global optimization over the ensemble • First, we generate the dictionary of models (using random forests or boosting) • Next we fit a model to the data 60
  • 61. Prof. Pier Luca Lanzi Stacked Generalization (Stacking) 61
  • 62. Prof. Pier Luca Lanzi Stacking Generalization (Stacking) • Generalization of previous ensemble methods and resemble many similarities to the concept of learning ensembles • Trains a learning algorithm to combine the predictions of an heterogeneous set of learning algorithms §First, a set of base learners are trained over the data (to generate level-0 models) §Then, a meta learner is trained using the predictions of base classifiers as additional inputs (to generate a level-1 model) • Applies crossvalidation-like scheme • Typically yields performance better than any single one of the trained models 62