This presentation provides an overview of boosting approaches for classification problems. It discusses combining classifiers through bagging and boosting to create stronger classifiers. The AdaBoost algorithm is explained in detail, including its training and classification phases. An example is provided to illustrate how AdaBoost works over multiple rounds, increasing the weights of misclassified examples to improve classification accuracy. In conclusion, AdaBoost is highlighted as an effective approach for classification problems where misclassification has severe consequences by producing highly accurate strong classifiers.
3. Supervised learning is the machine learning task .
infer a function from labeled training data.
The training data consist of a set of training examples.
In supervised learning, each example is a pair
consisting of a input object and a desired output
value called a supervisory signal.
Optimal scenario ?
Target: generalize the learning algorithm from the
training data to unseen situation in reasonable way.
Introduction
4. Classification is a type of supervised learning.
Classification relies on a priori reference structures that
divide the space of all possible data points into a set of
classes that are usually, but not necessarily, non-
overlapping.
A very familiar example is the email spam-catching
system.
Classification
5. The main issue in the classification is miss
classification.
which leads to the under-fitting and over-fitting
problems.
Like in the case of spam filtering due to miss
classification the spam may be classified as not spam
which is not considerable sometime.
So the major issue here to improve the accuracy of
the classification.
Contd……
6. Combining classifiers makes the use of some weak
classifiers and combining such classifier gives a strong
classifier.
Combining Classifiers
8. Bagging (Bootstrap aggregating) operates using
bootstrap sampling.
Given a training data set D containing m examples,
bootstrap sampling draws a sample of training
examples, Di, by selecting m examples uniformly at
random with replacement from D. The replacement
means that examples may be repeated in Di.
Bagging
10. Training Phase
Initialize the parameters
D={Ф}
h=the number of classification
For k=1 to h
Take a bootstrap sample Sk from training set S
Build the classifier Dk using Sk as training set
D=DUDi
Return D
Classification Phase
Run D1,D2,………..Dk on the input k
The class with maximum number of vote is choosen as the label
for X.
Bagging Algorithm
11. Boosting has been a very successful technique for solving the
two-class classification problem.
It was first introduced by Freund & Schapire (1997), with their
AdaBoost algorithm .
Rather than just combining the isolated classifiers boosting use
the mechanism of increasing the weights of misclassified data in
preceding classifiers.
A weak learner is defined to be a classifier which is only slightly
correlated with the true classification.
In contrast, a strong learner is a classifier that is arbitrarily well-
correlated with the true classification.
Boosting
13. 1. Initialize the data weighting coefficients {Wn } by setting Wi =
1/n, for n=1,2……..,N
2. For m=1 to m
a. Fit a classifier y 𝑚(x) to the training data by minimizing the
weighted error function.
b. Evaluate the quantities
The term I(ym(xn)≠tn) is indication function has values 0/1, 0 if xn
is properly classified 1 if not so.
AdaBoost Algotithm
14. And use these to evaluate
c. Update the data weighting coefficients
3. Make predictions using the final model, which is given by
Contd….
15. Let us take following points training set having 10 points represented
by plus or minus.
Assumption is the original status is assign equal weight to all points.
Let us take following points training set having 10 points represented
by plus or minus.
Assumption is the original status is assign equal weight to all points.
i.e. W1
(1) =W1
(2 ) =…………….=W1
(10)=1/10.
Figure1. Training set consisting 10 samples
Example AdaBoost
16. Round 1: Three “plus” points are not correctly classified. They
are given higher weights.
Figure 2. First hypothesis h1 misclassified 3 plus.
Contd…..
17. And error term and learning rate for first hypothesis as:
𝜖1 =
0.1+0.1+0.1
1
= 0.30
𝛼1 =
1
2
ln 1 − 0.30
0.30
= 0.42
Now we calculate the weights of each data points for second hypothesis as:
Wn
(m+1)=?
1st, 2nd, 6th, 7th, 8th, 9th and 10th data points are classified properly so their
weight remains same.
i.e. W1
(2)=W2
(2)=W6
(2)=W7
(2)=W8==W9
(2)=W10
(2)= 0.1
but 3rd,4th and 5th data points are misclassified so higher weights are
provided to them as
W3
(2)=W4
(2)=W5
(2)=0.1*e0.42=0.15
Contd..
18. Round 2: Three “minuse” points are not correctly classified. They
are given higher weights.
Figure5. Second Hypothesis h2 misclassified 3 minus.
Contd……
19. 𝜀2 =
𝑜. 1 + 0.1 + 0.1
1.15
= 0.26
𝛼2 =
1
2
ln 1 − 0.26
0.26
= 0.52
Now calculating values Wn
(3) as
Here second hypothesis has misclassified 6th, 7th and 8th so they are
provided with higher weights as :
W6
(3)=W7
(3)= W8
(3)=0.1*e(0.52)=0.16
Whereas the data points 1,2,3,4,5,9,10 are properly classified so their
weights remains same as:
W1
(3)=W2
(3)=W9
(3)=W10
(3)= 0.1
W3
(3)=W4
(3)=W5
(3)=0.15
Cont….
20. Round 3:
Figure 5. Third hypothesis h3 misclassified 2 plus and 1 minus.
Contd…
23. Adaboost algorithm provides a strong classification
mechanism combining various weak classifiers resulting into
strong classifier which then is able to increase accuracy and
efficiency.
Final learner will have minimum error and maximum learning
rate resulting to the high degree of accuracy.
Hence, Adaboost algorithm can be used in such where
misclassification leads to dire consequences very successfully
at some extent.
Conclusions
24. [1]. Eric Bauer“An Empirical Comparison of Voting Classification Algorithms: Bagging,
Boosting, and Variants “, Computer Science Department, Stanford University Stanford CA,
94305, 1998.
[2]. K. Tumer and J. Ghosh, “Classifier Combining: Analytical Results and Implications,” Proc
Nat’l Conf. Artificial Intelligence , Portland,Ore.,1996.
[3]. Paul Viola and Michael Jones,” Fast and Robust Classification using Asymmetric AdaBoost
and a Detector Cascade”, Mistubishi Electric Research Lab Cambridge, MA.
[4]. P´adraig Cunningham, Matthieu Cord, and Sarah Jane Delany,” Machine learning
techniques for multiledia case studies on organization and retrival” Cord,M,
Cunningham,2008.
[5]. Trevor Hastie,” Multi-class AdaBoost” Department of Statistics Stanford University , CA
94305”,January 12, 2006.
[6]. Yanmin Sun, Mohamed S. Kamel and Yang Wang, “Boosting for Learning Multiple
Classes with Imbalanced Class Distribution”, The Sixth International Conference on Data
Mining (ICDM’06).
Refrences