4. Multi class problems are imbalance when we
compare one against all.
In some cases the data set is very small, to
generalize well.
Text classification is an example of imbalanced
data.
It can be use with tree-kernels.
5. Effect of SMOTE and DEC – (SDC)
After DEC alone After SMOTE
and DEC
11. Between-class imbalanced.
(where we focused on)
Within-class imbalanced.
It is important in text classification.
We focused on the minority class, we want a
high prediction for the minority class..
Two class problem = multiclass problem .
12. NOT VERY GOOD
IN UNBALANCED
DATA
Popular evaluation for
imbalance problem.
Usually B=1, and =1
in this paper
14. Data level: Change the distribution
◦ make the data balanced
Modify the existing data mining algorithms
◦ Make new algorithms
15. Random oversampling: duplicate
Random under sampling: (can remove
important data)
Remove noise
SMOTE
Combine under sampling and over sampling.
Find the hard examples and over sample
them.
16. Adaboost (increase weights of misclassified),
it does not perform well on imbalances ds.
Improve updated weights of TP & FP, better
than weights of prediction based on TP & FP.
Use a kernel of SVM
Use a BMPM
Biased Mini max Probability Machine.
There are other cost-based learning…
28. Nothing: base line.
SMOTE
Random over-sampling
Borderline-SMOTE1
Borderline-SMOTE2
K=5
10 Fold cross validation.
C4.5 classified
We only want to improve the prediction of the
minority class
34. Is a common problem to work with
imbalanced data sets.
Borderline examples are more easy to
misclassified.
Our methods are better than traditional
SMOTE.
Open to research:
◦ how to define DANGER examples.
◦ Determination of number of examples in DANGER.
◦ Combine to data mining algorithms.
35.
36. You are free:
•to copy, distribute, display, and perform the work
•to make derivative works
Under the following conditions:
•Attribution. You must give the original author credit.
What does quot;Attribute this workquot; mean?
The page you came from contained embedded licensing metadata, including how the
creator wishes to be attributed for re-use. You can use the HTML here to cite the work.
Doing so will also include metadata on your page so that others can find the original work
as well.
•Non-Commercial. You may not use this work for commercial purposes.
•For any reuse or distribution, you must make clear to others the licence terms of this
work.
•Any of these conditions can be waived if you get permission from the copyright holder.
•Nothing in this license impairs or restricts the author's moral rights.