1. All images from wikimedia commons, a freely-licensed media repository
2. “ Classifiers” R & D project by Aditya M Joshi [email_address] IIT Bombay Under the guidance of Prof. Pushpak Bhattacharyya [email_address] IIT Bombay
5. What is classification? A machine learning task that deals with identifying the class to which an instance belongs A classifier performs classification Classifier Test instance Attributes (a1, a2,… an) Discrete-valued Class label ( Age, Marital status, Health status, Salary ) Issue Loan? {Yes, No} ( Perceptive inputs ) Steer? { Left, Straight, Right } Category of document? {Politics, Movies, Biology} ( Textual features : Ngrams )
6. Classification learning Training phase Testing phase Learning the classifier from the available data ‘Training set’ (Labeled) Testing how well the classifier performs ‘Testing set’
10. Diagram from Han-Kamber Example tree Intermediate nodes : Attributes Leaf nodes : Class predictions Edges : Attribute value tests Example algorithms: ID3, C4.5, SPRINT, CART
11. Decision Tree schematic Training data set a1 a2 a3 a4 a5 a6 a1 a2 a3 a4 a5 a6 X Y Z Pure node, Leaf node: Class RED Impure node, Select best attribute and continue Impure node, Select best attribute and continue
20. Decision List learning R S’ = S Set of candidate feature functions For each hi, Qi = Pi U Ni ( hi = 1 ) U i = max { | Pi| - pn * | Ni | , |Ni| - pp *|Pi| } Select hk, the feature with highest utility ( h k, ) If (| Pi| - pn * | Ni | > |Ni| - pp *|Pi| ) then 1 else 0 1 / 0 - Qk
41. SVM Issues SVMs are immune to the removal of non-support-vector points What if n-classes are to be predicted? Problem : SVMs deal with two-class classification Solution : Have multiple SVMs each for one class
44. Bagging Total set Sample D 1 Classifier model M 1 At random. May use bootstrap sampling with replacement Training dataset D Classifier learning scheme Classifier model M n Test set Majority vote Class Label
45. Boosting (AdaBoost) Total set Sample D 1 Classifier model M 1 Selection based on weight. May use bootstrap sampling with replacement Training dataset D Classifier learning scheme Classifier model M n Test set Weighted vote Class Label Initialize weights of instances to 1/d Weights of correctly classified instances multiplied by error / (1 – error) If error > 0.5? Error Error `
54. Parts of weka Explorer Basic interface to run ML Algorithms Experimenter Comparing experiments on different algorithms Knowledge Flow Similar to Work Flow ‘ Customized’ to one’s needs