2. Decision Trees Dealing with numeric attributes Standard method: binary splits Steps to decide where to split: Evaluate info gain for every possible split point of attribute Choose “best” split point But this is computationally intensive
3. Decision Trees Example Split on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes YesYes No No Yes YesYes No Yes Yes No temperature < 71.5: yes/4, no/2 temperature > 71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits
4. Decision Trees Dealing with missing values: Split instances with missing values into pieces A piece going down a branch receives a weight proportional to the popularity of the branch weights sum to 1
5. Decision Trees Pruning Making the decision tree less complex by removing cases of over fitting We have two types of pruning: Prepruning: Trying to decide during tree building Postpruning: Doing pruning after the tree has been constructed The two types of postpruning thatare generally used are: Subtree replacement Subtree raising To decide whether to do postpruning or not, we calculate the error rate before and after the pruning
8. Classification rules Criteria for choosing tests: p/t ratio Maximizes the ratio of positive instances with stress on accuracy p[log(p/t) – log(p/t)] Maximizes the number of positive instances with lesser accuracy
9. Classification rules Generating good rules: We can remove over fitting by either pruning of trees during construction or after they have been fully constructed To prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it
12. Classification rules As the node 4 was not replaced, we stop at this stage Now each leaf node gives us a possible rule Choose the leaf which covers the greatest number of instances
13. Extending linear models Support vector machines: Support vector machines are algorithms for learning linear classifier They use maximum marginal hyper plane: removes over fitting The instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored
15. Extending linear models Support vector machines: The hyper plane can be written as: Support vector: All instances for which alpha(i) > 0 b and alpha are determined using software packages The hyper plane can also be written using kernel as:
16. Extending linear models Multilayer perceptron: We can create a network of perceptron to approximate arbitrary target concepts Multilayer perceptron is an example of an artificial neural network Consists of: input layer, hidden layer(s), and output layer Structure of MLP is usually found by experimentation Parameters can be found using backpropagation
18. Extending linear models Back propagation: f(x) = 1/(1+exp(-x)) Error = ½(y-f(x))^2 So we try to minimize the error and get: Now just calculate the above expression for all training instances and do: w(i) = w(i) – L(dE/dw) We assume values of w in the starting
19. Clustering Incremental clustering: Steps Tree consists of empty root node Add instances one by one Update tree at appropriately at each stage To update, find the right leaf for an instance May involve restructuring the tree Restructuring: Merging and Replacement Decisions are made using category utility
21. EM Algorithm EM = ExpectationMaximization Generalize kmeans to probabilistic setting Iterative procedure: E “expectation” step: Calculate cluster probability for each instance M “maximization” step: Estimate distribution parameters from cluster probabilities Store cluster probabilities as instance weights Stop when improvement is negligible
22. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net