13. Learning problem Colon cancer, Alon et al 1999 Unsupervised learning Is there structure in data? Supervised learning Predict an outcome y . Data matrix: X m lines = patterns (data points, examples): samples, patients, documents, images, … n columns = features: (attributes, input variables): genes, proteins, words, pixels, …
14.
15. Artificial Neurons f( x ) = w x + b Axon Synapses Activation of other neurons Dendrites Cell potential Activation function McCulloch and Pitts, 1943 x 1 x 2 x n 1 f( x ) w 1 w 2 w n b
19. Kernel Method Potential functions, Aizerman et al 1964 f( x ) = i i k ( x i , x ) + b k( x 1 ,x ) 1 x 1 x 2 x n 1 2 m b k( x 2 ,x ) k( x m ,x ) k(. ,. ) is a similarity measure or “kernel”.
27. Iris Data (Fisher, 1936) Linear discriminant Tree classifier Gaussian mixture Kernel method (SVM) setosa virginica versicolor Figure from Norbert Jankowski and Krzysztof Grabczewski
29. Performance evaluation x 1 x 2 f( x ) = 0 f( x ) > 0 f( x ) < 0 f( x ) = 0 f( x ) > 0 f( x ) < 0 x 1 x 2
30. Performance evaluation x 1 x 2 f( x ) = -1 f( x ) > -1 f( x ) < -1 f( x ) = -1 f( x ) > -1 f( x ) < -1 x 1 x 2
31. Performance evaluation x 1 x 2 f( x ) = 1 f( x ) > 1 f( x ) < 1 f( x ) = 1 f( x ) > 1 f( x ) < 1 x 1 x 2
32. ROC Curve 100% 100% For a given threshold on f(x), you get a point on the ROC curve. Actual ROC 0 Positive class success rate (hit rate, sensitivity) 1 - negative class success rate (false alarm rate, 1-specificity) Random ROC Ideal ROC curve
33. ROC Curve Ideal ROC curve (AUC=1) 100% 100% 0 AUC 1 Actual ROC Random ROC (AUC=0.5) 0 For a given threshold on f(x), you get a point on the ROC curve. Positive class success rate (hit rate, sensitivity) 1 - negative class success rate (false alarm rate, 1-specificity)
34. Lift Curve O M Fraction of customers selected Hit rate = Frac. good customers select . Random lift Ideal Lift 100% 100% Customers ranked according to f(x); selection of the top ranking customers. Gini=2 AUC-1 0 Gini 1 Actual Lift 0
35.
36.
37.
38.
39.
40.
Hinweis der Redaktion
Pb : the background image shows in webex. I’ve reduced the image : hope it helps Not mention KXEN
Include here recommandation systems
Best performance for each type of method, normalized by the average of these perf
Which one is best : linear or non-linear The decision comes when we see new data Very often the simplest model is better This principle is implemented in Learning Theory
Which one is best : linear or non-linear The decision comes when we see new data Very often the simplest model is better This principle is implemented in Learning Theory
Which one is best : linear or non-linear The decision comes when we see new data Very often the simplest model is better This principle is implemented in Learning Theory
Which one is best : linear or non-linear The decision comes when we see new data Very often the simplest model is better This principle is implemented in Learning Theory
Explain that this is a global estimator
Explain that this is a global estimator
Explain that this is a global estimator Proof of Gini = 2 AUC –1 L = lift Hitrate = tp/pos Farate = fp/neg Selected = sel/tot = (tp+fp)/tot = pos/tot.tp/pos + neg/tot.fp/neg = pos/tot Hitrate + neg/tot Farate AUC = sum Hitrate d(Farate) L = sum Hitrate d(Selected) = sum Hitrate d(pos/tot Hitrate + neg/tot Farate) = pos/tot sum Hitrate d Hitrate + neg/tot sum Hitrate d Farate = ½ pos/tot + neg/tot AUC 2L-1 = -(1-pos/tot) + 2(1-pos/tot) AUC = (1-pos/tot) (2AUC-1) Gini = (L-1/2)/(1-pos/tot)/2 = (2L-1)/(1-pos/tot) = 2AUC-1