Introduction

Introduction to Machine Learning Isabelle Guyon isabelle @ clopinet .com

What is Machine Learning? ,[object Object],[object Object],TRAINING DATA ? Answer Trained machine Query

What for? ,[object Object],[object Object],[object Object],[object Object]

Some Learning Machines ,[object Object],[object Object],[object Object],[object Object]

Applications inputs training examples 10 10 2 10 3 10 4 10 5 Bioinformatics Ecology OCR HWR Market Analysis Text Categorization Machine Vision System diagnosis 10 10 2 10 3 10 4 10 5

Banking / Telecom / Retail ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Biomedical / Biometrics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],6

Computer / Internet ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],7

Challenges inputs training examples 10 10 2 10 3 10 4 10 5 Arcene, Dorothea, Hiva Sylva Gisette Gina Ada Dexter, Nova Madelon 10 10 2 10 3 10 4 10 5 NIPS 2003 & WCCI 2006

Ten Classification Tasks 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 50 100 150 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 50 100 150 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 50 100 150 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 50 100 150 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 50 100 150 ADA GINA HIVA NOVA SYLVA 0 5 10 15 20 25 30 35 40 45 50 0 20 40 ARCENE 0 5 10 15 20 25 30 35 40 45 50 0 20 40 DEXTER 0 5 10 15 20 25 30 35 40 45 50 0 20 40 DOROTHEA 0 5 10 15 20 25 30 35 40 45 50 0 20 40 GISETTE 0 5 10 15 20 25 30 35 40 45 50 0 20 40 MADELON Test BER (%)

Challenge Winning Methods BER/<BER>

Conventions X={x ij } n m x i y ={y j }  w

Learning problem Colon cancer, Alon et al 1999 Unsupervised learning Is there structure in data? Supervised learning Predict an outcome y . Data matrix: X m lines = patterns (data points, examples): samples, patients, documents, images, … n columns = features: (attributes, input variables): genes, proteins, words, pixels, …

Linear Models ,[object Object],[object Object],[object Object],[object Object]

Artificial Neurons f( x ) = w  x + b Axon Synapses Activation of other neurons Dendrites Cell potential Activation function McCulloch and Pitts, 1943 x 1 x 2 x n 1  f( x ) w 1 w 2 w n b

Linear Decision Boundary x 1 x 2 x 3 hyperplane x 1 x 2

Perceptron Rosenblatt, 1957 f( x ) f( x ) = w   ( x) + b  1 ( x ) 1 x 1 x 2 x n  2 ( x )  N ( x )  w 1 w 2 w N b

NL Decision Boundary x 1 x 2 x 1 x 2 x 3

Kernel Method Potential functions, Aizerman et al 1964 f( x ) =  i  i k ( x i , x ) + b k( x 1 ,x ) 1 x 1 x 2 x n   1  2  m b k( x 2 ,x ) k( x m ,x ) k(. ,. ) is a similarity measure or “kernel”.

Hebb’s Rule ,[object Object],Axon Link to “Naïve Bayes”  y x j w j Synapse Activation of another neuron Dendrite

Kernel “Trick” (for Hebb’s rule) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Kernel “Trick” (general) ,[object Object],[object Object],[object Object],[object Object],Dual forms

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],What is a Kernel?

Multi-Layer Perceptron Back-propagation, Rumelhart et al, 1986  x j   “hidden units” internal “latent” variables

Tree Classifiers ,[object Object],At each step, choose the feature that “reduces entropy” most. Work towards “node purity”. All the data f 1 f 2 Choose f 2 Choose f 1

Iris Data (Fisher, 1936) Linear discriminant Tree classifier Gaussian mixture Kernel method (SVM) setosa virginica versicolor Figure from Norbert Jankowski and Krzysztof Grabczewski

Fit / Robustness Tradeoff x 1 x 2 15 x 1 x 2

Performance evaluation x 1 x 2 f( x ) = 0 f( x ) > 0 f( x ) < 0 f( x ) = 0 f( x ) > 0 f( x ) < 0 x 1 x 2

Performance evaluation x 1 x 2 f( x ) = -1 f( x ) > -1 f( x ) < -1 f( x ) = -1 f( x ) > -1 f( x ) < -1 x 1 x 2

Performance evaluation x 1 x 2 f( x ) = 1 f( x ) > 1 f( x ) < 1 f( x ) = 1 f( x ) > 1 f( x ) < 1 x 1 x 2

ROC Curve 100% 100% For a given threshold on f(x), you get a point on the ROC curve. Actual ROC 0 Positive class success rate (hit rate, sensitivity) 1 - negative class success rate (false alarm rate, 1-specificity) Random ROC Ideal ROC curve

ROC Curve Ideal ROC curve (AUC=1) 100% 100% 0  AUC  1 Actual ROC Random ROC (AUC=0.5) 0 For a given threshold on f(x), you get a point on the ROC curve. Positive class success rate (hit rate, sensitivity) 1 - negative class success rate (false alarm rate, 1-specificity)

Lift Curve O M Fraction of customers selected Hit rate = Frac. good customers select . Random lift Ideal Lift 100% 100% Customers ranked according to f(x); selection of the top ranking customers. Gini=2 AUC-1 0  Gini  1 Actual Lift 0

Performance Assessment False alarm rate = type I errate = 1-specificity Hit rate = 1-type II errate = sensitivity = recall = test power ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Predictions: F(x) Class -1 Class +1 Truth: y Class -1 tn fp Class +1 fn tp Cost matrix Predictions: F(x) Class -1 Class +1 Truth: y Class -1 tn fp Class +1 fn tp neg=tn+fp Total pos=fn+tp sel =fp+tp rej=tn+fn Total m=tn+fp +fn+tp False alarm = fp/neg Class +1 / Total Hit rate = tp/pos Frac. selected = sel/m Cost matrix Class+1 /Total Precision = tp/sel Predictions: F(x) Class -1 Class +1 Truth: y Class -1 tn fp Class +1 fn tp neg=tn+fp Total pos=fn+tp sel =fp+tp rej=tn+fn Total m=tn+fp +fn+tp False alarm = fp/neg Class +1 / Total Hit rate = tp/pos Frac. selected = sel/m Cost matrix Predictions: F(x) Class -1 Class +1 Truth: y Class -1 tn fp Class +1 fn tp neg=tn+fp Total pos=fn+tp sel =fp+tp rej=tn+fn Total m=tn+fp +fn+tp Cost matrix

What is a Risk Functional? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

How to train? ,[object Object],[object Object],(… to be continued in the next lecture) Parameter space ( w ) R[f( x , w )] w *

How to Train? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Want to Learn More? ,[object Object],[object Object],[object Object],[object Object]

Introduction

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Introduction

Ähnlich wie Introduction (20)

Mehr von butest

Mehr von butest (20)

Introduction

Hinweis der Redaktion