SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
1
Support Vector Machines:
Support Vector Machines:
Maximum Margin Classifiers
Maximum Margin Classifiers
Machine Learning and Pattern Recognition:
September 16, 2008
Piotr Mirowski
Based on slides by Sumit Chopra and Fu-Jie Huang
2
Outline
Outline
What is behind Support Vector Machines?
Constrained Optimization
Lagrange Duality
Support Vector Machines in Detail
Kernel Trick
LibSVM demo
3
Binary Classification Problem
Binary Classification Problem
Given: Training data generated according to the
distribution D
Problem: Find a classifier (a function)
such that it generalizes well on the test set obtained from
the same distribution D
Solution:
Linear Approach: linear classifiers (e.g. Perceptron)
Non Linear Approach: non-linear classifiers
(e.g. Neural Networks, SVM)
x1, y1,,xp , yp∈ℜ
n
×{−1,1}
hx:ℜ
n
{−1,1}
input label input
space
label
space
4
Linearly Separable Data
Linearly Separable Data
Assume that the training data is linearly separable
x1
x2
5
Linearly Separable Data
Linearly Separable Data
Assume that the training data is linearly separable
Then the classifier is:
Inference:
hx = 
w.
xb where w∈ℜ
n
,b∈ℜ
signhx ∈ {−1,1}

w .
xb=0

w
abscissa on axis parallel to
abscissa of origin 0 is b

w
x1
x2
6
Linearly Separable Data
Linearly Separable Data
Assume that the training data is linearly separable

w
Margin
Maximize margin ρ (or 2ρ) so that:
For the closest points: hx = 
w.
xb ∈ {−1,1}
 =
1
∥
w∥

w .
xb=1

w .
xb=−1
=
1
∥
w∥
2=
2
∥
w∥

w .
xb=0 
w .
xb=0
x1
x2
7
input
Optimization Problem
Optimization Problem
A Constrained Optimization Problem
min
w
1
2
∥
w∥
2
s.t.:
yi 
w . 
xib  1, i=1,,m
Equivalent to maximizing the margin
A convex optimization problem:
Objective is convex
Constraints are affine hence convex
 =
1
∥
w∥
label
8
Optimization Problem
Optimization Problem
min
w
1
2
∥
w∥
2
s.t.:
yi 
w . 
xib  1, i=1,,m
constraints
Compare:
With:
min
w ∑
i=1
p
−yi
w. 
xib

2
∥
w∥2

objective
regularization
energy/errors
9
Optimization: Some Theory
Optimization: Some Theory
The problem:
min
x
f 0 x
s.t.:
f i x0, i=1,,m
hix=0 , i=1,, p
objective function
inequality constraints
equality constraints
Solution of problem:
Global (unique) optimum – if the problem is convex
Local optimum – if the problem is not convex
x
o
10
Optimization: Some Theory
Optimization: Some Theory
Example: Standard Linear Program (LP)
min
x
cT
x
s.t.:
Ax=b
x0
Example: Least Squares Solution of Linear Equations
(with L2
norm regularization of the solution x)
i.e. Ridge Regression
min
x
xT
x
s.t.:
Ax=b
11
Big Picture
Big Picture
Constrained / unconstrained optimization
Hierarchy of objective function:
smooth = infinitely derivable
convex = has a global optimum
convex non-convex
smooth non-smooth smooth non-smooth
f 0
SVM NN
12
Toy Example:
Toy Example:
Equality Constraint
Equality Constraint
Example 1: min x1x2 ≡ f
s.t.: x1
2
x2
2
−2=0 ≡h1
x1
x2
∇ f
∇ f
∇ f
∇ h1
∇ h1 ∇ h1
−1,−1
At Optimal Solution: ∇ f xo
=1
o
∇ h1xo

∇ f =

∂ f
∂ x1
∂ f
∂ x2

∇ h1=

∂h1
∂ x1
∂h1
∂ x2

13
is not an optimal solution, if there exists
such that
h1xs = 0
f xs  f x
x s≠0
Using first order Taylor's expansion
h1xs = h1 x∇ h1x
T
s = ∇ h1x
T
s = 0 1
f xs− f x = ∇ f x
T
s  0 2
Such an can exist only
when and
are not parallel
s
∇ h1x ∇ f x
∇ f x
∇ h1x
Toy Example:
Toy Example:
Equality Constraint
Equality Constraint
s
14
Thus we have
∇ f x
o
=1
o
∇ h1x
o

The Lagrangian
Lx ,1= f x−1 h1x
This is just a necessary (not a sufficient) condition”
x solution implies
Thus at the solution
∇x Lx
o
,1
o
=∇ f x
o
−1
o
∇ h1x
o
 = 0
Lagrange multiplier or
dual variable for h1
Toy Example:
Toy Example:
Equality Constraint
Equality Constraint
∇ h1x ∥ ∇ f x
15
Toy Example:
Toy Example:
Inequality Constraint
Inequality Constraint
Example 2: min x1x2 ≡ f
s.t.: 2−x1
2
−x2
2
 0 ≡c1
x1
x2
∇ f
∇ f
∇ f
∇ c1
∇ c1
∇ c1
−1,−1
∇ f =

∂ f
∂ x1
∂ f
∂ x2

∇ c1=

∂c1
∂ x1
∂c1
∂ x2

16
Toy Example:
Toy Example:
Inequality Constraint
Inequality Constraint
x1
x2
∇ c1
−1,−1
∇ f
∇ f
s
s
is not an optimal solution, if there exists
such that
x
c1xs  0
f xs  f x
Using first order Taylor's expansion
c1xs = c1x∇ c1 xT
s  0 1
f xs− f x = ∇ f x
T
s  0 2
s≠0
17
Toy Example:
Toy Example:
Inequality Constraint
Inequality Constraint
Case 1: Inactive constraint
Any sufficiently small s
as long as
Thus
c1x  0
∇ c1x
T
s  0 1
∇ f xT
s  0 2
∇ f 1x ≠ 0
s = − ∇ f x where 0
Case 2: Active constraint c1x = 0
∇ f x = 1 ∇ c1x, where 10
x1
∇ c1
−1,−1
∇ f
∇ f
s
s
∇ f x
∇ c1x
Case 2: Active constraint
s
18
Thus we have the Lagrangian (as before)
The optimality conditions
Lx ,1= f x−1 c1x
and
∇x L xo
,1
o
=∇ f xo
−1
o
∇ c1 xo
 = 0 for some 10
Lagrange multiplier or
dual variable for c1
Toy Example:
Toy Example:
Inequality Constraint
Inequality Constraint
1
o
c1x
o
 = 0
Complementarity
condition
either c1x
o
 = 0 or 1
o
= 0
(active) (inactive)
19
Same Concepts in a More
Same Concepts in a More
General Setting
General Setting
20
Lagrange Function
Lagrange Function
The Problem
min
x
f 0x
s.t.:
f i x0, i=1,,m
hi x=0 , i=1,, p
Lx , ,= f 0x∑
i=1
m
i f i x∑
i=1
p
i hi x
Standard tool for constrained optimization:
the Lagrange Function
dual variables or Lagrange multipliers
m inequality constraints
p equality constraints
objective function
21
Lagrange Dual Function
Lagrange Dual Function
Defined, for as the minimum value
of the Lagrange function over x
m inequality constraints
p equality constraints
g,=inf
x∈D
Lx , ,=inf
x∈Df 0x∑
i=1
m
1 f i x∑
i=1
p
i hix

g :ℜ
m
×ℜ
p
ℜ
,
22
unsatisfied
Lagrange Dual Function
Lagrange Dual Function
Interpretation of Lagrange dual function:
Writing the original problem as unconstrained problem
but with hard indicators (penalties)
minimize
x f 0x∑
i=1
m
I0 f i x∑
i=1
p
I1hi x

I0u={0 u0
∞ u0} I1u={0 u=0
∞ u≠0}
where
indicator functions
unsatisfied
satisfied
satisfied
23
Lagrange Dual Function
Lagrange Dual Function
Interpretation of Lagrange dual function:
The Lagrange multipliers in Lagrange dual function can
be seen as “softer” version of indicator (penalty)
functions.
minimize
x f 0x∑
i=1
m
I0 f ix∑
i=1
p
I1hi x

inf
x∈D f 0x∑
i=1
m
i f i x∑
i=1
p
i hi x

24
Lagrange Dual Function
Lagrange Dual Function
Lagrange dual function gives a lower bound on optimal
value of the problem:
g ,po
Proof: Let be a feasible optimal point and let .
Then we have:

x 0
f i  
x  0 i=1,, m
hi  
x = 0 i=1,, p
25
Lagrange Dual Function
Lagrange Dual Function
Lagrange dual function gives a lower bound on optimal
value of the problem:
g , p
o
Proof: Let be a feasible optimal point and let .
Then we have:

x 0
Thus
L 
x , , = f 0 
x∑
i=1
m
i f i  
x∑
i=1
p
i hi  
x  f 0 
x
f i 
x  0 i=1,,m
hi  
x = 0 i=1,, p
26
Lagrange Dual Function
Lagrange Dual Function
Lagrange dual function gives a lower bound on optimal
value of the problem:
g , p
o
Proof: Let be a feasible optimal point and let .
Then we have:

x 0
Thus
L 
x , , = f 0 
x∑
i=1
m
i f i  
x∑
i=1
p
i hi  
x  f 0 
x
.  0
f i 
x  0 i=1,,m
hi  
x = 0 i=1,, p
27
Lagrange Dual Function
Lagrange Dual Function
Lagrange dual function gives a lower bound on optimal
value of the problem:
g, = inf
x∈D
Lx , ,  L 
x , ,  f 0 
x
g, p
o
Proof: Let be a feasible optimal point and let .
Then we have:

x 0
Thus
L
x , , = f 0
x∑
i=1
m
i f i 
x∑
i=1
p
i hi 
x  f 0 
x
Hence
.  0
f i 
x  0 i=1,,m
hi 
x = 0 i=1,, p
28
Sufficient Condition
Sufficient Condition
If is a saddle point, i.e. if
x
o
,
o
,
o

∀ x∈ℜ
n
, ∀0, Lx
o
,,Lx
o
,
o
,
o
Lx ,
o
,
o

... then is a solution of p
o
x
o
,
o
,
o

x
o
x

o
,
o

 ,
Lx , ,
29
Lagrange Dual Problem
Lagrange Dual Problem
Lagrange dual function gives a lower bound
on the optimal value of the problem.
We seek the “best” lower bound to minimize the objective:
maximize g ,
s.t.: 0
The dual optimal value and solution:
The Lagrange dual problem is convex
even if the original problem is not.
d
o
= g
o
,
o

30
Primal / Dual Problems
Primal / Dual Problems
Primal problem:
min
x∈D
f 0x
s.t.:
f i x0, i=1,,m
hi x=0 , i=1,, p
Dual problem:
max
 ,
g,
s.t.: 0
po
d
o
g,=inf
x∈Df 0x∑
i=1
m
1 f ix∑
i=1
p
i hi x

31
Weak Duality
Weak Duality
Weak duality theorem:
d
o
 p
o
Optimal duality gap:
p
o
− d
o
 0
This bound is sometimes used to get an estimate on the
optimal value of the original problem that is difficult to
solve.
32
Strong Duality
Strong Duality
Slater's Condition: If and it is strictly feasible:
x∈D
Strong Duality theorem:
if Slater's condition holds and the problem is convex,
then strong duality is attained:
f i x  0 for i=1,m
hix = 0 for i=1, p
d
o
= p
o
Strong Duality:
Strong duality does not hold in general.
∃
o
,
o
 with d
o
=g
o
,
o
=max
 ,
g ,=inf
x
Lx ,
o
,
o
= p
o
33
Optimality Conditions:
Optimality Conditions:
First Order
First Order
Complementary slackness:
if strong duality holds, then at optimality
i
o
f i x
o
 = 0 i=1,m
Proof:
f 0x
o
 = g
o
,
o

= inf
x f 0x∑
i=1
m
i
o
f i x∑
i=1
p
i
o
hi x

 f 0xo
∑
i=1
m
i
o
f i xo
∑
i=1
p
i
o
hi xo

 f 0x
o

x
o
,
o
,
o

∀i , f ix0,i≥0
∀i ,hi x = 0
(strong duality)
34
Optimality Conditions:
Optimality Conditions:
First Order
First Order
Karush-Kuhn-Tucker (KKT) conditions
If the strong duality holds, then at optimality:
f i x
o
  0, i=1,,m
hi xo
 = 0, i=1,, p
i
o
 0, i=1,,m
i
o
f i x
o
 = 0, i=1,, m
∇ f 0 xo
∑
i=1
m
i
o
∇ f i xo
∑
i=1
p
i
o
∇ hi xo
 = 0
KKT conditions are
➔
necessary in general (local optimum)
➔
necessary and sufficient in case of convex problems
(global optimum)
35
What are Support Vector
What are Support Vector
Machines?
Machines?
Linear classifiers
(Mostly) binary classifiers
Supervised training
Good generalization with
explicit bounds
36
Main Ideas Behind
Main Ideas Behind
Support Vector Machines
Support Vector Machines
Maximal margin
Dual space
Linear classifiers
in high-dimensional space
using non-linear mapping
Kernel trick
37
Quadratic Programming
Quadratic Programming
38
Using the Lagrangian
Using the Lagrangian
39
Dual Space
Dual Space
40
Strong Duality
Strong Duality
41
Duality Gap
Duality Gap
42
No Duality Gap Thanks to
No Duality Gap Thanks to
Convexity
Convexity
43
Dual Form
Dual Form
Illustration from Prof. Mohri's lecture notes 44
Non-linear separation of datasets
Non-linear separation of datasets
Non-linear separation is impossible in
most problems
45
Non-separable datasets
Non-separable datasets
Solutions:
1) Nonlinear classifiers 2) Increase dimensionality of dataset
and add a non-linear mapping Ф
46
Kernel Trick
Kernel Trick
“similarity measure”
between 2 data samples
47
Kernel Trick Illustrated
Kernel Trick Illustrated
48
Curse of Dimensionality Due to
Curse of Dimensionality Due to
the Non-Linear Mapping
the Non-Linear Mapping
49
Positive Symmetric Definite Kernels
Positive Symmetric Definite Kernels
(Mercer Condition)
(Mercer Condition)
50
Advantages of SVM
Advantages of SVM
Work very well...
Error bounds easy to obtain:
Generalization error small and predictable
Fool-proof method:
(Mostly) three kernels to choose from:
Gaussian
Linear and Polynomial
Sigmoid
Very small number of parameters to optimize
51
Limitations of SVM
Limitations of SVM
Size limitation:
Size of kernel matrix is quadratic with the
number of training vectors
Speed limitations:
1) During training:
very large quadratic programming problem
solved numerically
Solutions:
Chunking
Sequential Minimal Optimization (SMO)
breaks QP problem into many small QP
problems solved analytically
Hardware implementations
2) During testing:
number of support vectors
Solution: Online SVM

Weitere ähnliche Inhalte

Ähnlich wie lecture on support vector machine in the field of ML

Project in Calcu
Project in CalcuProject in Calcu
Project in Calcu
patrickpaz
 

Ähnlich wie lecture on support vector machine in the field of ML (20)

Paper finance hosseinkhan_remy
Paper finance hosseinkhan_remyPaper finance hosseinkhan_remy
Paper finance hosseinkhan_remy
 
Project in Calcu
Project in CalcuProject in Calcu
Project in Calcu
 
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
 
QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...
QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...
QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...
 
A043001006
A043001006A043001006
A043001006
 
A043001006
A043001006A043001006
A043001006
 
A043001006
A043001006A043001006
A043001006
 
LAGRANGE_MULTIPLIER.ppt
LAGRANGE_MULTIPLIER.pptLAGRANGE_MULTIPLIER.ppt
LAGRANGE_MULTIPLIER.ppt
 
DIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptxDIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptx
 
Constrained Maximization
Constrained MaximizationConstrained Maximization
Constrained Maximization
 
SCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptx
SCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptxSCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptx
SCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptx
 
Limit and continuity
Limit and continuityLimit and continuity
Limit and continuity
 
Module ii sp
Module ii spModule ii sp
Module ii sp
 
A Level Set Method For Multiobjective Combinatorial Optimization Application...
A Level Set Method For Multiobjective Combinatorial Optimization  Application...A Level Set Method For Multiobjective Combinatorial Optimization  Application...
A Level Set Method For Multiobjective Combinatorial Optimization Application...
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the Dual
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Maxima & Minima
Maxima & MinimaMaxima & Minima
Maxima & Minima
 
Maxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun UmraoMaxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun Umrao
 
R lecture co4_math 21-1
R lecture co4_math 21-1R lecture co4_math 21-1
R lecture co4_math 21-1
 
course slides of Support-Vector-Machine.pdf
course slides of Support-Vector-Machine.pdfcourse slides of Support-Vector-Machine.pdf
course slides of Support-Vector-Machine.pdf
 

Kürzlich hochgeladen

MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
Krashi Coaching
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
MinawBelay
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
中 央社
 

Kürzlich hochgeladen (20)

PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
 
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
 
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
 
IPL Online Quiz by Pragya; Question Set.
IPL Online Quiz by Pragya; Question Set.IPL Online Quiz by Pragya; Question Set.
IPL Online Quiz by Pragya; Question Set.
 
How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 Inventory
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptx
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptxREPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
 

lecture on support vector machine in the field of ML

  • 1. 1 Support Vector Machines: Support Vector Machines: Maximum Margin Classifiers Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang
  • 2. 2 Outline Outline What is behind Support Vector Machines? Constrained Optimization Lagrange Duality Support Vector Machines in Detail Kernel Trick LibSVM demo
  • 3. 3 Binary Classification Problem Binary Classification Problem Given: Training data generated according to the distribution D Problem: Find a classifier (a function) such that it generalizes well on the test set obtained from the same distribution D Solution: Linear Approach: linear classifiers (e.g. Perceptron) Non Linear Approach: non-linear classifiers (e.g. Neural Networks, SVM) x1, y1,,xp , yp∈ℜ n ×{−1,1} hx:ℜ n {−1,1} input label input space label space
  • 4. 4 Linearly Separable Data Linearly Separable Data Assume that the training data is linearly separable x1 x2
  • 5. 5 Linearly Separable Data Linearly Separable Data Assume that the training data is linearly separable Then the classifier is: Inference: hx =  w. xb where w∈ℜ n ,b∈ℜ signhx ∈ {−1,1}  w . xb=0  w abscissa on axis parallel to abscissa of origin 0 is b  w x1 x2
  • 6. 6 Linearly Separable Data Linearly Separable Data Assume that the training data is linearly separable  w Margin Maximize margin ρ (or 2ρ) so that: For the closest points: hx =  w. xb ∈ {−1,1}  = 1 ∥ w∥  w . xb=1  w . xb=−1 = 1 ∥ w∥ 2= 2 ∥ w∥  w . xb=0  w . xb=0 x1 x2
  • 7. 7 input Optimization Problem Optimization Problem A Constrained Optimization Problem min w 1 2 ∥ w∥ 2 s.t.: yi  w .  xib  1, i=1,,m Equivalent to maximizing the margin A convex optimization problem: Objective is convex Constraints are affine hence convex  = 1 ∥ w∥ label
  • 8. 8 Optimization Problem Optimization Problem min w 1 2 ∥ w∥ 2 s.t.: yi  w .  xib  1, i=1,,m constraints Compare: With: min w ∑ i=1 p −yi w.  xib  2 ∥ w∥2  objective regularization energy/errors
  • 9. 9 Optimization: Some Theory Optimization: Some Theory The problem: min x f 0 x s.t.: f i x0, i=1,,m hix=0 , i=1,, p objective function inequality constraints equality constraints Solution of problem: Global (unique) optimum – if the problem is convex Local optimum – if the problem is not convex x o
  • 10. 10 Optimization: Some Theory Optimization: Some Theory Example: Standard Linear Program (LP) min x cT x s.t.: Ax=b x0 Example: Least Squares Solution of Linear Equations (with L2 norm regularization of the solution x) i.e. Ridge Regression min x xT x s.t.: Ax=b
  • 11. 11 Big Picture Big Picture Constrained / unconstrained optimization Hierarchy of objective function: smooth = infinitely derivable convex = has a global optimum convex non-convex smooth non-smooth smooth non-smooth f 0 SVM NN
  • 12. 12 Toy Example: Toy Example: Equality Constraint Equality Constraint Example 1: min x1x2 ≡ f s.t.: x1 2 x2 2 −2=0 ≡h1 x1 x2 ∇ f ∇ f ∇ f ∇ h1 ∇ h1 ∇ h1 −1,−1 At Optimal Solution: ∇ f xo =1 o ∇ h1xo  ∇ f =  ∂ f ∂ x1 ∂ f ∂ x2  ∇ h1=  ∂h1 ∂ x1 ∂h1 ∂ x2 
  • 13. 13 is not an optimal solution, if there exists such that h1xs = 0 f xs  f x x s≠0 Using first order Taylor's expansion h1xs = h1 x∇ h1x T s = ∇ h1x T s = 0 1 f xs− f x = ∇ f x T s  0 2 Such an can exist only when and are not parallel s ∇ h1x ∇ f x ∇ f x ∇ h1x Toy Example: Toy Example: Equality Constraint Equality Constraint s
  • 14. 14 Thus we have ∇ f x o =1 o ∇ h1x o  The Lagrangian Lx ,1= f x−1 h1x This is just a necessary (not a sufficient) condition” x solution implies Thus at the solution ∇x Lx o ,1 o =∇ f x o −1 o ∇ h1x o  = 0 Lagrange multiplier or dual variable for h1 Toy Example: Toy Example: Equality Constraint Equality Constraint ∇ h1x ∥ ∇ f x
  • 15. 15 Toy Example: Toy Example: Inequality Constraint Inequality Constraint Example 2: min x1x2 ≡ f s.t.: 2−x1 2 −x2 2  0 ≡c1 x1 x2 ∇ f ∇ f ∇ f ∇ c1 ∇ c1 ∇ c1 −1,−1 ∇ f =  ∂ f ∂ x1 ∂ f ∂ x2  ∇ c1=  ∂c1 ∂ x1 ∂c1 ∂ x2 
  • 16. 16 Toy Example: Toy Example: Inequality Constraint Inequality Constraint x1 x2 ∇ c1 −1,−1 ∇ f ∇ f s s is not an optimal solution, if there exists such that x c1xs  0 f xs  f x Using first order Taylor's expansion c1xs = c1x∇ c1 xT s  0 1 f xs− f x = ∇ f x T s  0 2 s≠0
  • 17. 17 Toy Example: Toy Example: Inequality Constraint Inequality Constraint Case 1: Inactive constraint Any sufficiently small s as long as Thus c1x  0 ∇ c1x T s  0 1 ∇ f xT s  0 2 ∇ f 1x ≠ 0 s = − ∇ f x where 0 Case 2: Active constraint c1x = 0 ∇ f x = 1 ∇ c1x, where 10 x1 ∇ c1 −1,−1 ∇ f ∇ f s s ∇ f x ∇ c1x Case 2: Active constraint s
  • 18. 18 Thus we have the Lagrangian (as before) The optimality conditions Lx ,1= f x−1 c1x and ∇x L xo ,1 o =∇ f xo −1 o ∇ c1 xo  = 0 for some 10 Lagrange multiplier or dual variable for c1 Toy Example: Toy Example: Inequality Constraint Inequality Constraint 1 o c1x o  = 0 Complementarity condition either c1x o  = 0 or 1 o = 0 (active) (inactive)
  • 19. 19 Same Concepts in a More Same Concepts in a More General Setting General Setting
  • 20. 20 Lagrange Function Lagrange Function The Problem min x f 0x s.t.: f i x0, i=1,,m hi x=0 , i=1,, p Lx , ,= f 0x∑ i=1 m i f i x∑ i=1 p i hi x Standard tool for constrained optimization: the Lagrange Function dual variables or Lagrange multipliers m inequality constraints p equality constraints objective function
  • 21. 21 Lagrange Dual Function Lagrange Dual Function Defined, for as the minimum value of the Lagrange function over x m inequality constraints p equality constraints g,=inf x∈D Lx , ,=inf x∈Df 0x∑ i=1 m 1 f i x∑ i=1 p i hix  g :ℜ m ×ℜ p ℜ ,
  • 22. 22 unsatisfied Lagrange Dual Function Lagrange Dual Function Interpretation of Lagrange dual function: Writing the original problem as unconstrained problem but with hard indicators (penalties) minimize x f 0x∑ i=1 m I0 f i x∑ i=1 p I1hi x  I0u={0 u0 ∞ u0} I1u={0 u=0 ∞ u≠0} where indicator functions unsatisfied satisfied satisfied
  • 23. 23 Lagrange Dual Function Lagrange Dual Function Interpretation of Lagrange dual function: The Lagrange multipliers in Lagrange dual function can be seen as “softer” version of indicator (penalty) functions. minimize x f 0x∑ i=1 m I0 f ix∑ i=1 p I1hi x  inf x∈D f 0x∑ i=1 m i f i x∑ i=1 p i hi x 
  • 24. 24 Lagrange Dual Function Lagrange Dual Function Lagrange dual function gives a lower bound on optimal value of the problem: g ,po Proof: Let be a feasible optimal point and let . Then we have:  x 0 f i   x  0 i=1,, m hi   x = 0 i=1,, p
  • 25. 25 Lagrange Dual Function Lagrange Dual Function Lagrange dual function gives a lower bound on optimal value of the problem: g , p o Proof: Let be a feasible optimal point and let . Then we have:  x 0 Thus L  x , , = f 0  x∑ i=1 m i f i   x∑ i=1 p i hi   x  f 0  x f i  x  0 i=1,,m hi   x = 0 i=1,, p
  • 26. 26 Lagrange Dual Function Lagrange Dual Function Lagrange dual function gives a lower bound on optimal value of the problem: g , p o Proof: Let be a feasible optimal point and let . Then we have:  x 0 Thus L  x , , = f 0  x∑ i=1 m i f i   x∑ i=1 p i hi   x  f 0  x .  0 f i  x  0 i=1,,m hi   x = 0 i=1,, p
  • 27. 27 Lagrange Dual Function Lagrange Dual Function Lagrange dual function gives a lower bound on optimal value of the problem: g, = inf x∈D Lx , ,  L  x , ,  f 0  x g, p o Proof: Let be a feasible optimal point and let . Then we have:  x 0 Thus L x , , = f 0 x∑ i=1 m i f i  x∑ i=1 p i hi  x  f 0  x Hence .  0 f i  x  0 i=1,,m hi  x = 0 i=1,, p
  • 28. 28 Sufficient Condition Sufficient Condition If is a saddle point, i.e. if x o , o , o  ∀ x∈ℜ n , ∀0, Lx o ,,Lx o , o , o Lx , o , o  ... then is a solution of p o x o , o , o  x o x  o , o   , Lx , ,
  • 29. 29 Lagrange Dual Problem Lagrange Dual Problem Lagrange dual function gives a lower bound on the optimal value of the problem. We seek the “best” lower bound to minimize the objective: maximize g , s.t.: 0 The dual optimal value and solution: The Lagrange dual problem is convex even if the original problem is not. d o = g o , o 
  • 30. 30 Primal / Dual Problems Primal / Dual Problems Primal problem: min x∈D f 0x s.t.: f i x0, i=1,,m hi x=0 , i=1,, p Dual problem: max  , g, s.t.: 0 po d o g,=inf x∈Df 0x∑ i=1 m 1 f ix∑ i=1 p i hi x 
  • 31. 31 Weak Duality Weak Duality Weak duality theorem: d o  p o Optimal duality gap: p o − d o  0 This bound is sometimes used to get an estimate on the optimal value of the original problem that is difficult to solve.
  • 32. 32 Strong Duality Strong Duality Slater's Condition: If and it is strictly feasible: x∈D Strong Duality theorem: if Slater's condition holds and the problem is convex, then strong duality is attained: f i x  0 for i=1,m hix = 0 for i=1, p d o = p o Strong Duality: Strong duality does not hold in general. ∃ o , o  with d o =g o , o =max  , g ,=inf x Lx , o , o = p o
  • 33. 33 Optimality Conditions: Optimality Conditions: First Order First Order Complementary slackness: if strong duality holds, then at optimality i o f i x o  = 0 i=1,m Proof: f 0x o  = g o , o  = inf x f 0x∑ i=1 m i o f i x∑ i=1 p i o hi x   f 0xo ∑ i=1 m i o f i xo ∑ i=1 p i o hi xo   f 0x o  x o , o , o  ∀i , f ix0,i≥0 ∀i ,hi x = 0 (strong duality)
  • 34. 34 Optimality Conditions: Optimality Conditions: First Order First Order Karush-Kuhn-Tucker (KKT) conditions If the strong duality holds, then at optimality: f i x o   0, i=1,,m hi xo  = 0, i=1,, p i o  0, i=1,,m i o f i x o  = 0, i=1,, m ∇ f 0 xo ∑ i=1 m i o ∇ f i xo ∑ i=1 p i o ∇ hi xo  = 0 KKT conditions are ➔ necessary in general (local optimum) ➔ necessary and sufficient in case of convex problems (global optimum)
  • 35. 35 What are Support Vector What are Support Vector Machines? Machines? Linear classifiers (Mostly) binary classifiers Supervised training Good generalization with explicit bounds
  • 36. 36 Main Ideas Behind Main Ideas Behind Support Vector Machines Support Vector Machines Maximal margin Dual space Linear classifiers in high-dimensional space using non-linear mapping Kernel trick
  • 42. 42 No Duality Gap Thanks to No Duality Gap Thanks to Convexity Convexity
  • 44. Illustration from Prof. Mohri's lecture notes 44 Non-linear separation of datasets Non-linear separation of datasets Non-linear separation is impossible in most problems
  • 45. 45 Non-separable datasets Non-separable datasets Solutions: 1) Nonlinear classifiers 2) Increase dimensionality of dataset and add a non-linear mapping Ф
  • 46. 46 Kernel Trick Kernel Trick “similarity measure” between 2 data samples
  • 48. 48 Curse of Dimensionality Due to Curse of Dimensionality Due to the Non-Linear Mapping the Non-Linear Mapping
  • 49. 49 Positive Symmetric Definite Kernels Positive Symmetric Definite Kernels (Mercer Condition) (Mercer Condition)
  • 50. 50 Advantages of SVM Advantages of SVM Work very well... Error bounds easy to obtain: Generalization error small and predictable Fool-proof method: (Mostly) three kernels to choose from: Gaussian Linear and Polynomial Sigmoid Very small number of parameters to optimize
  • 51. 51 Limitations of SVM Limitations of SVM Size limitation: Size of kernel matrix is quadratic with the number of training vectors Speed limitations: 1) During training: very large quadratic programming problem solved numerically Solutions: Chunking Sequential Minimal Optimization (SMO) breaks QP problem into many small QP problems solved analytically Hardware implementations 2) During testing: number of support vectors Solution: Online SVM