1. Introduction to Support Vector Machine
Lucas Xu
September 4, 2012
Lucas Xu Introduction to Support Vector Machine September 4, 2012 1 / 20
2. 1 Classifier
2 Hyper-Plane
3 Convex Optimization
4 Kernel
5 Application
Lucas Xu Introduction to Support Vector Machine September 4, 2012 2 / 20
3. Classifier
Attributes and Class Labels
Training Data
S = (x(1) , y (1) ), · · · , (x(m) , y (m) ) , x(i) ∈ Rd , y (i) ∈ {−1, 1}
Lucas Xu Introduction to Support Vector Machine September 4, 2012 3 / 20
4. Classifier
Umeng Gender Classification Data
user app1 app2 ··· appd gender
user1 1 0 ··· 0 male
user2 0 1 ··· 1 f emale
.
. .
. .
. .. .
. .
.
. . . . . .
usern 1 1 ··· 1 f emale
Each App belongs to one category, ≈ 20 categories.
Categories are mutual exclusive.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 4 / 20
5. Classifier
Umeng Gender Classification Data
S = (x(1) , y (1) ), · · · , (x(m) , y (m) ) , x(i) ∈ Rd , y (i) ∈ {−1, 1}
(i)
xk ∈ {0, 1}, 0 means not installed, 1 means installed on the device
1 ≤ k ≤ d, d 30, 000, about 30,000 apps
y (i) ∈ {male, f emale}
Lucas Xu Introduction to Support Vector Machine September 4, 2012 5 / 20
6. Hyper-Plane
Figure : Hyper Plane
The hyper-plane: wT x + b = 0
Classification function: hw,b (x) = g(wT x + b)
1 if z ≥ 0
g(z) =
−1 otherwise
Lucas Xu Introduction to Support Vector Machine September 4, 2012 6 / 20
7. Hyper-Plane
Functional Margin:
γ (i) = y (i) (wT x(i) + b)
ˆ
Scaling: set constraint normalization condition : w = 1
Geometric Margin:
w T b
γ (i) = y (i) x(i) +
w w
γ (i) should be a large positive number to increase the prediction
confidence.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 7 / 20
8. Hyper-Plane
Definition
The geometry margin of (w, b) with respect to training dataset S:
γ = min γ (i)
i=1,...,m
Lucas Xu Introduction to Support Vector Machine September 4, 2012 8 / 20
9. Hyper-Plane
The optimal margin classifier: (Intuitive)
find a decision boundary that maximizes the margin.
maxγ,w,b γ
s.t. y (i) (wT x(i) + b) ≥ γ, i = 1, ..., m
w = 1.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 9 / 20
10. Hyper-Plane
Normalization Constraint: let function margin γ = 1
ˆ
⇓
1
maxγ,w,b
w
s.t. y (i) (wT x(i) + b) ≥ γ, i = 1, ..., m
⇓
1
maxw,b w 2
2
s.t. y (i) (wT x(i) + b) ≥ 1, i = 1, ..., m
Lucas Xu Introduction to Support Vector Machine September 4, 2012 10 / 20
11. Hyper-Plane
Convex function
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
12. Hyper-Plane
Convex function
Convex set
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
13. Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many software
packages to solve the problem.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
14. Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many software
packages to solve the problem.
Basic Ideas for Support Vector Machine DONE !
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
15. Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many software
packages to solve the problem.
Basic Ideas for Support Vector Machine DONE !
More efficient solution ?
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
16. Convex Optimization
Primal Problem:
1
maxw,b w 2
2
s.t. y (i) (wT x(i) + b) ≥ 1, i = 1, ..., m
Lucas Xu Introduction to Support Vector Machine September 4, 2012 12 / 20
17. Convex Optimization
Lagrangian for the original problem:
m
1 2
min max L(w, b, α) = w − αi y (i) (wT x(i) + b) − 1
w,b α:αi ≥0 2
i=1
⇓
Under K.K.T condition, transforms to its Dual problem:
m m
1
max W (α) = αi − y (i) y (j) αi αj x(i) , x(j)
α 2
i=1 i,j=1
s.t. αi ≥ 0, i = 1, ..., m
m
αi y (i) = 0
i=1
Lucas Xu Introduction to Support Vector Machine September 4, 2012 13 / 20
18. Convex Optimization
Solutions:
m
∗
w = αi y (i) x(i)
i=1
maxi:y(i) =−1 w∗T x(i) + mini:y(i) =1 w∗T x(i)
b∗ = −
2
Predict:
g(x) = wT x + b
m T
= αi y (i) x(i) x+b
i=1
m
= αi y (i) x(i) , x + b
i=1
Lucas Xu Introduction to Support Vector Machine September 4, 2012 14 / 20
19. Kernel
For most of αi , αi = 0.
For those αi > 0, (x(i) , y (i) ) are called support vectors
Only needs to compute x(i) , x
(i) (i) (i)
if we can map feature space (x1 , x2 , ...xk ) to another high
(i) (i) (i)
dimension space (z1 , z2 , ...zl ), z = φ(x)
i.e. φ(x(i) , φ(x)
we can easily compute z (i) , z = K(φ( x(i) , x ))
Use a slightly different notation:
K(x, y) = φ(x), φ(y)
Intuitive Explanation: Measure of Similarities
Lucas Xu Introduction to Support Vector Machine September 4, 2012 15 / 20
20. Kernel
Definition
Mercer Kernel: K is positive semi-definite
Lucas Xu Introduction to Support Vector Machine September 4, 2012 16 / 20
21. Kernel
Primitive x, y
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
22. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
23. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
RBF exp(−γ||x − y||2 )
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
24. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
RBF exp(−γ||x − y||2 )
Sigmoid tanh(κ x, y + c).
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
25. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
RBF exp(−γ||x − y||2 )
String
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
26. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
RBF exp(−γ||x − y||2 )
String
Tree
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
27. Apply to Umeng Gender Classification
Problem Description
Classify the gender of a user based on apps (s)he installed and
categories of apps.
Kernel Design
m
K(x, y) = φ(xi , yj )
i,j=0
(1 + w)xi yj if i = j
φ(xi , yj ) = xi yj if i = j but the same category
0 if not the same category
w ≥ 0 , the extra weight if two users have installed the same app.
default to 1.0
Experiment Result
Lucas Xu Introduction to Support Vector Machine September 4, 2012 18 / 20
28. Apply to Umeng Gender Classification
x1
x2
.
.
.
xm
⇓
w · x1
w · x2
.
.
.
w · xm
c1
c2
.
. .
c20
ci counts the number of apps belonging to category i
Lucas Xu Introduction to Support Vector Machine September 4, 2012 19 / 20
29. references
Book: Christopher Bishop – PRML Chapter 7: Section 7.1
Slides: Andrew Moore – Support Vector Machines
Video: Bernhard Scholkopf – Kernel Methods
Video: Liva Ralaivola – Introduction to Kernel Methods
Video: Colin Campbell – Introduction to Support Vector Machines
Video: Alex Smola – Kernel Methods and Support Vector
Machines
Video: Partha Niyogi – Introduction to Kernel Methods
Many more videos on kernel-related topics here
http://www.seas.harvard.edu/courses/cs281/
Lucas Xu Introduction to Support Vector Machine September 4, 2012 20 / 20