SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
Support Vector Machine
Easy Tutorial(I wish)
(1) y t (x)  wT  (x)  b
문제를 간단히 하기 위해서 몇 가지 가정
1, 어떤 함수를 통해feature space으로의 mapping 후 linear sepeable
2, lable은 , 1과 -1 두가지
(1) y t (x)  wT  (x)  b

(2)초평면과 feature vector 와의 거리 
t n ( w T  ( x)  b)

( (1))
w

y ( x) t n y ( x n )

( we assumelinear seperable)
w
w
Perceptron algorithm

w '  w  yn x n

if tn  sign{ y(x n )}

오분류(mistake)된 표본
(sample)들을 가지고 초평면
(hyperplane)의 법선(normal)
벡터를 조정해 나간다.
그래서 perceptron 알고리즘
을 mistake driven 알고리즘
이라고 한다.
Perceptron, convergence
 ( k ) : the paramater vector after k update
•
•

Claim : perceptron algorithm indeed convergences in a finite of updates.
Assume that

|| xt ||  R for all t and some finite R

  0 s.t yt ( * )T xt  
Convergenc
e proof

 '    yt x t

if yt  f (x t ; )

( * )T  ( k )  ( * )T  ( k 1)  yt ( * )T x t
 ( * )T  ( k 1)  
 ( * )T ( ( k  2 )  yt x t )  
 ( * )T  ( k  2 )  yt ( * )T x t  
 ( * )T  ( k  2 )  2
...
 k
||  ( k ) || 2  ||  ( k 1)  yt x t || 2
 ||  ( k 1) || 2 2 yt ( ( k 1) )T x t  || x t || 2
 | |  ( k 1) || 2  || x t || 2
 ||  ( k 1) || 2  R 2
...
 kR2
1  cos( *,
1 

(k )

( * )T  ( k )
k
k
)  (k )
 (k )

||  || ||  * || ||  || ||  * ||
kR2 ||  * ||

k
kR ||  ||
2

*

or k 

R 2 ||  * || 2

2
Support Vector, and Margin의 정의

빨간색 동그라미가 Support vector이고 support vector와 초평면
(hyperplane)과의 수직거리를 margin이라고 한다.
SVM의 동기(motivation)?, 직관(intuition )?
수많은 초평면들이 있다.
어느 초평면이 가장 일반화 오류(generalization error)가 적을까?
Maximal margin classifier!!

 1



T
(3) arg max 
min t n (w  (x)  b) 
w ,b
w n



초평면에가장 가까운 샘플의마진을 최대로 하는 파라메타 구하기




Linear
Classifiers

x

denotes +1

a

f

yest

f(x,w,b) = sign(w. x - b)

denotes -1

How would you
classify this data?
Linear
Classifiers

x

denotes +1

a

f

yest

f(x,w,b) = sign(w. x - b)

denotes -1

How would you
classify this data?
Linear
Classifiers

x

denotes +1

a

f

yest

f(x,w,b) = sign(w. x - b)

denotes -1

How would you
classify this data?
Linear
Classifiers

x

denotes +1

a

f

yest

f(x,w,b) = sign(w. x - b)

denotes -1

Any of these
would be fine..
..but which is
best?
SVM 아이디어의 수식화를 위한 전처리?

t n ( w T  ( x)  b)
.
; w  kw, b  kb 로 _ 리스케일링해도 _ 마진은 _ 변함없다
w
t n (kw T  (x)  kb) t n k (w T  (x)  b) t n (w T  (x)  b)
.


kw
kw
w
(4) t n (w T  (x)  b)  1; 표면에서 가장가까운점이만족하는 수식이되도록과b를조정함
(5) t n (w T  (x)  b)  1; 모든 점이만족하는 수식, linear seperable임을뜻함
y ( x) t n y ( x n )
(2)초평면과 feature vector 와의 거리 

( we assume linear seperable)
w
w
t n ( w T  ( x)  b)
1

( (1)) 
 m arg in
w
w
1

w ; 마진 _ 최대화
 w 을 _ 최소화
2

1
1
2
w 을최소화; 은나중의편의를위해서(미분)
2
2
1
2
(6) arg min w ; 마진의역수를최소화하는파라메터구하기
2
w ,b

SVM 아이디어의 수식화

1
arg min w
2
w ,b

2

subject to t n (wT  (x)  b)  1

여기까지가 SVM 수식완성!!!!!!
이제 이 이 수식을 풀기만 하면 됨
최적화식은 우리가 고등학교 때 보았던 linear programming과 비슷함
Quadratic programming이라고 한다.
이 Quadratic programming 문제을 dual form으로 전환하기 풀기 위해서
Lagrange multipliers 개념을 도입한다.
Dual form으로 전환 이유
-> 커널 트릭을 위해,
위 수식은 convex 최적화 문제라는 것이다.
Convex , non convex 간단히 설명
->convex하면 좋은게 “미분해서 0” 기법으로 grobal minimum을 구할 수 있
다는 것.
Lagrange multiplier

g (x  ε)  g (x)  ε T g (x);테일러시리즈
g ( x)  g (x  ε); x, x  ε모두g (x)위에있다고가정
 ε T g ( x )  0
if lim   0 then εT g (x)  0  ε는g (x)에평행하기때문에g (x)는g (x)에수직이다.
f  g (x), g  g (x)  f  g (x)  0, where   0
L(x,  )  f  g (x); KKT조건만족
ex ) L(x,  )  1  x1 2  x2 2   ( x1  x2  1)
 2 x1    0
 2 x1    0
x1  x2  1  0
 x1  x2  1 / 2;다른방법으로풀기.
SVM 수식의 dual form으로 전환(SVM의 마술을 위해) 이부분 이해를 위해 천천히
N
1
2

(7) L(w,b, a)  w   a n t n (w T  (x)  b)  1 ; 라그랑쥬 최적화 수식
2
n 1
N
N
L(w,b, a)
 w   a n t n (x n )  0  (8)w   a n t n (x n )
w
n 1
n 1

L(w,b, a) N
  an t n  0
b
n 1
N

(8)w   a n t n (x n )
n 1

N

(9)0   a n t n ; (7)식을 미분해서 정리하면 나오는 수식
n 1

(8), (9)를(7)에대입해서정리하면
N
N
N
1 N N
T
T
L(w,b, a)    a n a m t n t m (x)  (x)   a n t n w  (x)   a n t n b   a n
2 n 1 m 1
n 1
n 1
n 1
N
N
1 N N
1 N N
~
T
 L (a)   a n   a n a m t n t m (x)  (x)   a n   a n a m t n t m k (x n , x m )
2 n 1 m 1
2 n 1 m 1
n 1
n 1

(11)a n  0; (10)번식의제약식
N

(12) a n t n  0; (9)번식의제약식
n 1
1
arg min w
2
w ,b

2

subject to t n (wT  (x)  b)  1





N
1
2
(7) L(w,b, a)  w   a n t n (w T  (x)  b)  1 ;라그랑쥬 최적화 수식
2
n 1

N
N
L(w,b, a)
 w   a n t n (x n )  0  (8)w   a n t n (x n )
w
n 1
n 1

L(w,b, a) N
  an t n  0
b
n 1
N

(8)w   a n t n (x n )
n 1

N

(9)0   ant n ; (7)식을 미분해서정리하면 나오는 수식
n1




N
1
2
(7) L(w,b, a)  w   a n t n (w T  (x)  b)  1 ;라그랑쥬 최적화 수식
2
n 1

N

(8)w   a n t n (x n )
n 1

N

(9)0   a n t n ; (7)식을 미분해서정리하면 나오는 수식
n 1

(8), (9)를(7)에대입해서정리하면
N
N
N
1 N N
T
L(w,b, a)    a n a m t n t m k (x n , x m )   a n t n w  (x)   a n t n b   a n
2 n 1 m 1
n 1
n 1
n 1
N
N
1 N N
1 N N
~
T
 L (a)   a n   a n a m t n t m (x)  (x)   a n   a n a m t n t m k (x n , x m )
2 n 1 m 1
2 n 1 m 1
n 1
n 1

(11)a n  0; (10)번식의제약식
N

(12) a n t n  0; (10)번식의제약식
n 1
이부분 이해를 위해 천천히
한번 읽어보라고 한다

SVM의 마술





N
1
2
(7) L(w,b, a)  w   an t n (w T  (x)  b)  1 ; 라그랑쥬 최적화 수식
2
n 1

 이 QP 최적화 문제를 푸는데 필요한 계산 복잡도 O( M 3 ) , M은 feature 의 차원
N
1 N N
~
(10) L (a)   an   an am t n t m k (x n , x m ); (8), (9)식을이용해서얻은(7)번식의_ 듀얼폼
2 n1 m1
n 1

 이 QP 최적화 문제를 푸는데 필요한 계산 복잡도 O( N 3 ) , N은 feature 의 갯수
보통 M  N이기 때문에 듀얼폼으로의 변환은 안 좋아보인다.
하지만 M  이라면 ? 즉 무한차원 매핑을 할 수 있다. 그러나 커널 함수(k )를 통해
적은 계산으로 그 일을 할 수 있다.이것을 커널 트릭이라고 한다.
 e.g ) (x)T  (x)  3x10000000 x10000000x3  3x3!!!!!!!; 차원은없어진다.
SVM의 예측기

(1) y(x)  w (x)  b,(8)w 
T

N

 ant n(xn )
n
1

(13) y(x) 

N

 ant nk (x, xn )  b;
n1

SVM에서 learning이란 결국 an과 b를 학습하는 것이다.
N
1 N N
~
L (a)   an   an amt nt m k (x n , x m )
2 n1 m1
n 1
N

subject to an  0 ,  ant n  0;
n1

an은 위의 제약식을 QP로 풀면된다.
QP가 SVM 계산중 가장 무거운 부분이다. 그래서 QP 를 최적화하려는
연구도있다. 그중 하나가 SMO(sequential minimal optimization)
MATLAB ' quadprog ' 함수 제공
N
1
2

(7) L(w,b, a)  w   a n t n y (x)  1 ; 라그랑쥬 최적화 수식
2
n 1

위의 수식은 다음과 같은 조건을 만족한다 ( KKT condition )
(14) a n  0
(15)t n y (x n )  1  0

(16)a n t n y (x n )  1  0;
a n  0 or t n y (x n )  1
이 조건의 의미는 a n  0이면 x n 은 SV라는 것이다.
즉 , QP를 풀면 SV를 알 수 있다.
이 SV의 집합만 가지고 (13)식으로 분류해 낼 수 있다.
b를 계산

N

(13) y(x)   ant n k (x, x n )  b.을 t n y (x n )  1 에 대입하면,
n1

 N

(17)t n   am t m k (x, x m )  b   1;
 mSV

1
(18)b 
NS



  t n   amt m k (x, x n ) 


mS
 nS


Support vector만 가지고 분
류평면을 만들어 낸다.
Linear seperable 하지 않다면?
여태까지는 sample들이 고
차원 mapping 함수를 통해
서 linear seperable 가능하
다고 가정하고 이론을 전개
시켜 나갔다.
하지만 실제 문제에서는 그
러한 경우는 많지 않다.
그래서 데이터들이 linear
seperable하지 않을 때를 해
결하기 위한 최적화 수식의
변화가 필요하다.
def .slack :  n  t n  y (x n ) ; 실제 타겟 값과 예측값과의 차이
(20)t n y (x n )  1   n , n  1,..., N ; 제약식(5)는이식으로바뀐다.

 n  0; 옳게분류, 0   n  1; 마진안에,  n  1; 분류가틀리게됨
N

1
2
w ; 잘못분류되는샘플에대해서는페널티를가하면서마진의역을최소화
2
n 1
1
2
C  이면 (6) arg min w 이랑 똑같아짐
2
w ,b
N
N
N
1
2
(22) L(w, b,  , a, μ)  w  C   n   an t n y (x n )  1   n     n n
2
n 1
n 1
n 1
(21) C   n 

(23)an  0


 (24)t y (x )  1    0 
n
n


(25)an (t n y (x n )  1   )  0

 KKT
(26)  n  0




(27) n  0


(28)  n n  0




Linear seperable 하지 않다면?

컨디션
N

L

(29)
 0  w   an t n ( xn )

w
n 1


N


L
(30)
 0   an t n  0  각 파라메터에 대해서 미분한 값

b
n 1




L
 0  an  C   n 
 (31)
 n


N
N
N
N
N
1
2
(32) L(w, b,  , a, μ)  w  C   n   ant n y (x n )   an   an n    n n
2
n 1
n 1
n 1
n 1
n 1
N
N
N
1
2
 w   ant n y (x n )    n (C  an   n )   an
2
n 1
n 1
n 1
N
1 N N
~
 L (a)   an   an am t n t m k (x n , x m );
2 n1 m1
n 1
(33)(31)an  C   n and (23)an  0 and (26)  n  0  0  an  C
N

(34) an t n  0( (30))
n 1
Example.

w1 : (1,5) t , (2,4) t
w2 : (2,3) t , (1,5) t
다음 점들을 다음 커널과 SVM을 이용해서 분류하라. ?
polynomial kernel



k(x ,z)  1  x z
T

  1 x z
2

1

2

1  x 2• z 2 

2

2

2

 1  2x 1z 1  2x 2z 2  x 1 z 1  2x 1z 1x 2z 2  x 2 z 2



2
1

2



2

2
1

 1 2x 1, 2x 2, x , 2x 1x 2, x 2 1 2z 1, 2z 2, z , 2z 1z 2, z 2
,
,
  x   z 
T



2T
N
1 N N
~
L (a)   an   an amt nt m k (x n , x m )
2 n1 m1
n 1
N

subject to an  0 ,  ant n  0;
n1

원래는 QP로 풀어야
되지만 간단한 문제이
기 때문에 해석적으로
풀었다.lagrange
multiplier 예제에서
보여준 것처럼.
N

(8)w   a n t n (x n )
n 1
고차원으로 매핑전의
차원에서의 초평면을
구하면 포물선 형태의
분류 경계가 나온다.
레퍼런스
1,비숍책
2,MIT 오픈강의
http://ocw.mit.edu/OcwWeb/Ele
ctrical-Engineering-andComputer-Science/6-867Fall2006/LectureNotes/index.htm
3,수많은 인터넷 자료들?
Fast Training of Support Vector Machines using
Sequential Minimal Optimization,
Platt, John.

Jung-Kyu Lee @ MLLAB
SVM revisited
• the (dual) optimization problem we covered in SVM class
m

1 m
max W (a )   a i   yi y ja ia j xi , x j .
a
2 i , j 1
i 1
s.t 0  a i  C , i  1,..., m
m

a y
i 1

i

i

0

• Solving this optimization problem for large-scale
application by using quadratic programming is very
computationally expensive.
• John Platt a colleague at Microsoft, proposed efficient
algorithm called SMO (Sequential Minimal Optimization) to
actually solve this optimization problem.
Digression: Coordinate
ascent(1/3)
• Consider trying to solve the following unconstrained optimization
problem.

max W (a1 , a 2 ,..., a m )
a

• Coordinate ascent:
Digression: Coordinate
ascent(2/3)
•
•
•
•
•
•

How coordinate ascent work?
We optimize a quadratic function.
Minimum is (0,0)
First, minimize this w.r.t α1
Next, minimize this w.r.t α2
With same argument, iterate until
convergence.
Digression: Coordinate
ascent(3/3)
• Coordinate assent will usually take a lot more steps than other
iterative optimization algorithm such as gradient descent,
Newton’s method, conjugate gradient descent etc.
• However, there are many optimization problems for which it’s
particularly easy to fix all but one of the parameters and optimize
with respect to just that one parameter.
• It turns out that this will be true when we modify this algorithm to
solve the SVM optimization problem.
SMO
• Coordinate assent in its basic form does not work to apply SVM
dual optimization problem.
m
1 m
max W (a )   a i   yi y ja ia j xi , x j .
a
2 i , j 1
i 1
s.t 0  a i  C , i  1,..., m
m

a y
i 1

i

i

0

• As coordinate assent progress, we can’t change αi without
violating the m
constraint.
 a i yi  0
i 1

m

 a1 y1   a i yi
i2
m

 a1   y1  a i yi ( y1  {1,1}, y1  1)
i2

2
Outline of SMO
• The SMO (sequential minimal optimization ) algorithm, therefore,
instead of trying to change one α at a time, we will try to change
two α at a time to keep satisfying the constraints.
• The term ‘minimal’ refers to the fact that we’re choosing the
smallest number of α
Convergence of SMO
• To test for convergence of SMO algorithm, we can check whether
the KKT conditions are satisfied to within some tolerance
ai  0  yi ( wT xi  b)  1
ai  C  yi ( wT xi  b)  1
0  ai  C  yi ( wT xi  b)  1

• The key reason that SMO is an efficient algorithm is that updating
αi, αj can be computed very efficiently.
Detail of SMO
• Suppose we’ve decide to hold α3,…,αm , update α1, α2

m

a y
i 1

i

i

0
m

 a1 y1  a 2 y2   a i yi
i 3

 a1 y1  a 2 y2  
Detail of SMO
• We can thus picture the constraints on α1, α2 as follows
• α1 and α2 must lie within
the box [0,C]×[0,C] shown.
• α1 and α2 must lie on line :
a1 y1  a 2 y2  
• From these constraints, we know:
L  a2  H
Detail of SMO
• From

a1 y1  a 2 y2  
 a1 y1  (  a 2 y2 )
 a1 y1  (  a 2 y2 ) y1
2

 a1  (  a 2 y2 ) y1

• W(α) become

W (a1 , a 2 ,..., a m )  W ((  a 2 y2 ) y1 , a 2 ,..., a m )
 aa    ba 2  c
• This is a standard quadratic function.
Detail of SMO
• From



aa   ba 2  c

• This is really easy to optimize by setting its derivative to zero.
• If you end up with a value outside,
you must clip your solution.
• That’ll give you the optimal solution
of this quadratic optimization
problem subject to your solution
satisfying this box constraint
and lying on this straight line.
Detail of SMO
• In conclusion,
 H if a 2 new,unclipped  H

a 2 new   a 2 new,unclipped if L  a 2 new,unclipped  H
 L if a new,unclipped  L
2


a1  (  a 2 y2 ) y1
• We do this process very quickly, which makes the inner loop of
the SMO algorithm very efficient.
Conclusion
• SMO
- has scalability for large scale data.

- makes SVM computationally inexpensive.
- can be implemented easily because it has simple training
structure (coordinate ascent).
Reference
•

•

[1] Platt, John. Fast Training of Support Vector Machines using Sequential
Minimal Optimization, in Advances in Kernel Methods – Support Vector
Learning, B. Scholkopf, C. Burges, A. Smola, eds., MIT Press (1998).
[2] The Simplified SMO Algorithm , Machine Learning course @ Stanford
http://www.stanford.edu/class/cs229/materials/smo.pdf

Weitere ähnliche Inhalte

Was ist angesagt?

Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesReinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesSangwoo Mo
 
Supervised learning and unsupervised learning
Supervised learning and unsupervised learningSupervised learning and unsupervised learning
Supervised learning and unsupervised learningArunakumariAkula1
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
random forest regression
random forest regressionrandom forest regression
random forest regressionAkhilesh Joshi
 
Hierarchical Reinforcement Learning
Hierarchical Reinforcement LearningHierarchical Reinforcement Learning
Hierarchical Reinforcement Learningahmad bassiouny
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodKirkwood Donavin
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsMarina Santini
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!Dongmin Lee
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
Machine Learning with Classification & Regression Trees - APAC
Machine Learning with Classification & Regression Trees - APAC Machine Learning with Classification & Regression Trees - APAC
Machine Learning with Classification & Regression Trees - APAC Minitab, LLC
 

Was ist angesagt? (20)

Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesReinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based Policies
 
Supervised learning and unsupervised learning
Supervised learning and unsupervised learningSupervised learning and unsupervised learning
Supervised learning and unsupervised learning
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
random forest regression
random forest regressionrandom forest regression
random forest regression
 
Hierarchical Reinforcement Learning
Hierarchical Reinforcement LearningHierarchical Reinforcement Learning
Hierarchical Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Intro rl
Intro rlIntro rl
Intro rl
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Machine Learning with Classification & Regression Trees - APAC
Machine Learning with Classification & Regression Trees - APAC Machine Learning with Classification & Regression Trees - APAC
Machine Learning with Classification & Regression Trees - APAC
 

Ähnlich wie Support Vector Machine Tutorial 한국어

2012-12-25 사조사2급실기-공식
2012-12-25 사조사2급실기-공식2012-12-25 사조사2급실기-공식
2012-12-25 사조사2급실기-공식Seok-kyu Kong 공석규
 
정수론적 알고리즘 - Sogang ICPC Team, 2020 Winter
정수론적 알고리즘 - Sogang ICPC Team, 2020 Winter정수론적 알고리즘 - Sogang ICPC Team, 2020 Winter
정수론적 알고리즘 - Sogang ICPC Team, 2020 WinterSuhyun Park
 
밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05HyeonSeok Choi
 
밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05HyeonSeok Choi
 
Eigendecomposition and pca
Eigendecomposition and pcaEigendecomposition and pca
Eigendecomposition and pcaJinhwan Suk
 
(Handson ml)ch.8-dimensionality reduction
(Handson ml)ch.8-dimensionality reduction(Handson ml)ch.8-dimensionality reduction
(Handson ml)ch.8-dimensionality reductionHaesun Park
 
Higher order procedure2 with conventional interface
Higher order procedure2 with conventional interface Higher order procedure2 with conventional interface
Higher order procedure2 with conventional interface fromitive
 
Neural network (perceptron)
Neural network (perceptron)Neural network (perceptron)
Neural network (perceptron)Jeonghun Yoon
 
[아꿈사] 게임 기초 수학 물리 1,2장
[아꿈사] 게임 기초 수학 물리 1,2장[아꿈사] 게임 기초 수학 물리 1,2장
[아꿈사] 게임 기초 수학 물리 1,2장sung ki choi
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear ModelJungkyu Lee
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명홍배 김
 
Code로 이해하는 RNN
Code로 이해하는 RNNCode로 이해하는 RNN
Code로 이해하는 RNNSANG WON PARK
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear ModelJungkyu Lee
 

Ähnlich wie Support Vector Machine Tutorial 한국어 (20)

Gmm to vgmm
Gmm to vgmmGmm to vgmm
Gmm to vgmm
 
2012-12-25 사조사2급실기-공식
2012-12-25 사조사2급실기-공식2012-12-25 사조사2급실기-공식
2012-12-25 사조사2급실기-공식
 
Variational AutoEncoder(VAE)
Variational AutoEncoder(VAE)Variational AutoEncoder(VAE)
Variational AutoEncoder(VAE)
 
06. graph mining
06. graph mining06. graph mining
06. graph mining
 
정수론적 알고리즘 - Sogang ICPC Team, 2020 Winter
정수론적 알고리즘 - Sogang ICPC Team, 2020 Winter정수론적 알고리즘 - Sogang ICPC Team, 2020 Winter
정수론적 알고리즘 - Sogang ICPC Team, 2020 Winter
 
밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05
 
Taocp 1 2-2
Taocp 1 2-2Taocp 1 2-2
Taocp 1 2-2
 
밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05
 
Eigendecomposition and pca
Eigendecomposition and pcaEigendecomposition and pca
Eigendecomposition and pca
 
(Handson ml)ch.8-dimensionality reduction
(Handson ml)ch.8-dimensionality reduction(Handson ml)ch.8-dimensionality reduction
(Handson ml)ch.8-dimensionality reduction
 
Digit recognizer
Digit recognizerDigit recognizer
Digit recognizer
 
Higher order procedure2 with conventional interface
Higher order procedure2 with conventional interface Higher order procedure2 with conventional interface
Higher order procedure2 with conventional interface
 
Neural network (perceptron)
Neural network (perceptron)Neural network (perceptron)
Neural network (perceptron)
 
[아꿈사] 게임 기초 수학 물리 1,2장
[아꿈사] 게임 기초 수학 물리 1,2장[아꿈사] 게임 기초 수학 물리 1,2장
[아꿈사] 게임 기초 수학 물리 1,2장
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명
 
SVM
SVMSVM
SVM
 
Code로 이해하는 RNN
Code로 이해하는 RNNCode로 이해하는 RNN
Code로 이해하는 RNN
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model
 
이항계수
이항계수이항계수
이항계수
 

Mehr von Jungkyu Lee

8. Logistic Regression
8. Logistic Regression8. Logistic Regression
8. Logistic RegressionJungkyu Lee
 
7. Linear Regression
7. Linear Regression7. Linear Regression
7. Linear RegressionJungkyu Lee
 
4. Gaussian Model
4. Gaussian Model4. Gaussian Model
4. Gaussian ModelJungkyu Lee
 
3 Generative models for discrete data
3 Generative models for discrete data3 Generative models for discrete data
3 Generative models for discrete dataJungkyu Lee
 
From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecFrom A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecJungkyu Lee
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMMJungkyu Lee
 
머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian Processes머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian ProcessesJungkyu Lee
 
Murpy's Machine Learing: 10. Directed Graphical Model
Murpy's Machine Learing: 10. Directed Graphical ModelMurpy's Machine Learing: 10. Directed Graphical Model
Murpy's Machine Learing: 10. Directed Graphical ModelJungkyu Lee
 
Murpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelMurpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelJungkyu Lee
 
Murpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelMurpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelJungkyu Lee
 
머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear ModelJungkyu Lee
 
머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical Model머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical ModelJungkyu Lee
 
머피's 머신러닝, Mixture model and EM algorithm
머피's 머신러닝, Mixture model and EM algorithm머피's 머신러닝, Mixture model and EM algorithm
머피's 머신러닝, Mixture model and EM algorithmJungkyu Lee
 
파이널 판타지 3 루트 공략
파이널 판타지 3 루트 공략파이널 판타지 3 루트 공략
파이널 판타지 3 루트 공략Jungkyu Lee
 
1. boolean 검색
1. boolean 검색1. boolean 검색
1. boolean 검색Jungkyu Lee
 
ThinkBayes: chapter 13  simulation
ThinkBayes: chapter 13  simulationThinkBayes: chapter 13  simulation
ThinkBayes: chapter 13  simulationJungkyu Lee
 
ThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensionsThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensionsJungkyu Lee
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jungkyu Lee
 
앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발Jungkyu Lee
 
TETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNINGTETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNINGJungkyu Lee
 

Mehr von Jungkyu Lee (20)

8. Logistic Regression
8. Logistic Regression8. Logistic Regression
8. Logistic Regression
 
7. Linear Regression
7. Linear Regression7. Linear Regression
7. Linear Regression
 
4. Gaussian Model
4. Gaussian Model4. Gaussian Model
4. Gaussian Model
 
3 Generative models for discrete data
3 Generative models for discrete data3 Generative models for discrete data
3 Generative models for discrete data
 
From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecFrom A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vec
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMM
 
머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian Processes머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian Processes
 
Murpy's Machine Learing: 10. Directed Graphical Model
Murpy's Machine Learing: 10. Directed Graphical ModelMurpy's Machine Learing: 10. Directed Graphical Model
Murpy's Machine Learing: 10. Directed Graphical Model
 
Murpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelMurpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear Model
 
Murpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelMurpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. Kernel
 
머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model
 
머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical Model머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical Model
 
머피's 머신러닝, Mixture model and EM algorithm
머피's 머신러닝, Mixture model and EM algorithm머피's 머신러닝, Mixture model and EM algorithm
머피's 머신러닝, Mixture model and EM algorithm
 
파이널 판타지 3 루트 공략
파이널 판타지 3 루트 공략파이널 판타지 3 루트 공략
파이널 판타지 3 루트 공략
 
1. boolean 검색
1. boolean 검색1. boolean 검색
1. boolean 검색
 
ThinkBayes: chapter 13  simulation
ThinkBayes: chapter 13  simulationThinkBayes: chapter 13  simulation
ThinkBayes: chapter 13  simulation
 
ThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensionsThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensions
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
 
앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발
 
TETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNINGTETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNING
 

Support Vector Machine Tutorial 한국어

  • 1. Support Vector Machine Easy Tutorial(I wish)
  • 2. (1) y t (x)  wT  (x)  b
  • 3. 문제를 간단히 하기 위해서 몇 가지 가정 1, 어떤 함수를 통해feature space으로의 mapping 후 linear sepeable 2, lable은 , 1과 -1 두가지
  • 4. (1) y t (x)  wT  (x)  b (2)초평면과 feature vector 와의 거리  t n ( w T  ( x)  b)  ( (1)) w y ( x) t n y ( x n )  ( we assumelinear seperable) w w
  • 5. Perceptron algorithm w '  w  yn x n if tn  sign{ y(x n )} 오분류(mistake)된 표본 (sample)들을 가지고 초평면 (hyperplane)의 법선(normal) 벡터를 조정해 나간다. 그래서 perceptron 알고리즘 을 mistake driven 알고리즘 이라고 한다.
  • 6. Perceptron, convergence  ( k ) : the paramater vector after k update • • Claim : perceptron algorithm indeed convergences in a finite of updates. Assume that || xt ||  R for all t and some finite R   0 s.t yt ( * )T xt  
  • 7. Convergenc e proof  '    yt x t if yt  f (x t ; ) ( * )T  ( k )  ( * )T  ( k 1)  yt ( * )T x t  ( * )T  ( k 1)    ( * )T ( ( k  2 )  yt x t )    ( * )T  ( k  2 )  yt ( * )T x t    ( * )T  ( k  2 )  2 ...  k ||  ( k ) || 2  ||  ( k 1)  yt x t || 2  ||  ( k 1) || 2 2 yt ( ( k 1) )T x t  || x t || 2  | |  ( k 1) || 2  || x t || 2  ||  ( k 1) || 2  R 2 ...  kR2 1  cos( *, 1  (k ) ( * )T  ( k ) k k )  (k )  (k )  ||  || ||  * || ||  || ||  * || kR2 ||  * || k kR ||  || 2 * or k  R 2 ||  * || 2 2
  • 8. Support Vector, and Margin의 정의 빨간색 동그라미가 Support vector이고 support vector와 초평면 (hyperplane)과의 수직거리를 margin이라고 한다.
  • 9. SVM의 동기(motivation)?, 직관(intuition )? 수많은 초평면들이 있다. 어느 초평면이 가장 일반화 오류(generalization error)가 적을까? Maximal margin classifier!!  1    T (3) arg max  min t n (w  (x)  b)  w ,b w n    초평면에가장 가까운 샘플의마진을 최대로 하는 파라메타 구하기  
  • 10. Linear Classifiers x denotes +1 a f yest f(x,w,b) = sign(w. x - b) denotes -1 How would you classify this data?
  • 11. Linear Classifiers x denotes +1 a f yest f(x,w,b) = sign(w. x - b) denotes -1 How would you classify this data?
  • 12. Linear Classifiers x denotes +1 a f yest f(x,w,b) = sign(w. x - b) denotes -1 How would you classify this data?
  • 13. Linear Classifiers x denotes +1 a f yest f(x,w,b) = sign(w. x - b) denotes -1 Any of these would be fine.. ..but which is best?
  • 14.
  • 15.
  • 16.
  • 17. SVM 아이디어의 수식화를 위한 전처리? t n ( w T  ( x)  b) . ; w  kw, b  kb 로 _ 리스케일링해도 _ 마진은 _ 변함없다 w t n (kw T  (x)  kb) t n k (w T  (x)  b) t n (w T  (x)  b) .   kw kw w (4) t n (w T  (x)  b)  1; 표면에서 가장가까운점이만족하는 수식이되도록과b를조정함 (5) t n (w T  (x)  b)  1; 모든 점이만족하는 수식, linear seperable임을뜻함 y ( x) t n y ( x n ) (2)초평면과 feature vector 와의 거리   ( we assume linear seperable) w w t n ( w T  ( x)  b) 1  ( (1))   m arg in w w 1 w ; 마진 _ 최대화  w 을 _ 최소화 2 1 1 2 w 을최소화; 은나중의편의를위해서(미분) 2 2 1 2 (6) arg min w ; 마진의역수를최소화하는파라메터구하기 2 w ,b 
  • 18. SVM 아이디어의 수식화 1 arg min w 2 w ,b 2 subject to t n (wT  (x)  b)  1 여기까지가 SVM 수식완성!!!!!! 이제 이 이 수식을 풀기만 하면 됨 최적화식은 우리가 고등학교 때 보았던 linear programming과 비슷함 Quadratic programming이라고 한다. 이 Quadratic programming 문제을 dual form으로 전환하기 풀기 위해서 Lagrange multipliers 개념을 도입한다. Dual form으로 전환 이유 -> 커널 트릭을 위해, 위 수식은 convex 최적화 문제라는 것이다. Convex , non convex 간단히 설명 ->convex하면 좋은게 “미분해서 0” 기법으로 grobal minimum을 구할 수 있 다는 것.
  • 19. Lagrange multiplier g (x  ε)  g (x)  ε T g (x);테일러시리즈 g ( x)  g (x  ε); x, x  ε모두g (x)위에있다고가정  ε T g ( x )  0 if lim   0 then εT g (x)  0  ε는g (x)에평행하기때문에g (x)는g (x)에수직이다. f  g (x), g  g (x)  f  g (x)  0, where   0 L(x,  )  f  g (x); KKT조건만족 ex ) L(x,  )  1  x1 2  x2 2   ( x1  x2  1)  2 x1    0  2 x1    0 x1  x2  1  0  x1  x2  1 / 2;다른방법으로풀기.
  • 20. SVM 수식의 dual form으로 전환(SVM의 마술을 위해) 이부분 이해를 위해 천천히 N 1 2  (7) L(w,b, a)  w   a n t n (w T  (x)  b)  1 ; 라그랑쥬 최적화 수식 2 n 1 N N L(w,b, a)  w   a n t n (x n )  0  (8)w   a n t n (x n ) w n 1 n 1 L(w,b, a) N   an t n  0 b n 1 N (8)w   a n t n (x n ) n 1 N (9)0   a n t n ; (7)식을 미분해서 정리하면 나오는 수식 n 1 (8), (9)를(7)에대입해서정리하면 N N N 1 N N T T L(w,b, a)    a n a m t n t m (x)  (x)   a n t n w  (x)   a n t n b   a n 2 n 1 m 1 n 1 n 1 n 1 N N 1 N N 1 N N ~ T  L (a)   a n   a n a m t n t m (x)  (x)   a n   a n a m t n t m k (x n , x m ) 2 n 1 m 1 2 n 1 m 1 n 1 n 1 (11)a n  0; (10)번식의제약식 N (12) a n t n  0; (9)번식의제약식 n 1
  • 21. 1 arg min w 2 w ,b 2 subject to t n (wT  (x)  b)  1   N 1 2 (7) L(w,b, a)  w   a n t n (w T  (x)  b)  1 ;라그랑쥬 최적화 수식 2 n 1 N N L(w,b, a)  w   a n t n (x n )  0  (8)w   a n t n (x n ) w n 1 n 1 L(w,b, a) N   an t n  0 b n 1 N (8)w   a n t n (x n ) n 1 N (9)0   ant n ; (7)식을 미분해서정리하면 나오는 수식 n1
  • 22.   N 1 2 (7) L(w,b, a)  w   a n t n (w T  (x)  b)  1 ;라그랑쥬 최적화 수식 2 n 1 N (8)w   a n t n (x n ) n 1 N (9)0   a n t n ; (7)식을 미분해서정리하면 나오는 수식 n 1 (8), (9)를(7)에대입해서정리하면 N N N 1 N N T L(w,b, a)    a n a m t n t m k (x n , x m )   a n t n w  (x)   a n t n b   a n 2 n 1 m 1 n 1 n 1 n 1 N N 1 N N 1 N N ~ T  L (a)   a n   a n a m t n t m (x)  (x)   a n   a n a m t n t m k (x n , x m ) 2 n 1 m 1 2 n 1 m 1 n 1 n 1 (11)a n  0; (10)번식의제약식 N (12) a n t n  0; (10)번식의제약식 n 1
  • 23. 이부분 이해를 위해 천천히 한번 읽어보라고 한다 SVM의 마술   N 1 2 (7) L(w,b, a)  w   an t n (w T  (x)  b)  1 ; 라그랑쥬 최적화 수식 2 n 1  이 QP 최적화 문제를 푸는데 필요한 계산 복잡도 O( M 3 ) , M은 feature 의 차원 N 1 N N ~ (10) L (a)   an   an am t n t m k (x n , x m ); (8), (9)식을이용해서얻은(7)번식의_ 듀얼폼 2 n1 m1 n 1  이 QP 최적화 문제를 푸는데 필요한 계산 복잡도 O( N 3 ) , N은 feature 의 갯수 보통 M  N이기 때문에 듀얼폼으로의 변환은 안 좋아보인다. 하지만 M  이라면 ? 즉 무한차원 매핑을 할 수 있다. 그러나 커널 함수(k )를 통해 적은 계산으로 그 일을 할 수 있다.이것을 커널 트릭이라고 한다.  e.g ) (x)T  (x)  3x10000000 x10000000x3  3x3!!!!!!!; 차원은없어진다.
  • 24. SVM의 예측기 (1) y(x)  w (x)  b,(8)w  T N  ant n(xn ) n 1 (13) y(x)  N  ant nk (x, xn )  b; n1 SVM에서 learning이란 결국 an과 b를 학습하는 것이다. N 1 N N ~ L (a)   an   an amt nt m k (x n , x m ) 2 n1 m1 n 1 N subject to an  0 ,  ant n  0; n1 an은 위의 제약식을 QP로 풀면된다. QP가 SVM 계산중 가장 무거운 부분이다. 그래서 QP 를 최적화하려는 연구도있다. 그중 하나가 SMO(sequential minimal optimization) MATLAB ' quadprog ' 함수 제공
  • 25. N 1 2  (7) L(w,b, a)  w   a n t n y (x)  1 ; 라그랑쥬 최적화 수식 2 n 1 위의 수식은 다음과 같은 조건을 만족한다 ( KKT condition ) (14) a n  0 (15)t n y (x n )  1  0 (16)a n t n y (x n )  1  0; a n  0 or t n y (x n )  1 이 조건의 의미는 a n  0이면 x n 은 SV라는 것이다. 즉 , QP를 풀면 SV를 알 수 있다. 이 SV의 집합만 가지고 (13)식으로 분류해 낼 수 있다.
  • 26. b를 계산 N (13) y(x)   ant n k (x, x n )  b.을 t n y (x n )  1 에 대입하면, n1  N  (17)t n   am t m k (x, x m )  b   1;  mSV  1 (18)b  NS     t n   amt m k (x, x n )    mS  nS  Support vector만 가지고 분 류평면을 만들어 낸다.
  • 27. Linear seperable 하지 않다면? 여태까지는 sample들이 고 차원 mapping 함수를 통해 서 linear seperable 가능하 다고 가정하고 이론을 전개 시켜 나갔다. 하지만 실제 문제에서는 그 러한 경우는 많지 않다. 그래서 데이터들이 linear seperable하지 않을 때를 해 결하기 위한 최적화 수식의 변화가 필요하다.
  • 28. def .slack :  n  t n  y (x n ) ; 실제 타겟 값과 예측값과의 차이 (20)t n y (x n )  1   n , n  1,..., N ; 제약식(5)는이식으로바뀐다.  n  0; 옳게분류, 0   n  1; 마진안에,  n  1; 분류가틀리게됨 N 1 2 w ; 잘못분류되는샘플에대해서는페널티를가하면서마진의역을최소화 2 n 1 1 2 C  이면 (6) arg min w 이랑 똑같아짐 2 w ,b N N N 1 2 (22) L(w, b,  , a, μ)  w  C   n   an t n y (x n )  1   n     n n 2 n 1 n 1 n 1 (21) C   n  (23)an  0    (24)t y (x )  1    0  n n   (25)an (t n y (x n )  1   )  0   KKT (26)  n  0     (27) n  0   (28)  n n  0    
  • 29. Linear seperable 하지 않다면? 컨디션 N  L  (29)  0  w   an t n ( xn )  w n 1   N   L (30)  0   an t n  0  각 파라메터에 대해서 미분한 값  b n 1     L  0  an  C   n   (31)  n   N N N N N 1 2 (32) L(w, b,  , a, μ)  w  C   n   ant n y (x n )   an   an n    n n 2 n 1 n 1 n 1 n 1 n 1 N N N 1 2  w   ant n y (x n )    n (C  an   n )   an 2 n 1 n 1 n 1 N 1 N N ~  L (a)   an   an am t n t m k (x n , x m ); 2 n1 m1 n 1 (33)(31)an  C   n and (23)an  0 and (26)  n  0  0  an  C N (34) an t n  0( (30)) n 1
  • 30.
  • 31.
  • 32. Example. w1 : (1,5) t , (2,4) t w2 : (2,3) t , (1,5) t 다음 점들을 다음 커널과 SVM을 이용해서 분류하라. ? polynomial kernel  k(x ,z)  1  x z T   1 x z 2 1 2 1  x 2• z 2  2 2 2  1  2x 1z 1  2x 2z 2  x 1 z 1  2x 1z 1x 2z 2  x 2 z 2  2 1 2  2 2 1  1 2x 1, 2x 2, x , 2x 1x 2, x 2 1 2z 1, 2z 2, z , 2z 1z 2, z 2 , ,   x   z  T  2T
  • 33. N 1 N N ~ L (a)   an   an amt nt m k (x n , x m ) 2 n1 m1 n 1 N subject to an  0 ,  ant n  0; n1 원래는 QP로 풀어야 되지만 간단한 문제이 기 때문에 해석적으로 풀었다.lagrange multiplier 예제에서 보여준 것처럼. N (8)w   a n t n (x n ) n 1
  • 34. 고차원으로 매핑전의 차원에서의 초평면을 구하면 포물선 형태의 분류 경계가 나온다.
  • 35.
  • 37. Fast Training of Support Vector Machines using Sequential Minimal Optimization, Platt, John. Jung-Kyu Lee @ MLLAB
  • 38. SVM revisited • the (dual) optimization problem we covered in SVM class m 1 m max W (a )   a i   yi y ja ia j xi , x j . a 2 i , j 1 i 1 s.t 0  a i  C , i  1,..., m m a y i 1 i i 0 • Solving this optimization problem for large-scale application by using quadratic programming is very computationally expensive. • John Platt a colleague at Microsoft, proposed efficient algorithm called SMO (Sequential Minimal Optimization) to actually solve this optimization problem.
  • 39. Digression: Coordinate ascent(1/3) • Consider trying to solve the following unconstrained optimization problem. max W (a1 , a 2 ,..., a m ) a • Coordinate ascent:
  • 40. Digression: Coordinate ascent(2/3) • • • • • • How coordinate ascent work? We optimize a quadratic function. Minimum is (0,0) First, minimize this w.r.t α1 Next, minimize this w.r.t α2 With same argument, iterate until convergence.
  • 41. Digression: Coordinate ascent(3/3) • Coordinate assent will usually take a lot more steps than other iterative optimization algorithm such as gradient descent, Newton’s method, conjugate gradient descent etc. • However, there are many optimization problems for which it’s particularly easy to fix all but one of the parameters and optimize with respect to just that one parameter. • It turns out that this will be true when we modify this algorithm to solve the SVM optimization problem.
  • 42. SMO • Coordinate assent in its basic form does not work to apply SVM dual optimization problem. m 1 m max W (a )   a i   yi y ja ia j xi , x j . a 2 i , j 1 i 1 s.t 0  a i  C , i  1,..., m m a y i 1 i i 0 • As coordinate assent progress, we can’t change αi without violating the m constraint.  a i yi  0 i 1 m  a1 y1   a i yi i2 m  a1   y1  a i yi ( y1  {1,1}, y1  1) i2 2
  • 43. Outline of SMO • The SMO (sequential minimal optimization ) algorithm, therefore, instead of trying to change one α at a time, we will try to change two α at a time to keep satisfying the constraints. • The term ‘minimal’ refers to the fact that we’re choosing the smallest number of α
  • 44. Convergence of SMO • To test for convergence of SMO algorithm, we can check whether the KKT conditions are satisfied to within some tolerance ai  0  yi ( wT xi  b)  1 ai  C  yi ( wT xi  b)  1 0  ai  C  yi ( wT xi  b)  1 • The key reason that SMO is an efficient algorithm is that updating αi, αj can be computed very efficiently.
  • 45. Detail of SMO • Suppose we’ve decide to hold α3,…,αm , update α1, α2 m a y i 1 i i 0 m  a1 y1  a 2 y2   a i yi i 3  a1 y1  a 2 y2  
  • 46. Detail of SMO • We can thus picture the constraints on α1, α2 as follows • α1 and α2 must lie within the box [0,C]×[0,C] shown. • α1 and α2 must lie on line : a1 y1  a 2 y2   • From these constraints, we know: L  a2  H
  • 47. Detail of SMO • From a1 y1  a 2 y2    a1 y1  (  a 2 y2 )  a1 y1  (  a 2 y2 ) y1 2  a1  (  a 2 y2 ) y1 • W(α) become W (a1 , a 2 ,..., a m )  W ((  a 2 y2 ) y1 , a 2 ,..., a m )  aa    ba 2  c • This is a standard quadratic function.
  • 48. Detail of SMO • From  aa   ba 2  c • This is really easy to optimize by setting its derivative to zero. • If you end up with a value outside, you must clip your solution. • That’ll give you the optimal solution of this quadratic optimization problem subject to your solution satisfying this box constraint and lying on this straight line.
  • 49. Detail of SMO • In conclusion,  H if a 2 new,unclipped  H  a 2 new   a 2 new,unclipped if L  a 2 new,unclipped  H  L if a new,unclipped  L 2  a1  (  a 2 y2 ) y1 • We do this process very quickly, which makes the inner loop of the SMO algorithm very efficient.
  • 50. Conclusion • SMO - has scalability for large scale data. - makes SVM computationally inexpensive. - can be implemented easily because it has simple training structure (coordinate ascent).
  • 51. Reference • • [1] Platt, John. Fast Training of Support Vector Machines using Sequential Minimal Optimization, in Advances in Kernel Methods – Support Vector Learning, B. Scholkopf, C. Burges, A. Smola, eds., MIT Press (1998). [2] The Simplified SMO Algorithm , Machine Learning course @ Stanford http://www.stanford.edu/class/cs229/materials/smo.pdf