SlideShare ist ein Scribd-Unternehmen logo
1 von 21
DA 2111 – Statistical & Machine Learning
Lecture 7 – Support Vector Machines (SVM)
Maninda Edirisooriya
manindaw@uom.lk
Classification Problem (remember?)
• Say we have two X variables
• In Binary Classification our goal
is to classify data points into two
known classes, Positive or
Negative
• When we can separate classes
with a linear decision boundary
we call it as Linearly Separable
Negative
Positive
X2
Y = β0 + β1*X1 + β2*X2 = 0
(Decision boundary equation)
Y > 0
Positive
Y < 0
Negative
X1
SVM Related Notation
• To make SVM related math easier we have to divide 𝛃 parameters into two types
• 𝛃0 as b or intercept
• 𝛃1, 𝛃2, … 𝛃n as W1 , W2 , … Wn or coefficients
• Say, W = [W1 , W2 , … Wn ]T
• Then,
𝜷 =
𝛽0
𝛽1
𝛽2
.
.
𝛽𝑛
=
𝑏
𝐖
SVM Classification
• Let’s consider a Linearly
Separable case (i.e. Hard SVM)
• SVM tries to find the hyperplane
(i.e. W and b) maximizing the
minimum distance from
hyperplane to its nearest data
points
• Nearest data points (that are
exactly unit 1 distance from the
hyperplane) are called Support
Vectors Negative
Positive
X2
Y = b + W1*X1 + W2*X2 = WT*X + b = 0
(SVM Decision boundary/hyperplane equation)
Y > 1
Positive
Y < -1
Negative
X1
Support Vectors
Primal Problem
• SVM Decision Boundary can be represented in Geometric sense, and
optimized as follows for finding W and b
• Minimize
𝟏
𝟐
∥W∥2+𝑪 σ𝟏=𝟏
𝑵
𝝃𝒊
• Subject to
1. Yi(WTXi+b) ≥ 𝟏 − 𝝃𝒊 for all i=1, 2, ..., N
2. 𝛏𝐢 ≥ 0 for all i=1, 2, ..., N
• This is the Geometric Problem or the Primal Problem of SVMs
• Here,
• A ∥W∥ is the Euclidian Norm of the Weight Vector, W = [W1 , W2 , … Wn]T
• In order to allow misclassified data points (i.e. to generalize to a Soft SVM), 𝝃𝒊,
known as the Slack Variable, is reduced from 1
• C is the Regularization Parameter that adds L1 regularization
• The goal is to minimize the objective function while ensuring that all data
points are correctly classified with a margin of at least 1 − 𝝃𝒊
Primal Problem
• By trying to minimizing ∥W∥ we mathematically try to minimize the
distance between the solution hyperplane and the support vectors
• The constrain part Yi(WTXi+b) ≥ 𝟏 tries to maintain at least a unit
distance between the hyperplane and its nearest vectors (Support
Vectors)
• In the constraint, reducing the term 𝝃𝒊 from 1 (in the right hand side)
allows Soft SVMs, where this minimum gap of 1 cannot be
maintained
• Regularization parameter C sets the tradeoff between maximizing the
margin and minimizing classification errors
• A smaller C emphasizes a wider margin and tolerates some misclassifications,
while a larger C results in a narrower margin and fewer misclassifications
Effect of C
Source: https://stats.stackexchange.com/questions/31066/what-is-the-influence-of-c-in-svms-with-linear-kernel
Modelling Non-Linear Functions
• Though the parameters 𝝃𝒊 and C
can address misclassifications still
this can only model nearly linearly
separable classes
• Highly non-linear classes (e.g.: that
needs a circular hyperplane)
cannot be modelled with a linear
hyperplane
• But such data may be linearly
separable if they are represented in
a higher dimensional space
Negative Positive
X2
Y = β0 + β1∗X1
2 + β2∗X2
2
(Circular decision boundary)
Y > 0
Positive
Y < 0
Negative
X1
Separate in a Higher Dimension
Source: https://www.hackerearth.com/blog/developers/simple-tutorial-svm-parameter-tuning-python-r
Dual Problem
• It is difficult to do that dimension increase transformation with Primal
Problem
• But Primal Problem can be converted to an equivalent problem
known as the Dual Problem or the Functional Problem to find 𝜶𝒊s
(known as Lagrange Multipliers)
• Maximize σ𝒊=𝟏
𝑵
𝜶𝒊 −
𝟏
𝟐
෎
𝒊=𝟏
𝑵
෍
𝒋=𝟏
𝑵
𝜶𝒊𝜶𝒋 𝒀𝒊𝒀𝒋𝑿𝒊
𝑻
𝑿𝒋
• Subject to
1. C ≥ 𝜶𝒊 ≥ 𝟎 for all i=1, 2, ..., N and
2. σ𝒊=𝟏
𝑵
𝜶𝒊𝒀𝒊 = 𝟎
Dual Problem
• With 𝜶𝒊 values we can,
• Find the Support Vectors and then
• Find the desicion boundary (hyperplane)
• Note that here we have
• Only 𝜶𝒊 values to be found which are less in number compared to W and b
values
• We have 𝑿𝒊
𝑻
𝑿𝒋 term which can be used for Kernel Trick (discussed later) to
handle non-linear desicion boundaries
• Once we have solved the Dual Problem the solution to the equivalant
Primal Problem can also be found
Solution Function from Dual Problem
• First find 𝜶𝒊s for Support Vectors that are greater than zero
• Using these support vectors calculate weight vector, W = 𝜮𝑺𝜶𝒊𝒀𝒊𝑿𝐢
• Where, 𝜮𝑺
: Sum of all the support vectors
• Calculate bias b using any support vector, b = 𝒀𝒊- WT𝑿𝐢
• As the solution function (Hypothesis Function) output is defined by
its sign, f(X) = sign(WT𝑿𝒊+b) where sign is a function that returns the
sign of a value
Transform to Higher Dimensions
• The data points in can be transformed to a higher dimensional space by
applying some mathematical technique
• E.g.: X1, X2 → X1, X2, X3 by defining X3 such that X3 = X1
2 + X2
2
• Then when it becomes linearly separable in the new space with higher
number of dimensions, a SVM can be applied for classification
• But this becomes highly computationally expensive when the number of
dimensions in the new space is very large (e.g.: 106 dimensions) or infinite
• But there is a way to modify the original function to get a similar effect,
without calculating the coordinates in the new high dimensional space
• That is possible by simply updating the distance between data points like
they are in a higher dimensional space, using the technique, Kernel Trick
Kernel Function
• Kernel Function is a function that measures the similarity/distance
between two data points in a vector space
• A Kernel Function should be Positive Definite as well
• Examples (when X and Y are vectors with same dimensions)
• Linear Kernel: K(X, Y) = XTY
• Polynomial Kernel: K(X, Y) = (XTY + r)n where r ≥ 0 and n ≥ 1
• Gaussian (Radial Basis Function) Kernel: K(X, Y) = ⅇ
−
𝑿−𝒀 𝟐
𝟐𝝈𝟐 where 𝝈 > 0
Linear Kernel
• A Linear Kernel gives the inner product (or dot product) of two
vectors in the same vector space
• Linear Kernel does not change the number of dimensions when the
similarity is given between the two vectors
• Example, (Say X and Y are vectors where X = [X1, X2]T and Y = [Y1, Y2]T)
• K(X, Y) = XTY = X1Y1+ X2Y2
Polynomial Kernel
• A Polynomial Kernel maps the X variables to a higher degree
polynomial up to the degree of the function
• When r=0, K(X, Y) = (XTY + r)n becomes (XTY)n where
• When n=1 it becomes a Linear Kernel XTY having a linear decision boundary
• When n=2, it becomes a quadratic kernel and the decision boundary is a
quadratic surface
• When n=3, it becomes a cubic kernel and the decision boundary is a cubic
surface
• And so on …
• When r > 1, r shifts the decision boundary
Gaussian (Radial Basis Function) Kernel
• RBF Kernel has infinite dimensions and can model highly non-linear
decision boundaries
• Can also be represented as K(X, Y) = ⅇ−𝜸 𝑿−𝒀 𝟐
where 𝜸 can tune the
bias-variance tradeoff
• Low 𝜸 relates to smoother hyperplane OR higher bias and lower variance
• High 𝜸 relates to wiggly hyperplane OR lower bias and higher variance
• Cross-validation can be used to find the best value for 𝜸
Kernel Trick
• In the Dual Problem, σ𝒊=𝟏
𝑵
𝜶𝒊 −
𝟏
𝟐
෎
𝒊=𝟏
𝑵
෍
𝒋=𝟏
𝑵
𝜶𝒊𝜶𝒋 𝒀𝒊𝒀𝒋𝑿𝒊
𝑻
𝑿𝒋 , which
has to be maximized, contains a Kernel Function on X variables to
themselves (i.e. 𝑿𝒊
𝑻
𝑿𝒋 which is a Linear Kernel) that can be replaced with
another Kernel Function
• This new Kernel Function can be a Polynomial Kernel or a Gaussian Kernel
(RBF Kernel) or any other valid Kernel function
• New Dual Problems needs to maximize becomes,
෍
𝒊=𝟏
𝑵
𝜶𝒊 −
𝟏
𝟐
ා
𝒊=𝟏
𝑵
෍
𝒋=𝟏
𝑵
𝜶𝒊𝜶𝒋 𝒀𝒊𝒀𝒋𝑲(𝑿𝒊, 𝑿𝒋)
SVM with Kernel
• The SVM with Kernel will become,
f(X) = 𝜮𝑺𝜶𝒊𝒀𝒊𝑲 𝑿, 𝑿𝒊 + 𝒃
• Where,
• 𝑲 𝑿, 𝑿𝒊 : Kernel function between X values with all support vectors 𝐗𝐢
• 𝜶𝒊: Lagrange multipliers learnt during the training
• 𝜮𝑺: Sum of all the support vectors
One Hour Homework
• Officially we have one more hour to do after the end of the lectures
• Therefore, for this week’s extra hour you have a homework
• Support Vector Machine is an important tool in Machine Learning for dealing
with highly non-linear relatively small datasets
• This lesson only explained the formulas in SVM but not proved any, as that
involves heavy mathematical calculations. You can try if you like
• Then search for the real world applications with SVMs and understand when
it has to be used as a ML tool
• Good Luck!
Questions?

Weitere ähnliche Inhalte

Ähnlich wie Extra Lecture - Support Vector Machines (SVM), a lecture in subject module Statistical & Machine Learning

Lecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptxLecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptxDuniaAbdelaziz
 
Machine learning interviews day2
Machine learning interviews   day2Machine learning interviews   day2
Machine learning interviews day2rajmohanc
 
Machine learning with neural networks
Machine learning with neural networksMachine learning with neural networks
Machine learning with neural networksLet's talk about IT
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxSubrata Kumer Paul
 
course slides of Support-Vector-Machine.pdf
course slides of Support-Vector-Machine.pdfcourse slides of Support-Vector-Machine.pdf
course slides of Support-Vector-Machine.pdfonurenginar1
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
 
1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vectorDr Fereidoun Dejahang
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector MachinesSilicon Mentor
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Dual SVM Problem.pdf
Dual SVM Problem.pdfDual SVM Problem.pdf
Dual SVM Problem.pdfssuser8547f2
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptxsurbhidutta4
 
Support Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptxSupport Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptxCodingChamp1
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptxAbdusSadik
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptxKarasuLee
 
Constraint satisfaction problems (csp)
Constraint satisfaction problems (csp)   Constraint satisfaction problems (csp)
Constraint satisfaction problems (csp) Archana432045
 
support vector machine algorithm in machine learning
support vector machine algorithm in machine learningsupport vector machine algorithm in machine learning
support vector machine algorithm in machine learningSamGuy7
 

Ähnlich wie Extra Lecture - Support Vector Machines (SVM), a lecture in subject module Statistical & Machine Learning (20)

Lecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptxLecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptx
 
Machine learning interviews day2
Machine learning interviews   day2Machine learning interviews   day2
Machine learning interviews day2
 
Machine learning with neural networks
Machine learning with neural networksMachine learning with neural networks
Machine learning with neural networks
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
course slides of Support-Vector-Machine.pdf
course slides of Support-Vector-Machine.pdfcourse slides of Support-Vector-Machine.pdf
course slides of Support-Vector-Machine.pdf
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector Machines
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Dual SVM Problem.pdf
Dual SVM Problem.pdfDual SVM Problem.pdf
Dual SVM Problem.pdf
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 
Support Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptxSupport Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptx
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
[ML]-SVM2.ppt.pdf
[ML]-SVM2.ppt.pdf[ML]-SVM2.ppt.pdf
[ML]-SVM2.ppt.pdf
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
 
Constraint satisfaction problems (csp)
Constraint satisfaction problems (csp)   Constraint satisfaction problems (csp)
Constraint satisfaction problems (csp)
 
Support Vector Machine.ppt
Support Vector Machine.pptSupport Vector Machine.ppt
Support Vector Machine.ppt
 
svm.ppt
svm.pptsvm.ppt
svm.ppt
 
support vector machine algorithm in machine learning
support vector machine algorithm in machine learningsupport vector machine algorithm in machine learning
support vector machine algorithm in machine learning
 

Mehr von Maninda Edirisooriya

Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
Lecture - 10 Transformer Model, Motivation to Transformers, Principles,  and ...Lecture - 10 Transformer Model, Motivation to Transformers, Principles,  and ...
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...Maninda Edirisooriya
 
Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning TechniquesLecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning TechniquesManinda Edirisooriya
 
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Maninda Edirisooriya
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Maninda Edirisooriya
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Maninda Edirisooriya
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMAnalyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMManinda Edirisooriya
 

Mehr von Maninda Edirisooriya (20)

Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
Lecture - 10 Transformer Model, Motivation to Transformers, Principles,  and ...Lecture - 10 Transformer Model, Motivation to Transformers, Principles,  and ...
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
 
Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning TechniquesLecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning Techniques
 
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMAnalyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
 
WSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolboxWSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolbox
 
Training Report
Training ReportTraining Report
Training Report
 
GViz - Project Report
GViz - Project ReportGViz - Project Report
GViz - Project Report
 
Mortivation
MortivationMortivation
Mortivation
 
Hafnium impact 2008
Hafnium impact 2008Hafnium impact 2008
Hafnium impact 2008
 
ChatCrypt
ChatCryptChatCrypt
ChatCrypt
 

Kürzlich hochgeladen

Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2T.D. Shashikala
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfEr.Sonali Nasikkar
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineAftabkhan575376
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Lovely Professional University
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdftawat puangthong
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashidFaiyazSheikh
 
Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxmichaelprrior
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoninghotman30312
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor banktawat puangthong
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfJNTUA
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoAbhimanyu Sangale
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdfKamal Acharya
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineJulioCesarSalazarHer1
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...drjose256
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdfKamal Acharya
 

Kürzlich hochgeladen (20)

Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdf
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsx
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoning
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdf
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of Arduino
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission line
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 

Extra Lecture - Support Vector Machines (SVM), a lecture in subject module Statistical & Machine Learning

  • 1. DA 2111 – Statistical & Machine Learning Lecture 7 – Support Vector Machines (SVM) Maninda Edirisooriya manindaw@uom.lk
  • 2. Classification Problem (remember?) • Say we have two X variables • In Binary Classification our goal is to classify data points into two known classes, Positive or Negative • When we can separate classes with a linear decision boundary we call it as Linearly Separable Negative Positive X2 Y = β0 + β1*X1 + β2*X2 = 0 (Decision boundary equation) Y > 0 Positive Y < 0 Negative X1
  • 3. SVM Related Notation • To make SVM related math easier we have to divide 𝛃 parameters into two types • 𝛃0 as b or intercept • 𝛃1, 𝛃2, … 𝛃n as W1 , W2 , … Wn or coefficients • Say, W = [W1 , W2 , … Wn ]T • Then, 𝜷 = 𝛽0 𝛽1 𝛽2 . . 𝛽𝑛 = 𝑏 𝐖
  • 4. SVM Classification • Let’s consider a Linearly Separable case (i.e. Hard SVM) • SVM tries to find the hyperplane (i.e. W and b) maximizing the minimum distance from hyperplane to its nearest data points • Nearest data points (that are exactly unit 1 distance from the hyperplane) are called Support Vectors Negative Positive X2 Y = b + W1*X1 + W2*X2 = WT*X + b = 0 (SVM Decision boundary/hyperplane equation) Y > 1 Positive Y < -1 Negative X1 Support Vectors
  • 5. Primal Problem • SVM Decision Boundary can be represented in Geometric sense, and optimized as follows for finding W and b • Minimize 𝟏 𝟐 ∥W∥2+𝑪 σ𝟏=𝟏 𝑵 𝝃𝒊 • Subject to 1. Yi(WTXi+b) ≥ 𝟏 − 𝝃𝒊 for all i=1, 2, ..., N 2. 𝛏𝐢 ≥ 0 for all i=1, 2, ..., N • This is the Geometric Problem or the Primal Problem of SVMs • Here, • A ∥W∥ is the Euclidian Norm of the Weight Vector, W = [W1 , W2 , … Wn]T • In order to allow misclassified data points (i.e. to generalize to a Soft SVM), 𝝃𝒊, known as the Slack Variable, is reduced from 1 • C is the Regularization Parameter that adds L1 regularization • The goal is to minimize the objective function while ensuring that all data points are correctly classified with a margin of at least 1 − 𝝃𝒊
  • 6. Primal Problem • By trying to minimizing ∥W∥ we mathematically try to minimize the distance between the solution hyperplane and the support vectors • The constrain part Yi(WTXi+b) ≥ 𝟏 tries to maintain at least a unit distance between the hyperplane and its nearest vectors (Support Vectors) • In the constraint, reducing the term 𝝃𝒊 from 1 (in the right hand side) allows Soft SVMs, where this minimum gap of 1 cannot be maintained • Regularization parameter C sets the tradeoff between maximizing the margin and minimizing classification errors • A smaller C emphasizes a wider margin and tolerates some misclassifications, while a larger C results in a narrower margin and fewer misclassifications
  • 7. Effect of C Source: https://stats.stackexchange.com/questions/31066/what-is-the-influence-of-c-in-svms-with-linear-kernel
  • 8. Modelling Non-Linear Functions • Though the parameters 𝝃𝒊 and C can address misclassifications still this can only model nearly linearly separable classes • Highly non-linear classes (e.g.: that needs a circular hyperplane) cannot be modelled with a linear hyperplane • But such data may be linearly separable if they are represented in a higher dimensional space Negative Positive X2 Y = β0 + β1∗X1 2 + β2∗X2 2 (Circular decision boundary) Y > 0 Positive Y < 0 Negative X1
  • 9. Separate in a Higher Dimension Source: https://www.hackerearth.com/blog/developers/simple-tutorial-svm-parameter-tuning-python-r
  • 10. Dual Problem • It is difficult to do that dimension increase transformation with Primal Problem • But Primal Problem can be converted to an equivalent problem known as the Dual Problem or the Functional Problem to find 𝜶𝒊s (known as Lagrange Multipliers) • Maximize σ𝒊=𝟏 𝑵 𝜶𝒊 − 𝟏 𝟐 ෎ 𝒊=𝟏 𝑵 ෍ 𝒋=𝟏 𝑵 𝜶𝒊𝜶𝒋 𝒀𝒊𝒀𝒋𝑿𝒊 𝑻 𝑿𝒋 • Subject to 1. C ≥ 𝜶𝒊 ≥ 𝟎 for all i=1, 2, ..., N and 2. σ𝒊=𝟏 𝑵 𝜶𝒊𝒀𝒊 = 𝟎
  • 11. Dual Problem • With 𝜶𝒊 values we can, • Find the Support Vectors and then • Find the desicion boundary (hyperplane) • Note that here we have • Only 𝜶𝒊 values to be found which are less in number compared to W and b values • We have 𝑿𝒊 𝑻 𝑿𝒋 term which can be used for Kernel Trick (discussed later) to handle non-linear desicion boundaries • Once we have solved the Dual Problem the solution to the equivalant Primal Problem can also be found
  • 12. Solution Function from Dual Problem • First find 𝜶𝒊s for Support Vectors that are greater than zero • Using these support vectors calculate weight vector, W = 𝜮𝑺𝜶𝒊𝒀𝒊𝑿𝐢 • Where, 𝜮𝑺 : Sum of all the support vectors • Calculate bias b using any support vector, b = 𝒀𝒊- WT𝑿𝐢 • As the solution function (Hypothesis Function) output is defined by its sign, f(X) = sign(WT𝑿𝒊+b) where sign is a function that returns the sign of a value
  • 13. Transform to Higher Dimensions • The data points in can be transformed to a higher dimensional space by applying some mathematical technique • E.g.: X1, X2 → X1, X2, X3 by defining X3 such that X3 = X1 2 + X2 2 • Then when it becomes linearly separable in the new space with higher number of dimensions, a SVM can be applied for classification • But this becomes highly computationally expensive when the number of dimensions in the new space is very large (e.g.: 106 dimensions) or infinite • But there is a way to modify the original function to get a similar effect, without calculating the coordinates in the new high dimensional space • That is possible by simply updating the distance between data points like they are in a higher dimensional space, using the technique, Kernel Trick
  • 14. Kernel Function • Kernel Function is a function that measures the similarity/distance between two data points in a vector space • A Kernel Function should be Positive Definite as well • Examples (when X and Y are vectors with same dimensions) • Linear Kernel: K(X, Y) = XTY • Polynomial Kernel: K(X, Y) = (XTY + r)n where r ≥ 0 and n ≥ 1 • Gaussian (Radial Basis Function) Kernel: K(X, Y) = ⅇ − 𝑿−𝒀 𝟐 𝟐𝝈𝟐 where 𝝈 > 0
  • 15. Linear Kernel • A Linear Kernel gives the inner product (or dot product) of two vectors in the same vector space • Linear Kernel does not change the number of dimensions when the similarity is given between the two vectors • Example, (Say X and Y are vectors where X = [X1, X2]T and Y = [Y1, Y2]T) • K(X, Y) = XTY = X1Y1+ X2Y2
  • 16. Polynomial Kernel • A Polynomial Kernel maps the X variables to a higher degree polynomial up to the degree of the function • When r=0, K(X, Y) = (XTY + r)n becomes (XTY)n where • When n=1 it becomes a Linear Kernel XTY having a linear decision boundary • When n=2, it becomes a quadratic kernel and the decision boundary is a quadratic surface • When n=3, it becomes a cubic kernel and the decision boundary is a cubic surface • And so on … • When r > 1, r shifts the decision boundary
  • 17. Gaussian (Radial Basis Function) Kernel • RBF Kernel has infinite dimensions and can model highly non-linear decision boundaries • Can also be represented as K(X, Y) = ⅇ−𝜸 𝑿−𝒀 𝟐 where 𝜸 can tune the bias-variance tradeoff • Low 𝜸 relates to smoother hyperplane OR higher bias and lower variance • High 𝜸 relates to wiggly hyperplane OR lower bias and higher variance • Cross-validation can be used to find the best value for 𝜸
  • 18. Kernel Trick • In the Dual Problem, σ𝒊=𝟏 𝑵 𝜶𝒊 − 𝟏 𝟐 ෎ 𝒊=𝟏 𝑵 ෍ 𝒋=𝟏 𝑵 𝜶𝒊𝜶𝒋 𝒀𝒊𝒀𝒋𝑿𝒊 𝑻 𝑿𝒋 , which has to be maximized, contains a Kernel Function on X variables to themselves (i.e. 𝑿𝒊 𝑻 𝑿𝒋 which is a Linear Kernel) that can be replaced with another Kernel Function • This new Kernel Function can be a Polynomial Kernel or a Gaussian Kernel (RBF Kernel) or any other valid Kernel function • New Dual Problems needs to maximize becomes, ෍ 𝒊=𝟏 𝑵 𝜶𝒊 − 𝟏 𝟐 ා 𝒊=𝟏 𝑵 ෍ 𝒋=𝟏 𝑵 𝜶𝒊𝜶𝒋 𝒀𝒊𝒀𝒋𝑲(𝑿𝒊, 𝑿𝒋)
  • 19. SVM with Kernel • The SVM with Kernel will become, f(X) = 𝜮𝑺𝜶𝒊𝒀𝒊𝑲 𝑿, 𝑿𝒊 + 𝒃 • Where, • 𝑲 𝑿, 𝑿𝒊 : Kernel function between X values with all support vectors 𝐗𝐢 • 𝜶𝒊: Lagrange multipliers learnt during the training • 𝜮𝑺: Sum of all the support vectors
  • 20. One Hour Homework • Officially we have one more hour to do after the end of the lectures • Therefore, for this week’s extra hour you have a homework • Support Vector Machine is an important tool in Machine Learning for dealing with highly non-linear relatively small datasets • This lesson only explained the formulas in SVM but not proved any, as that involves heavy mathematical calculations. You can try if you like • Then search for the real world applications with SVMs and understand when it has to be used as a ML tool • Good Luck!