SlideShare ist ein Scribd-Unternehmen logo
Hiba BELLAFKIH 2022-2023
Support Vector Machines
Supervised Machine Learning
1
Contents
2
03 Linearly Separable
Data Points
Hard Margin Classification
04 Linearly Separable
Data Points
Soft Margin Classification
02 Motivation
05 Non-Linearly
Separable Data
Kernel Trick 06 SVM Regression
01 Classification
Classification
3
Binary Classification
In this type, the machine should classify an instance as only one of two classes.
1
-1
Classification
4
Multiclass Classification ( one vs all)
In this type, the machine should classify an instance as only one of three classes or
more.
Classification
5
Multiclass Classification (one vs one)
We split a multi-class classification into one binary classification problem per each
pair of classes
Classes : [Red, Blue, Green, Yellow]
❏ Red vs Blue
❏ Red vs Green
❏ Red vs Yellow
❏ Blue vs Green
❏ Blue vs Yellow
❏ Green vs Yellow
N(N-1)/2 for N
classes.
Motivation
6
Classified training data set
(Example with well separated data points)
Best separator to be chosen
How do we chose the separator ?
Motivation
7
We add a margin to each side of the
separator
Wanted result :
➔ The separator with the largest
margin to the nearest data
point.
➔ The lowest generalization error.
Generalization error : a measure of how accurately
an algorithm is able to predict outcome values for
previously unseen data.
Motivation
8
Feature X2
Feature X1
Support Vectors
Decision Boundary
Classified +1
Classified -1
Data points
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
9
Our data form :
Then the SVM problem is going to be :
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
10
The margin is calculated with subtraction of distance between hyperplane H2 and
origin from distance between hyperplane H1 and origin which results in M = 2/|w|
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
11
1. Formulation of SVM problem
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
12
2. Finding parameters with respect to w, b and lambda (learning parameters).
Lagrange multipliers is an algorithm that finds where the
gradient of a function points in the same direction as the
gradients of its constraints, while also satisfying said
constraints.
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
13
3. Finding value of parameters that minimize ||w||.
Finding λ* => Finding w*
Finding λ* => Finding b*
Result : Switch to
optimizing λ
Solution ? **Dual Optimization Formulation**
When we move from primal to dual formulation we
switch from minimizing to maximizing the loss function.
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
14
1. Formulation and substitution value from primal problem
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
15
2. Simplify the loss function equation after substitution
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
16
3. Final optimization to get the value of λ
Above maximization operation can be solved with the
SMO ( sequential minimization optimization)
algorithms .
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
17
Once we get the value of λ we can get w from the equation below
Linearly Separable Data Points
Hard Margin Classification
Mathematical Interpretation of Optimal Hyperplane
18
And using the value of w , λ we will calculate b as following :
Linearly Separable Data Points
Soft Margin Classification
The reason behind it?
19
Hard margin ⇔ Perfect Separation ⇔ Overfitting
To allow the model to make a few mistakes while classifying the points.
Linearly Separable Data Points
Soft Margin Classification
Mathematical Interpretation of Optimal Hyperplane
20
Add a slack variable as a penalty for every miss-classification for each data point
represented by ξ (xi). So, no penalty means the data point is correctly classified, ξ = 0, and at
any miss classification ξ > 1, as a penalty.
Linearly Separable Data Points
Soft Margin Classification
Mathematical Interpretation of Optimal Hyperplane
21
Slack for every variable should be as low as possible and further regularized by
hyperparameter C
❏ If C = 0, means less complex boundary as classifier would be not penalized by slack, as
a result, the optimum hyperplane can use it anywhere and accept all large
misclassifications. And as a result, the decision boundary would be linear and under
fitted.
❏ If C = infinitely high, then even small slacks would be highly penalized and the
classifier can't afford to misclassify points and therefore overfitting. So parameter C is
important.
In machine learning, a hyperparameter is a parameter
whose value is used to control the learning process.
Linearly Separable Data Points
Soft Margin Classification
Mathematical Interpretation of Optimal Hyperplane
22
The new formulation becomes :
Linearly Separable Data Points
Soft Margin Classification
Mathematical Interpretation of Optimal Hyperplane
23
Primal gradient based optimization method to use gradient descent to update the
parameters of classifier.
The optimisation algorithm becomes :
Linearly Separable Data Points
Soft Margin Classification
Mathematical Interpretation of Optimal Hyperplane
24
Primal gradient based optimization method to use gradient descent to update the
parameters of classifier.
The optimisation algorithm becomes :
Non Linearly Separable Data Points
Kernel Trick
SVM Visualization
25
SVM Visualization
Non Linearly Separable Data Points
Kernel Trick
26
ORIGINAL
DATA
ORIGINAL
INNER
PROD
TRANS
DATA
TRANS
INNER
PROD
?
N*F
N*f
N²
N²
Non Linearly Separable Data Points
Kernel Trick
Mercer’s Theorem
27
According to Mercer’s theorem, if a function K(a, b) respects a few mathematical con‐
ditions called Mercer’s conditions (e.g., K must be continuous and symmetric in its
arguments so that K(a, b) = K(b, a), etc.), then there exists a function ϕ that maps a
and b into another space (possibly with much higher dimensions) such that
K(a, b) =ϕ(a)⊺ ϕ(b). You can use K as a kernel because you know ϕ exists, even if you don’t
know what ϕ is.
Non Linearly Separable Data Points
Kernel Trick
Mathematical Interpretation of Optimal Hyperplane
28
Optimization Problem becomes :
Non Linearly Separable Data Points
Kernel Trick
Most commonly used kernels
29
❏ Linear Kernel
❏ Polynomial Kernel Function
❏ Gaussian function
❏ Gaussian Radial basis function
SVM Regression
Concept
30
To use SVMs for regression instead of classification, the trick is to reverse the objective: instead
of trying to fit the largest possible street between two classes while limiting margin violations,
SVM Regression tries to fit as many instances as possible on the street while limiting margin
violations (i.e., instances off the street). The width of the street is controlled by a
hyperparameter, ϵ.

Weitere ähnliche Inhalte

Ähnlich wie SVM.pdf

Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
Zihui Li
 
classification algorithms in machine learning.pptx
classification algorithms in machine learning.pptxclassification algorithms in machine learning.pptx
classification algorithms in machine learning.pptx
jasontseng19
 
4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development
PriyankaRamavath3
 
SVM.ppt
SVM.pptSVM.ppt
Introduction to Machine Learning Elective Course
Introduction to Machine Learning Elective CourseIntroduction to Machine Learning Elective Course
Introduction to Machine Learning Elective Course
MayuraD1
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
NoorUlHaq47
 
OM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfOM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdf
ssuserb016ab
 
Support Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptxSupport Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptx
CodingChamp1
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
Krish_ver2
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
TheULTIMATEALLROUNDE
 
Supervised Learning.pptx
Supervised Learning.pptxSupervised Learning.pptx
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
홍배 김
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
Chun-Ming Chang
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
pushkarjoshi42
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
kibrualemu812
 
Svm ms
Svm msSvm ms
Svm ms
student
 
Support vector machine learning.pptx
Support vector machine learning.pptxSupport vector machine learning.pptx
Support vector machine learning.pptx
Abhiroop Bhattacharya
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
Suresh Pokharel
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
Sara Asher
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 

Ähnlich wie SVM.pdf (20)

Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
classification algorithms in machine learning.pptx
classification algorithms in machine learning.pptxclassification algorithms in machine learning.pptx
classification algorithms in machine learning.pptx
 
4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development
 
SVM.ppt
SVM.pptSVM.ppt
SVM.ppt
 
Introduction to Machine Learning Elective Course
Introduction to Machine Learning Elective CourseIntroduction to Machine Learning Elective Course
Introduction to Machine Learning Elective Course
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
 
OM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfOM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdf
 
Support Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptxSupport Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptx
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Supervised Learning.pptx
Supervised Learning.pptxSupervised Learning.pptx
Supervised Learning.pptx
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
Support vector machine learning.pptx
Support vector machine learning.pptxSupport vector machine learning.pptx
Support vector machine learning.pptx
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 

Kürzlich hochgeladen

EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 

Kürzlich hochgeladen (20)

EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 

SVM.pdf

  • 1. Hiba BELLAFKIH 2022-2023 Support Vector Machines Supervised Machine Learning 1
  • 2. Contents 2 03 Linearly Separable Data Points Hard Margin Classification 04 Linearly Separable Data Points Soft Margin Classification 02 Motivation 05 Non-Linearly Separable Data Kernel Trick 06 SVM Regression 01 Classification
  • 3. Classification 3 Binary Classification In this type, the machine should classify an instance as only one of two classes. 1 -1
  • 4. Classification 4 Multiclass Classification ( one vs all) In this type, the machine should classify an instance as only one of three classes or more.
  • 5. Classification 5 Multiclass Classification (one vs one) We split a multi-class classification into one binary classification problem per each pair of classes Classes : [Red, Blue, Green, Yellow] ❏ Red vs Blue ❏ Red vs Green ❏ Red vs Yellow ❏ Blue vs Green ❏ Blue vs Yellow ❏ Green vs Yellow N(N-1)/2 for N classes.
  • 6. Motivation 6 Classified training data set (Example with well separated data points) Best separator to be chosen How do we chose the separator ?
  • 7. Motivation 7 We add a margin to each side of the separator Wanted result : ➔ The separator with the largest margin to the nearest data point. ➔ The lowest generalization error. Generalization error : a measure of how accurately an algorithm is able to predict outcome values for previously unseen data.
  • 8. Motivation 8 Feature X2 Feature X1 Support Vectors Decision Boundary Classified +1 Classified -1 Data points
  • 9. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 9 Our data form : Then the SVM problem is going to be :
  • 10. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 10 The margin is calculated with subtraction of distance between hyperplane H2 and origin from distance between hyperplane H1 and origin which results in M = 2/|w|
  • 11. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 11 1. Formulation of SVM problem
  • 12. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 12 2. Finding parameters with respect to w, b and lambda (learning parameters). Lagrange multipliers is an algorithm that finds where the gradient of a function points in the same direction as the gradients of its constraints, while also satisfying said constraints.
  • 13. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 13 3. Finding value of parameters that minimize ||w||. Finding λ* => Finding w* Finding λ* => Finding b* Result : Switch to optimizing λ Solution ? **Dual Optimization Formulation** When we move from primal to dual formulation we switch from minimizing to maximizing the loss function.
  • 14. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 14 1. Formulation and substitution value from primal problem
  • 15. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 15 2. Simplify the loss function equation after substitution
  • 16. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 16 3. Final optimization to get the value of λ Above maximization operation can be solved with the SMO ( sequential minimization optimization) algorithms .
  • 17. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 17 Once we get the value of λ we can get w from the equation below
  • 18. Linearly Separable Data Points Hard Margin Classification Mathematical Interpretation of Optimal Hyperplane 18 And using the value of w , λ we will calculate b as following :
  • 19. Linearly Separable Data Points Soft Margin Classification The reason behind it? 19 Hard margin ⇔ Perfect Separation ⇔ Overfitting To allow the model to make a few mistakes while classifying the points.
  • 20. Linearly Separable Data Points Soft Margin Classification Mathematical Interpretation of Optimal Hyperplane 20 Add a slack variable as a penalty for every miss-classification for each data point represented by ξ (xi). So, no penalty means the data point is correctly classified, ξ = 0, and at any miss classification ξ > 1, as a penalty.
  • 21. Linearly Separable Data Points Soft Margin Classification Mathematical Interpretation of Optimal Hyperplane 21 Slack for every variable should be as low as possible and further regularized by hyperparameter C ❏ If C = 0, means less complex boundary as classifier would be not penalized by slack, as a result, the optimum hyperplane can use it anywhere and accept all large misclassifications. And as a result, the decision boundary would be linear and under fitted. ❏ If C = infinitely high, then even small slacks would be highly penalized and the classifier can't afford to misclassify points and therefore overfitting. So parameter C is important. In machine learning, a hyperparameter is a parameter whose value is used to control the learning process.
  • 22. Linearly Separable Data Points Soft Margin Classification Mathematical Interpretation of Optimal Hyperplane 22 The new formulation becomes :
  • 23. Linearly Separable Data Points Soft Margin Classification Mathematical Interpretation of Optimal Hyperplane 23 Primal gradient based optimization method to use gradient descent to update the parameters of classifier. The optimisation algorithm becomes :
  • 24. Linearly Separable Data Points Soft Margin Classification Mathematical Interpretation of Optimal Hyperplane 24 Primal gradient based optimization method to use gradient descent to update the parameters of classifier. The optimisation algorithm becomes :
  • 25. Non Linearly Separable Data Points Kernel Trick SVM Visualization 25 SVM Visualization
  • 26. Non Linearly Separable Data Points Kernel Trick 26 ORIGINAL DATA ORIGINAL INNER PROD TRANS DATA TRANS INNER PROD ? N*F N*f N² N²
  • 27. Non Linearly Separable Data Points Kernel Trick Mercer’s Theorem 27 According to Mercer’s theorem, if a function K(a, b) respects a few mathematical con‐ ditions called Mercer’s conditions (e.g., K must be continuous and symmetric in its arguments so that K(a, b) = K(b, a), etc.), then there exists a function ϕ that maps a and b into another space (possibly with much higher dimensions) such that K(a, b) =ϕ(a)⊺ ϕ(b). You can use K as a kernel because you know ϕ exists, even if you don’t know what ϕ is.
  • 28. Non Linearly Separable Data Points Kernel Trick Mathematical Interpretation of Optimal Hyperplane 28 Optimization Problem becomes :
  • 29. Non Linearly Separable Data Points Kernel Trick Most commonly used kernels 29 ❏ Linear Kernel ❏ Polynomial Kernel Function ❏ Gaussian function ❏ Gaussian Radial basis function
  • 30. SVM Regression Concept 30 To use SVMs for regression instead of classification, the trick is to reverse the objective: instead of trying to fit the largest possible street between two classes while limiting margin violations, SVM Regression tries to fit as many instances as possible on the street while limiting margin violations (i.e., instances off the street). The width of the street is controlled by a hyperparameter, ϵ.