Support vector machine is a powerful machine learning method in data classification. Using it for applied researches is easy but comprehending it for further development requires a lot of efforts. This report is a tutorial on support vector machine with full of mathematical proofs and example, which help researchers to understand it by the fastest way from theory to practice. The report focuses on theory of optimization which is the base of support vector machine.
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...ijfls
Â
This document summarizes a research paper that proposes a fuzzy bi-objective support vector machine (SVM) model to identify infected COVID-19 patients. The model uses SVM classification with two objectives - maximizing margin between classes and minimizing misclassification errors. An α-cut transforms the fuzzy model into a classical bi-objective problem solved using weighting methods. This generates multiple efficient solutions. An interactive process then identifies the best compromise based on minimizing the number of support vectors in each class. The model constructs a utility function to measure COVID-19 infection levels based on the SVM classification.
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...ijfls
Â
A support vector machine (SVM) learns the decision surface from two different classes of the input points. In several applications, some of the input points are misclassified and each is not fully allocated to either of these two groups. In this paper a bi-objective quadratic programming model with fuzzy parameters is utilized and different feature quality measures are optimized simultaneously. An α-cut is defined to transform the fuzzy model to a family of classical bi-objective quadratic programming problems. The weighting method is used to optimize each of these problems. For the proposed fuzzy bi-objective quadratic programming model, a major contribution will be added by obtaining different effective support vectors due to changes in weighting values. The experimental results, show the effectiveness of the α-cut with the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions. The main contribution of this paper includes constructing a utility function for measuring the degree of infection with coronavirus disease (COVID-19).
This document provides an introduction to support vector machines (SVM). It discusses the history and development of SVM, including its introduction in 1992 and popularity due to success in handwritten digit recognition. The document then covers key concepts of SVM, including linear classifiers, maximum margin classification, soft margins, kernels, and nonlinear decision boundaries. Examples are provided to illustrate SVM classification and parameter selection.
This document provides an introduction to support vector machines (SVM). It discusses the history and key concepts of SVM, including how SVM finds the optimal separating hyperplane with maximum margin between classes to perform linear classification. It also describes how SVM can learn nonlinear decision boundaries using kernel tricks to implicitly map inputs to high-dimensional feature spaces. The document gives examples of commonly used kernel functions and outlines the steps to perform classification with SVM.
This document summarizes support vector machines (SVMs), a machine learning technique for classification and regression. SVMs find the optimal separating hyperplane that maximizes the margin between positive and negative examples in the training data. This is achieved by solving a convex optimization problem that minimizes a quadratic function under linear constraints. SVMs can perform non-linear classification by implicitly mapping inputs into a higher-dimensional feature space using kernel functions. They have applications in areas like text categorization due to their ability to handle high-dimensional sparse data.
Real-Time Stock Market Analysis using Spark StreamingSigmoid
Â
This document proposes using support vector machines (SVMs) to model high-frequency limit order book dynamics and predict metrics like mid-price movement and price spread crossing. It describes representing each limit order book entry as a vector of attributes, then using multi-class SVMs to build models for each metric. Experiments on real data show the selected features are effective for short-term price forecasts. The document provides background on SVMs, describing how they find an optimal separating hyperplane to classify data points into labels.
For more info visit us at: http://www.siliconmentor.com/
Support vector machines are widely used binary classifiers known for its ability to handle high dimensional data that classifies data by separating classes with a hyper-plane that maximizes the margin between them. The data points that are closest to hyper-plane are known as support vectors. Thus the selected decision boundary will be the one that minimizes the generalization error (by maximizing the margin between classes).
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...ijfls
Â
This document summarizes a research paper that proposes a fuzzy bi-objective support vector machine (SVM) model to identify infected COVID-19 patients. The model uses SVM classification with two objectives - maximizing margin between classes and minimizing misclassification errors. An α-cut transforms the fuzzy model into a classical bi-objective problem solved using weighting methods. This generates multiple efficient solutions. An interactive process then identifies the best compromise based on minimizing the number of support vectors in each class. The model constructs a utility function to measure COVID-19 infection levels based on the SVM classification.
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...ijfls
Â
A support vector machine (SVM) learns the decision surface from two different classes of the input points. In several applications, some of the input points are misclassified and each is not fully allocated to either of these two groups. In this paper a bi-objective quadratic programming model with fuzzy parameters is utilized and different feature quality measures are optimized simultaneously. An α-cut is defined to transform the fuzzy model to a family of classical bi-objective quadratic programming problems. The weighting method is used to optimize each of these problems. For the proposed fuzzy bi-objective quadratic programming model, a major contribution will be added by obtaining different effective support vectors due to changes in weighting values. The experimental results, show the effectiveness of the α-cut with the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions. The main contribution of this paper includes constructing a utility function for measuring the degree of infection with coronavirus disease (COVID-19).
This document provides an introduction to support vector machines (SVM). It discusses the history and development of SVM, including its introduction in 1992 and popularity due to success in handwritten digit recognition. The document then covers key concepts of SVM, including linear classifiers, maximum margin classification, soft margins, kernels, and nonlinear decision boundaries. Examples are provided to illustrate SVM classification and parameter selection.
This document provides an introduction to support vector machines (SVM). It discusses the history and key concepts of SVM, including how SVM finds the optimal separating hyperplane with maximum margin between classes to perform linear classification. It also describes how SVM can learn nonlinear decision boundaries using kernel tricks to implicitly map inputs to high-dimensional feature spaces. The document gives examples of commonly used kernel functions and outlines the steps to perform classification with SVM.
This document summarizes support vector machines (SVMs), a machine learning technique for classification and regression. SVMs find the optimal separating hyperplane that maximizes the margin between positive and negative examples in the training data. This is achieved by solving a convex optimization problem that minimizes a quadratic function under linear constraints. SVMs can perform non-linear classification by implicitly mapping inputs into a higher-dimensional feature space using kernel functions. They have applications in areas like text categorization due to their ability to handle high-dimensional sparse data.
Real-Time Stock Market Analysis using Spark StreamingSigmoid
Â
This document proposes using support vector machines (SVMs) to model high-frequency limit order book dynamics and predict metrics like mid-price movement and price spread crossing. It describes representing each limit order book entry as a vector of attributes, then using multi-class SVMs to build models for each metric. Experiments on real data show the selected features are effective for short-term price forecasts. The document provides background on SVMs, describing how they find an optimal separating hyperplane to classify data points into labels.
For more info visit us at: http://www.siliconmentor.com/
Support vector machines are widely used binary classifiers known for its ability to handle high dimensional data that classifies data by separating classes with a hyper-plane that maximizes the margin between them. The data points that are closest to hyper-plane are known as support vectors. Thus the selected decision boundary will be the one that minimizes the generalization error (by maximizing the margin between classes).
This document discusses support vector machines (SVMs) and their application in agriculture. It begins with an introduction to SVMs, explaining that they are a supervised machine learning algorithm used for classification and regression. The document then covers key aspects of SVMs including: how they find the optimal separating hyperplane for classification; handling linearly separable and non-separable data using soft-margin hyperplanes and kernels; and common kernel functions. It provides an example application of using an SVM classifier to identify pests in leaf images. In conclusion, the document provides an overview of SVMs and their use in solving agricultural classification problems.
SVM is a supervised learning method that finds a hyperplane with maximum margin to separate classes. It uses kernels to map data to higher dimensions to allow for nonlinear separation. The objective is to minimize training error and model complexity by maximizing the margin between classes. SVMs solve a convex optimization problem that finds support vectors and determines the separating hyperplane using kernels, slack variables, and a cost parameter C to balance margin and errors. Parameter selection, like the kernel and its parameters, affects performance and is typically done through grid search and cross-validation.
Support Vector Machine topic of machine learning.pptxCodingChamp1
Â
Support Vector Machines (SVM) find the optimal separating hyperplane that maximizes the margin between two classes of data points. The hyperplane is chosen such that it maximizes the distance from itself to the nearest data points of each class. When data is not linearly separable, the kernel trick can be used to project the data into a higher dimensional space where it may be linearly separable. Common kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid kernels. Soft margin SVMs introduce slack variables to allow some misclassification and better handle non-separable data. The C parameter controls the tradeoff between margin maximization and misclassification.
Support vector machines (SVM) are a type of supervised machine learning model that constructs hyperplanes to classify data. Least squares support vector machines (LS-SVM) are a variation of SVM that uses equality constraints instead of inequality constraints, solving a system of linear equations instead of a quadratic programming problem. LS-SVM tends to be more suitable than standard SVM for inseparable data and produces solutions that lack the sparseness of SVM.
2012ãmdsp pr13 support vector machinenozomuhamada
Â
The document provides a course calendar for a class on Bayesian estimation methods. It lists the dates and topics to be covered over 15 class periods from September to January. The topics progress from basic concepts like Bayes estimation and the Kalman filter, to more modern methods like particle filters, hidden Markov models, Bayesian decision theory, and applications of principal component analysis and independent component analysis. One class is noted as having no class.
This document provides an overview of linear classifiers and support vector machines (SVMs) for text classification. It explains that SVMs find the optimal separating hyperplane between classes by maximizing the margin between the hyperplane and the closest data points of each class. The document discusses how SVMs can be extended to non-linear classification through kernel methods and feature spaces. It also provides details on solving the SVM optimization problem and using SVMs for classification.
The document discusses linear and non-linear support vector machine (SVM) algorithms. SVM finds the optimal hyperplane that separates classes with the maximum margin. For linear data, the hyperplane is a straight line, while for non-linear data it is a higher dimensional surface. The data points closest to the hyperplane that determine its position are called support vectors. Non-linear SVM handles non-linear separable data by projecting it into a higher dimension where it may become linearly separable.
Two algorithms to accelerate training of back-propagation neural networksESCOM
Â
This document proposes two algorithms to initialize the weights of neural networks to accelerate training. Algorithm I performs a step-by-step orthogonalization of the input matrices to drive them towards a diagonal form, aiming to place the network closer to the convergence points of the activation function. Algorithm II aims to jointly diagonalize the input matrices to also drive them towards a diagonal form. The algorithms are shown to significantly reduce training time compared to random initialization, though Algorithm I works best when the activation function has Ï(0)>1.
The International Journal of Engineering and Science (The IJES)theijes
Â
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...gerogepatton
Â
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
Â
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
Support Vector Machines (SVMs) were proposed in 1963 and took shape in the late 1970s as part of statistical learning theory. They became popular in the last decade for classification, regression, and optimization tasks. SVMs find the optimal separating hyperplane that maximizes the margin between positive and negative examples by solving a quadratic programming optimization with linear constraints. This is done efficiently using kernel methods that implicitly map inputs to high-dimensional feature spaces without explicitly computing the mappings. Empirically, SVMs have been shown to generalize well and are among the best performing classifiers.
This document summarizes support vector machines (SVMs). It explains that SVMs find the optimal separating hyperplane that maximizes the margin between two classes of data points. The hyperplane is determined by support vectors, which are the data points closest to the hyperplane. SVMs can be solved as a quadratic programming problem. The document also discusses how kernels can map data into higher dimensional spaces to make non-separable problems separable by SVMs.
Single to multiple kernel learning with four popular svm kernels (survey)eSAT Journals
Â
Abstract Machine learning applications and pattern recognition have gained great attention recently because of the variety of applications
depend on machine learning techniques, these techniques could make many processes easier and also reduce the amount of
human interference (more automation). This paper research four of the most popular kernels used with Support Vector Machines
(SVM) for Classification purposes. This survey uses Linear, Polynomial, Gaussian and Sigmoid kernels, each in a single form and
all together as un-weighted sum of kernels as form of Multi-Kernel Learning (MKL), with eleven datasets, these data sets are
benchmark datasets with different types of features and different number of classes, so some will be used with Two-Classes
Classification (Binary Classification) and some with Multi-Class Classification. Shogun machine learning Toolbox is used with
Python programming language to perform the classification and also to handle the pre-classification operations like Feature
Scaling (Normalization).The Cross Validation technique is used to find the best performance Out of the suggested different
kernels' methods .To compare the final results two performance measuring techniques are used; classification accuracy and Area
Under Receiver Operating Characteristic (ROC). General basics of SVM and used Kernels with classification parameters are
given through the first part of the paper, then experimental details are explained in steps and after those steps, experimental
results from the steps are given with final histograms represent the differences of the outputs' accuracies and the areas under
ROC curves (AUC). Finally best methods obtained are applied on remote sensing data sets and the results are compared to a
state of art work published in the field using the same set of data.
Keywords: Machine Learning, Classification, SVM, MKL, Cross Validation and ROC
This chapter discusses classification methods including linear discriminant functions and probabilistic generative and discriminative models. It covers linear decision boundaries, perceptrons, Fisher's linear discriminant, logistic regression, and the use of sigmoid and softmax activation functions. The key points are:
1) Classification involves dividing the input space into decision regions using linear or nonlinear boundaries.
2) Perceptrons and Fisher's linear discriminant find linear decision boundaries by updating weights to minimize misclassification.
3) Generative models like naive Bayes estimate joint probabilities while discriminative models like logistic regression directly model posterior probabilities.
4) Sigmoid and softmax functions are used to transform linear outputs into probabilities for binary and multiclass classification respectively.
This document provides an introduction to support vector machines (SVMs) for text classification. It discusses how SVMs find an optimal separating hyperplane that maximizes the margin between classes. SVMs can handle non-linear classification through the use of kernels, which map data into a higher-dimensional feature space. The document outlines the mathematical formulations of linear and soft-margin SVMs, explains how the kernel trick allows evaluating inner products implicitly in that feature space, and summarizes how SVMs are used for classification tasks.
Numerical Solutions of Burgers' Equation Project ReportShikhar Agarwal
Â
This report summarizes the use of finite element methods to numerically solve Burgers' equation. It introduces finite element methods and the Galerkin method for approximating solutions. MATLAB codes are presented to solve example boundary value problems and differential equations. The method of quasi-linearization is also described for solving Burgers' equation numerically. The report concludes that finite element methods can accurately predict numerical solutions that are close to exact solutions for problems where no closed-form solution exists.
This document provides an overview of Support Vector Machines (SVMs). It discusses how SVMs find the optimal separating hyperplane between two classes that maximizes the margin between them. It describes how SVMs can handle non-linearly separable data using kernels to project the data into higher dimensions where it may be linearly separable. The document also discusses multi-class classification approaches for SVMs and how they can be used for anomaly detection.
This document contains a student assignment submission for a course on Perspective in Informatics. It includes the student's responses to 3 questions:
1) Analyzing different functions to determine if they satisfy the properties of a distance measure, including max(x,y), diff(x,y), and sum(x,y).
2) Computing sketches of vectors using different random vectors and analyzing the estimated vs. true angles between the vectors.
3) Calculating the expected Jaccard similarity of two randomly selected subsets R and S of a universe U with n elements and size m.
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Â
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Conditional mixture model and its application for regression modelLoc Nguyen
Â
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating statistical parameter when data sample contains hidden part and observed part. EM is applied to learn finite mixture model in which the whole distribution of observed variable is average sum of partial distributions. Coverage ratio of every partial distribution is specified by the probability of hidden variable. An application of mixture model is soft clustering in which cluster is modeled by hidden variable whereas each data point can be assigned to more than one cluster and degree of such assignment is represented by the probability of hidden variable. However, such probability in traditional mixture model is simplified as a parameter, which can cause loss of valuable information. Therefore, in this research I propose a so-called conditional mixture model (CMM) in which the probability of hidden variable is modeled as a full probabilistic density function (PDF) that owns individual parameter. CMM aims to extend mixture model. I also propose an application of CMM which is called adaptive regression model (ARM). Traditional regression model is effective when data sample is scattered equally. If data points are grouped into clusters, regression model tries to learn a unified regression function which goes through all data points. Obviously, such unified function is not effective to evaluate response variable based on grouped data points. The concept âadaptiveâ of ARM means that ARM solves the ineffectiveness problem by selecting the best cluster of data points firstly and then evaluating response variable within such best cluster. In orther words, ARM reduces estimation space of regression model so as to gain high accuracy in calculation.
Keywords: expectation maximization (EM) algorithm, finite mixture model, conditional mixture model, regression model, adaptive regression model (ARM).
This document discusses support vector machines (SVMs) and their application in agriculture. It begins with an introduction to SVMs, explaining that they are a supervised machine learning algorithm used for classification and regression. The document then covers key aspects of SVMs including: how they find the optimal separating hyperplane for classification; handling linearly separable and non-separable data using soft-margin hyperplanes and kernels; and common kernel functions. It provides an example application of using an SVM classifier to identify pests in leaf images. In conclusion, the document provides an overview of SVMs and their use in solving agricultural classification problems.
SVM is a supervised learning method that finds a hyperplane with maximum margin to separate classes. It uses kernels to map data to higher dimensions to allow for nonlinear separation. The objective is to minimize training error and model complexity by maximizing the margin between classes. SVMs solve a convex optimization problem that finds support vectors and determines the separating hyperplane using kernels, slack variables, and a cost parameter C to balance margin and errors. Parameter selection, like the kernel and its parameters, affects performance and is typically done through grid search and cross-validation.
Support Vector Machine topic of machine learning.pptxCodingChamp1
Â
Support Vector Machines (SVM) find the optimal separating hyperplane that maximizes the margin between two classes of data points. The hyperplane is chosen such that it maximizes the distance from itself to the nearest data points of each class. When data is not linearly separable, the kernel trick can be used to project the data into a higher dimensional space where it may be linearly separable. Common kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid kernels. Soft margin SVMs introduce slack variables to allow some misclassification and better handle non-separable data. The C parameter controls the tradeoff between margin maximization and misclassification.
Support vector machines (SVM) are a type of supervised machine learning model that constructs hyperplanes to classify data. Least squares support vector machines (LS-SVM) are a variation of SVM that uses equality constraints instead of inequality constraints, solving a system of linear equations instead of a quadratic programming problem. LS-SVM tends to be more suitable than standard SVM for inseparable data and produces solutions that lack the sparseness of SVM.
2012ãmdsp pr13 support vector machinenozomuhamada
Â
The document provides a course calendar for a class on Bayesian estimation methods. It lists the dates and topics to be covered over 15 class periods from September to January. The topics progress from basic concepts like Bayes estimation and the Kalman filter, to more modern methods like particle filters, hidden Markov models, Bayesian decision theory, and applications of principal component analysis and independent component analysis. One class is noted as having no class.
This document provides an overview of linear classifiers and support vector machines (SVMs) for text classification. It explains that SVMs find the optimal separating hyperplane between classes by maximizing the margin between the hyperplane and the closest data points of each class. The document discusses how SVMs can be extended to non-linear classification through kernel methods and feature spaces. It also provides details on solving the SVM optimization problem and using SVMs for classification.
The document discusses linear and non-linear support vector machine (SVM) algorithms. SVM finds the optimal hyperplane that separates classes with the maximum margin. For linear data, the hyperplane is a straight line, while for non-linear data it is a higher dimensional surface. The data points closest to the hyperplane that determine its position are called support vectors. Non-linear SVM handles non-linear separable data by projecting it into a higher dimension where it may become linearly separable.
Two algorithms to accelerate training of back-propagation neural networksESCOM
Â
This document proposes two algorithms to initialize the weights of neural networks to accelerate training. Algorithm I performs a step-by-step orthogonalization of the input matrices to drive them towards a diagonal form, aiming to place the network closer to the convergence points of the activation function. Algorithm II aims to jointly diagonalize the input matrices to also drive them towards a diagonal form. The algorithms are shown to significantly reduce training time compared to random initialization, though Algorithm I works best when the activation function has Ï(0)>1.
The International Journal of Engineering and Science (The IJES)theijes
Â
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...gerogepatton
Â
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
Â
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
Support Vector Machines (SVMs) were proposed in 1963 and took shape in the late 1970s as part of statistical learning theory. They became popular in the last decade for classification, regression, and optimization tasks. SVMs find the optimal separating hyperplane that maximizes the margin between positive and negative examples by solving a quadratic programming optimization with linear constraints. This is done efficiently using kernel methods that implicitly map inputs to high-dimensional feature spaces without explicitly computing the mappings. Empirically, SVMs have been shown to generalize well and are among the best performing classifiers.
This document summarizes support vector machines (SVMs). It explains that SVMs find the optimal separating hyperplane that maximizes the margin between two classes of data points. The hyperplane is determined by support vectors, which are the data points closest to the hyperplane. SVMs can be solved as a quadratic programming problem. The document also discusses how kernels can map data into higher dimensional spaces to make non-separable problems separable by SVMs.
Single to multiple kernel learning with four popular svm kernels (survey)eSAT Journals
Â
Abstract Machine learning applications and pattern recognition have gained great attention recently because of the variety of applications
depend on machine learning techniques, these techniques could make many processes easier and also reduce the amount of
human interference (more automation). This paper research four of the most popular kernels used with Support Vector Machines
(SVM) for Classification purposes. This survey uses Linear, Polynomial, Gaussian and Sigmoid kernels, each in a single form and
all together as un-weighted sum of kernels as form of Multi-Kernel Learning (MKL), with eleven datasets, these data sets are
benchmark datasets with different types of features and different number of classes, so some will be used with Two-Classes
Classification (Binary Classification) and some with Multi-Class Classification. Shogun machine learning Toolbox is used with
Python programming language to perform the classification and also to handle the pre-classification operations like Feature
Scaling (Normalization).The Cross Validation technique is used to find the best performance Out of the suggested different
kernels' methods .To compare the final results two performance measuring techniques are used; classification accuracy and Area
Under Receiver Operating Characteristic (ROC). General basics of SVM and used Kernels with classification parameters are
given through the first part of the paper, then experimental details are explained in steps and after those steps, experimental
results from the steps are given with final histograms represent the differences of the outputs' accuracies and the areas under
ROC curves (AUC). Finally best methods obtained are applied on remote sensing data sets and the results are compared to a
state of art work published in the field using the same set of data.
Keywords: Machine Learning, Classification, SVM, MKL, Cross Validation and ROC
This chapter discusses classification methods including linear discriminant functions and probabilistic generative and discriminative models. It covers linear decision boundaries, perceptrons, Fisher's linear discriminant, logistic regression, and the use of sigmoid and softmax activation functions. The key points are:
1) Classification involves dividing the input space into decision regions using linear or nonlinear boundaries.
2) Perceptrons and Fisher's linear discriminant find linear decision boundaries by updating weights to minimize misclassification.
3) Generative models like naive Bayes estimate joint probabilities while discriminative models like logistic regression directly model posterior probabilities.
4) Sigmoid and softmax functions are used to transform linear outputs into probabilities for binary and multiclass classification respectively.
This document provides an introduction to support vector machines (SVMs) for text classification. It discusses how SVMs find an optimal separating hyperplane that maximizes the margin between classes. SVMs can handle non-linear classification through the use of kernels, which map data into a higher-dimensional feature space. The document outlines the mathematical formulations of linear and soft-margin SVMs, explains how the kernel trick allows evaluating inner products implicitly in that feature space, and summarizes how SVMs are used for classification tasks.
Numerical Solutions of Burgers' Equation Project ReportShikhar Agarwal
Â
This report summarizes the use of finite element methods to numerically solve Burgers' equation. It introduces finite element methods and the Galerkin method for approximating solutions. MATLAB codes are presented to solve example boundary value problems and differential equations. The method of quasi-linearization is also described for solving Burgers' equation numerically. The report concludes that finite element methods can accurately predict numerical solutions that are close to exact solutions for problems where no closed-form solution exists.
This document provides an overview of Support Vector Machines (SVMs). It discusses how SVMs find the optimal separating hyperplane between two classes that maximizes the margin between them. It describes how SVMs can handle non-linearly separable data using kernels to project the data into higher dimensions where it may be linearly separable. The document also discusses multi-class classification approaches for SVMs and how they can be used for anomaly detection.
This document contains a student assignment submission for a course on Perspective in Informatics. It includes the student's responses to 3 questions:
1) Analyzing different functions to determine if they satisfy the properties of a distance measure, including max(x,y), diff(x,y), and sum(x,y).
2) Computing sketches of vectors using different random vectors and analyzing the estimated vs. true angles between the vectors.
3) Calculating the expected Jaccard similarity of two randomly selected subsets R and S of a universe U with n elements and size m.
Ãhnlich wie Tutorial on Support Vector Machine (20)
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Â
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Conditional mixture model and its application for regression modelLoc Nguyen
Â
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating statistical parameter when data sample contains hidden part and observed part. EM is applied to learn finite mixture model in which the whole distribution of observed variable is average sum of partial distributions. Coverage ratio of every partial distribution is specified by the probability of hidden variable. An application of mixture model is soft clustering in which cluster is modeled by hidden variable whereas each data point can be assigned to more than one cluster and degree of such assignment is represented by the probability of hidden variable. However, such probability in traditional mixture model is simplified as a parameter, which can cause loss of valuable information. Therefore, in this research I propose a so-called conditional mixture model (CMM) in which the probability of hidden variable is modeled as a full probabilistic density function (PDF) that owns individual parameter. CMM aims to extend mixture model. I also propose an application of CMM which is called adaptive regression model (ARM). Traditional regression model is effective when data sample is scattered equally. If data points are grouped into clusters, regression model tries to learn a unified regression function which goes through all data points. Obviously, such unified function is not effective to evaluate response variable based on grouped data points. The concept âadaptiveâ of ARM means that ARM solves the ineffectiveness problem by selecting the best cluster of data points firstly and then evaluating response variable within such best cluster. In orther words, ARM reduces estimation space of regression model so as to gain high accuracy in calculation.
Keywords: expectation maximization (EM) algorithm, finite mixture model, conditional mixture model, regression model, adaptive regression model (ARM).
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsLoc Nguyen
Â
Collaborative filtering (CF) is a popular technique in recommendation study. Concretely, items which are recommended to user are determined by surveying her/his communities. There are two main CF approaches, which are memory-based and model-based. I propose a new CF model-based algorithm by mining frequent itemsets from rating database. Hence items which belong to frequent itemsets are recommended to user. My CF algorithm gives immediate response because the mining task is performed at offline process-mode. I also propose another so-called Roller algorithm for improving the process of mining frequent itemsets. Roller algorithm is implemented by heuristic assumption âThe larger the support of an item is, the higher itâs likely that this item will occur in some frequent itemsetâ. It models upon doing white-wash task, which rolls a roller on a wall in such a way that is capable of picking frequent itemsets. Moreover I provide enhanced techniques such as bit representation, bit matching and bit mining in order to speed up recommendation process. These techniques take advantages of bitwise operations (AND, NOT) so as to reduce storage space and make algorithms run faster.
Simple image deconvolution based on reverse image convolution and backpropaga...Loc Nguyen
Â
Deconvolution task is not important in convolutional neural network (CNN) because it is not imperative to recover convoluted image when convolutional layer is important to extract features. However, the deconvolution task is useful in some cases of inspecting and reflecting a convolutional filter as well as trying to improve a generated image when information loss is not serious with regard to trade-off of information loss and specific features such as edge detection and sharpening. This research proposes a duplicated and reverse process of recovering a filtered image. Firstly, source layer and target layer are reversed in accordance with traditional image convolution so as to train the convolutional filter. Secondly, the trained filter is reversed again to derive a deconvolutional operator for recovering the filtered image. The reverse process is associated with backpropagation algorithm which is most popular in learning neural network. Experimental results show that the proposed technique in this research is better to learn the filters that focus on discovering pixel differences. Therefore, the main contribution of this research is to inspect convolutional filters from data.
Technological Accessibility: Learning Platform Among Senior High School StudentsLoc Nguyen
Â
The document discusses technological accessibility among senior high school students. It covers several topics:
- Returning to nature through technology and ensuring technology accessibility is a life skill strengthened by advantages while avoiding problems for youth.
- Solutions to the dilemma of technology accessibility include letting youth develop independently, using security controls, and encouraging compassion.
- Researching career is optional but following passion and balancing positive and negative emotions can contribute to success.
The document discusses how engineering and technology can be used for social impact. It makes three key points:
1. Technology stems from knowledge and science, but assessing the value of knowledge is difficult. However, the importance of technology is clear through its fruits. Education fosters gaining knowledge and leads to technological development.
2. While technology increases wealth, it also widens inequality gaps. However, technology can indirectly address inequality through education by providing universal access to learning and bringing universities to people everywhere.
3. Diversity among countries should be embraced, and late innovations building on existing technologies can be particularly impactful, as was the strategy of a racing champion who overtook opponents in the final round through close observation and
Harnessing Technology for Research EducationLoc Nguyen
Â
The document discusses harnessing technology for research and education. It notes some paradoxes of technology, such as how it can both increase inequality but also help solve it through education. It discusses the future of education being supported by distance learning tools, AI, and virtual/augmented reality. Open universities are seen as an intermediate step towards "home universities" and as a place where students can become lifelong learners and teachers. Understanding, emotion, and compassion are discussed as being more important than just remembering facts. The document ends by discussing connecting different subjects and technologies like a jigsaw puzzle.
Future of education with support of technologyLoc Nguyen
Â
The document discusses the future of education with the support of technology. It notes that while technology deepens inequality by benefiting the rich, it can indirectly address inequality through education by allowing anyone to access knowledge and study from anywhere. Emerging forms of educational support include distance learning using tools like Zoom, artificial intelligence assistants, and virtual/augmented reality. Blended teaching combining different methods is emphasized. Understanding is seen as more important than memorization, and education should foster understanding, emotional development, and compassion through nature-based activities. The document argues technology can help education promote love by connecting various topics through both fusion and by assembling them like a jigsaw puzzle.
The document discusses generative artificial intelligence (AI) and its applications, using the metaphor of a dragon's flight. It introduces digital generative AI models that can generate images, sounds, and motions. As an example, it describes a model that could generate possible orbits or movements for a dragon and tiger in an image along with a bamboo background. The document then discusses how generative AI could be applied creatively in education to generate new learning materials and methods. It argues that while technology and societies are changing rapidly, climate change is a unifying challenge that nations must work together to overcome.
Adversarial Variational Autoencoders to extend and improve generative modelLoc Nguyen
Â
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is good at generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Learning dyadic data and predicting unaccomplished co-occurrent values by mix...Loc Nguyen
Â
Dyadic data which is also called co-occurrence data (COD) contains co-occurrences of objects. Searching for statistical models to represent dyadic data is necessary. Fortunately, finite mixture model is a solid statistical model to learn and make inference on dyadic data because mixture model is built smoothly and reliably by expectation maximization (EM) algorithm which is suitable to inherent spareness of dyadic data. This research summarizes mixture models for dyadic data. When each co-occurrence in dyadic data is associated with a value, there are many unaccomplished values because a lot of co-occurrences are inexistent. In this research, these unaccomplished values are estimated as mean (expectation) of random variable given partial probabilistic distributions inside dyadic mixture model.
Machine learning forks into three main branches such as supervised learning, unsupervised learning, and reinforcement learning where reinforcement learning is much potential to artificial intelligence (AI) applications because it solves real problems by progressive process in which possible solutions are improved and finetuned continuously. The progressive approach, which reflects ability of adaptation, is appropriate to the real world where most events occur and change continuously and unexpectedly. Moreover, data is getting too huge for supervised learning and unsupervised learning to draw valuable knowledge from such huge data at one time. Bayesian optimization (BO) models an optimization problem as a probabilistic form called surrogate model and then directly maximizes an acquisition function created from such surrogate model in order to maximize implicitly and indirectly the target function for finding out solution of the optimization problem. A popular surrogate model is Gaussian process regression model. The process of maximizing acquisition function is based on updating posterior probability of surrogate model repeatedly, which is improved after every iteration. Taking advantages of acquisition function or utility function is also common in decision theory but the semantic meaning behind BO is that BO solves problems by progressive and adaptive approach via updating surrogate model from a small piece of data at each time, according to ideology of reinforcement learning. Undoubtedly, BO is a reinforcement learning algorithm with many potential applications and thus it is surveyed in this research with attention to its mathematical ideas. Moreover, the solution of optimization problem is important to not only applied mathematics but also AI.
A Proposal of Two-step Autoregressive ModelLoc Nguyen
Â
Autoregressive (AR) model and conditional autoregressive (CAR) model are specific regressive models in which independent variables and dependent variable imply the same object. They are powerful statistical tools to predict values based on correlation of time domain and space domain, which are useful in epidemiology analysis. In this research, I combine them by the simple way in which AR and CAR is estimated in two separate steps so as to cover time domain and space domain in spatial-temporal data analysis. Moreover, I integrate logistic model into CAR model, which aims to improve competence of autoregressive models.
Extreme bound analysis based on correlation coefficient for optimal regressio...Loc Nguyen
Â
Regression analysis is an important tool in statistical analysis, in which there is a demand of discovering essential independent variables among many other ones, especially in case that there is a huge number of random variables. Extreme bound analysis is a powerful approach to extract such important variables called robust regressors. In this research, a so-called Regressive Expectation Maximization with RObust regressors (REMRO) algorithm is proposed as an alternative method beside other probabilistic methods for analyzing robust variables. By the different ideology from other probabilistic methods, REMRO searches for robust regressors forming optimal regression model and sorts them according to descending ordering given their fitness values determined by two proposed concepts of local correlation and global correlation. Local correlation represents sufficient explanatories to possible regressive models and global correlation reflects independence level and stand-alone capacity of regressors. Moreover, REMRO can resist incomplete data because it applies Regressive Expectation Maximization (REM) algorithm into filling missing values by estimated values based on ideology of expectation maximization (EM) algorithm. From experimental results, REMRO is more accurate for modeling numeric regressors than traditional probabilistic methods like Sala-I-Martin method but REMRO cannot be applied in case of nonnumeric regression model yet in this research.
The document proposes a "jagged stock investment" strategy where stocks are repeatedly purchased at intervals where the price is increasing, similar to rises and falls in a saw blade. It shows mathematically that this strategy can achieve equal or higher returns than bank depositing if the average growth rate and interest rate are the same. The strategy models purchasing a share and its replications over time periods where each replication is bought using the profits from the previous rise. It provides formulas to calculate the overall value and return on investment from this jagged stock investment approach.
This document presents a study on minima distribution and its application to explaining the convergence and convergence speed of optimization algorithms. The author proposes weaker conditions for the convergence and monotonicity properties of minima distribution compared to previous work. Specifically, the convergence condition requires the function Ï(x) to be positive and non-minimizing everywhere except at minimizers of the target function f(x). The monotonicity condition requires f(x) and the derivative of Ï(x) to be inversely proportional. The author then defines convergence speed as the derivative of the integral of f(x) with respect to the optimization iterations and shows it depends on the slope of the derivative of Ï(x).
This is chapter 4 âVariants of EM algorithmâ in my book âTutorial on EM algorithmâ, which focuses on EM variants. The main purpose of expectation maximization (EM) algorithm, also GEM algorithm, is to maximize the log-likelihood L(Î) = log(g(Y|Î)) with observed data Y by maximizing the conditional expectation Q(Îâ|Î). Such Q(Îâ|Î) is defined fixedly in E-step. Therefore, most variants of EM algorithm focus on how to maximize Q(Îâ|Î) in M-step more effectively so that EM is faster or more accurate.
This is the chapter 3 "Properties and convergence of EM algorithm" in my book âTutorial on EM algorithmâ, which focuses on mathematical explanation of the convergence of GEM algorithm given by DLR (Dempster, Laird, & Rubin, 1977, pp. 6-9).
Local optimization with convex function is solved perfectly by traditional mathematical methods such as Newton-Raphson and gradient descent but it is not easy to solve the global optimization with arbitrary function although there are some purely mathematical approaches such as approximation, cutting plane, branch and bound, and interval method which can be impractical because of their complexity and high computation cost. Recently, some evolutional algorithms which are inspired from biological activities are proposed to solve the global optimization by acceptable heuristic level. Among them is particle swarm optimization (PSO) algorithm which is proved as an effective and feasible solution for global optimization in real applications. Although the ideology of PSO is not complicated, it derives many variants, which can make new researchers confused. Therefore, this tutorial focuses on describing, systemizing, and classifying PSO by succinct and straightforward way. Moreover, a combination of PSO and another evolutional algorithm as artificial bee colony (ABC) algorithm for improving PSO itself or solving other advanced problems are mentioned too.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
Â
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and âImmersion Cubeâ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
Â
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesnât cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Â
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
The cost of acquiring information by natural selectionCarl Bergstrom
Â
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Context. With a mass exceeding several 104 Mâ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 Ã 10â8 photons cmâ2
s
â1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Â
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
The technology uses reclaimed COâ as the dyeing medium in a closed loop process. When pressurized, COâ becomes supercritical (SC-COâ). In this state COâ has a very high solvent power, allowing the dye to dissolve easily.
The Milky Wayâs (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
âlast major merger.â Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8â11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1â2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1â2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the âlast major mergerâ
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
When I was asked to give a companion lecture in support of âThe Philosophy of Scienceâ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
1. Tutorial on Support Vector Machine
Prof. Dr. Loc Nguyen, PhD, PostDoc
Director of Sunflower Soft Company, Vietnam
Email: ng_phloc@yahoo.com
Homepage: www.locnguyen.net
Support Vector Machine - Loc Nguyen
15/01/2023 1
2. Abstract
Support vector machine is a powerful machine learning
method in data classification. Using it for applied
researches is easy but comprehending it for further
development requires a lot of efforts. This report is a
tutorial on support vector machine with full of
mathematical proofs and example, which help researchers
to understand it by the fastest way from theory to
practice. The report focuses on theory of optimization
which is the base of support vector machine.
2
Support Vector Machine - Loc Nguyen
15/01/2023
3. Table of contents
1. Support vector machine
2. Sequential minimal optimization
3. An example of data classification by SVM
4. Conclusions
3
Support Vector Machine - Loc Nguyen
15/01/2023
4. 1. Support vector machine
Support vector machine (SVM) (Law, 2006) is a supervised
learning algorithm for classification and regression. Given a set of
p-dimensional vectors in vector space, SVM finds the separating
hyperplane that splits vector space into sub-set of vectors; each
separated sub-set (so-called data set) is assigned by one class.
There is the condition for this separating hyperplane: âit must
maximize the margin between two sub-setsâ. Figure 1.1
(Wikibooks, 2008) shows separating hyperplanes H1, H2, and H3 in
which only H2 gets maximum margin according to this condition.
Suppose we have some p-dimensional vectors; each of them
belongs to one of two classes. We can find many pâ1 dimensional
hyperplanes that classify such vectors but there is only one
hyperplane that maximizes the margin between two classes. In
other words, the nearest between one side of this hyperplane and
other side of this hyperplane is maximized. Such hyperplane is
called maximum-margin hyperplane and it is considered as the
SVM classifier.
15/01/2023 Support Vector Machine - Loc Nguyen 4
Figure 1.1. Separating hyperplanes
5. 1. Support vector machine
Let {X1, X2,âŠ, Xn} be the training set of n vectors Xi (s) and let yi = {+1, â1} be the class label of vector
Xi. Each Xi is also called a data point with attention that vectors can be identified with data points. Data
point can be called point, in brief. It is necessary to determine the maximum-margin hyperplane that
separates data points belonging to yi=+1 from data points belonging to yi=â1 as clear as possible.
According to theory of geometry, arbitrary hyperplane is represented as a set of points satisfying
hyperplane equation specified by equation 1.1.
ð â ðð â ð = 0 (1.1)
Where the sign âââ denotes the dot product or scalar product and W is weight vector perpendicular to
hyperplane and b is the bias. Vector W is also called perpendicular vector or normal vector and it is used
to specify hyperplane. Suppose W=(w1, w2,âŠ, wp) and Xi=(xi1, xi2,âŠ, xip), the scalar product ð â ðð is:
ð â ðð = ðð â ð = ð€1ð¥ð1 + ð€2ð¥ð2 + ⯠+ ð€ðð¥ðð =
ð=1
ð
ð€ðð¥ðð
Given scalar value w, the multiplication of w and vector Xi denoted wXi is a vector as follows:
ð€ðð = ð€ð¥ð1, ð€ð¥ð2, ⊠, ð€ð¥ðð
Please distinguish scalar product ð â ðð and multiplication wXi.
15/01/2023 Support Vector Machine - Loc Nguyen 5
6. 1. Support vector machine
The essence of SVM method is to find out weight vector W and bias b so that the hyperplane equation
specified by equation 1.1 expresses the maximum-margin hyperplane that maximizes the margin
between two classes of training set. The value
ð
ð
is the offset of the (maximum-margin) hyperplane
from the origin along the weight vector W where |W| or ||W|| denotes length or module of vector W.
ð = ð = ð â ð = ð€1
2
+ ð€2
2
+ ⯠+ ð€ð
2
=
ð=1
ð
ð€ð
2
Note that we use two notations . and . for denoting the length of vector. Additionally, the value
2
ð
is the width of the margin as seen in figure 1.2. To determine the margin, two parallel hyperplanes are
constructed, one on each side of the maximum-margin hyperplane. Such two parallel hyperplanes are
represented by two hyperplane equations, as shown in equation 1.2 as follows:
ð â ðð â ð = 1
ð â ðð â ð = â1
(1.2)
15/01/2023 Support Vector Machine - Loc Nguyen 6
7. 1. Support vector machine
Figure 1.2 (Wikibooks, 2008) illustrates maximum-margin
hyperplane, weight vector W and two parallel hyperplanes.
As seen in the figure 1.2, the margin is limited by such
two parallel hyperplanes. Exactly, there are two margins
(each one for a parallel hyperplane) but it is convenient for
referring both margins as the unified single margin as
usual. You can imagine such margin as a road and SVM
method aims to maximize the width of such road. Data
points lying on (or being very near to) two parallel
hyperplanes are called support vectors because they
construct mainly the maximum-margin hyperplane in the
middle. This is the reason that the classification method is
called support vector machine (SVM).
Figure 1.2. Maximum-margin hyperplane, parallel
hyperplanes and weight vector W
15/01/2023 Support Vector Machine - Loc Nguyen 7
8. 1. Support vector machine
To prevent vectors from falling into the margin, all vectors belonging to two classes
yi=1 and yi=â1 have two following constraints, respectively:
ð â ðð â ð ⥠1 for ðð belonging to class ðŠð = +1
ð â ðð â ð †â1 for ðð belonging to class ðŠð = â1
As seen in figure 1.2, vectors (data points) belonging to classes yi=+1 and yi=â1 are
depicted as black circles and white circles, respectively. Such two constraints are
unified into the so-called classification constraint specified by equation 1.3 as follows:
ðŠð ð â ðð â ð ⥠1 ⺠1 â ðŠð ð â ðð â ð †0 (1.3)
As known, yi=+1 and yi=â1 represent two classes of data points. It is easy to infer that
maximum-margin hyperplane which is the result of SVM method is the classifier that
aims to determine which class (+1 or â1) a given data point X belongs to. Your attention
please, each data point Xi in training set was assigned by a class yi before and
maximum-margin hyperplane constructed from the training set is used to classify any
different data point X.
15/01/2023 Support Vector Machine - Loc Nguyen 8
9. 1. Support vector machine
Because maximum-margin hyperplane is defined by weight vector W, it is easy to recognize that the essence
of constructing maximum-margin hyperplane is to solve the constrained optimization problem as follows:
minimize
ð,ð
1
2
ð 2
subject to ðŠð ð â ðð â ð ⥠1, âð = 1, ð
Where |W| is the length of weight vector W and ðŠð ð â ðð â ð ⥠1 is the classification constraint specified
by equation 1.3. The reason of minimizing
1
2
ð 2
is that distance between two parallel hyperplanes is
2
ð
and we need to maximize such distance in order to maximize the margin for maximum-margin hyperplane.
Then maximizing
2
ð
is to minimize
1
2
ð . Because it is complex to compute the length |W| with arithmetic
root, we substitute
1
2
ð 2
for
1
2
ð when ð 2
is equal to the scalar product ð â ð as follows:
ð 2
= ð 2
= ð â ð = ð€1
2
+ ð€2
2
+ ⯠+ ð€ð
2
The constrained optimization problem is re-written, shown in equation 1.4 as below:
minimize
ð,ð
ð ð = minimize
ð,ð
1
2
ð 2
subject to ðð ð, ð = 1 â ðŠð ð â ðð â ð †0, âð = 1, ð
(1.4)
15/01/2023 Support Vector Machine - Loc Nguyen 9
10. 1. Support vector machine
In equation 1.4, ð ð =
1
2
ð 2
is called target function with regard to variable W. Function ðð ð, ð = 1 â
ðŠð ð â ðð â ð is called constraint function with regard to two variables W, b and it is derived from the
classification constraint specified by equation 1.3. There are n constraints ðð ð, ð †0 because training set {X1,
X2,âŠ, Xn} has n data points Xi (s). Constraints ðð ð, ð †0 inside equation 1.3 implicate the perfect separation in
which there is no data point falling into the margin (between two parallel hyperplanes, see figure 1.2). On the other
hand, the imperfect separation allows some data points to fall into the margin, which means that each constraint
function gi(W,b) is subtracted by an error ðð ⥠0. The constraints become (Honavar, p. 5):
ðð ð, ð = 1 â ðŠð ð â ðð â ð â ðð †0, âð = 1, ð
Which implies that:
ð â ðð â ð ⥠1 â ðð for ðð belonging to class ðŠð = +1
ð â ðð â ð †â1 + ðð for ðð belonging to class ðŠð = â1
We have a n-component error vector Ο = (Ο1, Ο2,âŠ, Οn) for n constraints. The penalty ð¶ ⥠0 is added to the target
function in order to penalize data points falling into the margin. The penalty C is a pre-defined constant. Thus, the
target function f(W) becomes:
ð ð =
1
2
ð 2
+ ð¶
ð=1
ð
ðð
15/01/2023 Support Vector Machine - Loc Nguyen 10
11. 1. Support vector machine
If the positive penalty is infinity, ð¶ = +â then, target function ð ð =
1
2
ð 2
+ ð¶ ð=1
ð
ðð may get maximal
when all errors Οi must be 0, which leads to the perfect separation specified by aforementioned equation 1.4.
Equation 1.5 specifies the general form of constrained optimization originated from equation 1.4.
minimize
ð,ð,ð
1
2
ð 2
+ ð¶
ð=1
ð
ðð
subject to
1 â ðŠð ð â ðð â ð â ðð †0, âð = 1, ð
âðð †0, âð = 1, ð
(1.5)
Where C ⥠0 is the penalty. The Lagrangian function (Boyd & Vandenberghe, 2009, p. 215) is constructed from
constrained optimization problem specified by equation 1.5. Let L(W, b, Ο, λ, Ό) be Lagrangian function where
λ=(λ1, λ2,âŠ, λn) and ÎŒ=(ÎŒ1, ÎŒ2,âŠ, ÎŒn) are n-component vectors, λi ⥠0 and ÎŒi ⥠0, âð = 1, ð. We have:
ð¿ ð, ð, ð, ð, ð = ð ð +
ð=1
ð
ðððð ð, ð â
ð=1
ð
ðððð
=
1
2
ð 2
â ð â
ð=1
ð
ðððŠððð +
ð=1
ð
ðð + ð
ð=1
ð
ðððŠð +
ð=1
ð
ð¶ â ðð â ðð ðð
15/01/2023 Support Vector Machine - Loc Nguyen 11
12. 1. Support vector machine
In general, equation 1.6 represents Lagrangian function as follows:
ð¿ ð, ð, ð, ð, ð =
1
2
ð 2
â ð â
ð=1
ð
ðððŠððð +
ð=1
ð
ðð + ð
ð=1
ð
ðððŠð +
ð=1
ð
ð¶ + ðð â ðð ðð (1.6)
Where ðð ⥠0, ðð ⥠0, ðð ⥠0, âð = 1, ð
Note that λ=(λ1, λ2,âŠ, λn) and ÎŒ=(ÎŒ1, ÎŒ2,âŠ, ÎŒn) are called Lagrange multipliers or Karush-Kuhn-
Tucker multipliers (Wikipedia, KarushâKuhnâTucker conditions, 2014) or dual variables. The sign
âââ denotes scalar product and every training data point Xi was assigned by a class yi before.
Suppose (W*, b*) is solution of constrained optimization problem specified by equation 1.5 then,
the pair (W*, b*) is minimum point of target function f(W) or target function f(W) gets minimum at
(W*, b*) with all constraints ðð ð, ð = 1 â ðŠð ð â ðð â ð â ðð †0, âð = 1, ð. Note that W* is
called optimal weight vector and b* is called optimal bias. It is easy to infer that the pair (W*, b*)
represents the maximum-margin hyperplane and it is possible to identify (W*, b*) with the
maximum-margin hyperplane.
15/01/2023 Support Vector Machine - Loc Nguyen 12
13. 1. Support vector machine
The ultimate goal of SVM method is to find out W* and b*. According to Lagrangian duality theorem (Boyd
& Vandenberghe, 2009, p. 216) (Jia, 2013, p. 8), the point (W*, b*, Ο*) and the point (λ*, Ό*) are minimizer
and maximizer of Lagrangian function with regard to variables (W, b, Ο) and variables (λ, Ό), respectively as
follows:
ðâ
, ðâ
, ðâ
= argmin
ð,ð,ð
ð¿ ð, ð, ð, ð, ð
ðâ
, ðâ
= argmax
ððâ¥0,ððâ¥0
ð¿ ð, ð, ð, ð, ð
(1.7)
Where Lagrangian function L(W, b, Ο, λ, Ό) is specified by equation 1.6. Please pay attention that equation
1.7 specifies the Lagrangian duality theorem in which the point (W*, b*, Ο*, λ*, Ό*) is saddle point if the target
function f is convex. In practice, the maximizer (λ*, Ό*) is determined based on the minimizer (W*, b*, Ο*) as
follows:
ðâ
, ðâ
= argmax
ððâ¥0,ððâ¥0
min
ð,ð
ð¿ ð, ð, ð, ð, ð = argmax
ððâ¥0,ððâ¥0
ð¿ ðâ
, ðâ
, ðâ
, ð, ð
Now it is necessary to solve the Lagrangian duality problem represented by equation 1.7 to find out W*.
Here the Lagrangian function L(W, b, Ο, λ, Ό) is minimized with respect to the primal variables W, b, Ο and
then maximized with respect to the dual variables λ = (λ1, λ2,âŠ, λn) and ÎŒ = (ÎŒ1, ÎŒ2,âŠ, ÎŒn), in turn.
15/01/2023 Support Vector Machine - Loc Nguyen 13
14. 1. Support vector machine
If gradient of L(W, b, Ο, λ, Ό) is equal to zero then, L(W, b, Ο, λ, Ό) will be expected to get extreme with note that
gradient of a multi-variable function is the vector whose components are first-order partial derivative of such function.
Thus, setting the gradient of L(W, b, Ο, λ, Ό) with respect to W, b, and Ο to zero, we have:
ðð¿ ð, ð, ð, ð, ð
ðð
= 0
ðð¿ ð, ð, ð, ð, ð
ðð
= 0
ðð¿ ð, ð, ð, ð, ð
ððð
= 0, âð = 1, ð
â¹
ð =
ð=1
ð
ðððŠððð
ð=1
ð
ðððŠð = 0
ðð = ð¶ â ðð, âð = 1, ð
In general, W* is determined by equation 1.8 known as Lagrange multipliers condition as follows:
ðâ
=
ð=1
ð
ðððŠððð
ð=1
ð
ðððŠð = 0
ðð = ð¶ â ðð, âð = 1, ð
ðð ⥠0, ðð ⥠0, âð = 1, ð
(1.8)
15/01/2023 Support Vector Machine - Loc Nguyen 14
In equation 1.8, the condition from zero partial derivatives ð = ð=1
ð
ðððŠððð,
ð=1
ð
ðððŠð = 0, and λi = C â ÎŒi is called stationarity condition whereas the
condition from nonnegative dual variables λi ⥠0 and Όi ⥠0 is called dual
feasibility condition.
15. 1. Support vector machine
It is required to determine Lagrange multipliers λ=(λ1, λ2,âŠ, λn) in order to evaluate W*. Substituting equation 1.8
into Lagrangian function L(W, b, Ο, λ, Ό) specified by equation 1.6, we have:
ð ð = min
ð,ð
ð¿ ð, ð, ð, ð, ð = min
ð,ð
1
2
ð 2
â ð â
ð=1
ð
ðððŠððð +
ð=1
ð
ðð + ð
ð=1
ð
ðððŠð +
ð=1
ð
ð¶ + ðð â ðð ðð
= â¯
As a result, equation 1.9 specified the so-called dual function l(λ).
ð ð = min
ð,ð
ð¿ ð, ð, ð, ð, ð = â
1
2
ð=1
ð
ð=1
ð
ðððððŠððŠð ðð â ðð +
ð=1
ð
ðð (1.9)
According to Lagrangian duality problem represented by equation 1.7, λ=(λ1, λ2,âŠ, λn) is calculated as the
maximum point λ*=(λ1
*, λ2
*,âŠ, λn
*) of dual function l(λ). In conclusion, maximizing l(λ) is the main task of SVM
method because the optimal weight vector W* is calculated based on the optimal point λ* of dual function l(λ)
according to equation 1.8.
ðâ =
ð=1
ð
ðððŠððð =
ð=1
ð
ðð
â
ðŠððð
15/01/2023 Support Vector Machine - Loc Nguyen 15
16. 1. Support vector machine
Maximizing l(λ) is quadratic programming (QP) problem, specified by equation 1.10.
maximize
ð
â
1
2
ð=1
ð
ð=1
ð
ðððððŠððŠð ðð â ðð +
ð=1
ð
ðð
subject to
ð=1
ð
ðððŠð = 0
0 †ðð †ð¶, âð = 1, ð
(1.10)
The constraints 0 †ðð †ð¶, âð = 1, ð are implied from the equations ðð = ð¶ â ðð, âð =
1, ð when ðð ⥠0, âð = 1, ð. When combining equation 1.8 and equation 1.10, we obtain
solution of Lagrangian duality problem which is also solution of constrained optimization
problem, of course. This is the well-known Lagrange multipliers method or general
KarushâKuhnâTucker (KKT) approach.
15/01/2023 Support Vector Machine - Loc Nguyen 16
17. 1. Support vector machine
Anyhow, in context of SVM, the final problem which needs to be solved
is the simpler QP problem specified equation 1.10. There are some
methods to solve this QP problem but this report focuses on a so-called
Sequential Minimal Optimization (SMO) developed by author Platt
(Platt, 1998). The SMO algorithm is very effective method to find out
the optimal (maximum) point λ* of dual function l(λ).
ð ð = â
1
2
ð=1
ð
ð=1
ð
ðððððŠððŠð ðð â ðð +
ð=1
ð
ðð
Moreover SMO algorithm also finds out the optimal bias b*, which
means that SVM classifier (W*, b*) is totally determined by SMO
algorithm. The next section described SMO algorithm in detail.
15/01/2023 Support Vector Machine - Loc Nguyen 17
18. 2. Sequential minimal optimization
SMO algorithm solves each smallest optimization problem via two nested loops:
- The outer loop finds out the first Lagrange multiplier λi whose associated data point Xi violates KKT condition
(Wikipedia, KarushâKuhnâTucker conditions, 2014). Violating KKT condition is known as the first choice
heuristic.
- The inner loop finds out the second Lagrange multiplier λj according to the second choice heuristic. The second
choice heuristic that maximizes optimization step will be described later.
- Two Lagrange multipliers λi and λj are optimized jointly according to QP problem specified by equation 1.10.
SMO algorithm continues to solve another smallest optimization problem. SMO algorithm stops when there is
convergence in which no data point violating KKT condition is found; consequently, all Lagrange multipliers λ1,
λ2,âŠ, λn are optimized.
15/01/2023 Support Vector Machine - Loc Nguyen 18
The ideology of SMO algorithm is to divide the whole QP problem into many smallest optimization problems. Each
smallest problem relates to only two Lagrange multipliers. For solving each smallest optimization problem, SMO
algorithm includes two nested loops as shown in table 2.1 (Platt, 1998, pp. 8-9):
Table 2.1. Ideology of SMO algorithm
The ideology of violating KKT condition as the first choice heuristic is similar to the event that wrong things need to
be fixed priorly.
19. 2. Sequential minimal optimization
Before describing SMO algorithm in detailed, KKT condition with subject to SVM is mentioned firstly because
violating KKT condition is known as the first choice heuristic of SMO algorithm. For solving constrained
optimization problem, KKT condition indicates both partial derivatives of Lagrangian function and
complementary slackness are zero (Wikipedia, KarushâKuhnâTucker conditions, 2014). Referring Eqs. 1.8 and
1.4, KKT condition of SVM is summarized as Eq. 2.1:
ð =
ð=1
ð
ðððŠððð
ð=1
ð
ðððŠð = 0
ðð = ð¶ â ðð, âð
1 â ðŠð ð â ðð â ð â ðð †0, âð
âðð †0, âð
ðð ⥠0, ðð ⥠0, âð
ðð 1 â ðŠð ð â ðð â ð â ðð = 0, âð
âðððð = 0, âð
(2.1)
15/01/2023 Support Vector Machine - Loc Nguyen 19
In equation 2.1, the complementary slackness is λi(1 â yi (WâXi â
b) â Οi) = 0 and âÎŒiΟi = 0 for all i because of the primal feasibility
condition 1 â yi (WâXi â b) â Οi †0 and âΟi †0 for all i. KKT
condition specified by equation 2.1 implies Lagrange multipliers
condition specified by equation 1.8, which aims to solve
Lagrangian duality problem whose solution (W*, b*, Ο*, λ*, Ό*) is
saddle point of Lagrangian function. This is the reason that KKT
condition is known as general form of Lagrange multipliers
condition. It is easy to deduce that QP problem for SVM specified
by equation 1.10 is derived from KKT condition within
Lagrangian duality theory.
20. 2. Sequential minimal optimization
KKT condition for SVM is analyzed into three following cases (Honavar, p. 7):
1. If λi=0 then, ÎŒi = C â λi = C. It implies Οi=0 from equation ðððð = 0. Then, from inequation 1 â ðŠð(ð â ðð â
15/01/2023 Support Vector Machine - Loc Nguyen 20
Let ðžð = ðŠð â ð â ðð â ð be prediction error, we have:
ðŠððžð = ðŠð
2 â ðŠð ð â ðð â ð = 1 â ðŠð ð â ðð â ð
The KKT condition implies equation 2.2:
21. 2. Sequential minimal optimization
Equation 2.2 expresses directed corollaries from KKT condition. It is commented on equation 2.2 that if
Ei=0, the KKT condition is more likely to be satisfied because the primal feasibility condition is reduced as
Οi †0 and the complementary slackness is reduced as âλiΟi = 0 and âÎŒiΟi = 0. So, if Ei=0 and Οi=0 then KKT
equation 2.2 is reduced significantly, which focuses on solving the stationarity condition ð = ð=1
ð
ðððŠððð
and ð=1
ð
ðððŠð = 0, turning back Lagrange multipliers condition as follows :
ð =
ð=1
ð
ðððŠððð
ð=1
ð
ðððŠð = 0
ðð = ð¶ â ðð, âð
ðð ⥠0, ðð ⥠0, âð
Therefore, the data points Xi satisfying equation yiEi=0 (also Ei=0 where yi
2=1 and ðŠð = ±1) lie on the
margin (the two parallel hyperplanes), when they contribute to formulation of the normal vector ðâ =
ð=1
ð
ðððŠððð. These points Xi are called support vectors.
15/01/2023 Support Vector Machine - Loc Nguyen 21
Figure 2.1. Support vectors
22. 2. Sequential minimal optimization
Recall that support vectors satisfying equation yiEi=0 also Ei=0 lie on the margin (see figure 2.1). According to
KKT condition specified by equation 2.1, support vectors are always associated with non-zero Lagrange
multipliers such that 0<λi<C because they will not contribute to the normal vector ðâ = ð=1
ð
ðððŠððð if their λi are
zero or C. Note, errors Οi of support vectors are 0 because of the reduced complementary slackness (âλiΟi = 0 and â
ΌiΟi = 0) when Ei=0 along with λi>0. When Οi=0 then Όi can be positive from equation ΌiΟi = 0, which can violate
the equation λi = C â ÎŒi if λi=C. Such Lagrange multipliers 0<λi<C are also called non-boundary multipliers
because they are not bounds such as 0 and C. So, support vectors are also known as non-boundary data points. It is
easy to infer from equation 1.8:
ðâ
=
ð=1
ð
ðððŠððð
15/01/2023 Support Vector Machine - Loc Nguyen 22
that support vectors along with their non-zero Lagrange multipliers
form mainly the optimal weight vector W* representing the maximum-
margin hyperplane â the SVM classifier. This is the reason that this
classification approach is called support vector machine (SVM). Figure
2.1 (Moore, 2001, p. 5) illustrates an example of support vectors.
Figure 2.1. Support vectors
23. 2. Sequential minimal optimization
Violating KKT condition is the first choice heuristic of SMO algorithm. By negating three
corollaries specified by equation 2.2, KKT condition is violated in three following cases:
ðð = 0 and ðŠððžð > 0
0 < ðð < ð¶ and ðŠððžð â 0
ðð = ð¶ and ðŠððžð < 0
By logic induction, these cases are reduced into two cases specified by equation 2.3.
ðð < ð¶ and ðŠððžð > 0
ðð > 0 and ðŠððžð < 0
(2.3)
Where ðžð is prediction error:
ðžð = ðŠð â ð â ðð â ð
Equation 2.3 is used to check whether given data point Xi violates KKT condition. For
explaining the logic induction, from (λi = 0 and yiEi > 0) and (0 < λi < C and yiEi â 0)
obtaining (λi < C and yiEi > 0), from (λi = C and yiEi < 0) and (0 < λi < C and yiEi â 0)
obtaining (λi > 0 and yiEi < 0). Conversely, from (λi < C and yiEi > 0) and (λi > 0 and yiEi <
0) we obtain (λi = 0 and yiEi > 0), (0 < λi < C and yiEi â 0), and (λi = C and yiEi < 0).
15/01/2023 Support Vector Machine - Loc Nguyen 23
24. 2. Sequential minimal optimization
The main task of SMO algorithm (see table 2.1) is to optimize jointly two Lagrange multipliers in order to solve
each smallest optimization problem, which maximizes the dual function l(λ).
ð ð = â
1
2
ð=1
ð
ð=1
ð
ðððððŠððŠð ðð â ðð +
ð=1
ð
ðð
Where,
ð=1
ð
ðððŠð = 0 and 0 †ðð †ð¶, âð = 1, ð
Without loss of generality, two Lagrange multipliers λi and λj that will be optimized are λ1 and λ2 while all other
multipliers λ3, λ4,âŠ, λn are fixed. Old values of λ1 and λ2 are denoted ð1
old
and ð2
old
. Your attention please, old
values are known as current values. Thus, λ1 and λ2 are optimized based on the set: ð1
old
, ð2
old
, λ3, λ4,âŠ, λn. The
old values ð1
old
and ð2
old
are initialized by zero (Honavar, p. 9). From the condition ð=1
ð
ðððŠð = 0, we have:
ð1
old
ðŠ1 + ð2
old
ðŠ2 + ð3ðŠ3 + ð4ðŠ4 + ⯠+ ðððŠð = 0
and
ð1ðŠ1 + ð2ðŠ2 + ð3ðŠ3 + ð4ðŠ4 + ⯠+ ðððŠð = 0
15/01/2023 Support Vector Machine - Loc Nguyen 24
25. 2. Sequential minimal optimization
It implies following equation of line with regard to two variables λ1 and λ2:
ð1ðŠ1 + ð2ðŠ2 = ð1
old
ðŠ1 + ð2
old
ðŠ2 (2.4)
Equation 2.4 specifies the linear constraint of two Lagrange multipliers λ1 and λ2. This constraint is drawn as
diagonal lines in figure 2.2 (Honavar, p. 9).
Figure 2.2. Linear constraint of two Lagrange multipliers
In figure 2.2, the box is bounded by the interval [0, C] of Lagrange multipliers, 0 †ðð †ð¶. The left box and
right box in figure 2.2 imply that λ1 and λ2 are proportional and inversely proportional, respectively. SMO
algorithm moves λ1 and λ2 along diagonal lines so as to maximize the dual function l(λ).
15/01/2023 Support Vector Machine - Loc Nguyen 25
26. 2. Sequential minimal optimization
Multiplying two sides of equation ð1ðŠ1 + ð2ðŠ2 = ð1
old
ðŠ1 + ð2
old
ðŠ2 by y1, we have:
ð1ðŠ1ðŠ1 + ð2ðŠ1ðŠ2 = ð1
old
ðŠ1ðŠ1 + ð2
old
ðŠ1ðŠ2 â¹ ð1 + ð ð2 = ð1
old
+ ð ð2
old
Where s = y1y2 = ±1. Let,
ðŸ = ð1 + ð ð2 = ð1
old
+ ð ð2
old
We have equation 2.5 as a variant of the linear constraint of two Lagrange multipliers λ1 and λ2 (Honavar, p. 9):
ð1 = ðŸ â ð ð2 (2.5)
Where,
ð = ðŠ1ðŠ2 and ðŸ = ð1 + ð ð2 = ð1
old
+ ð ð2
old
By fixing multipliers λ3, λ4,âŠ, λn, all arithmetic combinations of ð1
old
, ð2
old
, λ3, λ4,âŠ, λn are constants denoted by term
âconstâ. The dual function l(λ) is re-written (Honavar, pp. 9-11):
ð ð = â
1
2
ð=1
ð
ð=1
ð
ðððððŠððŠð ðð â ðð +
ð=1
ð
ðð = â¯
= â
1
2
ð1 â ð1 ð1
2 + ð2 â ð2 ð2
2 + 2ð ð1 â ð2 ð1ð2 + 2
ð=3
ð
ððð1ðŠððŠ1 ðð â ð1
15/01/2023 Support Vector Machine - Loc Nguyen 26
27. 2. Sequential minimal optimization
Let,
ðŸ11 = ð1 â ð1, ðŸ12 = ð1 â ð2, ðŸ22 = ð2 â ð2
Let ðold
be the optimal weight vector ð = ð=1
ð
ðððŠððð based on old values of two aforementioned
Lagrange multipliers. Following linear constraint of two Lagrange multipliers specified by equation 2.4,
we have:
ðold
= ð1
old
ðŠ1ð1 + ð2
old
ðŠ2ð2 +
ð=3
ð
ðððŠððð =
ð=1
ð
ðððŠððð = ð
Let,
ð£ð =
ð=3
ð
ðððŠð ðð â ðð = ⯠= ðold
â ðð â ð1
old
ðŠ1ð1 â ðð â ð2
old
ðŠ2ð2 â ðð
We have (Honavar, p. 10):
ð ð = â
1
2
ðŸ11 ð1
2 + ðŸ22 ð2
2 + 2ð ðŸ12ð1ð2 + 2ðŠ1ð£1ð1 + 2ðŠ2ð£2ð2 + ð1 + ð2 + ðððð ð¡
= â
1
2
ðŸ11 + ðŸ22 â 2ðŸ12 ð2
2
+ 1 â ð + ð ðŸ11ðŸ â ð ðŸ12ðŸ + ðŠ2ð£1 â ðŠ2ð£2 ð2 + ðððð ð¡
15/01/2023 Support Vector Machine - Loc Nguyen 27
28. 2. Sequential minimal optimization
Let ð = ðŸ11 â 2ðŸ12 + ðŸ22 and assessing the coefficient of λ2, we have (Honavar, p. 11):
1 â ð + ð ðŸ11ðŸ â ð ðŸ12ðŸ + ðŠ2ð£1 â ðŠ2ð£2 = ðð2
old
+ ðŠ2 ðž2
old
â ðž1
old
According to equation 2.3, ðž2
old
and ðž1
old
are old prediction errors on X2 and X1, respectively:
ðžð
old
= ðŠð â ðððð â ðð â ðold
Thus, equation 2.6 specifies dual function with subject to the second Lagrange multiplier λ2 that is optimized in
conjunction with the first one λ1 by SMO algorithm.
ð ð2 = â
1
2
ð ð2
2 + ðð2
old
+ ðŠ2 ðž2
old
â ðž1
old
ð2 + ðððð ð¡ (2.6)
The first-order and second-order derivatives of dual function l(λ2) with regard to λ2 are:
dð ð2
dð2
= âðð2 + ðð2
old
+ ðŠ2 ðž2
old
â ðž1
old
and
d2ð ð2
d ð2
2
= âð
The quantity η is always non-negative due to:
ð = ð1 â ð1 â 2ð1 â ð2 + ð2 â ð2 = ð1 â ð2
2 ⥠0
Recall that the goal of QP problem is to maximize the dual function l(λ2) so as to find out the optimal multiplier
(maximum point) ð2
â
. The second-order derivative of l(λ2) is always non-negative and so, l(λ2) is concave
function and there always exists the maximum point ð2
â
.
15/01/2023 Support Vector Machine - Loc Nguyen 28
29. 2. Sequential minimal optimization
The function l(λ2) gets maximal if its first-order derivative is equal to zero:
dð ð2
dð2
= 0 â¹ âðð2 + ðð2
old
+ ðŠ2 ðž2
old
â ðž1
old
= 0 â¹ ð2 = ð2
old
+
ðŠ2 ðž2
old
â ðž1
old
ð
Therefore, the new values of λ1 and λ2 that are solutions of the smallest optimization problem of SMO algorithm
are:
ð2
â
= ð2
new
= ð2
old
+
ðŠ2 ðž2
old
â ðž1
old
ð
ð1
â
= ð1
new
= ð1
old
+ ð ð2
old
â ð ð2
new
= ð1
old
â ð
ðŠ2 ðž2
old
â ðž1
old
ð
Shortly, we have:
ð2
â
= ð2
new
= ð2
old
+
ðŠ2 ðž2
old
â ðž1
old
ð
ð1
â
= ð1
new
= ð1
old
â ð
ðŠ2 ðž2
old
â ðž1
old
ð
(2.7)
Obviously, ð1
new
is totally determined in accordance with ð2
new
, thus we should focus on ð2
new
.
15/01/2023 Support Vector Machine - Loc Nguyen 29
30. 2. Sequential minimal optimization
Because multipliers λi are bounded, 0 †ðð †ð¶ , it is
required to find out the range of ð2
new
. Let L and U be
lower bound and upper bound of ð2
new
, respectively. If s =
1, then λ1 + λ2 = γ. There are two sub-cases (see figure 2.3)
as follows (Honavar, p. 11-12):
⢠If γ ⥠C then L = γ â C and U = C.
⢠If γ < C then L = 0 and U = γ.
If s = â1, then λ1 â λ2 = γ. There are two sub-cases (see
figure 2.4) as follows (Honavar, pp. 11-13):
⢠If γ ⥠0 then L = 0 and U = C â γ.
⢠If γ < 0 then L = âγ and U = C.
15/01/2023 Support Vector Machine - Loc Nguyen 30
31. 2. Sequential minimal optimization
If η > 0 then:
ð2
new
= ð2
old
+
ðŠ2 ðž2
old
â ðž1
old
ð
ð2
â
= ð2
new,clipped
=
ð¿ if ð2
new
< ð¿
ð2
new
if ð¿ †ð2
new
†ð
ð if ð < ð2
new
If η = 0 then ð2
â
= ð2
new,clipped
= argmax
ð2
ð ð2 = ð¿ , ð ð2 = ð . Lower bound L and upper bound U are
described as follows:
- If s=1 and γ > C then L = γ â C and U = C. If s=1 and γ < C then L = 0 and U = γ.
- If s = â1 and γ > 0 then L = 0 and U = C â γ. If s = â1 and γ < 0 then L = âγ and U = C.
Let Îλ1 and Îλ2 represent the changes in multipliers λ1 and λ2, respectively.
Îð2 = ð2
â
â ð2
old
and Îð1 = âð Îð2
The new value of the first multiplier λ1 is re-written in accordance with the change Îλ1.
ð1
â
= ð1
new
= ð1
old
+ Îð1
15/01/2023 Support Vector Machine - Loc Nguyen 31
In general, table 2.2 below summarizes how SMO algorithm optimizes jointly two Lagrange multipliers.
32. 2. Sequential minimal optimization
Basic tasks of SMO algorithm to optimize jointly two Lagrange multipliers are now described in detailed. The ultimate
goal of SVM method is to determine the classifier (W*, b*). Thus, SMO algorithm updates optimal weight W* and
optimal bias b* based on the new values ð1
new
and ð2
new
at each optimization step. Let ðâ
= ðnew
be the new
(optimal) weight vector, according equation 2.1 we have:
ðnew
=
ð=1
ð
ðððŠððð = ð1
new
ðŠ1ð1 + ð2
new
ðŠ2ð2 +
ð=3
ð
ðððŠððð
Let ðâ
= ðold
be the old weight vector:
ðold
=
ð=1
ð
ðððŠððð = ð1
old
ðŠ1ð1 + ð2
old
ðŠ2ð2 +
ð=3
ð
ðððŠððð
It implies:
ðâ
= ðnew
= ð1
new
ðŠ1ð1 â ð1
old
ðŠ1ð1 + ð2
new
ðŠ2ð2 â ð2
old
ðŠ2ð2 + ðold
Let ðž2
new
be the new prediction error on X2:
ðž2
new
= ðŠ2 â ðnew
â ð2 â ðnew
The new (optimal) bias ðâ
= ðnew
is determining by setting ðž2
new
= 0 with reason that the optimal classifier (W*, b*)
has zero error.
ðž2
new
= 0 ⺠ðŠ2 â ðnew
â ð2 â ðnew
= 0 ⺠ðnew
= ðnew
â ð2 â ðŠ2
15/01/2023 Support Vector Machine - Loc Nguyen 32
33. 2. Sequential minimal optimization
In general, equation 2.8 specifies the optimal classifier (W*, b*) resulted from
each optimization step of SMO algorithm.
ðâ = ðnew = ð1
new
â ð1
old
ðŠ1ð1 + ð2
new
â ð2
old
ðŠ2ð2 + ðold
ðâ = ðnew = ðnew â ð2 â ðŠ2
(2.8)
Where ðold
is the old value of weight vector, of course we have:
ðold = ð1
old
ðŠ1ð1 + ð2
old
ðŠ2ð2 +
ð=3
ð
ðððŠððð
15/01/2023 Support Vector Machine - Loc Nguyen 33
34. 2. Sequential minimal optimization
All multipliers λi (s), weight vector W, and bias b are initialized by zero. SMO algorithm divides the whole QP problem into many
smallest optimization problems. Each smallest optimization problem focuses on optimizing two joint multipliers. SMO algorithm
solves each smallest optimization problem via two nested loops:
1. The outer loop alternates one sweep through all data points and as many sweeps as possible through non-boundary data points
(support vectors) so as to find out the data point Xi that violates KKT condition according to equation 2.3. The Lagrange
multiplier λi associated with such Xi is selected as the first multiplier aforementioned as λ1. Violating KKT condition is known
as the first choice heuristic of SMO algorithm.
2. The inner loop browses all data points at the first sweep and non-boundary ones at later sweeps so as to find out the data point
Xj that maximizes the deviation ðžð
old
â ðžð
old
where ðžð
old
and ðžð
old
are prediction errors on Xi and Xj, respectively, as seen in
equation 2.6. The Lagrange multiplier λj associated with such Xj is selected as the second multiplier aforementioned as λ2.
Maximizing the deviation ðž2
old
â ðž1
old
is known as the second choice heuristic of SMO algorithm.
a. Two Lagrange multipliers λ1 and λ2 are optimized jointly, which results optimal multipliers ð1
new
and ð2
new
, as seen in
table 2.2.
b. SMO algorithm updates optimal weight W* and optimal bias b* based on the new values ð1
new
and ð2
new
according to
equation 2.8.
SMO algorithm continues to solve another smallest optimization problem. SMO algorithm stops when there is convergence in
which no data point violating KKT condition is found. Consequently, all Lagrange multipliers λ1, λ2,âŠ, λn are optimized and the
optimal SVM classifier (W*, b*) is totally determined.
15/01/2023 Support Vector Machine - Loc Nguyen 34
By extending the ideology shown in table 2.1, SMO algorithm is described particularly in table 2.3 (Platt, 1998, pp. 8-9) (Honavar, p. 14).
35. 2. Sequential minimal optimization
Recall that non-boundary data points (support vectors) are ones whose Lagrange multipliers
are non-zero such that 0<λi<C. When both optimal weight vector W* and optimal bias b* are
determined by SMO algorithm or other methods, the maximum-margin hyperplane known
as SVM classifier is totally determined. According to equation 1.1, the equation of
maximum-margin hyperplane is expressed in equation 2.9 as follows:
ðâ
â ð â ðâ
= 0 (2.9)
For any data point X, classification rule derived from maximum-margin hyperplane (SVM
classifier) is used to classify such data point X. Let R be the classification rule, equation
2.10 specifies the classification rule as the sign function of point X.
ð â ð ððð ðâ
â ð â ðâ
=
+1 if ðâ
â ð â ðâ
⥠0
â1 if ðâ â ð â ðâ < 0
(2.10)
After evaluating R with regard to X, if R(X) =1 then, X belongs to class +1; otherwise, X
belongs to class â1. This is the simple process of data classification. The next section
illustrates how to apply SMO into classifying data points where such data points are
documents.
15/01/2023 Support Vector Machine - Loc Nguyen 35
36. 3. An example of data classification by SVM
It is necessary to have an example for illustrating how to classify documents by SVM. Given a
set of classes C = {computer science, math}, a set of terms T = {computer, derivative} and the
corpus ð = {doc1.txt, doc2.txt, doc3.txt, doc4.txt}. The training corpus (training data) is shown
in following table 3.1 in which cell (i, j) indicates the number of times that term j (column j)
occurs in document i (row i); in other words, each cell represents a term frequency and each row
represents a document vector. Note that there are four documents and each document in this
section belongs to only one class such as computer science or math.
Table 3.1. Term frequencies of documents (SVM)
15/01/2023 Support Vector Machine - Loc Nguyen 36
computer derivative class
doc1.txt 20 55 math
doc2.txt 20 20 computer science
doc3.txt 15 30 math
doc4.txt 35 10 computer science
37. 3. An example of data classification by SVM
Let Xi be data points representing documents doc1.txt, doc2.txt, doc3.txt, doc4.txt.
We have X1=(20,55), X2=(20,20), X3=(15,30), and X4=(35,10). Let yi=+1 and yi=â1
represent classes âmathâ and âcomputer scienceâ, respectively. Let x and y
represent terms âcomputerâ and âderivativeâ, respectively and so, for example, it
is interpreted that the data point X1=(20,55) has abscissa x=20 and ordinate y=55.
Therefore, term frequencies from table 3.1 is interpreted as SVM input training
corpus shown in table 3.2 below.
Data points X1, X2, X3, and X4 are depicted in figure 3.1 (next) in which classes
âmathâ (yi=+1) and âcomputerâ (yi = â1) are represented by shading and hollow
circles, respectively.
15/01/2023 Support Vector Machine - Loc Nguyen 37
x y yi
X1 20 55 +1
X2 20 20 â1
X3 15 30 +1
X4 35 10 â1
38. 3. An example of data classification by SVM
By applying SMO algorithm described in table 2.3 into training
corpus shown in table 3.1, it is easy to calculate optimal multiplier
λ*, optimal weight vector W* and optimal bias b*. Firstly, all
multipliers λi (s), weight vector W, and bias b are initialized by
zero. This example focuses on perfect separation and so, ð¶ = +â.
ð1 = ð2 = ð3 = ð4 = 0
ð = 0,0
ð = 0
ð¶ = +â
15/01/2023 Support Vector Machine - Loc Nguyen 38
39. 3. An example of data classification by SVM
At the first sweep:
The outer loop of SMO algorithm searches for a data point Xi that violates KKT condition according to equation 2.3
through all data points so as to select two multipliers that will be optimized jointly. We have:
ðž1 = ðŠ1 â ð â ð1 â ð = 1 â 0,0 â 20,55 â 0 = 1
Due to λ1=0 < C=+â and y1E1=1*1=1 > 0, point X1 violates KKT condition according to equation 2.3. Then, λ1 is
selected as the first multiplier. The inner loop finds out the data point Xj that maximizes the deviation ðžð
old
â ðž1
old
. We
have:
ðž2 = ðŠ2 â ð â ð2 â ð = â1, ðž2 â ðž1 = 2
ðž3 = ðŠ3 â ð â ð3 â ð = 1, ðž3 â ðž1 = 0
ðž4 = ðŠ4 â ð â ð4 â ð = â1, ðž4 â ðž1 = 2
Because the deviation ðž2 â ðž1 is maximal, the multiplier λ2 associated with X2 is selected as the second multiplier. Now
λ1 and λ2 are optimized jointly according to table 2.2.
ð = ð1 â ð1 â 2ð1 â ð2 + ð2 â ð2 = 1225, ð = ðŠ1ðŠ2 = â1, ðŸ = ð1
old
+ ð ð2
old
= 0
ð2
new
= ð2
old
+
ðŠ2 ðž2 â ðž1
ð
=
2
1225
, âð2 = ð2
new
â ð2
old
=
2
1225
ð1
new
= ð1
old
â ðŠ1ðŠ2âð2 =
2
1225
15/01/2023 Support Vector Machine - Loc Nguyen 39
40. 3. An example of data classification by SVM
Optimal classifier (W*, b*) is updated according to equation 2.8.
ðâ = ðnew = ð1
new
â ð1
old
ðŠ1ð1 + ð2
new
â ð2
old
ðŠ2ð2 + ðold
=
2
1225
â 0 â 1 â 20,55 +
2
1225
â 0 â â1 â 20,20 + 0,0 = 0,
2
35
ðâ = ðnew = ðnew â ð2 â ðŠ2 = 0,
2
35
â 20,20 â â1 =
15
7
Now we have:
ð1 = ð1
new
=
2
1225
, ð2 = ð2
new
=
2
1225
, ð3 = ð4 = 0
ð = ðâ = 0,
2
35
ð = ðâ =
15
7
15/01/2023 Support Vector Machine - Loc Nguyen 40
41. 3. An example of data classification by SVM
The outer loop of SMO algorithm continues to search for another data point Xi that violates KKT condition according to
equation 2.3 through all data points so as to select two other multipliers that will be optimized jointly. We have:
ðž2 = ðŠ2 â ð â ð2 â ð = 0, ðž3 = ðŠ3 â ð â ð3 â ð =
10
7
Due to λ3=0 < C and y3E3=1*(10/7) > 0, point X3 violates KKT condition according to equation 2.3. Then, λ3 is selected as the
first multiplier. The inner loop finds out the data point Xj that maximizes the deviation ðžð
old
â ðž3
old
. We have:
ðž1 = ðŠ1 â ð â ð1 â ð = 0, ðž1 â ðž3 =
10
7
, ðž2 = ðŠ2 â ð â ð2 â ð = 0, ðž2 â ðž3 =
10
7
ðž4 = ðŠ4 â ð â ð4 â ð =
6
7
, ðž4 â ðž3 =
4
7
Because both deviations ðž1 â ðž3 and ðž2 â ðž3 are maximal, the multiplier λ2 associated with X2 is selected randomly among
{λ1, λ2} as the second multiplier. Now λ3 and λ2 are optimized jointly according to table 2.2.
ð = ð3 â ð2 â 2ð3 â ð2 + ð3 â ð2 = 125, ð = ðŠ3ðŠ2 = â1, ðŸ = ð3
old
+ ð ð2
old
= â
2
1225
, ð¿ = âðŸ =
2
1225
, ð = ð¶ = +â
(L and U are lower bound and upper bound of ð2
new
)
ð2
new
= ð2
old
+
ðŠ2 ðž2 â ðž3
ð
=
16
1225
, âð2 = ð2
new
â ð2
old
=
2
175
ð3
new
= ð3
old
â ðŠ3ðŠ2âð2 =
2
175
15/01/2023 Support Vector Machine - Loc Nguyen 41
42. 3. An example of data classification by SVM
Optimal classifier (W*, b*) is updated according to equation 2.8.
ðâ = ðnew = ð3
new
â ð3
old
ðŠ3ð3 + ð2
new
â ð2
old
ðŠ2ð2 + ðold = â
2
35
,
6
35
ðâ = ðnew = ðnew â ð2 â ðŠ2 =
23
7
Now we have:
ð1 =
2
1225
, ð2 = ð2
new
=
16
1225
, ð3 = ð3
new
=
2
175
, ð4 = 0
ð = ðâ = â
2
35
,
6
35
ð = ðâ =
23
7
15/01/2023 Support Vector Machine - Loc Nguyen 42
43. 3. An example of data classification by SVM
The outer loop of SMO algorithm continues to search for another data point Xi that
violates KKT condition according to equation 2.3 through all data points so as to
select two other multipliers that will be optimized jointly. We have:
ðž4 = ðŠ4 â ð â ð4 â ð =
18
7
Due to λ4=0 < C and y4E4=(â1)*(18/7) < 0, point X4 does not violate KKT condition
according to equation 2.3. Then, the first sweep of outer loop stops with the results
as follows:
ð1 =
2
1225
, ð2 = ð2
new
=
16
1225
, ð3 = ð3
new
=
2
175
, ð4 = 0
ð = ðâ = â
2
35
,
6
35
ð = ðâ =
23
7
15/01/2023 Support Vector Machine - Loc Nguyen 43
44. 3. An example of data classification by SVM
At the second sweep:
The outer loop of SMO algorithm searches for a data point Xi that violates KKT condition according to equation 2.3 through non-boundary
data points so as to select two multipliers that will be optimized jointly. Recall that non-boundary data points (support vectors) are ones
whose associated multipliers are not bounds 0 and C (0<λi<C). At the second sweep, there are three non-boundary data points X1, X2 and X3.
We have:
ðž1 = ðŠ1 â ð â ð1 â ð = 1 â â
2
35
,
6
35
â 20,55 â
23
7
= â4
Due to ð1 =
2
1225
> 0 and y1E1 = 1*(â4) < 0, point X1 violates KKT condition according to equation 2.3. Then, λ1 is selected as the first
multiplier. The inner loop finds out the data point Xj among non-boundary data points (0<λi<C) that maximizes the deviation ðžð
old
â ðž1
old
.
We have:
ðž2 = ðŠ2 â ð â ð2 â ð = 0, ðž2 â ðž1 = 4, ðž3 = ðŠ3 â ð â ð3 â ð = 0, ðž3 â ðž1 = 4
Because the deviation ðž2 â ðž1 is maximal, the multiplier λ2 associated with X2 is selected as the second multiplier. Now λ1 and λ2 are
optimized jointly according to table 2.2.
ð = ð1 â ð1 â 2ð1 â ð2 + ð2 â ð2 = 1225, ð = ðŠ1ðŠ2 = â1, ðŸ = ð1
old
+ ð ð2
old
= â
14
1225
, ð¿ = âðŸ =
2
175
, ð = ð¶ = +â
(L and U are lower bound and upper bound of ð2
new
)
ð2
new
= ð2
old
+
ðŠ2 ðž2 â ðž1
ð
=
12
1225
< ð¿ =
2
175
â¹ ð2
new
= ð¿ =
2
175
Îð2 = ð2
new
â ð2
old
= â
2
1225
, ð1
new
= ð1
old
â ðŠ1ðŠ2Îð2 = 0
15/01/2023 Support Vector Machine - Loc Nguyen 44
45. 3. An example of data classification by SVM
Optimal classifier (W*, b*) is updated according to equation 2.8.
ðâ = ðnew = ð1
new
â ð1
old
ðŠ1ð1 + ð2
new
â ð2
old
ðŠ2ð2 + ðold
= â
2
35
,
4
35
ðâ
= ðnew
= ðnew
â ð2 â ðŠ2 =
15
7
The second sweep stops with results as follows:
ð1 = ð1
new
= 0, ð2 = ð2
new
=
2
175
, ð3 =
2
175
, ð4 = 0
ð = ðâ
= â
2
35
,
4
35
ð = ðâ =
15
7
15/01/2023 Support Vector Machine - Loc Nguyen 45
46. 3. An example of data classification by SVM
At the third sweep:
The outer loop of SMO algorithm searches for a data point Xi that violates KKT condition according to
equation 2.3 through non-boundary data points so as to select two multipliers that will be optimized jointly.
Recall that non-boundary data points (support vectors) are ones whose associated multipliers are not bounds
0 and C (0<λi<C). At the third sweep, there are only two non-boundary data points X2 and X3. We have:
ðž2 = ðŠ2 â ð â ð2 â ð = 0, ðž3 = ðŠ3 â ð â ð3 â ð =
4
7
Due to ð3 =
2
175
< ð¶ = +â and y3E3 = 1*(4/7) > 0, point X3 violates KKT condition according to equation
2.3. Then, λ3 is selected as the first multiplier. Because there are only two non-boundary data points X2 and
X3, the second multiplier is λ2. Now λ3 and λ2 are optimized jointly according to table 2.2.
ð = ð3 â ð2 â 2ð3 â ð2 + ð3 â ð2 = 125, ð = ðŠ3ðŠ2 = â1, ðŸ = ð3
old
+ ð ð2
old
= 0, ð¿ = 0, ð = ð¶ â ðŸ = +â
(L and U are lower bound and upper bound of ð2
new
)
ð2
new
= ð2
old
+
ðŠ2 ðž2 â ðž3
ð
=
2
125
, âð2 = ð2
new
â ð2
old
=
4
875
ð3
new
= ð3
old
â ðŠ3ðŠ2âð2 =
2
125
15/01/2023 Support Vector Machine - Loc Nguyen 46
47. 3. An example of data classification by SVM
Optimal classifier (W*, b*) is updated according to equation 2.8.
ðâ = ðnew = ð3
new
â ð3
old
ðŠ3ð3 + ð2
new
â ð2
old
ðŠ2ð2 + ðold
=
18
175
,
12
35
ðâ
= ðnew
= ðnew
â ð2 â ðŠ2 =
18
175
,
12
35
â 20,20 â â1 =
347
35
The third sweep stops with results as follows:
ð1 = 0, ð2 = ð2
new
=
2
125
, ð3 = ð3
new
=
2
125
, ð4 = 0
ð = ðâ
=
18
175
,
12
35
ð = ðâ
=
347
35
15/01/2023 Support Vector Machine - Loc Nguyen 47
48. 3. An example of data classification by SVM
After the third sweep, two non-boundary multipliers were optimized jointly. You
can sweep more times to get more optimal results because data point X3 still
violates KKT condition as follows:
ðž3 = ðŠ3 â ð â ð3 â ð = â
32
35
Due to ð3 =
2
125
> 0 and y3E3 = 1*(â32/35) < 0, point X3 violates KKT condition
according to equation 2.3. But it takes a lot of sweeps so that SMO algorithm
reaches absolute convergence (E3=0 and hence, no KKT violation) because the
penalty C is set to be +â, which implicates the perfect separation. This is the
reason that we can stop the SMO algorithm at the third sweep in this example. In
general, you can totally stop the SMO algorithm after optimizing two last
multipliers which implies that all multipliers were optimized.
15/01/2023 Support Vector Machine - Loc Nguyen 48
49. 3. An example of data classification by SVM
As a result, W* and b* were determined:
ðâ =
18
175
,
12
35
ðâ
=
347
35
The maximum-margin hyperplane (SVM classifier)
is totally determined as below:
ðâ
â ð â ðâ
= 0 â¹ ðŠ = â0.3ð¥ + 28.92
The SVM classifier ðŠ = â0.3ð¥ + 28.9 is depicted
in figure 3.2. Where the maximum-margin
hyperplane is drawn as bold line. Data points X2 and
X3 are support vectors because their associated
multipliers λ2 and λ3 are non-zero
(0<λ2=2/125<C=+â, 0<λ3=2/125<C=+â).
15/01/2023 Support Vector Machine - Loc Nguyen 49
50. 3. An example of data classification by SVM
Derived from the above classifier ðŠ = â0.3ð¥ + 28.9, the classification rule is:
ð â ð ððð 0.3ð¥ + ðŠ â 28.9 =
+1 if 0.3ð¥ + ðŠ â 28.9 ⥠0
â1 if 0.3ð¥ + ðŠ â 28.9 < 0
Now we apply classification rule ð â ð ððð 0.3ð¥ + ðŠ â 28.9 into document
classification. Suppose the numbers of times that terms âcomputerâ and
âderivativeâ occur in document D are 40 and 10, respectively. We need to
determine which class document D=(40, 10) is belongs to. We have:
ð ð· = ð ððð 0.3 â 40 + 10 â 28.9 = ð ððð â6.9 = â1
Hence, it is easy to infer that document D belongs to class âcomputer scienceâ
(yi = â1).
15/01/2023 Support Vector Machine - Loc Nguyen 50
51. 4. Conclusions
⢠In general, the main ideology of SVM is to determine the separating hyperplane that
maximizes the margin between two classes of training data. Based on theory of
optimization, such optimal hyperplane is specified by the weight vector W* and the bias b*
which are solutions of constrained optimization problem. It is proved that there always
exist these solutions but the main issue of SVM is how to find out them when the
constrained optimization problem is transformed into quadratic programming (QP)
problem. SMO which is the most effective algorithm divides the whole QP problem into
many smallest optimization problems. Each smallest optimization problem focuses on
optimizing two joint multipliers. It is possible to state that SMO is the best
implementation version of the âarchitectureâ SVM.
⢠SVM is extended by concept of kernel function. The dot product in separating hyperplane
equation is the simplest kernel function. Kernel function is useful in case of requirement
of data transformation (Law, 2006, p. 21). There are many pre-defined kernel functions
available for SVM. Readers are recommended to research more about kernel functions
(Cristianini, 2001).
15/01/2023 Support Vector Machine - Loc Nguyen 51
52. References
1. Boyd, S., & Vandenberghe, L. (2009). Convex Optimization. New York, NY, USA: Cambridge University Press.
2. Cristianini, N. (2001). Support Vector and Kernel Machines. The 28th International Conference on Machine Learning
(ICML). Bellevue, Washington, USA: Website for the book "Support Vector Machines": http://www.support-vector.net.
Retrieved 2015, from http://www.cis.upenn.edu/~mkearns/teaching/cis700/icml-tutorial.pdf
3. Honavar, V. G. (n.d.). Sequential Minimal Optimization for SVM. Iowa State University, Department of Computer Science.
Ames, Iowa, USA: Vasant Honavar homepage.
4. Jia, Y.-B. (2013). Lagrange Multipliers. Lecture notes on course âProblem Solving Techniques for Applied Computer
Scienceâ, Iowa State University of Science and Technology, USA.
5. Johansen, I. (2012, December 29). Graph software. Graph version 4.4.2 build 543(4.4.2 build 543). GNU General Public
License.
6. Law, M. (2006). A Simple Introduction to Support Vector Machines. Lecture Notes for CSE 802 course, Michigan State
University, USA, Department of Computer Science and Engineering.
7. Moore, A. W. (2001). Support Vector Machines. Carnegie Mellon University, USA, School of Computer Science. Available at
http://www. cs. cmu. edu/~awm/tutorials.
8. Platt, J. C. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft
Research.
9. Wikibooks. (2008, January 1). Support Vector Machines. Retrieved 2008, from Wikibooks website:
http://en.wikibooks.org/wiki/Support_Vector_Machines
10. Wikipedia. (2014, August 4). KarushâKuhnâTucker conditions. (Wikimedia Foundation) Retrieved November 16, 2014, from
Wikipedia website: http://en.wikipedia.org/wiki/KarushâKuhnâTucker_conditions
15/01/2023 Support Vector Machine - Loc Nguyen 52
53. Thank you for listening
53
Support Vector Machine - Loc Nguyen
15/01/2023