SlideShare a Scribd company logo
1 of 6
Exploring Support Vector Regression
for Predictive Data Analysis
Daniel Kuntz∗, Surya Chandra† and Jon Pritchard‡
Department of Electrical Engineering and Computer Science
Colorado School of Mines: Golden, CO
Email: ∗dkuntz@mines.edu, †schandra@mines.edu, ‡jpritcha@mines.edu
Abstract—The purpose of this paper is to demonstrate the
use of Support Vector Regression (SVR) in the context of
predicting the hourly use of bikes in Washington D.C.’s bike
share program. An abridged derivation of the SVR scheme is
given along with an explanation of kernel functions which are
vital to the performance of this method. Bike share data is
provided as part of a Kaggle
TM
competition, meaning we get
a firm qualitative benchmark for it’s predictive performance
against an array of other competitors, also, we show a direct
comparison between SVR and a naive linear regression to further
intuitive comprehension of the concepts. Our results indicate good
performance vs. linear regression and competitive performance
in the overall contest.
I. INTRODUCTION
Advances in predictive modelling are providing new in-
sights into critical data for businesses, governments and indi-
viduals. One of the most popular of these methods is SVR. It
is an efficient, highly configurable, and mathematically sound
solution for gaining this insight. In short, it is designed for the
task of fitting a non-linear function to approximate an outcome
(e.g. number of bikes rented) based on data that this outcome
is perceived to be a function of (e.g. time, season, weather, ...)
which are usually called ”explanatory variables”.
As a test case, our team will compete in the Washington
D.C. Bike Share competition hosted by Kaggle. In this contest,
it is of interest to the city to determine when and why people
are using their bike share program. This information will allow
them to properly plan for future growth a well as provide and
analysis of customer use patterns. Our team has decided to
use SVR modelling to compete in the competition and our
approach is documented herein.
A. How The Competition Works
Kaggle provides two sets of data. One set, generally
referred to as the ”training” set provides a set of explanatory
variables along with the outcome for each. For this particular
competition the given variables and outcomes are provided
in TABLES I and II respectively. This set of data is used to
train a prediction algorithm. The second set of data, referred
to as the ”test” set provides explanatory variables but not their
outcome. This outcome is hidden from contestants who’s job
it is to predict these outcomes. Once a prediction is made, it’s
accuracy is scored with equation (1).
TABLE I. EXPLANATORY VARIABLES [1]
Name Description
datetime Date and Time (YYYY-MM-DD HH:MM:SS)
season Season (1 = spring, 2 = summer, 3 = fall, 4 = winter)
holiday Whether the day is considered a holiday
weather 1: Clear, Few clouds, Partly cloudy, Partly cloudy; 2: Mist +
Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3:
Light Snow, Light Rain + Thunderstorm + Scattered clouds,
Light Rain + Scattered clouds; 4: Heavy Rain + Ice Pallets
+ Thunderstorm + Mist, Snow + Fog
temp Temperature in Celsius
atemp ”Feels like” temperature in Celsius
humidity Relative humidity
windspeed Wind speed
TABLE II. OUTCOMES
Name Description
casual Number of non-registered user rentals initiated
registered Number of registered user rentals initiated
count Number of total rentals
=
1
n
n
i=1
(log (pi + 1) − log (ai + 1))
2
(1)
Where:
: Root Mean Squared Logrithmic Error (RMSLE)
n : Number of explanatory vectors in the test data set
pi : Prediction for vector i
ai : Actual value for vector i
B. Discussion of Parameters
Some of the parameters in TABLE I had to be modified and
some care had to be taken that redundant and non-important
variables were not used. The ”datetime” variable was divided
into 4 different variables: year, day, month and hour. This
allowed our model to take into account variations by the hour
and month as one would intuitively expect a strongly correlated
cyclical pattern associated with these variables. Also, variables
such as ”season”, which is entirely dependent on the month
and day were generally taken out of the model so as to not
”over-train” the model.
Some experimentation was needed to best determine which
variables affected the outcome the most strongly, one way to do
this that we will not discuss is by using Principal Component
Analysis (PCA). When results are discussed, we will provide
a full list of explanatory variables used in the model.
II. LINEAR REGRESSION
To demonstrate some of the underlying concepts of SVR
we take as inspiration a very simple linear regression for
creating a predictive model. In this we will assume that each
explanatory variable used has a weight and the sum of each
variable times it weight plus an offset is a good model of the
system.
A. Problem Formulation
We assume that the predictive function that we would like
to find takes the same form as (2)
f(xi) = w0 +
m
j=1
wjxi,j (2)
Where:
xi : The ith explanatory vector
w0 : An offset weight
wj : weights for each component of xi
m : The number of variables in xi
So, in this case if we determine the weights w =
[w0 · · · wm]
T
we have found a predictive model. Since we
have m + 1 weights, we could use a system of equations of
the form (2) to find them. So for the training set of data we
have system of equations (3).




1 x1,1 x1,2 · · · x1,m
1 x2,1 x2,2 · · · x2,m
...
...
...
1 xn,1 xn,2 · · · xn,m








w0
w1
...
wm



 =




y1
y2
...
yn



 (3)
Where:
n : The the number of training eplanatory vectors
yi : The outcome of for each explanatory vector i
Using matrix notation we rewrite (2) as (4). We recognize
this as a standard over-defined minimization problem, The
solution of which is given by (5). (X+
denotes the pseudo
inverse of the data matrix X)
Xw = y (4)
w = X+
y (5)
B. Results
Using this naive linear regression method, with the ex-
planatory vectors: ”year”, ”month”, ”day”, ”hour”, ”holiday”,
”workingday”, ”weather”, ”temp”, ”humidity” and ”wind-
speed” we managed to achieve the competition results in
TABLE III.
TABLE III. KAGGLE SCORE FOR LINEAR REGRESSION PREDICTION
Score (RMSLE) Rank (of approx. 1500)
1.30542 1275
C. Analysis of Linear Regression Results
As we would suspect, the linear regression did not perform
very well. The reason for this is that many variables do not
affect the outcome in a linear way. The variable ”weather”
may reduce the number of riders proportionally to how how
bad the weather is, and as such, is a good candidate for linear
regression but what about a variable like ”hour”? You would
intuitively think that this variable would create spikes in the
outcome for hours that represent rush hour. Fig. 1 shows that
this is true as well as showing the average bike rentals for each
hour over the whole data set compared to a best fit line. We
can easily see that the linear regression is not really a good
representation of this variable. Hence, we need a non-linear
representation of the data.
Fig. 1. Linear Regression Fit to Hourly Average
III. HIGHER DIMENSIONAL MAPPING
AND KERNEL FUNCTIONS
Since linear regression fails to accurately model the data,
it is obvious that we need to use a non-linear model to achieve
a better approximation. However, non-linear models are much
more complex than linear models. One strategy that could work
would be to map the low dimensional data into a higher dimen-
sional space where it is linear. In a simplistic manner, linear
regression preforms this kind of mapping by adding in an offset
term and performing the mapping: Φ (x) : Rm
→ Rm+1
. This
idea can be expanded to include higher order terms as well,
consider the mapping Φ such that:
Φ (x) : R2
→ R6
Φ ([x1 x2]) = 1 x1 x2 x2
1 x2
2 x1x2 (6)
The problem with these kinds of mappings is that the linear
regression model becomes extremely inefficient. This is due to
the fact that we could be mapping to a space with a HUGE
number of dimensions. As an example, for an m-dimensional
vector, a simple quadratic mapping the transformed vector
will be in an O m2
dimensional space. This can become
computationally expensive very quickly.
A. Definition
A solution to the problem of mapping to a higher dimen-
sional space is the use of kernel functions. Kernel functions al-
low us to find inner the products of high dimensional vectors in
a lower dimensional space for a very specific set of functions.
This means that if we can formulate our minimization problem
to depend only on these inner products. We can then use these
kernel functions to drastically improve the performance of our
algorithm.
The definition of a kernel function is simply any function
that satisfies the following:
K([x1 x2]) = Φ (x1) , Φ (x2) (7)
Where:
K : The Kernel Fucntion
Φ : A mapping to a higher dimensional space
B. Example Kernel Function
In order to illustrated the relationship between the mapping
functions and the kernels function, an example a simple kernel
function is derived below.
Given the column vectors:
x = [x1 x2]
T
z = [z1 z2]
T
Φ(x) = x2
1
√
2x1x2 x2
2
T
If follows that:
Φ(x)
T
Φ(z) = x2
1
√
2x1x2 x2
2 z2
1
√
2z1z2 z2
2
T
= x2
1 z 2
1 + x2
2 z 2
2 + 2x1x2z1z2
= (x1z1 + x2z2)
2
= xT
z
2
= K(x, z)
C. Other Types of Kernel Functions
Two of the most commonly used used kernel function are
the Gaussian Radial Basis Function (RBF) (8) and Polynomial
function (9).
K (x2, x2) = exp −
x1 − x2
2
2σ2
, σ ∈ R (8)
K (x2, x2) = ( x1, x2 + c)
p
, c ≥ 0, p ∈ N (9)
D. Discussion
We have defined kernel functions and showed how they
can be used to calculate high dimensional inner products using
lower dimensional vectors. With this knowledge we can move
forward to define the formulation of support vector regression,
using kernel functions to simplify calculations.
IV. DERIVATION OF THE SUPPORT VECTOR REGRESSION
METHOD
A. Primal Formulation
In order to use the efficient properties of kernel functions
we now need a regression formulation that can be expressed
in terms of the inner product of explanatory vectors xi. To this
end we consider the minimization problem (10)
Minimize:
1
2
w 2
2 + C
n
i=1
(ζi + ζ∗
i ) (10)
Subject to:
yi − w, xi − w0 ≤ + ζi (11)
w, xi + w0 − yi ≤ + ζ∗
i (12)
ζi ≥ 0
ζ∗
i ≥ 0
In this formulation ζi and ζ ∗
i are the slack variables, they
allow the data to vary outside of the band ± . However, if they
do go outside this band, they are penalize the minimization
term. C > 0 is the amount for which deviations larger than
can are penalized. As shown in Fig 2, only the points outside
the region contribute to cost as we linearly penalize deviations.
These penalized data vectors are the support vectors.
Fig. 2. Visuaization of Support Vectors [2]
B. Lagrangian Minimization
The minimization problem described by (10) has the La-
grangian representation (13)
L :=
1
2
w 2
2 + C
n
i=1
(ζi + ζ∗
i ) −
n
i=1
(ηiζi + η∗
i ζ∗
i )
−
n
i=1
αi ( + ζi + w0 + w, xi − yi)
−
n
i=1
α∗
i ( + ζ∗
i − w0 − w, xi + yi) (13)
Taking each derivative of L with respect to the variables
{w, w0, ζi, ζ∗
i } yields the following expressions:
∂L
∂w
= w −
n
i=1
(αi − α∗
i ) xi (14)
∂L
∂w0
= −w0
n
i=1
(αi − α∗
i ) (15)
∂L
∂ζi
= C − (ηi + αi) (16)
∂L
∂ζ∗
i
= C − (η∗
i + α∗
i ) (17)
Setting each derivative equal to 0 imparts the following
expressions:
(14) = 0 =⇒ w =
n
i=1
(αi − α∗
i ) xi (18)
(15) = 0 =⇒
n
i=1
(αi − α∗
i ) = 0 (19)
(16) = 0 =⇒ ηi = C − αi (20)
(17) = 0 =⇒ η∗
i = C − α∗
i (21)
C. Dual Formulation
Plugging expressions (18), (19), (20), (21) back into (13)
then yields the dual formulation of the minimization problem
(10). This formulation is given by (22)
Maximize:
n
i=1
(αi − α∗
i ) yi −
n
i=1
(αi + α∗
i )
−
1
2
n
i,j=1
(αi − α∗
i ) αj − α∗
j xi, xj (22)
Subject to:
n
i=1
(αi − α∗
i ) = 0
αi, α∗
i ∈ [0, C]
Notice that the dual formulation has is written in terms
of the inner product of xi. This mean that we can use the
kernel functions described in SECTION III to reduce the
dimensionality of a higher order mapping zi = Φ(xi). This
allows us to write (22) as (23)
n
i=1
(αi − α∗
i ) yi −
n
i=1
(αi + α∗
i )
−
1
2
n
i,j=1
(αi − α∗
i ) αj − α∗
j K (xi, xj) (23)
D. Solving for α(∗)
Now the only unknowns left are the variables α and α∗
.
Solving for these variables is a task that can be accomplished
numerically. One such numerical scheme is an interior point
algorithm referred to as primal-dual path-following [3]. This
technique is described in [4]. It should also be noted that a very
nice property of the SVR formulation is that it can be shown
to be convex [3] so any numerical technique will converge to
only one possible solution.
E. Final Solution
Once we have solved for α and α∗
all that is left is compute
the weights w. This is realized by plugging (18) into (2) to
obtain (24)
f (xi) =
n
i,j=1
(αi − α∗
i ) αj − α∗
j K (xi, x2) + w0 (24)
Similarly, the offset term can be solved for by plugging
(18) into (11) or (12) to obtain (25) and (26):
w0 = yi −
n
j=1
αjK (xj, xi) − for aj ∈ (0, C) (25)
w0 = yi +
n
j=1
α∗
j K (xj, xi) + for a∗
j ∈ (0, C) (26)
With:
i s.t. 0 < αi < C/n
F. Selection of Parameters
When selecting parameters C and it helps to have an
understanding of how they affect the regression. The primal
minimization problem (10) holds some clues as to how these
variables affect the outcome. C penalizes the function that
is being minimized any time a vector goes outside the error
insensitive tube (which is ± ).
We can see from Fig. 3 that a small C favors a smoother
function while a larger C puts more emphasis on getting as
close to every point as possible. Thus, the C parameter is a
good way to deal with ”over-fitting” the data. It can be thought
of as a gain that we apply to the slack variables.
Fig. 3. Effect of the C Parameter
As shown by Fig 4 the size of controls how much small
errors in the predictive function are ignored. A small will
penalize most errors while a larger value will not penalize
errors that are close enough. Thus determines the number of
support vectors used calculate f(xi)
Fig. 4. Effect of the Parameter
G. Results
Using the parameters in TABLE IV, we achieved the
highest Kaggle score for our team. The score is provided in
TABLE V. These results show a drastic improvement from the
naive Linear Regression method.
TABLE IV. SVR PARAMETERS
Parameter Value
explanatory
variables
month, hour, weather, workingday
kernel Gaussian Radial Basis Function (RBF)
0.1
C 30
TABLE V. KAGGLE SCORE FOR SVR PREDICTION
Score (RMSLE) Rank (of approx. 1500)
0.55815 847
H. Analysis of SVR Results
We have seen that SVR boosts the predictive power a lot
from the baseline linear regression. In order to answer why
it does we again show the plot for a model run with just the
hour as an explanatory variable (Fig. 5). Now we can see that
the non-linear function fit to the data shows a much closer
adherence to the average. As time of day is one of the most
principal variables, we can easily imagine our fit in higher
dimensions conforming much more closely to actual data.
Fig. 5. SVR Fit to Hourly Average
V. CONCLUSION
This project has demonstrated how Support Vector Re-
gression can be used to find a functional approximation to
a nonlinear dataset. It extends the idea of linear regression
to higher dimensional spaces, and artfully utilizes kernel
functions in order to reduce the complexity of computing the
result. As our results in the Kaggle competition have shown,
SVR is a far more robust method of prediction than the naive
linear regression.
ACKNOWLEDGMENT
Special thanks to Professor Gongguo Tang for a very well
taught and interesting class this semester.
REFERENCES
[1] ”Data - Bike Sharing Demand,” https://www.kaggle.com/c/bike-sharing-
demand/data, Accessed Dec. 10, 2014
[2] P. S. Yu, et al. ”Support vector regression for real-time flood stage
forecasting”. Journal of Hydrology, 328 (3-4), pp. 704-716 (Sep. 2006)
[3] A. Smola, B. Sch¨olkopf, ”A Tutorial On Support Vector Regression,”
Sep. 30, 2003
[4] R. J. Vanderbei, ”LOQO: An interior point code for quadratic program-
ming.” TR SOR-94-15, Statistics and Operations Re- search, Princeton
Univ., NJ, 1994.

More Related Content

What's hot

PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learningAkhilesh Joshi
 
random forest regression
random forest regressionrandom forest regression
random forest regressionAkhilesh Joshi
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to RAngshuman Saha
 
Grid search (parameter tuning)
Grid search (parameter tuning)Grid search (parameter tuning)
Grid search (parameter tuning)Akhilesh Joshi
 
Linear models
Linear modelsLinear models
Linear modelsFAO
 
Programs in array using SWIFT
Programs in array using SWIFTPrograms in array using SWIFT
Programs in array using SWIFTvikram mahendra
 
Regression kriging
Regression krigingRegression kriging
Regression krigingFAO
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting SpatialFAO
 
Scala collection methods flatMap and flatten are more powerful than monadic f...
Scala collection methods flatMap and flatten are more powerful than monadic f...Scala collection methods flatMap and flatten are more powerful than monadic f...
Scala collection methods flatMap and flatten are more powerful than monadic f...Philip Schwarz
 
Functional Programming in Swift
Functional Programming in SwiftFunctional Programming in Swift
Functional Programming in SwiftSaugat Gautam
 
Cubist
CubistCubist
CubistFAO
 
Queue implementation
Queue implementationQueue implementation
Queue implementationRajendran
 
Fp in scala with adts
Fp in scala with adtsFp in scala with adts
Fp in scala with adtsHang Zhao
 
The Ring programming language version 1.3 book - Part 25 of 88
The Ring programming language version 1.3 book - Part 25 of 88The Ring programming language version 1.3 book - Part 25 of 88
The Ring programming language version 1.3 book - Part 25 of 88Mahmoud Samir Fayed
 
Chapter 1 Basic Concepts
Chapter 1 Basic ConceptsChapter 1 Basic Concepts
Chapter 1 Basic ConceptsHareem Aslam
 
Fp in scala with adts part 2
Fp in scala with adts part 2Fp in scala with adts part 2
Fp in scala with adts part 2Hang Zhao
 

What's hot (20)

PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
random forest regression
random forest regressionrandom forest regression
random forest regression
 
K fold
K foldK fold
K fold
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
Grid search (parameter tuning)
Grid search (parameter tuning)Grid search (parameter tuning)
Grid search (parameter tuning)
 
Struct examples
Struct examplesStruct examples
Struct examples
 
Linear models
Linear modelsLinear models
Linear models
 
Programs in array using SWIFT
Programs in array using SWIFTPrograms in array using SWIFT
Programs in array using SWIFT
 
Regression kriging
Regression krigingRegression kriging
Regression kriging
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting Spatial
 
Scala collection methods flatMap and flatten are more powerful than monadic f...
Scala collection methods flatMap and flatten are more powerful than monadic f...Scala collection methods flatMap and flatten are more powerful than monadic f...
Scala collection methods flatMap and flatten are more powerful than monadic f...
 
Map, Reduce and Filter in Swift
Map, Reduce and Filter in SwiftMap, Reduce and Filter in Swift
Map, Reduce and Filter in Swift
 
Functional Programming in Swift
Functional Programming in SwiftFunctional Programming in Swift
Functional Programming in Swift
 
Cubist
CubistCubist
Cubist
 
Queue implementation
Queue implementationQueue implementation
Queue implementation
 
Fp in scala with adts
Fp in scala with adtsFp in scala with adts
Fp in scala with adts
 
The Ring programming language version 1.3 book - Part 25 of 88
The Ring programming language version 1.3 book - Part 25 of 88The Ring programming language version 1.3 book - Part 25 of 88
The Ring programming language version 1.3 book - Part 25 of 88
 
Chapter 1 Basic Concepts
Chapter 1 Basic ConceptsChapter 1 Basic Concepts
Chapter 1 Basic Concepts
 
Fp in scala with adts part 2
Fp in scala with adts part 2Fp in scala with adts part 2
Fp in scala with adts part 2
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 

Similar to Exploring Support Vector Regression - Signals and Systems Project

Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.pptTanyaWadhwani4
 
Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Niklas Ignell
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinMinchao Lin
 
Iterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue ProblemsIterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue Problemsijceronline
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithmsrahmedraj93
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionDario Panada
 
Byungchul Yea (Project)
Byungchul Yea (Project)Byungchul Yea (Project)
Byungchul Yea (Project)Byung Chul Yea
 
Regression analysis in excel
Regression analysis in excelRegression analysis in excel
Regression analysis in excelAwais Salman
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxSivam Chinna
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
 
Random Forest Ensemble of Support Vector Regression for Solar Power Forecasting
Random Forest Ensemble of Support Vector Regression for Solar Power ForecastingRandom Forest Ensemble of Support Vector Regression for Solar Power Forecasting
Random Forest Ensemble of Support Vector Regression for Solar Power ForecastingMohamed Abuella
 

Similar to Exploring Support Vector Regression - Signals and Systems Project (20)

Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao Lin
 
ECE611 Mini Project2
ECE611 Mini Project2ECE611 Mini Project2
ECE611 Mini Project2
 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
 
Iterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue ProblemsIterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue Problems
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithms
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
 
working with python
working with pythonworking with python
working with python
 
Rsh qam11 ch04 ge
Rsh qam11 ch04 geRsh qam11 ch04 ge
Rsh qam11 ch04 ge
 
pre
prepre
pre
 
Byungchul Yea (Project)
Byungchul Yea (Project)Byungchul Yea (Project)
Byungchul Yea (Project)
 
Chap011
Chap011Chap011
Chap011
 
R9
R9R9
R9
 
Regression analysis in excel
Regression analysis in excelRegression analysis in excel
Regression analysis in excel
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
report
reportreport
report
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
Random Forest Ensemble of Support Vector Regression for Solar Power Forecasting
Random Forest Ensemble of Support Vector Regression for Solar Power ForecastingRandom Forest Ensemble of Support Vector Regression for Solar Power Forecasting
Random Forest Ensemble of Support Vector Regression for Solar Power Forecasting
 

More from Surya Chandra

Robotics Simulation by Wireless Brains - ROBOKDC'15 Project
Robotics Simulation by Wireless Brains - ROBOKDC'15 ProjectRobotics Simulation by Wireless Brains - ROBOKDC'15 Project
Robotics Simulation by Wireless Brains - ROBOKDC'15 ProjectSurya Chandra
 
Forward Bit Error Correction - Wireless Communications
Forward Bit Error Correction - Wireless Communications Forward Bit Error Correction - Wireless Communications
Forward Bit Error Correction - Wireless Communications Surya Chandra
 
Wordoku Puzzle Solver - Image Processing Project
Wordoku Puzzle Solver - Image Processing ProjectWordoku Puzzle Solver - Image Processing Project
Wordoku Puzzle Solver - Image Processing ProjectSurya Chandra
 
Direction Finding - Antennas Project
Direction Finding - Antennas Project Direction Finding - Antennas Project
Direction Finding - Antennas Project Surya Chandra
 
Smart Bin – Advanced Control System Design Project
Smart Bin – Advanced Control System Design ProjectSmart Bin – Advanced Control System Design Project
Smart Bin – Advanced Control System Design ProjectSurya Chandra
 
Augmented Reality Video Playlist - Computer Vision Project
Augmented Reality Video Playlist - Computer Vision ProjectAugmented Reality Video Playlist - Computer Vision Project
Augmented Reality Video Playlist - Computer Vision ProjectSurya Chandra
 
Balancing Robot Kalman Filter Design – Estimation Theory Project
Balancing Robot Kalman Filter Design – Estimation Theory ProjectBalancing Robot Kalman Filter Design – Estimation Theory Project
Balancing Robot Kalman Filter Design – Estimation Theory ProjectSurya Chandra
 

More from Surya Chandra (7)

Robotics Simulation by Wireless Brains - ROBOKDC'15 Project
Robotics Simulation by Wireless Brains - ROBOKDC'15 ProjectRobotics Simulation by Wireless Brains - ROBOKDC'15 Project
Robotics Simulation by Wireless Brains - ROBOKDC'15 Project
 
Forward Bit Error Correction - Wireless Communications
Forward Bit Error Correction - Wireless Communications Forward Bit Error Correction - Wireless Communications
Forward Bit Error Correction - Wireless Communications
 
Wordoku Puzzle Solver - Image Processing Project
Wordoku Puzzle Solver - Image Processing ProjectWordoku Puzzle Solver - Image Processing Project
Wordoku Puzzle Solver - Image Processing Project
 
Direction Finding - Antennas Project
Direction Finding - Antennas Project Direction Finding - Antennas Project
Direction Finding - Antennas Project
 
Smart Bin – Advanced Control System Design Project
Smart Bin – Advanced Control System Design ProjectSmart Bin – Advanced Control System Design Project
Smart Bin – Advanced Control System Design Project
 
Augmented Reality Video Playlist - Computer Vision Project
Augmented Reality Video Playlist - Computer Vision ProjectAugmented Reality Video Playlist - Computer Vision Project
Augmented Reality Video Playlist - Computer Vision Project
 
Balancing Robot Kalman Filter Design – Estimation Theory Project
Balancing Robot Kalman Filter Design – Estimation Theory ProjectBalancing Robot Kalman Filter Design – Estimation Theory Project
Balancing Robot Kalman Filter Design – Estimation Theory Project
 

Recently uploaded

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 

Recently uploaded (20)

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

Exploring Support Vector Regression - Signals and Systems Project

  • 1. Exploring Support Vector Regression for Predictive Data Analysis Daniel Kuntz∗, Surya Chandra† and Jon Pritchard‡ Department of Electrical Engineering and Computer Science Colorado School of Mines: Golden, CO Email: ∗dkuntz@mines.edu, †schandra@mines.edu, ‡jpritcha@mines.edu Abstract—The purpose of this paper is to demonstrate the use of Support Vector Regression (SVR) in the context of predicting the hourly use of bikes in Washington D.C.’s bike share program. An abridged derivation of the SVR scheme is given along with an explanation of kernel functions which are vital to the performance of this method. Bike share data is provided as part of a Kaggle TM competition, meaning we get a firm qualitative benchmark for it’s predictive performance against an array of other competitors, also, we show a direct comparison between SVR and a naive linear regression to further intuitive comprehension of the concepts. Our results indicate good performance vs. linear regression and competitive performance in the overall contest. I. INTRODUCTION Advances in predictive modelling are providing new in- sights into critical data for businesses, governments and indi- viduals. One of the most popular of these methods is SVR. It is an efficient, highly configurable, and mathematically sound solution for gaining this insight. In short, it is designed for the task of fitting a non-linear function to approximate an outcome (e.g. number of bikes rented) based on data that this outcome is perceived to be a function of (e.g. time, season, weather, ...) which are usually called ”explanatory variables”. As a test case, our team will compete in the Washington D.C. Bike Share competition hosted by Kaggle. In this contest, it is of interest to the city to determine when and why people are using their bike share program. This information will allow them to properly plan for future growth a well as provide and analysis of customer use patterns. Our team has decided to use SVR modelling to compete in the competition and our approach is documented herein. A. How The Competition Works Kaggle provides two sets of data. One set, generally referred to as the ”training” set provides a set of explanatory variables along with the outcome for each. For this particular competition the given variables and outcomes are provided in TABLES I and II respectively. This set of data is used to train a prediction algorithm. The second set of data, referred to as the ”test” set provides explanatory variables but not their outcome. This outcome is hidden from contestants who’s job it is to predict these outcomes. Once a prediction is made, it’s accuracy is scored with equation (1). TABLE I. EXPLANATORY VARIABLES [1] Name Description datetime Date and Time (YYYY-MM-DD HH:MM:SS) season Season (1 = spring, 2 = summer, 3 = fall, 4 = winter) holiday Whether the day is considered a holiday weather 1: Clear, Few clouds, Partly cloudy, Partly cloudy; 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog temp Temperature in Celsius atemp ”Feels like” temperature in Celsius humidity Relative humidity windspeed Wind speed TABLE II. OUTCOMES Name Description casual Number of non-registered user rentals initiated registered Number of registered user rentals initiated count Number of total rentals = 1 n n i=1 (log (pi + 1) − log (ai + 1)) 2 (1) Where: : Root Mean Squared Logrithmic Error (RMSLE) n : Number of explanatory vectors in the test data set pi : Prediction for vector i ai : Actual value for vector i B. Discussion of Parameters Some of the parameters in TABLE I had to be modified and some care had to be taken that redundant and non-important variables were not used. The ”datetime” variable was divided into 4 different variables: year, day, month and hour. This allowed our model to take into account variations by the hour and month as one would intuitively expect a strongly correlated cyclical pattern associated with these variables. Also, variables such as ”season”, which is entirely dependent on the month and day were generally taken out of the model so as to not ”over-train” the model. Some experimentation was needed to best determine which variables affected the outcome the most strongly, one way to do this that we will not discuss is by using Principal Component Analysis (PCA). When results are discussed, we will provide a full list of explanatory variables used in the model.
  • 2. II. LINEAR REGRESSION To demonstrate some of the underlying concepts of SVR we take as inspiration a very simple linear regression for creating a predictive model. In this we will assume that each explanatory variable used has a weight and the sum of each variable times it weight plus an offset is a good model of the system. A. Problem Formulation We assume that the predictive function that we would like to find takes the same form as (2) f(xi) = w0 + m j=1 wjxi,j (2) Where: xi : The ith explanatory vector w0 : An offset weight wj : weights for each component of xi m : The number of variables in xi So, in this case if we determine the weights w = [w0 · · · wm] T we have found a predictive model. Since we have m + 1 weights, we could use a system of equations of the form (2) to find them. So for the training set of data we have system of equations (3).     1 x1,1 x1,2 · · · x1,m 1 x2,1 x2,2 · · · x2,m ... ... ... 1 xn,1 xn,2 · · · xn,m         w0 w1 ... wm     =     y1 y2 ... yn     (3) Where: n : The the number of training eplanatory vectors yi : The outcome of for each explanatory vector i Using matrix notation we rewrite (2) as (4). We recognize this as a standard over-defined minimization problem, The solution of which is given by (5). (X+ denotes the pseudo inverse of the data matrix X) Xw = y (4) w = X+ y (5) B. Results Using this naive linear regression method, with the ex- planatory vectors: ”year”, ”month”, ”day”, ”hour”, ”holiday”, ”workingday”, ”weather”, ”temp”, ”humidity” and ”wind- speed” we managed to achieve the competition results in TABLE III. TABLE III. KAGGLE SCORE FOR LINEAR REGRESSION PREDICTION Score (RMSLE) Rank (of approx. 1500) 1.30542 1275 C. Analysis of Linear Regression Results As we would suspect, the linear regression did not perform very well. The reason for this is that many variables do not affect the outcome in a linear way. The variable ”weather” may reduce the number of riders proportionally to how how bad the weather is, and as such, is a good candidate for linear regression but what about a variable like ”hour”? You would intuitively think that this variable would create spikes in the outcome for hours that represent rush hour. Fig. 1 shows that this is true as well as showing the average bike rentals for each hour over the whole data set compared to a best fit line. We can easily see that the linear regression is not really a good representation of this variable. Hence, we need a non-linear representation of the data. Fig. 1. Linear Regression Fit to Hourly Average III. HIGHER DIMENSIONAL MAPPING AND KERNEL FUNCTIONS Since linear regression fails to accurately model the data, it is obvious that we need to use a non-linear model to achieve a better approximation. However, non-linear models are much more complex than linear models. One strategy that could work would be to map the low dimensional data into a higher dimen- sional space where it is linear. In a simplistic manner, linear regression preforms this kind of mapping by adding in an offset term and performing the mapping: Φ (x) : Rm → Rm+1 . This idea can be expanded to include higher order terms as well, consider the mapping Φ such that: Φ (x) : R2 → R6 Φ ([x1 x2]) = 1 x1 x2 x2 1 x2 2 x1x2 (6) The problem with these kinds of mappings is that the linear regression model becomes extremely inefficient. This is due to the fact that we could be mapping to a space with a HUGE number of dimensions. As an example, for an m-dimensional vector, a simple quadratic mapping the transformed vector will be in an O m2 dimensional space. This can become computationally expensive very quickly.
  • 3. A. Definition A solution to the problem of mapping to a higher dimen- sional space is the use of kernel functions. Kernel functions al- low us to find inner the products of high dimensional vectors in a lower dimensional space for a very specific set of functions. This means that if we can formulate our minimization problem to depend only on these inner products. We can then use these kernel functions to drastically improve the performance of our algorithm. The definition of a kernel function is simply any function that satisfies the following: K([x1 x2]) = Φ (x1) , Φ (x2) (7) Where: K : The Kernel Fucntion Φ : A mapping to a higher dimensional space B. Example Kernel Function In order to illustrated the relationship between the mapping functions and the kernels function, an example a simple kernel function is derived below. Given the column vectors: x = [x1 x2] T z = [z1 z2] T Φ(x) = x2 1 √ 2x1x2 x2 2 T If follows that: Φ(x) T Φ(z) = x2 1 √ 2x1x2 x2 2 z2 1 √ 2z1z2 z2 2 T = x2 1 z 2 1 + x2 2 z 2 2 + 2x1x2z1z2 = (x1z1 + x2z2) 2 = xT z 2 = K(x, z) C. Other Types of Kernel Functions Two of the most commonly used used kernel function are the Gaussian Radial Basis Function (RBF) (8) and Polynomial function (9). K (x2, x2) = exp − x1 − x2 2 2σ2 , σ ∈ R (8) K (x2, x2) = ( x1, x2 + c) p , c ≥ 0, p ∈ N (9) D. Discussion We have defined kernel functions and showed how they can be used to calculate high dimensional inner products using lower dimensional vectors. With this knowledge we can move forward to define the formulation of support vector regression, using kernel functions to simplify calculations. IV. DERIVATION OF THE SUPPORT VECTOR REGRESSION METHOD A. Primal Formulation In order to use the efficient properties of kernel functions we now need a regression formulation that can be expressed in terms of the inner product of explanatory vectors xi. To this end we consider the minimization problem (10) Minimize: 1 2 w 2 2 + C n i=1 (ζi + ζ∗ i ) (10) Subject to: yi − w, xi − w0 ≤ + ζi (11) w, xi + w0 − yi ≤ + ζ∗ i (12) ζi ≥ 0 ζ∗ i ≥ 0 In this formulation ζi and ζ ∗ i are the slack variables, they allow the data to vary outside of the band ± . However, if they do go outside this band, they are penalize the minimization term. C > 0 is the amount for which deviations larger than can are penalized. As shown in Fig 2, only the points outside the region contribute to cost as we linearly penalize deviations. These penalized data vectors are the support vectors. Fig. 2. Visuaization of Support Vectors [2] B. Lagrangian Minimization The minimization problem described by (10) has the La- grangian representation (13)
  • 4. L := 1 2 w 2 2 + C n i=1 (ζi + ζ∗ i ) − n i=1 (ηiζi + η∗ i ζ∗ i ) − n i=1 αi ( + ζi + w0 + w, xi − yi) − n i=1 α∗ i ( + ζ∗ i − w0 − w, xi + yi) (13) Taking each derivative of L with respect to the variables {w, w0, ζi, ζ∗ i } yields the following expressions: ∂L ∂w = w − n i=1 (αi − α∗ i ) xi (14) ∂L ∂w0 = −w0 n i=1 (αi − α∗ i ) (15) ∂L ∂ζi = C − (ηi + αi) (16) ∂L ∂ζ∗ i = C − (η∗ i + α∗ i ) (17) Setting each derivative equal to 0 imparts the following expressions: (14) = 0 =⇒ w = n i=1 (αi − α∗ i ) xi (18) (15) = 0 =⇒ n i=1 (αi − α∗ i ) = 0 (19) (16) = 0 =⇒ ηi = C − αi (20) (17) = 0 =⇒ η∗ i = C − α∗ i (21) C. Dual Formulation Plugging expressions (18), (19), (20), (21) back into (13) then yields the dual formulation of the minimization problem (10). This formulation is given by (22) Maximize: n i=1 (αi − α∗ i ) yi − n i=1 (αi + α∗ i ) − 1 2 n i,j=1 (αi − α∗ i ) αj − α∗ j xi, xj (22) Subject to: n i=1 (αi − α∗ i ) = 0 αi, α∗ i ∈ [0, C] Notice that the dual formulation has is written in terms of the inner product of xi. This mean that we can use the kernel functions described in SECTION III to reduce the dimensionality of a higher order mapping zi = Φ(xi). This allows us to write (22) as (23) n i=1 (αi − α∗ i ) yi − n i=1 (αi + α∗ i ) − 1 2 n i,j=1 (αi − α∗ i ) αj − α∗ j K (xi, xj) (23) D. Solving for α(∗) Now the only unknowns left are the variables α and α∗ . Solving for these variables is a task that can be accomplished numerically. One such numerical scheme is an interior point algorithm referred to as primal-dual path-following [3]. This technique is described in [4]. It should also be noted that a very nice property of the SVR formulation is that it can be shown to be convex [3] so any numerical technique will converge to only one possible solution. E. Final Solution Once we have solved for α and α∗ all that is left is compute the weights w. This is realized by plugging (18) into (2) to obtain (24) f (xi) = n i,j=1 (αi − α∗ i ) αj − α∗ j K (xi, x2) + w0 (24) Similarly, the offset term can be solved for by plugging (18) into (11) or (12) to obtain (25) and (26): w0 = yi − n j=1 αjK (xj, xi) − for aj ∈ (0, C) (25) w0 = yi + n j=1 α∗ j K (xj, xi) + for a∗ j ∈ (0, C) (26) With: i s.t. 0 < αi < C/n
  • 5. F. Selection of Parameters When selecting parameters C and it helps to have an understanding of how they affect the regression. The primal minimization problem (10) holds some clues as to how these variables affect the outcome. C penalizes the function that is being minimized any time a vector goes outside the error insensitive tube (which is ± ). We can see from Fig. 3 that a small C favors a smoother function while a larger C puts more emphasis on getting as close to every point as possible. Thus, the C parameter is a good way to deal with ”over-fitting” the data. It can be thought of as a gain that we apply to the slack variables. Fig. 3. Effect of the C Parameter As shown by Fig 4 the size of controls how much small errors in the predictive function are ignored. A small will penalize most errors while a larger value will not penalize errors that are close enough. Thus determines the number of support vectors used calculate f(xi) Fig. 4. Effect of the Parameter G. Results Using the parameters in TABLE IV, we achieved the highest Kaggle score for our team. The score is provided in TABLE V. These results show a drastic improvement from the naive Linear Regression method. TABLE IV. SVR PARAMETERS Parameter Value explanatory variables month, hour, weather, workingday kernel Gaussian Radial Basis Function (RBF) 0.1 C 30 TABLE V. KAGGLE SCORE FOR SVR PREDICTION Score (RMSLE) Rank (of approx. 1500) 0.55815 847 H. Analysis of SVR Results We have seen that SVR boosts the predictive power a lot from the baseline linear regression. In order to answer why it does we again show the plot for a model run with just the hour as an explanatory variable (Fig. 5). Now we can see that the non-linear function fit to the data shows a much closer adherence to the average. As time of day is one of the most principal variables, we can easily imagine our fit in higher dimensions conforming much more closely to actual data. Fig. 5. SVR Fit to Hourly Average V. CONCLUSION This project has demonstrated how Support Vector Re- gression can be used to find a functional approximation to a nonlinear dataset. It extends the idea of linear regression to higher dimensional spaces, and artfully utilizes kernel functions in order to reduce the complexity of computing the result. As our results in the Kaggle competition have shown, SVR is a far more robust method of prediction than the naive linear regression.
  • 6. ACKNOWLEDGMENT Special thanks to Professor Gongguo Tang for a very well taught and interesting class this semester. REFERENCES [1] ”Data - Bike Sharing Demand,” https://www.kaggle.com/c/bike-sharing- demand/data, Accessed Dec. 10, 2014 [2] P. S. Yu, et al. ”Support vector regression for real-time flood stage forecasting”. Journal of Hydrology, 328 (3-4), pp. 704-716 (Sep. 2006) [3] A. Smola, B. Sch¨olkopf, ”A Tutorial On Support Vector Regression,” Sep. 30, 2003 [4] R. J. Vanderbei, ”LOQO: An interior point code for quadratic program- ming.” TR SOR-94-15, Statistics and Operations Re- search, Princeton Univ., NJ, 1994.