EE660_Report_YaxinLiu_8448347171

EE-660 Machine Learning Final Project
Gesture Phase Segmentation based on Radial Basis Function(RBF) Kernel Support Vec-
tor Machine and K-Nearest Neighbor Algorithms
Yaxin Liu, yaxinliu@usc.edu
December 6, 2014
1. Abstract
People commonly perform one or several movements or “excursion” with their hands, arms or even their bodies
during a natural conversation or when giving a speech. An excursion, particularly regarding the hands, refers to a
movement from a rest position to some region in space, and then bringing them back to the same or another rest
position. While the hands are away from the rest position, the movement is called a gesture unit. In addition, a
gesture unit can be divided into four phases: preparation, stroke, hold and retraction.
This project aims at evaluating the performance of different classifiers, the one based on Support Vector
Machine(SVM) with RBF Kernel and the other based on K-Nearest Neighbor algorithm, that works on segmenting
the 5 gesture phases stated above. The result based on SVM shows an accuracy of 53.27% on the test set and
the KNN algorithm 92.53%.
2. Problem Statement
As it’s stated above, this problem can be treated as a multi-class classification problem which contains five class-
es that corresponds to the five gesture phases: 1-Rest, 2-Preparation, 3-Stroke, 4-Hold and 5-Retraction. Given a
dataset with the information of movements of three individuals and the labels to each captured frame of move-
ment, our goal is to correctly classify these five phases using the best performed model. To make the result as
accuracy as possible, there’s a lot of work to do with the feature selection and preprocessing procedure.
3. Project Formulation
3.1 Support Vector Machine
SVM is based on performing a nonlinear mapping on input vectors from their original feature space into a high-
dimensional feature space, and optimizing a hyperplane capable of separating data in this high-dimensional fea-
ture space[1]. We use toolbox LibSVM in MatLab to implement this algorithm. The following shows the functions I
used in this toolbox:
- svmtrain(labels, samples, parameter): used on the training set to get the cross validation or the model. The
third attribute in the function contains the parameters on building the model. Here we choose ‘-t 2’ for RBF
Kernel;
- svmpredict(labels, samples, model): apply the selected model on the testing set to get the accuracy of the
prediction.
3.2 K-Nearest Neighbor learning algorithm
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for
classification and regression.[2] In both cases, the input consists of the k closest training examples in the feature
space. In k-NN classification, the output is a class membership. An object is classified by a majority vote of its
neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a posi-
tive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
k-NN algorithm can be implemented by using the function fitcknn() for the model selection and predict() to get the
accuracy on the testing set. 
1

4. Methodology
First of all we need to separate the whole dataset into two parts, the training set and the testing set. Training set is
the set where we can get our classification model and the cross validation accuracy, an important criterion for
model selection.
In this project we need to work on the raw data to do the classification. Regardless of the limitation on the raw
data, we first implement SVM on the original dataset to generate a baseline for the classification precision.
After that, we notice that the only feature we get in the raw dataset the the position of the hands, wrists, head,
spine and the timestamp on each frame of the videos. So it’s necessary to do feature extraction, which consists
of: a pre-processing phase that aims at making the representation invariant to the lateral displacement and dis-
tance of the user in relation to the camera; and a velocity and acceleration extraction phase.
To select features from the expanded feature vector, we implement PCA by using the toolbox PRTools in Mat-
Lab， by calling the function pcam(). Then plot the cross validation accuracy as a function of the number of fea-
tures that is selected, under the circumstances of both algorithms, SVM and k-NN. Choose the optimal models
from both system and test on the testing set to get the final result.
5. Implementation
5.1 Feature Space and Baseline test
There’re 7 subsets of the dataset provided on the website (http://archive.ics.uci.edu/ml/datasets/Gesture+Phase
+Segmentation#). The dataset is organized in 7 raw files, one for each video which compose the dataset. The
name of the file refers to each video: the letter corresponds to the user (A, B, C) and the number corresponds to
the story (1, 2, 3). Number of instances: A1-1747 frames, A2-1264 frames, A3-1834 frames, B1-1073 frames, B3-
1423 frames, C1-1111 frames, C3-1448 frames. Attributes information in raw files:
- 18 numeric attributes (double), a timestamp (integer) and a class attribute (nominal):
• 1~3: Position of left hand (x, y, z coordinates);
• 4~6: Position of right hand (x, y, z coordinates);
• 7~9: Position of head (x, y, z coordinates);
• 10~12: Position of spine (x, y, z coordinates);
• 13~15: Position of left wrist (x, y, z coordinates);
• 16~18: Position of right wrist (x, y, z coordinates);
• 19: Timestamp;
• 20: Phases: 1- Rest, 2- Preparation, 3- Stroke, 4- Hold, 5- Retraction (the labels are changed to be
integers);
Now we combine these 7 files to form one dataset and then randomly pick 1500 samples as the testing set Dtest
and the rest for the training set Dtrain.
We directly use Dtest which has the original feature space for the SVM training procedure, and the cross validation
accuracy results to be 29.84%.
5.2 Preprocessing and Feature Extraction
Due to the fact that each person may stand at a different position in front of the camera, we need to do some pre-
processing on the feature. This includes subtracting the hands and wrists coordinate from the spine coordinates in
each frame and getting it divided by the distance between the points of the head and spine.
2

p’hand = (phand - pspine)/dis(phead - pspine)
p’wrist = (pwrist - pspine)/dis(phead - pspine)
Then, normalize each 3-D position coordinates.
The second thing we did on the feature space is adding features of velocity and the acceleration of the hands and
wrists. For the velocity, the estimation is given by:
where t is the timestamp of frame i, d is the displacement in frames, and is the Euclidean distance between
normalized 3D position of the interest point at frame i and at frame i-d . Here we use a window with size 3, that is
d=2 in the equation above. The acceleration is estimated by:
The proposed approach considers a windowed strategy [3], using information from past and/or future frames to
represent each frame of interest due to the intrinsic temporal aspects of gesture phases segmentation problem. At
last, we added two more features which corresponds to the scalar of the velocity and the acceleration. Therefore,
we have 51 features in total:
• 19~21: Vectorial velocity of left hand (x, y, z coordinates);
• 22~24: Vectorial velocity of right hand (x, y, z coordinates);
• 25~27: Vectorial velocity of left wrist (x, y, z coordinates);
• 28~30: Vectorial velocity of right wrist (x, y, z coordinates);
• 31~33: Vectorial acceleration of left hand (x, y, z coordinates);
• 34~36: Vectorial acceleration of right hand (x, y, z coordinates);
• 37~39: Vectorial acceleration of left wrist (x, y, z coordinates);
• 40~42: Vectorial acceleration of right wrist (x, y, z coordinates);
• 43: Scalar velocity of left hand
• 44: Scalar velocity of right hand
• 45: Scalar velocity of left wrist
• 46: Scalar velocity of right wrist
• 47: Scalar acceleration of left hand
• 48: Scalar acceleration of right hand
• 49: Scalar acceleration of left wrist
• 50: Scalar acceleration of right wrist
• 51: Normalized timestamp
5.2 Training Process and Model Selection
After the preprocessing procedure, we have 8380 samples for training and 1500 samples for testing. To select the
appropriate features in the model, we implement PCA on both SVM and k-NN algorithm and generate the rela-
tionship between the cross validation accuracy and the dimensionality of the feature vector. When the optimal di-
mensionality is determined, we use the chosen feature vector on selecting the best parameters for the models in
each classiﬁer.
- For SVM, the key parameter for the function svmtrain() is the cost value for the model. So we plot a diagram
of the changes in the cross validation accuracy with different cost.
3
vi,i−d =
Δri,i−d
ti − ti−d
Δri,i−d
ai,i−1 =
vi − vi−1
ti − ti−1

- For k-NN, the number of neighbored samples and the types of the distances between samples are key pa-
rameters in the model. So we plot a diagram of the performance of the model with different number of neigh-
bors.
We choose the model that has the highest cross validation accuracy in part 5.2 for each algorithms and use this
model on the testing set to get the final classification results.
5.3 Training Results
The graphs showed above shows that the optimal dimension of the SVM model happens at d=5, with an accuracy
of 53.77% and it goes up as the cost value increases. However, the cost value cannot be too large or it might
cause under fitting with too much regularization. The optimal SVM model should has a 5-dimensional feature
space and the cost of 101.
The two graphs above corresponds to the perform of the parameters of the model generated by k-NN algorithm.
The graph on the left shows that the model has the highest precision of 92.17%, with 1 neighbor, and the one on
the right shows that the optimal dimension appears at d=45 with an accuracy of 92.91%. However, it shows that
different types of the computation between samples makes no big difference in the performance of the classifier.
So we just use the Euclidian distance here in this model.
It turns out that k-NN algorithm won over the RBF Kernel SVM in this instance.
6. Final Results and Interpretation
4

Finally, we apply the two selected models from each classifier on the testing set. Etest with the RBF Kernel SVM
classifier is 46.73%, and that with the one generated by k-NN algorithm is 8.73%. Therefore, the classifier gener-
ated by the k-NN algorithm works better on this gesture phase segmentation problem.
7. Summary and Conclusions
From all the work that has been done so far, it seems that k-NN algorithm wins over the RBF Kernel SVM on this
specific problem. However, the toolbox, LibSVM, we used could has its limitations on selecting models in limited
parameters to choose. Also, the window size implemented while computing the velocity and the acceleration of
the hands and wrists can make difference in the result. There’re researches on the same problem that has a pre-
cision far better than 53.77% with an SVM model. According to the reference paper [3], the authors achieved a
precision of 84.3% in their approach working with SVM.
8. Reference
[1] V. Vapnik. The nature of statistical learning theory. Springer-Verlag, 1995.
[2] Wiki: http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
[3] Madeo, R. C. B. ; Lima, C. A. M. ; PERES, S. M. . Gesture Unit Segmentation using Support Vector Ma-
chines: Segmenting Gestures from Rest Positions. In: Symposium on Applied Computing (SAC), 2013,
Coimbra. Proceedings of the 28th Annual ACM Symposium on Applied Computing (SAC), 2013a. p. 46-52.
[4] Wagner, P. K. ; PERES, S. M. ; Madeo, R. C. B. ; Lima, C. A. M. ; Freitas, F. A. . Gesture Unit Segmentation
Using Spatial-Temporal Information and Machine Learning. In: 27th Florida Artificial Intelligence Research
Society Conference (FLAIRS), 2014, Pensacola Beach. Proceedings of the 27th Florida Artificial Intelligence
Research Society Conference (FLAIRS). Palo Alto : The AAAI Press, 2014. p. 101-106.
5

Appendix.
MatLab Code
% —————————————————————————————————————————————————
% - generate_dataset.m
% - separate the dataset into training set and testing set
% - both preprocessed and raw dataset are respectively created
% ————————————————————————————————————————————————-
clear all;
close all;
load dataset;
%----------- pre-processing on trainingset ------------------
d = 2;
k = 1;
[xa1_norm, ya1_pre] = prepocess( xa1, ya1, d, k );
[xb1_norm, yb1_pre] = prepocess( xb1, yb1, d, k );
[xb3_norm, yb3_pre] = prepocess( xb3, yb3, d, k );
[xc1_norm, yc1_pre] = prepocess( xc1, yc1, d, k );
[xc3_norm, yc3_pre] = prepocess( xc3, yc3, d, k );
x_data = [xa1_norm;xa2_norm;xa3_norm;xb1_norm;xb3_norm;xc1_norm;xc3_norm;];
x_raw = [xa1(d+k+1:length(xa1(:,1)),:);
xa2(d+k+1:length(xa2(:,1)),:);
xa3(d+k+1:length(xa3(:,1)),:);
xb1(d+k+1:length(xb1(:,1)),:);
xb3(d+k+1:length(xb3(:,1)),:);
xc1(d+k+1:length(xc1(:,1)),:);
xc3(d+k+1:length(xc3(:,1)),:)];
y_data = [ya1_pre;ya2_pre;ya3_pre;yb1_pre;yb3_pre;yc1_pre;yc3_pre];
y_raw = [ya1(d+k+1:length(xa1(:,1)),:);
ya2(d+k+1:length(xa2(:,1)),:);
ya3(d+k+1:length(xa3(:,1)),:);
yb1(d+k+1:length(xb1(:,1)),:);
yb3(d+k+1:length(xb3(:,1)),:);
yc1(d+k+1:length(xc1(:,1)),:);
yc3(d+k+1:length(xc3(:,1)),:)];
Num_test = 1500;
Num_data = length(x_data(:,1));
rand_idx = randperm(Num_data);
idx_test = sort(rand_idx(1,1:Num_test));
idx_train = sort(rand_idx(1,Num_test+1:Num_data));
x_train = x_data(idx_train',:);
y_train = y_data(idx_train',:);
x_test = x_data(idx_test',:);
y_test = y_data(idx_test',:);
xb_train = x_raw(idx_train',:);
yb_train = y_raw(idx_train',:);
save('rawdata.mat','xb_train','yb_train');
save(‘/Users/yaxinliu/Documents/MATLAB/EE660/project/
dataset_p.mat','x_train','y_train','x_test','y_test');
% —————————————————————————————————-
% - baseline.m
% - training on the raw data
% —————————————————————————————————-
clear all;
close all;
load rawdata;
addpath('libsvm-3.17/matlab');
6

%-------------- baseline test ----------------
accuracy = svmtrain(yb_train, xb_train, '-t 2 -v 5 -h 0 -q’);
% ——————————————————————————————————————————————————————————-
% - RBFKernelSVM.m
% - SVM classifier selection and test with the optimal model
% ——————————————————————————————————————————————————————————-
clear all;
close all;
load dataset_p;
addpath('libsvm-3.17/matlab');
addpath('prtools/prtools');
Fnum = length(x_train(1,:));
%------ select optimal parameters for the model -------------
ACC_PCA = zeros(1,Fnum);
for D = 1:2:Fnum
[W, FRAC] = pcam(x_train, D);
ACC_PCA(D) = svmtrain(y_train, x_train*W, '-t 2 -v 5 -h 0 -q -c 100');
end
[MaxVal_PCA, MaxIdx_PCA] = max(ACC_PCA(1:2:Fnum));
MaxIdx_PCA = (MaxIdx_PCA-1)*2 + 1;
%------ Plot -----------
figure;
plot([1:2:Fnum],ACC_PCA(1:2:Fnum),'.-');
title('RBF Kernel(SVM) Performance vs. Dimension');
ylabel('Accuracy');
xlabel('Dimension');
grid on;
ACC_C = zeros(1,6);
C = 1:20:101;
[W, FRAC] = pcam(x_train, MaxIdx_PCA);
for i = 1:6
parameter = sprintf('-t 2 -v 5 -h 0 -q -c %d', C(i));
ACC_C(i) = svmtrain(y_train, x_train*W,parameter);
end
[MaxVal_C, MaxIdx_C] = max(ACC_C);
MaxIdx_C = (MaxIdx_C-1)*20+1;
figure;
plot(C,ACC_C,'.-');
title('RBF Kernel(SVM) Performance vs. Cost (with Opt. Dim)');
ylabel('Accuracy');
xlabel('Cost');
grid on;
parameter = sprintf('-t 2 -h 0 -q -c %d ', MaxIdx_C);
SVMoptModel = svmtrain(y_train, x_train*W,parameter);
[predicted_label, accuracy, prob_estimates] = svmpredict(y_test, x_test*W, SVMoptModel, ‘-
q');
% ——————————————————————————————————————————————————————————————————————
% - KNN.m
% - classifier selection with KNN algo. and test with the optimal model
% ——————————————————————————————————————————————————————————————————————
clear all;
close all;
load dataset_p;
7

addpath('prtools/prtools');
%-------------- KNN classifier ------------------------------
KNNmodel=fitcknn(x_train,y_train);
Acc = zeros(1,10);
for D = 1:10
KNNmodel.NumNeighbors = D;
cvmdl = crossval(KNNmodel,'kfold',5);
Acc(D) = 1-kfoldLoss(cvmdl);
end
%------ Plot -----------
figure;
plot([1:10],Acc,'.-');
title('k-NN Performance vs. Num of Neighbors');
ylabel('Accuracy');
grid on;
[MaxVal_K, MaxIdx_K] = max(Acc);
KNNmodel.NumNeighbors = MaxIdx_K;
Fnum = length(x_train(1,:));
ACC_PCA = zeros(1,Fnum);
for D = 1:Fnum
[W, FRAC] = pcam(x_train, D);
KNNmodel=fitcknn(x_train*W,y_train,'NumNeighbors',MaxIdx_K);
cvmdl = crossval(KNNmodel,'kfold',5);
ACC_PCA(D) = 1-kfoldLoss(cvmdl);
end
[MaxVal_PCA, MaxIdx_PCA] = max(ACC_PCA);
%------ Plot -----------
figure;
plot([1:3:Fnum],ACC_PCA(1:3:Fnum),'.-');
title('k-NN Performance vs. Dimension');
ylabel('Accuracy');
grid on;
%------------- Compute error ---------------------------------
y_predict = predict(KNNmodel,x_test*W);
Error = y_predict-y_test;
ErrNum = length(find(Error~=0));
Acc_test = (length(Error)-ErrNum)/length(Error);
% —————————————————————————————————————————————————————————————-
% - normalize.m
% - the func. works on normalizing the 3-D position coordinates
% —————————————————————————————————————————————————————————————-
function Matrix = normalize( train )
for i = 1: length(train(:,1))
hs_dist(i) = normest(train(i,7:9)-train(i,10:12));
train(i,1:3) = (train(i,1:3) - train(i,7:9))/hs_dist(i);
8

end
for i = 1:length(train(:,1))
for j = 1:3:length(train(1,:))-1
scale(i,j) = normest(train(i,j:(j+2)));
Matrix(i,j:(j+2)) = train(i,j:(j+2))/scale(i,j);
end
end
end
% —————————————————————————————————————————————————————————————-
% - preprocess.m
% - the func. works on feature extraction
% —————————————————————————————————————————————————————————————-
function [output_x, output_y] = prepocess( sample_x, sample_y, d, k )
norm_x = normalize(sample_x);
height = length(norm_x(:,1));
velocity = zeros(height,12);
accele = zeros(height,12);
scalar_v = zeros(height,4);
scalar_a = zeros(height,4);
for i = d+1:height
velocity(i,:) = (norm_x(i,[1:3,4:6,13:15,16:18])-norm_x(i-d,[1:3,4:6,13:15,16:18]))/(sam-
ple_x(i,19)-sample_x(i-d,19));
scalar_v(i,:) = [normest(velocity(i,1:3)),normest(velocity(i,4:6)),normest(velocity(i,
7:9)),normest(velocity(i,10:12))];
if(i>d+k)
accele(i,:) = (velocity(i,:)-velocity(i-k,:))/(sample_x(i,19)-sample_x(i-1,19));
scalar_a(i,:) = [normest(accele(i,1:3)),normest(accele(i,4:6)),normest(accele(i,
7:9)),normest(accele(i,10:12))];
end
end
time = (sample_x(:,19)-mean(sample_x(:,19)))/std(sample_x(:,19));
output_x = [norm_x,velocity,accele,scalar_v,scalar_a,time];
output_x = output_x(d+k+1:height,:);
% output_x = output_x(:,[1:6,13:51]);
output_y = sample_y(d+k+1:height,:);
end
9

EE660_Report_YaxinLiu_8448347171

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie EE660_Report_YaxinLiu_8448347171

Ähnlich wie EE660_Report_YaxinLiu_8448347171 (20)

EE660_Report_YaxinLiu_8448347171