1) The document describes designing a minimum distance to class mean classifier to classify unlabeled sample vectors given labeled samples clustered into two classes.
2) It involves plotting the sample points, calculating class means, classifying test points using minimum distance to class mean, and drawing the decision boundary.
3) Accuracy is reported to decrease from 75% to 65% to 55% as the number of sample points increases from 10 to 20 to 30, showing that classification accuracy decreases with more complex datasets, though training samples remain accurately classified.
complete construction, environmental and economics information of biomass com...
Pattern Recognition - Designing a minimum distance class mean classifier
1. Ahsanullah
University of Science
& Technology
Designing a minimum
distance to class mean
classifier
Name : Mufakkharul Islam Nayem
ID : 12.01.04.150
Year & Semester : 4th
2nd
Section : C (C-2)
Assignment 1
Course Title : Pattern Recognition LAB
Course ID : CSE 4214
Date of Submission - December 26, 2015
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 1
2. Introduction
“Minimum Distance to Class Mean Classifier” is used to classify unclassified sample
vectors where the vectors clustered in more than one classes are given. For example, in a
dataset containing n sample vectors of dimension d some given sample vectors are already
clustered into classes and some are not. We can classify the unclassified sample vectors with
Class Mean Classifier.
Task
Given the following two-class set of prototypes:
Dataset 1:
w1 = {(2 -1),(3 0),(3 2),(-1-3),(4 1),(-2 -4),(0 -1),(-2 2),(-1 -4),(-4 1)} [input file class1_dataset1.txt]
w2 = {(0 0),(-2 1),(-1 -1),(-4 4),(-4 1),(2 6),(2 2),(3 1),(3 -1),(-1 -3)} [input file class1_dataset1.txt]
Dataset 2:
w1 = {(2 -1),(3 0),(3 2),(-1-3),(4 1),(-2 -4),(0 -1),(-2 2),(-1 -4),(-4 1)} [input file class1_dataset2.txt]
w2 = {(0 0),(-2 1),(-1 -1),(-4 4),(-4 1),(2 6),(2 2),(3 1),(3 -1),(-1 -3)} [input file class2_dataset2.txt]
Dataset 3:
w1 = {(2 -1),(3 0),(3 2),(-1-3),(4 1),(-2 -4),(0 -1),(-2 2),(-1 -4),(-4 1)} [input file class1_dataset3.txt]
w2 = {(0 0),(-2 1),(-1 -1),(-4 4),(-4 1),(2 6),(2 2),(3 1),(3 -1),(-1 -3)} [input file class1_dataset3.txt]
1. Plot all sample points from both classes, but samples from the same class should have the
same color and marker.
2. Using a minimum distance classifier with respect to ‘class mean’, classify the following
points by plotting them with the designated class-color but different marker.
X1 = (5 2)
X2 = (2 -4)
X3 = (-1 8)
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 2
3. X4 = (-2 -3)
X5 = (-2 -12)
X6 = (-10 6) ; [input file testsample.txt]
Linear Discriminant Function:
!
3. Draw the decision boundary between the two-classes.
Solution
• Plotting two-class set of prototypes
Two classes in each dataset were given. Firstly I plotted all the points. Samples from
the same class were plotted using the same color and marker so that different classes can be
distinguished easily. Here blue stars ‘*’ represent class 1 and red stars ‘*’ represent class 2.
• Calculating the distance from the mean of each class using Linear Discriminant
Function
The mean points of the two classes are calculated & plotted with same class color with
the ‘+’ marker (‘+’ for class 1 & ‘+’ for class 2).
Now for each of the point (X1…X6), I calculated the distance from the mean of each
class using Linear Discriminant Function:
!
Further derivation of the linear discriminant function:
I used the discriminant function in the form, !
so, for the two classes (w1, w2) the function becomes,
!
gi (x) = XT
Yi − 1
2 Yi
T
Yi
gi (x) = XT
Yi − 1
2 Yi
T
Yi
gi (x) = wi
T
x − 1
2 wi
T
wi
g1(x) = w1
T
x − 1
2 w1
T
w1
g2 (x) = w2
T
x − 1
2 w2
T
w2
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 3
4. The points were assigned to the class with minimum distance from the respected class
mean. I used the same class color but different marker while plotting those points. Here the
circles ‘o’ are the given test samples (‘o’ for class 1 & ‘o’ for class 2).
• Drawing the decision boundary between the two-classes
Decision boundary between the two classes is to be drawn now. For finding out this
boundary I considered all possible points within the range considering those points as over
the boundary line whose distance from both the classes is same. However, I used minimum &
maximum values of x1 to calculate x2 using the following function for the approximate
boundary between the two classes:
!
From this formula I have derived the linear equation to find the decision boundary co-
ordinates for plotting them in the figure. The equation that I derived is,
! ; [here ! ]
Here ‘.-’ line created with orange color (.-.-.-.-) represents the linear decision boundary
between the two classes.
g(x) = g1(x)− g2 (x)
= w1
T
x − 1
2 w1
T
w1 − w2
T
x + 1
2 w2
T
w2
= (w1
T
− w2
T
)x − 1
2 (w1
T
w1 − w2
T
w2 )
(w1
T
− w2
T
)x − 1
2 (w1
T
w1 − w2
T
w2 ) = 0
⇒ COEF1 COEF2( ) x1
x2
⎛
⎝
⎜
⎞
⎠
⎟ + CONSTANT = 0
⇒ COEF1 × x1 + COEF1 × x2 + CONSTANT = 0
⇒ x2 =
COEF1 × x1 + CONSTANT
−COEF2
CONSTANT = − 1
2 (w1
T
w1 − w2
T
w2 )
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 4
5. Accuracy Discussion
For the 3 datasets figures for each of them is given below,
For dataset 1,
• 2 samples (out of 10) from class 1 & 3 samples (out of 10) from class 2 are misclassified with
respect to decision boundary.
• All the training samples are classified correctly.
• Accuracy is 75%
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 5
6. For dataset 2,
• 6 samples (out of 20) from class 1 & 8 samples (out of 20) from class 2 are misclassified with
respect to decision boundary.
• All the training samples are classified correctly.
• Accuracy is 65%
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 6
7. For dataset 3,
• 15 samples (out of 30) from class 1 & 12 samples (out of 30) from class 2 are misclassified
with respect to decision boundary.
• All the training samples are classified correctly.
• Accuracy is 55%
So, from the observation I can say that as sample data increases the classification rate &
accuracy of the decision boundary decreases. But training sample classification is more
accurate.
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 7
8. Matlab Code
function CMCfinal()
clear
clc
%Given Sample Points
w1=zeros(2,10);
myfile = fopen(‘class1_dataset1.txt','r'); %for dataset2 use
class1_dataset3.txt &
dataset3 use
class1_dataset3.txt
w1 = fscanf(myfile,'%f %f',size(w1));
w1=w1’;
w2=zeros(2,10);
myfile = fopen(‘class2_dataset2.txt','r'); %for dataset2 use
class2_dataset3.txt &
dataset3 use
class2_dataset3.txt
w2 = fscanf(myfile,'%f %f',size(w2));
w2=w2';
%Plotting the Sample Points
figure
title('Minimum Distance to Class Mean Classifier');
hold on
L1=plot(w1(:,1),w1(:,2),'*','MarkerEdgeColor','b');
hold on;
L2=plot(w2(:,1),w2(:,2),'*','MarkerEdgeColor','r');
xlabel('X1');
ylabel('X2');
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 8
9. %Calculating mean of two classes
y1 = [ mean(w1(:,1)) mean(w1(:,2))];
y2 = [ mean(w2(:,1)) mean(w2(:,2))];
%Plotting mean of two classes
hold on
L3=plot(y1(1),y1(2),'+','MarkerEdgeColor','b');
hold on
L4=plot(y2(1),y2(2),'+','MarkerEdgeColor','r');
%Points for testing
x=zeros(2,6);
myfile = fopen('testsample.txt','r');
x = fscanf(myfile,'%f %f',size(x));
x=x';
% For n number of test samples, calculating Linear Discriminant
Function
for n = 1:length(x)
g1=x(n,:)*y1'-.5*(y1*y1');
g2=x(n,:)*y2’-.5*(y2*y2');
if g1>g2
hold on
L5=plot(x(n,1),x(n,2),'o','MarkerEdgeColor','b');
else
hold on
L6=plot(x(n,1),x(n,2),'o','MarkerEdgeColor','r');
end
end
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 9
10. % Calculate decision boundary between two classes
minw=min(min(w1(:)),min(w2(:)));
minall=min(minw,min(x(:)));
maxw=max(max(w1(:)),max(w2(:)));
maxall=max(maxw,max(x(:)));
DBx1 = minall:0.1:maxall;
coefficient=(y1-y2);
constant=-0.5*det((y1'*y1-y2'*y2));
for i=1:length(DBx1)
DBx2(i,:) = (coefficient(1,1)*DBx1(:,i)+constant)/
(-coefficient(1,2));
end
%{
for i=1:length(DBx)
DBx2(i,1) = (3*DBx1(1,i)+7.0312)/(1.5);
end
%}
DB = [DBx1' DBx2];
hold on
L7=plot(DB(:,1),DB(:,2),'.-');
legend([L1 L2 L3 L4 L5 L6 L7],{'Class 1','Class 2','Class 1
Mean','Class 2 Mean','Class 1 Test','Class 2
Test','Decision Boundary'},'location','northoutside');
hold off;
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 10
11. Conclusion
I tried to implement the algorithm in a simple way. No complex calculations were
made. The weakness of the algorithm is its misclassification rate is relatively higher because
the boundary between the two classes is linear.
DESIGNING A MINIMUM DISTANCE TO CLASS MEAN CLASSIFIER 11