PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbor Classier

PSOk-NN: A Particle Swarm Optimization
Approach to Optimize k-Nearest Neighbor
Classiﬁer
Alaa Tharwat1,2,5, Aboul Ella Hassanien3,4,5
1Dept. of Electricity- Faculty of Engineering- Suez Canal University, Ismaalia, Egypt.
2Faculty of Engineering, Ain Shams University, Cairo, Egypt.
3Faculty of Computers Information, Cairo University, Cairo, Egypt.
4Faculty of Computers and Information, Beni Suef University - Egypt.
5Scientiﬁc Research Group in Egypt (SRGE) http://www.egyptscience.net.
Swarm Work Shop - Nov. 7, 2015
Alaa Tharwat1,2,5
, Aboul Ella Hassanien3,4,5
Swarm Work Shop - Nov. 7, 2015 1 /
20

Agenda
Introduction
Theoretical Background.
Proposed Model.
Experimental Results.
Conclusions and Future Work
Alaa Tharwat1,2,5
20

Introduction
In machine learning field, there are two main learning approaches,
namely, supervised and unsupervised learning approaches.
There are two main techniques of supervised learning, namely,
regression and classification.
In the unsupervised approach, the targets or responses of the input
data are not required to build the model.
There are many types of classifiers, but k-Nearest Neighbour (k-NN)
classifier is one of the oldest and simplest classifier.
Alaa Tharwat1,2,5
20

Theoretical Background k-Nearest Neighbour (k-NN) Classifier
k-Nearest Neighbour (k-NN) is one of the most common and simple
methods for pattern classification.
In k-NN classifier, an unknown pattern is distinguished or classified
based on the similarity to the known samples (i.e. labelled or training
samples) by computing the distances from the unknown sample to all
labelled samples and select the k-nearest samples as the basis for
classification.
The unknown sample is assigned to the class containing the most
samples among the k-nearest samples (i.e. voting), thus, the k
parameter must be odd.
Alaa Tharwat1,2,5
20

Theoretical Background Particle Swarm Optimization (PSO)
The main objective of the PSO algorithm is to search in the search
space for the positions which are close to the global minimum or
maximum solution.
In PSO algorithm, a number of particles, agents, or elements which
represent the solutions are randomly placed in the search space. The
number of particles is determined by a user.
The current location or position of each particle is used to calculate
the objective or ﬁtness function at that location.
Each particle has three values, namely, position (xi ∈ Rn), velocity
(vi), the previous best positions (pi), and (G) which represents the
position of the best ﬁtness value achieved.
Alaa Tharwat1,2,5
20

The velocity of each particle is adjusted in each iteration as shown in
Equation (1).
The movement of any particle is then calculated by adding the
velocity and the current position of that particle as in Equation (2).
vi
(t+1) = Current Motion + Particle Memory Inﬂunce + Swarm Inﬂunce
vi
(t+1) = wvi
(t) + C1r1(pi
t − xi
(t)) + C2r2(G − xi
(t))
(1)
xi
(t+1) = xi
(t) + vi
(t+1) (2)
where w represents the inertia weight, C1 is the cognition learning factor,
C2 is the social learning factors, r1, r2 are the uniformly generated
random numbers in the range of [0 , 1].
Alaa Tharwat1,2,5
20

x(t)
i
x(t+1)
i
x(t)
j
x(t+1)
j
G
P(t)
i
P(t)
j
v(t)
i
v(t)
j
v(t+1)
i
v(t+1) j
vp
i
vp
j
vG
i
vG
j
Particle 1 (Current Position)
Particle 1 (Next Position)
Particle 2 (Current Position)
Particle 2 (Next Position)
Original Velocity
Velocity to Pbest
Velocity to G
Resultant Velocity
(a)
x(t)
i
G
xi
(t+1)
xj
(t+1)
x(t)
j
P(t)
j
P(t)
i
`
(b)
Figure: An example to show how two particles are move using PSO algorithm,
(a) general movement of the two particles, (b) movement of two particle in
one-dimensional space.
Alaa Tharwat1,2,5
20

Proposed Model: PSOk-NN
Particle Swarm Optimization (PSO)
TraininigC
Samples
kCParameter
MisclassificationCRate
fB
fG
B G < Y ? 6 7 8 9
B
G
<
Y
?
6
7
8
k=B
k=<
k=?
ClassCBC
ClassCGC
ClassCG
ClassCG
ClassCB
IntializeCPSO
ForCEachCParticle
UpdateCVelocityCdvi
V
UpdateCPositionCdxi
V
EvaluateCFitnessC
FunctionCdFdxi
VV
SatisfyC
TerminationC
Criterion
NextC
Iteration
BestCSloutionCdGV
IfCdFdxi
V<FdPi
VV
Pi
=xi
IfCdFdxi
V<FdGVV
G=xi
NextCParticle
No
Yes
Testing
Samples
TestingC
?
Figure: PSOk-NN algorithm searches for the optimal k parameter which
minimizes the misclassiﬁcation rate of the testing samples.
Alaa Tharwat1,2,5
20

Experimental Results Simulated Example
Table: Description of the training data used in our simulated example.
Pattern
No.
Class 1
(ω1)
Class 2
(ω2)
f1 f2 f1 f2
1 7 1 3 3
2 5 2 4 4
3 9 2 7 4
4 10 4 5 5
5 8 4 6 5
6 11 4 6 10
7 9 9 4 11
8 9 11 2 11
9 10 9 2 6
10 8 6 5 9
Alaa Tharwat1,2,5
20

k=1
k=3
k=5
k=7
f1
f2
1 2 3 4 5 6 7 8 9 10 11 12
1
2
3
4
5
6
7
8
9
10
11
12
k=1
k=3
k=5
k=7
k=9
C2 (false)
C2 (false)
C1 (true)
C1 (true)
C2 (false)
Value
of k
Predicted
Class Label
Class 1 (Training Pattern)
Class 1 (Testing Pattern)
Class 2 (Training Pattern)
Class 2 (Testing Pattern)
k=9
Figure: Example of how k parameter controls the predicted class labels of the
unknown sample, hence controls the misclassiﬁcation rate.
Alaa Tharwat1,2,5
20

Table: Description of the testing data used in our simulated example and its
predicted class labels using k-NN classifier using different values of k.
Testing Samples True Class
Label (yi)
Predicted Class Labels (ˆyi)
No. of
Sample
f1 f2 k=1 k=3 k=5 k=7 k=9
1 7 9 1 2 2 1 1 2
2 4 2 2 1 2 2 2 2
3 9 3 1 1 1 1 1 1
4 2 7 2 2 2 2 2 2
Misclassification Rate (%) 50 25 0 0 25
The bold values indicate the wrong class label.
Alaa Tharwat1,2,5
20

Initial Values
Particle
No.
Position (xi) Velocity (vi)
Fitness
Function (F)
Pi G
1 1 0 100 - -
2 9 0 100 - -
3 5 0 100 - -
4 3 0 100 - -
First Iteration
1 1 5.6 50 1 -
2 9 -5.6 25 9 -
3 5 0 0 5 G
4 3 2.8 25 3 -
Second Iteration
1 5 3.36 0 5 G
2 5 -3.36 0 5 G
3 5 0 0 5 G
4 5 -1.68 0 5 G
Alaa Tharwat1,2,5
20

ParticleS1
ParticleS2S
ParticleS3S
ParticleS4S
k=1 k=3 k=5 k=7 k=9
F(x1
)=50x1
F(x2
)=25x2
F(x3
)=0x3
F(x4
)=25x4
MisclassificationSRateS(6)
0
25
50
FirstSIteration
k=1 k=3 k=5 k=7 k=9
MisclassificationSRateS(6)
0
25
50
SecondSIteration
v2
=-5.6
v1
=5.6
v4
=2.8
v3
=0
Figure: Visualization of how PSO algorithm searches for the best k value which
achieves the minimum misclassiﬁcation rate.
Alaa Tharwat1,2,5
20

Experimental Results Experiments Using Real Data
Table: Data sets description.
Data set Dimension Samples Classes
Iris 4 150 3
Ionosphere 34 351 2
Liver-disorders 6 345 2
Ovarian 4000 216 2
Breast Cancer 13 683 2
Wine 13 178 3
Sonar 60 208 2
Pima Indians Diabetes 8 768 2
ORL32×32 1024 400 40
Yale32×32 1024 165 15
Alaa Tharwat1,2,5
20

Dataset
PSOk-NN GAk-NN ACOk-NN
Misclassification Rate Misclassification Rate Misclassification Rate
Iris 1.4667±0.4216 4±0 2.6667±0
Iono 13.1429±0 17.1429±0 16.9143±0.5521
Liver 30.9302±1.4708 31.9767.±0 35.4651±7.4898×10−15
Ovarian 13.0556±0.2928 14.2321±0.2145 13.8889±0
Breast Cancer 30.3021±(0.8037) 31.0850±7.4898×10−15 32.2581±7.4898×10−15
Wine 23.0899±0 24.7191±3.7449×10−15 28.3146±0.7106
Sonar 17.45±0 21.1538±0 17.3077±2.0271
Diabate 24.7448±0.9025 22.9167±3.7449×10−15 26.0417±7.4898×10−15
ORL32×32 8.5±0 9.5±0 8.5±0
Yale32×32 21.9512±3.7449×10−15 21.9512±3.7449×10−15 25.8537±0.7713
Alaa Tharwat1,2,5
20

0 10 20 30 40 50 60 70 80 90 100
0
200
400
600
800
1000
1200
1400
1600
1800
No. of Iterations
TotalAbsoluteVelocity
Iono Dataset
Iris Dataset
Sonar Dataset
Figure: Toal absolute velocity of the PSOk-NN algorithm using Iono, Iris, and
Sonar datasets.
Alaa Tharwat1,2,5
20

0 5 10 15 20 25 30 35 40
2
3
4
5
6
7
8
9
10
k Value
FitnessFunction
PSO particles
(a) After the ﬁrst iteration
0 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
k ValueFitnessFunction
PSO particles
(b) After the second
iteration
0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
k Value
FitnessFunction
PSO particles
(c) After the tenth iteration
Figure: Visualization of the movements of all particles of PSOk-NN algorithm
till it reaches to the optimal solution which achieved the minimum
misclassiﬁcation rate.
Alaa Tharwat1,2,5
20

−4 −3 −2 −1 0 1 2 3 4
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
First Feature
SecondFeature
setosa
versicolor
virginica
(a) After the first iteration
−4 −3 −2 −1 0 1 2 3 4
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
First Feature
SecondFeature
setosa
versicolor
virginica
(b) After the tenth iteration
Figure: Misclassification samples after the first and tenth iterations using
PSOk-NN algorithm.
Alaa Tharwat1,2,5
20

Conclusions
PSOk-NN algorithm achieved the minimum misclassiﬁcation error in
eight of the datasets (80%) compared with the other two algorithms.
PSOk-NN algorithm converges to the optimal solution faster than
the other two algorithms due to the use of linearly decreasing inertia
weight in PSO algorithm.
GAk-NN ﬂuctuating up and down, while PSOk-NN algorithm is more
stable during converging to the optimal solution because in PSO, the
best solution gives information to all other particles to move to the
optimal solution, while in GA the all agents are changed randomly
without any guiding from any agent.
Alaa Tharwat1,2,5
20

Thank you
Thank You
Qurstions
Alaa Tharwat1,2,5
20

PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbor Classier

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbor Classier

Ähnlich wie PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbor Classier (20)

Mehr von Aboul Ella Hassanien

Mehr von Aboul Ella Hassanien (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)