SlideShare a Scribd company logo
1 of 8
Download to read offline
Journal of Information Engineering and Applications
ISSN 2224-5782 (print) ISSN 2225-0506 (online)
Vol.3, No.11, 2013

www.iiste.org

The Improved K-Means with Particle Swarm
Optimization
Mrs. Nidhi Singh, Dr.Divakar Singh
Department of Computer Science & Engg, BUIT,BU,Bhopal.India.(M.P)
nnidhi_sng14@yahoo.com divakar_singh@rediffmail.com
Abstract
In today’s world data mining has become a large field of research. As the time increases a large amount of data
is accumulated. Clustering is an important data mining task and has been used extensively by a number of
researchers for different application areas such as finding similarities in images, text data and bio-informatics
data. Cluster analysis is one of the primary data analysis methods. Clustering defines as the process of organizing
data objects into a set of disjoint classes called clusters. Clustering is an example of unsupervised classification.
In clustering, K-Means (Macqueen) is one of the most well known popular clustering algorithm. K-Means is a
partitioning algorithm follows some drawbacks: number of clusters k must be known in advanced, it is sensitive
to random selection of initial cluster centre, and it is sensitive to outliers. In this paper, we tried to improve some
drawbacks of K-Means algorithm and an efficient algorithm is proposed to enhance the K-Means clustering with
Particle Swarm Optimization. In recent years, Particle Swarm Optimization (PSO) has been successfully applied
to a number of real world clustering problems with the fast convergence and the effectively for high-dimensional
data.
Keywords: Clustering, K-Means clustering, PSO (Particle Swarm Optimization), Hierarchical clustering.
I.
Introduction
A. Clustering
Data mining is the process of extracting patterns from large amount of data repositories or data bases. Data
mining is seen as an increasingly important tool by modern business to transform data into business intelligence
giving an informational advantage. Cluster analysis or Clustering is the assignment of a set of objects into
subsets (called clusters) so that objects in the same cluster are similar in some sense. Clustering is a method of
unsupervised learning, and a common technique used in variety of fields including machine learning, data
mining, pattern recognition, image analysis and bioinformatics [1]. A good clustering method will produce high
quality of clusters with high intra-cluster similarity and low inter-cluster similarity.
B. Types of Clustering
Data Clustering algorithms mainly divided into two categories: Hierarchical and Partition algorithms.
Hierarchical algorithm usually consists of either Agglomerative (“bottom-up”) or Divisive (“top-down”).
Agglomerative algorithms as we know is a bottom-up approach that is it begin with each element as a separate
cluster and merge them into successively larger clusters. Divisive algorithms is a top-down approach and begin
with the whole set and proceed to divide it into successively smaller clusters[1].Partition clustering algorithm
partitions the data set into desired number of sets in a single step. Many methods have been proposed to solve
clustering problem. One of the most popular and simple clustering method is K-Means clustering algorithm
developed by Mac Queen in 1967.K-Means Clustering algorithm is an iterative approach; it partitions the dataset
into k groups. Efficiency of K-Means algorithm heavily depends on selection of initial centroids because
randomly choosing initial centroid also has an influence on number of iteration and also produces different
cluster results for different cluster centroid [2].
C. Particle Swarm Intelligence
Swarm Intelligence (SI), was inspired by the biological behaviour of animals, and is an innovative distributed
intelligent paradigm for solving optimization problems [3]. The two main Swarm Intelligence algorithms are (1)
Ant Colony Optimization (ACO) and (2) Particle Swarm Optimization (PSO). This paper mainly deals with PSO.
It is an optimization technique originally proposed by Kennedy and Eberhart [4] and was inspired by the swarm
behaviour of birds, fish and bees when they search for food or communicate with each other.

1
Journal of Information Engineering and Applications
ISSN 2224-5782 (print) ISSN 2225-0506 (online)
0506
Vol.3, No.11, 2013

www.iiste.org

Figure 1.1. Basic flow diagram of PSO.
PSO approach is based upon cooperation and communication among agents called particles. Particles are the
agents that represent individual solutions while the swarm is the collection of particles which represent the
collection
solution space. The particles then start moving through the solution space by maintaining a velocity value V and
keeping track of its best previous position achieved so far. This position value is known as its personal be
best
position. Global best is another best solution which is the best fitness value which is achieved by any of the
particles. The fitness of each particle or the entire swarm is evaluated by a fitness function [7]. The flow chart of
basic PSO is shown in Figure 1.1:
D. K-Means Clustering Algorithm
Means
This part briefly describes the standard K
K-Means algorithm. K-Means is a typical clustering algorithm in Data
Means
Mining which is widely used for clustering large set of data’s. In 1967, Mac Queen firstly proposed the K
K-Means
algorithm, it was one of the most simple, unsupervised learning algorithm, which was applied to solve the
problem of the well-known cluster [9].The most widely used Partitioning Clustering algorithm is K
known
K-Means. KMeans algorithm clusters the input data into k clusters, where k is given as an input parameter.
data

Figure 1.2 Process of K
K-Means algorithm

2
Journal of Information Engineering and Applications
ISSN 2224-5782 (print) ISSN 2225-0506 (online)
Vol.3, No.11, 2013

www.iiste.org

K-Means algorithm finds a partition that highly depends on the minimum squared error between the centroid of a
cluster and its data points. The algorithm first takes k random data points as initial centroids and assigns each
data point to the nearest cluster until convergence criteria is met. Although K-Means algorithm is simple and
easy to implement, it suffers from three major drawbacks:
1) The number of clusters, k has to be specified as input.
2) The algorithm converges to local optima.
3) The final clusters depend on randomly chosen initial centroids.
4) High computational complexity
The flow diagram for simple K-Means algorithm is shown in Figure 1.2[11].
II.
Related Work
Various researches have been carried out to improve the efficiency of K-Means algorithm with Particle Swarm
Optimization. Particle Swarm Optimization gives the optimal initial seed and using this best seed K-Means
algorithm produces better clusters and produces much accurate results than traditional K-Means algorithm.
A. M. Fahim et al. [5] proposed an enhanced method for assigning data points to the suitable clusters. In the
original K-Means algorithm in each iteration the distance is calculated between each data element to all centroids
and the required computational time of this algorithm is depends on the number of data elements, number of
clusters and number of iterations, so it is computationally expensive.
K. A. Abdul Nazeer et al. [6] proposed an enhanced algorithm to improve the accuracy and efficiency of the KMeans clustering algorithm. In this algorithm two methods are used, one method for finding the better initial
centroids. And another method for an efficient way for assigning data points to appropriate clusters with reduced
time complexity. This algorithm produces good clusters in less amount of computational time.
Shafiq Alam [7] proposed a novel algorithm for clustering called Evolutionary Particle Swarm Optimization
(EPSO)-clustering algorithm which is based on PSO. The proposed algorithm is based on the evolution of swarm
generations where the particles are initially uniformly distributed in the input data space and after a specified
number of iterations; a new generation of the swarm evolves. The swarm tries to dynamically adjust itself after
each generation to optimal positions. The paper describes the new algorithm the initial implementation and
presents tests performed on real clustering benchmark data. The proposed method is compared with K-Means
clustering- a benchmark clustering technique and simple particle swarm clustering algorithm. The results show
that the algorithm is efficient and produces compact clusters.
In this paper, Lekshmy P Chandran et al. [8] describes a recently developed Meta heuristic optimization
algorithm named harmony search helps to find out near global optimal solutions by searching the entire solution
space. K-Means performs a localized searching. Studies have shown that hybrid algorithm that combines the two
ideas will produce a better solution.
In this paper, a new approach that combines the improved harmony search optimization technique and an
enhanced K-Means algorithm is proposed.
III.
Proposed Work
The proposed algorithm works in two phases. Phase I is Algorithm 3.1 describes the Particle Swarm
Optimization and Phase II is Algorithm 3.2 describes the original K-Means Algorithm. The Algorithm 3.1 which
gives better seed selection by calculating forces on each particle due to another in each direction and the total
force on an individual particle. The output of Algorithm 3.1 is given as input to Algorithm 3.2 which generates
the final clusters. The cluster generated by this proposed algorithm is much accurate and of good quality in
comparison to K-Means algorithm.
Algorithm 3.1 Particle Swarm Optimization
Step 1.initialization of parameters// number of particles, velocity.
Step 2.select randomly three particles goals
2.1 Initial goal
2.2 Average goal
2.3 Final goal
Step 3. Repeat step 2 until finds which particle goal is optimal.
Step 4 .Do
4.1 The total force on one particle due to another, in X direction due to distance factor will be FDXa,i such
FDxa ,i = min Q. 1 d a ,i cos Φ
that:

[ (

)

Where,
3

]
Journal of Information Engineering and Applications
ISSN 2224-5782 (print) ISSN 2225-0506 (online)
Vol.3, No.11, 2013

www.iiste.org

d a, i , is distance between the two particles, Φ is the angle with X-axis and Q is a constant.
4.2 In the same manner force in Y direction is calculated.
n

4.3 Now total force acted on a particle can be calculated as FDxa =

∑ FDx

a ,i

i =1
i≠a

Where n is the number of particles in the system.
Step 5.Similarity measure is calculated. A low value of squared error of corresponding values of data vector is
considered as a measure of similarity.

ε = ∑ (x j ,1 − x j ,i )2
m

j =1

 p

∑ v i ,1 
 i =1

 p

∑ v i , 2 
 i =1

r
.1 n
Step 6.distance matrix Vc is calculated as v c = .



.

 p
∑ v i , j 
 i =1






Step 7. Clusters are formed when one clusters combines with another. If the mean value of cluster having higher
cluster ID is within 1*sigma limit of the other cluster, the two clusters are merged together. Sigma is the standard
deviation of distances of the particles from the mean value.
Step 8. Stop
Algorithm 3.2 K-Means Clustering Algorithm [6]
Require: D = {d1, d2, d3, ..., di, ..., dn } // Set of n
data points.
Initial cluster centroids calculated from Algorithm 3.1.
Ensure: A set of k clusters.
Steps:
Step 1. Arbitrarily choose k data points from D as initial
centroids;
Step 2. Repeat
Assign each point di to the cluster which has the closest centroid;
Calculate the new mean for each cluster;
Step 3.Until convergence criteria are met.
The squared error can be calculated using Eq (3.1):
2

k

E=

∑∑ x−x

…………….(3.1)

i

i =1 x∈Ci

The accuracy can be calculated using Precision and Recall:
Precision is the fraction of retrieved documents that are relevant to the search. It can be calculated using Eq(3.2).
Recall is the fraction of documents that are relevant to the query that are successfully retrieved. It can be
calculated using Eq(3.3).

tp

Precision=

Recall=

tp + fp
tp

t p + fn

…………….(3.2)

…………………….(3.3)

Where
tp= true positive (correct result)
4
Journal of Information Engineering and Applications
ISSN 2224-5782 (print) ISSN 2225-0506 (online)
Vol.3, No.11, 2013

www.iiste.org

tn= true negative (correct absence of result)
fp= false positive (unexpected result)
fn= false negative (Missing result)
Overall accuracy can be calculated using Eq (3.4).
Accuracy=

t p + tn
t p + tn + f p + f n

……….(3.4)

IV.
Result Analysis
The technique proposed in this paper is tested on different data sets such as breast cancer [10], Thyroid [10], Ecoli [10], against different criteria such as accuracy, time, error rate, number of iterations and number of clusters.
The same data sets are used as an input for the K-Means algorithm. Both the algorithms do not need to know
number of clusters in advance. For the K-Means algorithm the set of initial centroids also required. The proposed
method finds initial centroids systematically. The proposed methods take additional inputs like threshold values.
The description of datasets is shown in Table 4.1.
Table4.1. Description of Datasets

The comparative analysis for different attributes like time, accuracy, error rate and number of iterations are
tabulated in Table 4.2. Based on these attributes the performance of the K-Means and proposed algorithm are
calculated for a particular threshold. In this proposed algorithm there is no need to explicitly define number of
clusters K.
Table4.2. Performance Comparison of the Algorithms for Different Datasets for Threshold 0.45

The graphical result based on comparison shown in table 4.2 is shown in Figure 4.1, Figure 4.2, Figure 4.3, and
Figure 4.4, with threshold value 0.45.
Figure 4.1 shows the comparison of different datasets with respect to Time. Results have shown that the
proposed algorithm takes comparative less time as compared to K-Means.

5
Journal of Information Engineering and Applications
ISSN 2224-5782 (print) ISSN 2225-0506 (online)
Vol.3, No.11, 2013

www.iiste.org

K-Means v/s Proposed
algorithm
10
Time
(in sec) 0

K-Means

Datasets

Proposed
algorithm

Figure 4.1. Comparison of different dataset with respect to Time at threshold 0.45
Figure 4.2 shows the error rate for different datasets, and the error rate for K-Means much larger than the
proposed algorithm.

K-Means v/s Proposed
algorithm
10

k-means

Error 5
Rate 0
(%)

proposed
algorithm

Datasets
Figure 4.2 Comparison of different dataset with respect to Error rate at threshold 0.45
Figure 4.3 shows number of iterations for K-Means and proposed algorithm, the proposed algorithm take much
number of iterations.

K-Means v/s Proposed
algorithm
7
6
Number 5
of
4
iteratio 3
2
ns
1
0

k-means
Proposed
algorithm

Datasets
Figure 4.3 Comparison of different dataset with respect to number of iterations at threshold 0.45
Figure 4.4 shows the accuracy of K-Means and proposed algorithm, result shows that the proposed algorithm is
much accurate than K-Means.

6
Journal of Information Engineering and Applications
ISSN 2224-5782 (print) ISSN 2225-0506 (online)
Vol.3, No.11, 2013

www.iiste.org

K-Means v/s Proposed
algorithm
Accurac
y
(%)

92
90
88
86
84
82
80

k-means
Proposed
algorithm

Data sets
Figure 4.4 Comparison of different dataset with respect to accuracy at threshold 0.45
Matlab 7.8.0 is used for programming in experiment. Computer configuration is Intel Pentium 2.80GHz CPU,
256MB RAM.
V.
Discussion
In this paper we have discussed the improved K-Means Clustering with Particle Swarm Optimization. One of the
major drawback of K-Means Clustering is the random selection of seed, the random selection of initial seed
results in different cluster which are not good in quality. The K-Means Clustering algorithm needs the steps (1)
declaration of k clusters (2) initial seed selection (3) similarity matrix (4) cluster generation. The PSO algorithm
is applied in step (2) the PSO gives the optimal solution for seed selection. In this paper, the standard K-Means is
applied with different PSO to produce results which are more accurate and efficient than K-Means algorithm. In
this paper there is no need to given k number of clusters in advance only a threshold value is required.
References
[1]. M.V.B.T.Santhi, V.R.N.S.S.V. Sai Leela, P.U. Anitha, D. Nagamalleswari, “Enhancing K-Means
Clustering Algorithm”, IJCST Vol 2, Issue 4, Oct- Dec 2011.
[2]. Madhu Yedla, Srinivasa Rao Pathokota, T M Srinivasa, “Enhancing K-Means Clustering Algorithm with
Improved Initial Center”, (IJCSIT) International Journal of Computer Science and Information
Technologies, Vol. 1 (2), pp:121-125, 2010.
[3]. Ajith Abraham, He Guo, and Liu Hongbo, “Swarm Intelligence: Foundations, Perspectives and
Applications”, Swarm Intelligent Systems, Nedjah N, Mourelle L (eds.),Nova Publishers, USA, 2006.
[4]. J. Kennedy, and R. C. Eberhart, “Particle Swarm Optimization,” Proc. Of IEEE International Conference
on Neural Networks (ICNN), Vol. IV, Perth, Australia, 1942- 1948, 1995
[5]. A. M. Fahim, A. M. Salem, F. A. Torkey and M. A. Ramadan, “An Efficient enhanced K-Means
clustering algorithm,” journal of Zhejiang University, 10(7): 16261633, 2006.
[6]. K. A. Abdul Nazeer and M. P. Sebastian, “Improving the accuracy and efficiency of the K-Means
clustering algorithm,” in International Conference on Data Mining and Knowledge Engineering
(ICDMKE), Proceedings of the World Congress on Engineering (WCE-2009),Vol 1, July, London, UK,
2009.
[7]. Shafiq Alam, Gillian Dobbie, Patricia Riddle, "An Evolutionary Particle Swarm Optimization Algorithm
for Data Clustering", Swarm Intelligence Symposium St. Louis MO USA, September 21-23, IEEE 2008.
[8]. Lekshmy P Chandran,K A Abdul Nazeer, “An Improved Clustering Algorithm based on K-Means and
Harmony Search Optimization”, IEEE 2011.
[9]. Shi Na, Xumin Liu, Guan Yong, “Research on K-Means Clustering Algorithm -An Improved K-Means
Clustering Algorithm”, Third International Symposium on Intelligent Information Technology and
Security Informatics, pp: 63-67, IEEE, 2010.
[10]. The UCI Repository website. [Online].Available: http://archive.ics.uci.edu/,2010.
[11]. Juntao Wang,Xiaolong Su,"An improved K-Means clustering algorithm",IEEE,2011.

7
This academic article was published by The International Institute for Science,
Technology and Education (IISTE). The IISTE is a pioneer in the Open Access
Publishing service based in the U.S. and Europe. The aim of the institute is
Accelerating Global Knowledge Sharing.
More information about the publisher can be found in the IISTE’s homepage:
http://www.iiste.org
CALL FOR JOURNAL PAPERS
The IISTE is currently hosting more than 30 peer-reviewed academic journals and
collaborating with academic institutions around the world. There’s no deadline for
submission. Prospective authors of IISTE journals can find the submission
instruction on the following page: http://www.iiste.org/journals/
The IISTE
editorial team promises to the review and publish all the qualified submissions in a
fast manner. All the journals articles are available online to the readers all over the
world without financial, legal, or technical barriers other than those inseparable from
gaining access to the internet itself. Printed version of the journals is also available
upon request of readers and authors.
MORE RESOURCES
Book publication information: http://www.iiste.org/book/
Recent conferences: http://www.iiste.org/conference/
IISTE Knowledge Sharing Partners
EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open
Archives Harvester, Bielefeld Academic Search Engine, Elektronische
Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial
Library , NewJour, Google Scholar

More Related Content

What's hot

A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory based
ijaia
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
IJECEIAES
 

What's hot (20)

Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory based
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
Az36311316
Az36311316Az36311316
Az36311316
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learning
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clustering
 
Column store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setColumn store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute set
 
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online Data
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 

Viewers also liked

SaúDe Do Trabalhador I Aula
SaúDe Do Trabalhador I AulaSaúDe Do Trabalhador I Aula
SaúDe Do Trabalhador I Aula
cybelly
 
A Matter Of Ethics
A Matter Of EthicsA Matter Of Ethics
A Matter Of Ethics
EJAdery1
 
Goal setting theory
Goal setting theoryGoal setting theory
Goal setting theory
Mj Payal
 
Nr 36 apresentação - padrão - workshop (1)
Nr 36 apresentação - padrão - workshop (1)Nr 36 apresentação - padrão - workshop (1)
Nr 36 apresentação - padrão - workshop (1)
Jupira Silva
 
Presentación1
Presentación1Presentación1
Presentación1
juliana
 
Lernspiele Lesepiele Software Schriftspracherwerb
Lernspiele Lesepiele Software SchriftspracherwerbLernspiele Lesepiele Software Schriftspracherwerb
Lernspiele Lesepiele Software Schriftspracherwerb
joness6
 
El lenguaje de programaciã³n prolog jaume i castellã³n
El lenguaje de programaciã³n prolog   jaume i castellã³nEl lenguaje de programaciã³n prolog   jaume i castellã³n
El lenguaje de programaciã³n prolog jaume i castellã³n
njrr
 

Viewers also liked (20)

mI pRiMeR tRaBaJo De MaNtEnImiEnTO
mI pRiMeR tRaBaJo De MaNtEnImiEnTOmI pRiMeR tRaBaJo De MaNtEnImiEnTO
mI pRiMeR tRaBaJo De MaNtEnImiEnTO
 
SaúDe Do Trabalhador I Aula
SaúDe Do Trabalhador I AulaSaúDe Do Trabalhador I Aula
SaúDe Do Trabalhador I Aula
 
Manual de TQE 2014 v1
Manual de TQE 2014 v1Manual de TQE 2014 v1
Manual de TQE 2014 v1
 
A Matter Of Ethics
A Matter Of EthicsA Matter Of Ethics
A Matter Of Ethics
 
Street Drugs Down And Dirty
Street Drugs Down And DirtyStreet Drugs Down And Dirty
Street Drugs Down And Dirty
 
Goal setting theory
Goal setting theoryGoal setting theory
Goal setting theory
 
Nr 36 apresentação - padrão - workshop (1)
Nr 36 apresentação - padrão - workshop (1)Nr 36 apresentação - padrão - workshop (1)
Nr 36 apresentação - padrão - workshop (1)
 
Angie halloween
Angie halloweenAngie halloween
Angie halloween
 
Presentación1
Presentación1Presentación1
Presentación1
 
Anyi halloween
Anyi halloweenAnyi halloween
Anyi halloween
 
Robot012013
Robot012013Robot012013
Robot012013
 
Estación 4. Mi Programa de Formación.
Estación 4. Mi Programa de Formación.Estación 4. Mi Programa de Formación.
Estación 4. Mi Programa de Formación.
 
Blended learning caminho natural para as ies
Blended learning caminho natural para as iesBlended learning caminho natural para as ies
Blended learning caminho natural para as ies
 
Packaging DNN extensions
Packaging DNN extensionsPackaging DNN extensions
Packaging DNN extensions
 
Lernspiele Lesepiele Software Schriftspracherwerb
Lernspiele Lesepiele Software SchriftspracherwerbLernspiele Lesepiele Software Schriftspracherwerb
Lernspiele Lesepiele Software Schriftspracherwerb
 
Life Cycle of Stars Stations
Life Cycle of Stars Stations Life Cycle of Stars Stations
Life Cycle of Stars Stations
 
AHS Slides_Tucker Max
AHS Slides_Tucker MaxAHS Slides_Tucker Max
AHS Slides_Tucker Max
 
Archivo leyes
Archivo leyesArchivo leyes
Archivo leyes
 
TIC: el objetivo no es innovar, el objetivo es mejorar la formación del alu...
TIC: el objetivo no es innovar,  el objetivo es mejorar  la formación del alu...TIC: el objetivo no es innovar,  el objetivo es mejorar  la formación del alu...
TIC: el objetivo no es innovar, el objetivo es mejorar la formación del alu...
 
El lenguaje de programaciã³n prolog jaume i castellã³n
El lenguaje de programaciã³n prolog   jaume i castellã³nEl lenguaje de programaciã³n prolog   jaume i castellã³n
El lenguaje de programaciã³n prolog jaume i castellã³n
 

Similar to The improved k means with particle swarm optimization

(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...
Akram Pasha
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
IJERD Editor
 

Similar to The improved k means with particle swarm optimization (20)

I017235662
I017235662I017235662
I017235662
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering Ensemble
 
Max stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemMax stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problem
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A Review
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
 

More from Alexander Decker

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...
Alexander Decker
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websites
Alexander Decker
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banks
Alexander Decker
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized d
Alexander Decker
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistance
Alexander Decker
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifham
Alexander Decker
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibia
Alexander Decker
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school children
Alexander Decker
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banks
Alexander Decker
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget for
Alexander Decker
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjab
Alexander Decker
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...
Alexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniques
Alexander Decker
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloud
Alexander Decker
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveraged
Alexander Decker
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenya
Alexander Decker
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health of
Alexander Decker
 

More from Alexander Decker (20)

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...
 
A validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inA validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale in
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websites
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banks
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized d
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistance
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifham
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibia
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school children
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banks
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget for
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjab
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniques
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloud
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveraged
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenya
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health of
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

The improved k means with particle swarm optimization

  • 1. Journal of Information Engineering and Applications ISSN 2224-5782 (print) ISSN 2225-0506 (online) Vol.3, No.11, 2013 www.iiste.org The Improved K-Means with Particle Swarm Optimization Mrs. Nidhi Singh, Dr.Divakar Singh Department of Computer Science & Engg, BUIT,BU,Bhopal.India.(M.P) nnidhi_sng14@yahoo.com divakar_singh@rediffmail.com Abstract In today’s world data mining has become a large field of research. As the time increases a large amount of data is accumulated. Clustering is an important data mining task and has been used extensively by a number of researchers for different application areas such as finding similarities in images, text data and bio-informatics data. Cluster analysis is one of the primary data analysis methods. Clustering defines as the process of organizing data objects into a set of disjoint classes called clusters. Clustering is an example of unsupervised classification. In clustering, K-Means (Macqueen) is one of the most well known popular clustering algorithm. K-Means is a partitioning algorithm follows some drawbacks: number of clusters k must be known in advanced, it is sensitive to random selection of initial cluster centre, and it is sensitive to outliers. In this paper, we tried to improve some drawbacks of K-Means algorithm and an efficient algorithm is proposed to enhance the K-Means clustering with Particle Swarm Optimization. In recent years, Particle Swarm Optimization (PSO) has been successfully applied to a number of real world clustering problems with the fast convergence and the effectively for high-dimensional data. Keywords: Clustering, K-Means clustering, PSO (Particle Swarm Optimization), Hierarchical clustering. I. Introduction A. Clustering Data mining is the process of extracting patterns from large amount of data repositories or data bases. Data mining is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage. Cluster analysis or Clustering is the assignment of a set of objects into subsets (called clusters) so that objects in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique used in variety of fields including machine learning, data mining, pattern recognition, image analysis and bioinformatics [1]. A good clustering method will produce high quality of clusters with high intra-cluster similarity and low inter-cluster similarity. B. Types of Clustering Data Clustering algorithms mainly divided into two categories: Hierarchical and Partition algorithms. Hierarchical algorithm usually consists of either Agglomerative (“bottom-up”) or Divisive (“top-down”). Agglomerative algorithms as we know is a bottom-up approach that is it begin with each element as a separate cluster and merge them into successively larger clusters. Divisive algorithms is a top-down approach and begin with the whole set and proceed to divide it into successively smaller clusters[1].Partition clustering algorithm partitions the data set into desired number of sets in a single step. Many methods have been proposed to solve clustering problem. One of the most popular and simple clustering method is K-Means clustering algorithm developed by Mac Queen in 1967.K-Means Clustering algorithm is an iterative approach; it partitions the dataset into k groups. Efficiency of K-Means algorithm heavily depends on selection of initial centroids because randomly choosing initial centroid also has an influence on number of iteration and also produces different cluster results for different cluster centroid [2]. C. Particle Swarm Intelligence Swarm Intelligence (SI), was inspired by the biological behaviour of animals, and is an innovative distributed intelligent paradigm for solving optimization problems [3]. The two main Swarm Intelligence algorithms are (1) Ant Colony Optimization (ACO) and (2) Particle Swarm Optimization (PSO). This paper mainly deals with PSO. It is an optimization technique originally proposed by Kennedy and Eberhart [4] and was inspired by the swarm behaviour of birds, fish and bees when they search for food or communicate with each other. 1
  • 2. Journal of Information Engineering and Applications ISSN 2224-5782 (print) ISSN 2225-0506 (online) 0506 Vol.3, No.11, 2013 www.iiste.org Figure 1.1. Basic flow diagram of PSO. PSO approach is based upon cooperation and communication among agents called particles. Particles are the agents that represent individual solutions while the swarm is the collection of particles which represent the collection solution space. The particles then start moving through the solution space by maintaining a velocity value V and keeping track of its best previous position achieved so far. This position value is known as its personal be best position. Global best is another best solution which is the best fitness value which is achieved by any of the particles. The fitness of each particle or the entire swarm is evaluated by a fitness function [7]. The flow chart of basic PSO is shown in Figure 1.1: D. K-Means Clustering Algorithm Means This part briefly describes the standard K K-Means algorithm. K-Means is a typical clustering algorithm in Data Means Mining which is widely used for clustering large set of data’s. In 1967, Mac Queen firstly proposed the K K-Means algorithm, it was one of the most simple, unsupervised learning algorithm, which was applied to solve the problem of the well-known cluster [9].The most widely used Partitioning Clustering algorithm is K known K-Means. KMeans algorithm clusters the input data into k clusters, where k is given as an input parameter. data Figure 1.2 Process of K K-Means algorithm 2
  • 3. Journal of Information Engineering and Applications ISSN 2224-5782 (print) ISSN 2225-0506 (online) Vol.3, No.11, 2013 www.iiste.org K-Means algorithm finds a partition that highly depends on the minimum squared error between the centroid of a cluster and its data points. The algorithm first takes k random data points as initial centroids and assigns each data point to the nearest cluster until convergence criteria is met. Although K-Means algorithm is simple and easy to implement, it suffers from three major drawbacks: 1) The number of clusters, k has to be specified as input. 2) The algorithm converges to local optima. 3) The final clusters depend on randomly chosen initial centroids. 4) High computational complexity The flow diagram for simple K-Means algorithm is shown in Figure 1.2[11]. II. Related Work Various researches have been carried out to improve the efficiency of K-Means algorithm with Particle Swarm Optimization. Particle Swarm Optimization gives the optimal initial seed and using this best seed K-Means algorithm produces better clusters and produces much accurate results than traditional K-Means algorithm. A. M. Fahim et al. [5] proposed an enhanced method for assigning data points to the suitable clusters. In the original K-Means algorithm in each iteration the distance is calculated between each data element to all centroids and the required computational time of this algorithm is depends on the number of data elements, number of clusters and number of iterations, so it is computationally expensive. K. A. Abdul Nazeer et al. [6] proposed an enhanced algorithm to improve the accuracy and efficiency of the KMeans clustering algorithm. In this algorithm two methods are used, one method for finding the better initial centroids. And another method for an efficient way for assigning data points to appropriate clusters with reduced time complexity. This algorithm produces good clusters in less amount of computational time. Shafiq Alam [7] proposed a novel algorithm for clustering called Evolutionary Particle Swarm Optimization (EPSO)-clustering algorithm which is based on PSO. The proposed algorithm is based on the evolution of swarm generations where the particles are initially uniformly distributed in the input data space and after a specified number of iterations; a new generation of the swarm evolves. The swarm tries to dynamically adjust itself after each generation to optimal positions. The paper describes the new algorithm the initial implementation and presents tests performed on real clustering benchmark data. The proposed method is compared with K-Means clustering- a benchmark clustering technique and simple particle swarm clustering algorithm. The results show that the algorithm is efficient and produces compact clusters. In this paper, Lekshmy P Chandran et al. [8] describes a recently developed Meta heuristic optimization algorithm named harmony search helps to find out near global optimal solutions by searching the entire solution space. K-Means performs a localized searching. Studies have shown that hybrid algorithm that combines the two ideas will produce a better solution. In this paper, a new approach that combines the improved harmony search optimization technique and an enhanced K-Means algorithm is proposed. III. Proposed Work The proposed algorithm works in two phases. Phase I is Algorithm 3.1 describes the Particle Swarm Optimization and Phase II is Algorithm 3.2 describes the original K-Means Algorithm. The Algorithm 3.1 which gives better seed selection by calculating forces on each particle due to another in each direction and the total force on an individual particle. The output of Algorithm 3.1 is given as input to Algorithm 3.2 which generates the final clusters. The cluster generated by this proposed algorithm is much accurate and of good quality in comparison to K-Means algorithm. Algorithm 3.1 Particle Swarm Optimization Step 1.initialization of parameters// number of particles, velocity. Step 2.select randomly three particles goals 2.1 Initial goal 2.2 Average goal 2.3 Final goal Step 3. Repeat step 2 until finds which particle goal is optimal. Step 4 .Do 4.1 The total force on one particle due to another, in X direction due to distance factor will be FDXa,i such FDxa ,i = min Q. 1 d a ,i cos Φ that: [ ( ) Where, 3 ]
  • 4. Journal of Information Engineering and Applications ISSN 2224-5782 (print) ISSN 2225-0506 (online) Vol.3, No.11, 2013 www.iiste.org d a, i , is distance between the two particles, Φ is the angle with X-axis and Q is a constant. 4.2 In the same manner force in Y direction is calculated. n 4.3 Now total force acted on a particle can be calculated as FDxa = ∑ FDx a ,i i =1 i≠a Where n is the number of particles in the system. Step 5.Similarity measure is calculated. A low value of squared error of corresponding values of data vector is considered as a measure of similarity. ε = ∑ (x j ,1 − x j ,i )2 m j =1  p  ∑ v i ,1   i =1   p  ∑ v i , 2   i =1  r .1 n Step 6.distance matrix Vc is calculated as v c = .    .   p ∑ v i , j   i =1      Step 7. Clusters are formed when one clusters combines with another. If the mean value of cluster having higher cluster ID is within 1*sigma limit of the other cluster, the two clusters are merged together. Sigma is the standard deviation of distances of the particles from the mean value. Step 8. Stop Algorithm 3.2 K-Means Clustering Algorithm [6] Require: D = {d1, d2, d3, ..., di, ..., dn } // Set of n data points. Initial cluster centroids calculated from Algorithm 3.1. Ensure: A set of k clusters. Steps: Step 1. Arbitrarily choose k data points from D as initial centroids; Step 2. Repeat Assign each point di to the cluster which has the closest centroid; Calculate the new mean for each cluster; Step 3.Until convergence criteria are met. The squared error can be calculated using Eq (3.1): 2 k E= ∑∑ x−x …………….(3.1) i i =1 x∈Ci The accuracy can be calculated using Precision and Recall: Precision is the fraction of retrieved documents that are relevant to the search. It can be calculated using Eq(3.2). Recall is the fraction of documents that are relevant to the query that are successfully retrieved. It can be calculated using Eq(3.3). tp Precision= Recall= tp + fp tp t p + fn …………….(3.2) …………………….(3.3) Where tp= true positive (correct result) 4
  • 5. Journal of Information Engineering and Applications ISSN 2224-5782 (print) ISSN 2225-0506 (online) Vol.3, No.11, 2013 www.iiste.org tn= true negative (correct absence of result) fp= false positive (unexpected result) fn= false negative (Missing result) Overall accuracy can be calculated using Eq (3.4). Accuracy= t p + tn t p + tn + f p + f n ……….(3.4) IV. Result Analysis The technique proposed in this paper is tested on different data sets such as breast cancer [10], Thyroid [10], Ecoli [10], against different criteria such as accuracy, time, error rate, number of iterations and number of clusters. The same data sets are used as an input for the K-Means algorithm. Both the algorithms do not need to know number of clusters in advance. For the K-Means algorithm the set of initial centroids also required. The proposed method finds initial centroids systematically. The proposed methods take additional inputs like threshold values. The description of datasets is shown in Table 4.1. Table4.1. Description of Datasets The comparative analysis for different attributes like time, accuracy, error rate and number of iterations are tabulated in Table 4.2. Based on these attributes the performance of the K-Means and proposed algorithm are calculated for a particular threshold. In this proposed algorithm there is no need to explicitly define number of clusters K. Table4.2. Performance Comparison of the Algorithms for Different Datasets for Threshold 0.45 The graphical result based on comparison shown in table 4.2 is shown in Figure 4.1, Figure 4.2, Figure 4.3, and Figure 4.4, with threshold value 0.45. Figure 4.1 shows the comparison of different datasets with respect to Time. Results have shown that the proposed algorithm takes comparative less time as compared to K-Means. 5
  • 6. Journal of Information Engineering and Applications ISSN 2224-5782 (print) ISSN 2225-0506 (online) Vol.3, No.11, 2013 www.iiste.org K-Means v/s Proposed algorithm 10 Time (in sec) 0 K-Means Datasets Proposed algorithm Figure 4.1. Comparison of different dataset with respect to Time at threshold 0.45 Figure 4.2 shows the error rate for different datasets, and the error rate for K-Means much larger than the proposed algorithm. K-Means v/s Proposed algorithm 10 k-means Error 5 Rate 0 (%) proposed algorithm Datasets Figure 4.2 Comparison of different dataset with respect to Error rate at threshold 0.45 Figure 4.3 shows number of iterations for K-Means and proposed algorithm, the proposed algorithm take much number of iterations. K-Means v/s Proposed algorithm 7 6 Number 5 of 4 iteratio 3 2 ns 1 0 k-means Proposed algorithm Datasets Figure 4.3 Comparison of different dataset with respect to number of iterations at threshold 0.45 Figure 4.4 shows the accuracy of K-Means and proposed algorithm, result shows that the proposed algorithm is much accurate than K-Means. 6
  • 7. Journal of Information Engineering and Applications ISSN 2224-5782 (print) ISSN 2225-0506 (online) Vol.3, No.11, 2013 www.iiste.org K-Means v/s Proposed algorithm Accurac y (%) 92 90 88 86 84 82 80 k-means Proposed algorithm Data sets Figure 4.4 Comparison of different dataset with respect to accuracy at threshold 0.45 Matlab 7.8.0 is used for programming in experiment. Computer configuration is Intel Pentium 2.80GHz CPU, 256MB RAM. V. Discussion In this paper we have discussed the improved K-Means Clustering with Particle Swarm Optimization. One of the major drawback of K-Means Clustering is the random selection of seed, the random selection of initial seed results in different cluster which are not good in quality. The K-Means Clustering algorithm needs the steps (1) declaration of k clusters (2) initial seed selection (3) similarity matrix (4) cluster generation. The PSO algorithm is applied in step (2) the PSO gives the optimal solution for seed selection. In this paper, the standard K-Means is applied with different PSO to produce results which are more accurate and efficient than K-Means algorithm. In this paper there is no need to given k number of clusters in advance only a threshold value is required. References [1]. M.V.B.T.Santhi, V.R.N.S.S.V. Sai Leela, P.U. Anitha, D. Nagamalleswari, “Enhancing K-Means Clustering Algorithm”, IJCST Vol 2, Issue 4, Oct- Dec 2011. [2]. Madhu Yedla, Srinivasa Rao Pathokota, T M Srinivasa, “Enhancing K-Means Clustering Algorithm with Improved Initial Center”, (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 1 (2), pp:121-125, 2010. [3]. Ajith Abraham, He Guo, and Liu Hongbo, “Swarm Intelligence: Foundations, Perspectives and Applications”, Swarm Intelligent Systems, Nedjah N, Mourelle L (eds.),Nova Publishers, USA, 2006. [4]. J. Kennedy, and R. C. Eberhart, “Particle Swarm Optimization,” Proc. Of IEEE International Conference on Neural Networks (ICNN), Vol. IV, Perth, Australia, 1942- 1948, 1995 [5]. A. M. Fahim, A. M. Salem, F. A. Torkey and M. A. Ramadan, “An Efficient enhanced K-Means clustering algorithm,” journal of Zhejiang University, 10(7): 16261633, 2006. [6]. K. A. Abdul Nazeer and M. P. Sebastian, “Improving the accuracy and efficiency of the K-Means clustering algorithm,” in International Conference on Data Mining and Knowledge Engineering (ICDMKE), Proceedings of the World Congress on Engineering (WCE-2009),Vol 1, July, London, UK, 2009. [7]. Shafiq Alam, Gillian Dobbie, Patricia Riddle, "An Evolutionary Particle Swarm Optimization Algorithm for Data Clustering", Swarm Intelligence Symposium St. Louis MO USA, September 21-23, IEEE 2008. [8]. Lekshmy P Chandran,K A Abdul Nazeer, “An Improved Clustering Algorithm based on K-Means and Harmony Search Optimization”, IEEE 2011. [9]. Shi Na, Xumin Liu, Guan Yong, “Research on K-Means Clustering Algorithm -An Improved K-Means Clustering Algorithm”, Third International Symposium on Intelligent Information Technology and Security Informatics, pp: 63-67, IEEE, 2010. [10]. The UCI Repository website. [Online].Available: http://archive.ics.uci.edu/,2010. [11]. Juntao Wang,Xiaolong Su,"An improved K-Means clustering algorithm",IEEE,2011. 7
  • 8. This academic article was published by The International Institute for Science, Technology and Education (IISTE). The IISTE is a pioneer in the Open Access Publishing service based in the U.S. and Europe. The aim of the institute is Accelerating Global Knowledge Sharing. More information about the publisher can be found in the IISTE’s homepage: http://www.iiste.org CALL FOR JOURNAL PAPERS The IISTE is currently hosting more than 30 peer-reviewed academic journals and collaborating with academic institutions around the world. There’s no deadline for submission. Prospective authors of IISTE journals can find the submission instruction on the following page: http://www.iiste.org/journals/ The IISTE editorial team promises to the review and publish all the qualified submissions in a fast manner. All the journals articles are available online to the readers all over the world without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. Printed version of the journals is also available upon request of readers and authors. MORE RESOURCES Book publication information: http://www.iiste.org/book/ Recent conferences: http://www.iiste.org/conference/ IISTE Knowledge Sharing Partners EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open Archives Harvester, Bielefeld Academic Search Engine, Elektronische Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial Library , NewJour, Google Scholar