More Related Content
Similar to Meta heuristic based clustering of two-dimensional data using-2
Similar to Meta heuristic based clustering of two-dimensional data using-2 (20)
More from IAEME Publication
More from IAEME Publication (20)
Meta heuristic based clustering of two-dimensional data using-2
- 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
93
META-HEURISTIC BASED CLUSTERING OF TWO-DIMENSIONAL
DATA USING NEIGHBOURHOOD SEARCH WITH DATA MINING
TECHNIQUE AS AN APPLICATION OF P-MEDIAN PROBLEM
1
D.Srinivas Reddy, 2
Dr A.Govardhan, 3
S.S.V.N Sharma
1
Vaageswari College of Engineering, Karimnagar, India
2
JNTUH University, Director of Evaluation, Hyderabad, India
3
Director of Vaagdevi College of Engineering, India
ABSTRACT
The p-median difficulty is the well known facility allocation essence. The
neighborhood search (NS) algorithm provides a solution to that problem. This NS algorithm
is hybridized with data mining (DM) technique to attain a solution that is better than NS
approach [1]. Two new Metaheuristic clustering algorithms HDMNS (Hybrid Data Mining
based Neighborhood Search) and HMDMNS (Hybrid Multiple Data Mining based
Neighborhood Search) are proposed schemes which are hybrid versions of NS algorithm [10,
11 and 17]. The clustering technique is the process of dividing the points into related groups.
The proposed NS with DM technique method can also be used as a clustering algorithm
based on the nature of the p-median problem. The proposed algorithm has two phases; First
phase in both algorithms constructs the basic solution using the NS approach. In HDMNS
approach DM technique is used to locate the improved solution which gives the better
clustering of two-dimensional data space. In HMDMNS the second phase is also uses the
DM technique to get the best quality cluster based on the idea to get better results with
mining if exists and the results are compared with the well known K-means clustering
algorithm, results proves that both the methods out performs k-means algorithm.
Index Terms- HDMNS (Hybrid Data Mining based Neighborhood Search), HMDMNS
(Hybrid Multiple Data Mining based Neighborhood Search, neighborhood search (NS), Data
mining (DM), Clustering
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 3, May-June (2013), pp. 93-100
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
- 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
94
I. INTRODUCTION
The p-median problem which is a combinatorial optimization problem in natural
world is NP-Hard that realizes facilitators that serve the maximum locations [7, 8]. The p-
median problem will be practical in several applications such as growing marketing strategies
in the realm of management sciences and locating server positions in computer networks
(CNs) [12]. The use the p-median problem as a clustering technique. In this work a
metaheuristic method k-Mean-GRASP is proposed to solve the p-median problem which is
discussed in detail. The arrangement of similar objects into different groups is known as
Clustering, i.e., the segregation of information into subsets (clusters), so that the elements of
each subset possess some common mannerism. Data clustering is an important Data Mining
task and is a general mechanism for analyzing statistical data, which is used in several other
areas; such as mechanical process industries, machine learning, pattern recognition, image
analysis and bioinformatics [15]. Metaheuristics embody a principal category of approximate
practice for decipher of hard combinatorial optimization problems, for simplification of
which the use of exact methods is impractical. There are several general purpose high-level
procedures that can be instantiated to explore the solution space of a specific optimization
problem efficiently. Earlier, metaheuristics, like genetic algorithms, tabu search, simulated
annealing, ant systems, GRASP, and others, have been introduced and are applied to real-life
problems in several areas of science. Many optimization problems are successfully applied to
solve the GRASP (Greedy Randomized Adaptive Search Procedures) metaheuristic. The
search process for identifying the solution employed by GRASP is iterative and each pass
consists of two phases: construction and enhancement phase. In the construction phase a
feasible solution is built, and then its neighbourhood is determined by the enhancement phase
to find an improved one. The outcome is the paramount solution found over all iterations.
Figure 1: K-Mean GRASP Procedure Figure 2: K-Means algorithm
The NS technique reveals all possible combinations with the elements in the
neighbourhood of individual elements in the solution and determine the optimal solution i.e.,
which serves the maximum locations so that the sum of the total distance from the each
element to the facilities is minimized. The NS Approach Metaheuristic is an iterative one
which contains two phases [16]. Construction phase is the first phase that structure the initial
solution and based on the initial solution the second phase gawk at for the optimal solution
based on NS approach and then the feasible solution space is computed to obtain the optimal
Procedure-Mean-GRASP()
1. Initialize best_sol as Æ
2. repeat
3. sol ßk-Means(data points);
4. best_sol ßEnhancement (sol);
5. if cost(sol) > cost(best_sol)
6. best_sol ßsol;
7. end if
8. until Termination criterion;
9. return best_sol;
Procedure-means (data points)
1. Initialize k-points as cluster centers
2. Assign each data point to the nearest
cluster center
3. Recompute the cluster centers for each
cluster as the mean of the cluster.
4. Repeat steps 2 and 3 until there is not
any more change in the value of the
means
- 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
95
Procedure HDMNS ()
1. Initialize sol, optimal_sol, Mined_sol
Φ
2. Initialize sol_space Φ
3. Initialize USS, FIS, UFIS Φ
4. Read list, p
5. Sol NSApproach(list, p)
6. USS Update_sol_space (sol)
7. FIS Generate_frequent_items(USS)
8. UFIS Update_supportcount(FIS, USS)
9. Mined _ sol Generate _ mined
_solution(UFIS)
10. Update optimal_sol
Figure 4: Hybridized Data Mining NS algorithm
solution. The NS Approach mechanism is illustrated in the Figure 3. It is having two phases:
the construction phase and NB Search phase.
II. HYBRIDIZED DATA MINING NEIGHBOURHOOD SEARCH APPROACH AND
HYBRID MULTIPLE DATA MINING BASED NEIGHBORHOOD SEARCH
The HDMNS () procedure shown in the Figure 4 consists of three phases. The first
phase NS Approach (), computes the initial solution using NS approach by considering the
given list and user specified ‘P’ [18]. The second phase is the application of DM technique to
the basic realistic solution. The obtained solution in first phase is input for the second phase
which is used to generate the updated solution space (USS) that consists of all the probable
solutions generated using the result obtained in the NSApproach (). Based on USS, the
Frequent item set (FIS) are generated which consists the set of all individual items that are
present in the USS [4, 5 and 9]. Currently, support count is calculated and updated for each
item in the FIS. After updating support count, sort the frequent items in the decreasing order
of support count and then the mined solution is constructed by deliberating the items with
high support count until the size of the solution or the number of items exactly equals to ‘P’.
The final phase is used to obtain the optimal solution. The optimal solution is updated using
enhancement phase described in NSApproach which examines the global optimal solution
that optimizes the objective of the given p-median problem [13, 14]. HMDMNS described in
the Figure 5, is the Metaheuristic based on Neighbourhood Search (NS) hybridized with
Data Mining Technique (HDMNS) multiple times iteratively with Frequent Mining to
provide an updated solution to p-median problem if exits in each iteration [6]. The resulting
local optimal solution from NS method serves as a basis for identification of practical
solution space that holds different possible solutions of similar size and by the application of
frequent mining technique on it results in identification of frequent items. Basing on the
support count, most practical solution is identified. The frequent mining technique is applied
only one time in HDMNS and yields a better solution. In HMDMNS it is applied constantly
several times based on the inspiration that on applying the mining technique one time derives
a better solution. Then applying the same over and over again will definitely yield optimal
one.
Procedure NSApproach ()
1. optml_sol ∅
2. sol Construction(data points);
3. best_sol NBSearch(sol);
4. if cost(sol) > cost(optml_sol)
5. optml_sol sol;
6. end if
7. until Termination criterion;
8. return optml_sol;
Figure 3: NS Approach procedure
- 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
96
Procedure HMDMNS ()
1. sol, optimal_sol, Mined_sol Φ
2. sol_space Φ
3. USS, FIS, UFIS Φ
4. Read list, p
5. Sol NSApproach(list, p)
6. Repeat
7. USS Update_sol_space (sol)
8. FIS Generate_frequent_items(USS)
9. UFIS Update_supportcount(FIS, USS)
10.Mined _ sol Generate _ mined
_solution(UFIS)
11.Update optimal_sol
12. Go to step 6 until user specified times
Figure 5: Hybridized Data Mining NS algorithm
Figure 6:
III. HDMNS AND HMDMNS APPROACHES AS CLUSTERING METHODS,
K-MEANS
Of the most popular heuristics for solving clustering problems by k-means clustering
Algorithm (CA) which is simple and widely used [2]. This algorithm segregates the data into
k disjoint clusters. The center of each cluster is labeled as centroid. It partitions the objects so
as to minimize the sum total of the squared distances between the centroid of the clusters and
their objects. The k-means algorithm is described in the Figure 6. The cluster quality
computed for k-means is same as p-median problem objective function value. So, the solution
approaches proposed for p-median can also be considered as new clustering mechanisms
which are efficient than k-means. And the cluster quality estimation for the k-means
algorithm is the squared error criterion is given by ܧ ൌ ∑ ∑ | െ ݉|ଶ
א
ୀଵ where E is the
sum of square-error for all objects in the database, ‘P’ is the point in the space representing
the given object, and mi is the mean of the cluster Ci. This criterion makes the resulting k
clusters as solid.
IV. RESULTS
The experimental results acquired for k-GRASP and k-MEANS are presented in this
section, and the results are com-pared on the bases of solution quality against k. Experiments
are conducted on data sets with 50, 75, 100 points. Results are tabulated and graphs are
plotted. The data sets under study are taken from the web site of Professor Eric Taillarrd,
University of Applied Sciences of Western Switzerland. The companion website for p-
median problems instances is http://mistic.heigvd.ch/taillard/problemes.dir/location.html.
Here quality of the solution (cluster) i.e., sum of the distances from each customer location to
its closest facility (cluster center) is measured for both k-means algorithm and the hybridized
k-mean-GRASP algorithm. In Figure 7 solution/cluster quality is compared using both
algorithms k-Means and k-Mean-GRASP for the data set of size 50 with number of facility
locations (cluster centers) incremented by 5.In Figure 8 solution/cluster quality is compared
using both algorithms k-Means and k-Mean-GRASP for the data set of size 75 with number
of facility locations (cluster centers) incremented by 10.
Procedure k-means (data points)
1. Initialize k-points as cluster centers
2. Assign each data point to the nearest
cluster center
3. Recompute the cluster centers for each
cluster as the mean of the cluster.
4. Repeat steps 2 and 3 until there is not
any more change in the value of the
means
- 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May
Figure 7: k-Mean Vs K-Mean GRAS
The experimental results obtained for
analyzed in this section, and the results are evaluated on the basis of quality of the solution
against ‘P’. Experiments are carried out on data sets with 15, 25, 50
are tabularized and graphs are outlined. The origin of data sets under study is acquired
from the web site of Professor Eric Taillarrd, Kent
Western Switzerland. The associated website for p
http://mistic.heig-vd.ch/taillard/problemes.dir/location.html.
compared for algorithms HDMNS, HMDMNS and k
p and n=25. It is identified that HMDMNS is resulting with good quality cluster than the
other two.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976
6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
97
Mean GRASP Figure 8: k-Mean Vs k-Mean GRASP
The experimental results obtained for NS approach, HDMNS and HM
and the results are evaluated on the basis of quality of the solution
. Experiments are carried out on data sets with 15, 25, 50and 75 points. Results
are tabularized and graphs are outlined. The origin of data sets under study is acquired
the web site of Professor Eric Taillarrd, Kent University of Applied Sciences of
Western Switzerland. The associated website for p-median problem instances is
vd.ch/taillard/problemes.dir/location.html. In Graph-1 the cluster
HDMNS, HMDMNS and k-means algorithms with intervals 5 for
p and n=25. It is identified that HMDMNS is resulting with good quality cluster than the
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
June (2013), © IAEME
Mean GRASP
MDMNS are
and the results are evaluated on the basis of quality of the solution
points. Results
are tabularized and graphs are outlined. The origin of data sets under study is acquired
University of Applied Sciences of
median problem instances is
cluster quality is
means algorithms with intervals 5 for
p and n=25. It is identified that HMDMNS is resulting with good quality cluster than the
- 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976
6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
98
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
June (2013), © IAEME
- 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
99
The same is observed in the remaining graphs plotted for cluster quality with n=50, 15
and 75 with p= 15, 3 and 20 respectively in Graphs:2, 3 and 4. Execution times are also
compared for all the three algorithms for p=5, 15 and 3 and identified that HMDMNS can
also be used as a better clustering algorithms than existing k-means algorithm. It is depicted
in Graphs 5 and 6.
V. CONCLUSION
It is observed that in all the test cases, Hybrid Multiple Data Mining Neighbourhood
Search (HMDMNS) Metaheuristic performs much better as a clustering algorithm (CA) when
compared with the efficient existing method k-means. So instead of using k-means;
HMDMNS can be used as a new clustering mechanism.
REFERENCES
[1] R. Agrwal and R. Srikanth, Fast algorithms for mining association rules, Proceedings of
the Very Large Data Bases Conference, pp. 487-499, 1994.
[2] T. A. Feo and M. G. C. Resende, A probabilistic heuristic for a computationally
difficult set covering problem, Operational Research Letters, 8 (1989), pp.67-71.
[3] M. D. H. Gamal and Salhi, A cellular heuristic for the multisource Weber Problem,
computers & Operations Research, 30 (2003), pp.1609-1624.
[4] B. Goethals and M. J. Zaki, Advances in Frequent Item set Mining Implementations:
Introduction to FIMI03, Proceedings of the IEEEICDM workshop on Frequent Item set
Mining Implementations, 2003.
[5] G. Grahne and J. Zhu, Efficiently using prefix-trees in mining frequent item-sets,
Proceedings of the IEEEICDM Workshop on Frequent Item set Mining
Implementations, 2003.
[6] J. Han and M. Kamber, Data Mining: Concepts and Techniques, 2nd
Ed., Morgan
Kaufman Publishers, 2006
[7] O. Kariv and L. Hakimi, An algorithmic approach to network location problems, part ii:
the p-medians, SIAM Journal of Applied Mathematics, 37 (1979), pp.539-560
[8] N. Mladenovic, J. Brimberg, P. Hansen and Jose A. Moreno-Perez, The p-median
problem: A survey of metaheuristic approaches, European Journal of Operational
Research, 179 (2007), pp.927-939.
[9] S. Orlando, P. Palmerimi and R. Perego, Adaptive and resource- aware mining of
frequent sets, Proceedings of the IEEE International conference on Data Mining,
pp.338-345, 2002
[10] M. H. F. Ribeiro, V. F. Trindade, A. Plastino and S. L. Martins, Hybradization of
GRASP metaheuristic with datamining techniquess, Proceedings of the ECAI
Workshop on Hybrid Metaheuristics, pp.69-78, 2004.
[11] E. G. Talbi, A taxonomy of hybrid metaheuristics, Journal of Heuristics, 8 (2002),
pp.541-564.
[12] B. C. Tansel, R. L. Fransis, and T. J. Lowe. Location on networks: A survey,
Management Science, 29 (1983), pp.482-511.
[13] Moh’d Belal Al-Zoubi, Ahmed Sharieh, Nedal Al-Hanbali and Ali Al-Dahoud, A
Hybrid Heuristic Algorithm for Solving the P-Median Problem. Journal of Computer
Science (Special Issue) 80-83, 2005, Science Publications.
- 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
100
[14] Alexndre Plastino, Eric R Fonseca, Richard Fuchshuber, Simone de L Martins,
Alex.A.Freitas, Martino Luis and Said Salhi A Hybrid Datamining Metaheuristic for
the p-median problem. Proceedings of SIAM journal (Data Mining), 2009.
[15] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, a Nodal Relational Approach to
Data Virtualization. International conference on Cloud Computing and E-governance,
(Bangkok, Thiland).
[16] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, Metaheuristic Approach based
on Neighborhood Search for Solving p-Median Problem, IOSR Journal of Computer
Engineering (IOSRJCE), Volume 7, Issue 1 (Nov-Dec. 2012), PP. 01-05.
[17] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, Hybrid K-mean GRAP for
Partition based Clustering of Two Dimensional Data Space as an application of p-
Median Problem, International Journal of Engineering Sciences Research (IJESR),
Volume 3, Issue 12, December-2012.
[18] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, Hybridization of Neighborhood
Search Metaheuristic with Data Mining Technique to solve p-median Problem,
International Journal of Computational Engineering Research (IJCER), Vol. 2, Issue 7,
November-2012.
[19] R. Lakshman Naik, D. Ramesh and B. Manjula, “Instances Selection using Advance
Data Mining Techniques”, International Journal of Computer Engineering &
Technology (IJCET), Volume 3, Issue 2, 2012, pp. 47 - 53, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.
[20] M. Karthikeyan, M. Suriya Kumar and Dr. S. Karthikeyan, “A Literature Review on the
Data Mining and Information Security”, International Journal of Computer Engineering
& Technology (IJCET), Volume 3, Issue 1, 2012, pp. 141 - 146, ISSN Print:
0976 – 6367, ISSN Online: 0976 – 6375.