SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research
and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.504-508
504 | P a g e
Survey of Combined Clustering Approaches
Mr. Santosh D. Rokade*, Mr. A. M. Bainwad**
*( M.Tech Student, CSE Department, SGGS IE&T, Nanded, India)
** (Assistant Professor, CSE Department, SGGS IE&T, Nanded, India)
ABSTRACT
Data clustering is one of the very
important techniques and plays an important
role in many areas such as data mining, pattern
recognition, machine learning, bioinformatics
and other fields. There are many clustering
approaches proposed in the literature with
different quality, complexity tradeoffs. Each
clustering algorithm has its own characteristics
and works on its domain space with no optimum
solution for all datasets of different properties,
sizes, structures, and distributions. Combining
multiple clustering is considered as new progress
in the area of data clustering. In this paper
different combining clustering algorithms are
discussed. Combining clustering is based on the
level of cooperation between the clustering
algorithms; either they cooperate on the
intermediate level or end result level.
Cooperation among multiple clustering
techniques is for the goal of increasing the
homogeneity of objects within the clusters
Keywords - Chameleon , Ensemble clustering,
Generic and Kernel based ensemble clustering,
Hybrid clustering,
1. Introduction
The goal of clustering is to group the data
points or objects that are close or similar to each
other and identify such grouping in an
unsupervised manner, unsupervised is in the sense
that no information is provided to the algorithm
about which data point belongs to which cluster. In
other words data clustering is a data analysis
technique that enables the abstraction of large
amounts of data by forming meaningful groups or
categories of objects, these objects are formally
known as clusters. This grouping is in such a way
that objects in the same cluster are similar to each
other, and those in different clusters are dissimilar
according to some similarity measure or criteria.
The increasing importance of data clustering in
different areas has led to the development of a
variety of algorithms. These algorithms are
differing in many aspects, such as the similarity
measure used, the types of attributes they use to
characterize the dataset, and the representation of
the clusters. Combination. of different clustering
algorithm is new progress area in the document
clustering to improve the result of different
clustering algorithms. The cooperation is
performed on end result level or intermediate level.
Examples of end-result cooperation are the
ensemble clustering and the hybrid clustering
approaches [3-9, 11, 12]. Cooperative clustering
model is an example of intermediate level
cooperation [13]. Sometimes, k-means and
agglomerative hierarchical approaches are
combined so as to get the best results as compares
to individual algorithm. For example, in the
document domain Scatter/Gather [1], a document
browsing system based on clustering, uses a hybrid
approach involving both k-means and
agglomerative hierarchical clustering. The k-means
is used because of its run-time efficiency and the
agglomerative hierarchical clustering is used
because of its quality [2]. Survey of different
ensemble clustering and hybrid clustering
approaches is done in the upcoming section of this
paper.
2. Ensemble clustering
The concept of cluster ensemble is
introduced in [3] by A. Strehl and J. Ghosh. In this
paper, they introduce the problem of combining
multiple partitioning of a set of objects without
accessing the original features. They call this
problem as cluster ensemble problem. The cluster
ensemble problem is then formalized as a
combinatorial optimization problem in terms of
shared mutual information. In addition to a direct
maximization approach, they propose three
effective and efficient techniques for obtaining high
quality combiners or consensus functions. The first
combiner induces a similarity measure from the
partitioning and then reclusters the objects. The
second combiner is based on hypergraph
partitioning. The third one collapses groups of
clusters into meta-clusters which then compete for
each object to determine the combined clustering.
Unlike classification or regression settings, there
have been very few approaches proposed for
combining multiple clusterings. Bradley and
Fayyad [4] in 1998 proposed an approach for
combining multiple clusterings; here they combine
the results of several clusterings of a given dataset,
where each solution resides in a common known
feature space, for example combining multiple sets
of cluster centers obtained by using k-means with
different initializations.
According to D. Greene, P. Cunningham, recent
techniques for ensemble clustering are effective in
Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research
and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.504-508
505 | P a g e
improving the accuracy and stability of standard
clustering algorithms though these techniques have
drawback of computational cost of generating and
combining multiple clusterings of the data. D.
Greene, P. Cunningham proposed efficient
ensemble methods for document clustering, in that
they present an efficient kernel-based ensemble
clustering method suitable for application to large,
high-dimensional datasets [5].
2.1 Generic Ensemble Clustering
Ensemble clustering is based on the idea
of combining multiple clusterings of a given
dataset to produce a better combined or aggregated
solution. The general process followed by these
techniques is given in the fig.1 which has two
different phases.
Fig.1 Generic ensemble clustering process.
Phase-1. Generation: Construct a collect-ion of τ
base clustering solutions, denoted as
which represents the members
of the ensemble. This is typically done by
repeatedly applying a given clustering algorithm in
a manner that leads to diversity among the
members.
Phase-2. Integration: Once a collection of
ensemble members has been generated, a suitable
integration function is applied to combine them to
produce a final “consensus” clustering.
2.2 Kernel-based Ensemble Clustering
In order to avoid repeated recompution of
similarity values in original feature space, D.
Greene and P. Cunningham choose to represent the
data in the form of an kernel matching k,
where k indicates the close resemblance between
object and . The main advantage of using
kernel methods in the ensemble clustering is that
after construction of single kernel matrix we may
subsequently generate multiple partitions without
using original data. In [5] Greene D.Tsymbal
proposed a Kernel-based correspondence clustering
with prototype reduction that produces more stable
results than other schemes such as those based on
pair-wise co-assignments, which are highly
sensitive to the choice of final clustering algorithm.
The Kernel-based correspondence clustering
algorithm is described as follows:
1) Construct full kernel matrix k and set
counter .
2) Increment t and generate base
clustering :
 Produce a sub sampling without
replacement.
 Apply adjusted kernel k-means
with random initialization to the
samples.
 Assign each out-of-sample object
to the nearest centroid in
3) If , initialize V as the binary
membership matrix for . Otherwise,
update V as follows:
 Compute the current consensus
clustering C from V such that:
 Find the optimal correspondence
between the clusters in
and .
 For each object assigned to the
jth
cluster in the ,
increment .
4) Repeat from Step 2 until is stable or
.
5) Return the final consensus clustering .
2.3 Ensemble clustering with Kernel Reduction
The ensemble clustering approach
introduced by D. Greene, P. Cunningham [5]
allows each base clustering to be generated without
referring back to the original feature space but, for
larger datasets the computational cost of repeatedly
applying an algorithm is very high . To
reduce this computational cost, the value of n
should be reduced. After this the ensemble process
becomes less computationally expensive. Greene
and Cunningham [5] showed that the principles
underlying the kernel-based prototype reduction
technique may also be used to improve the
efficiency of ensemble clustering. The proposed
techniques mainly performed in three steps such as
applying prototype reduction, performing
correspondence clustering on the reduced
representation and subsequently mapping the
resulting aggregate solution back to the original
data. The entire process is illustrated in Fig. 2.
Fig. 2 Ensemble clustering process with prototype
reduction
The entire ensemble process with prototype
reduction is summarized in following algorithm.
1) Construct full n * n kernel matrix K from
the original data X.
Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research
and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.504-508
506 | P a g e
2) Apply prototype reduction to form
the reduced kernel matrix k’.
3) Apply kernel-based correspondence
clustering using K’ as given in kernel-
based correspondence clustering algorithm
to produce a consensus clustering ’.
4) Construct a full clustering by assigning
a cluster label to each based on the
nearest cluster in ’.
5) Apply adjusted kernel k-means using
as an initial partition to produce a refined
final clustering of .
Recent ensemble clustering techniques
have been shown to be effective in improving the
accuracy and stability of standard clustering
algorithms but computational complexity is the
main drawback of ensemble clustering techniques.
3. Hybrid Clustering
To improve the performance and
efficiency of algorithms several clustering methods
have been proposed to combine the features of
hierarchical and partitional clustering algorithms.
In general, these algorithms first partition the input
data set into m sub clusters and then a new
hierarchical structure is constructed based on these
m subclusters.
This idea of hybrid clustering is first
proposed in [6] where a multilevel algorithm is
developed. N. M. Murty and G. Krishna described
a hybrid clustering algorithm based on the concepts
of multilevel theory which is nonhierarchical at the
first level and hierarchical from second level
onwards to cluster data sets having chain-like
clusters and concentric clusters. N. M. Murty and
G. Krishna observed that this hybrid clustering
algorithm gives the same results as the hierarchical
clustering algorithm with less computation and
storage requirements. At the first level, the
multilevel algorithm partitions the data set into
several partitions and then performs the k-means
algorithm on each partition to obtain several
subclusters. In subsequent levels, this algorithm
uses the centroids of the subclusters identified in
the previous level as the new input data points and
performs the hierarchical clustering algorithm on
those points. This process continues until exactly k
clusters are determined. Finally, the algorithm
performs a top-down process to reassign all points
of each subcluster to the cluster of their centroids
[7].
Balanced Iterative Reduced Clustering
using Hierarchies (BIRCH) is another hybrid
clustering algorithm designed to deal with large
input data sets [8, 9]. BIRCH algorithm introduces
two important concepts, first is cluster feature and
another is cluster feature tree which are used to
summarize cluster representations. These structures
help the clustering method achieve good speed and
scalability in large databases and also make it
effective for incremental and dynamic clustering of
incoming objects. A clustering feature (CF) is three
dimensional vector summarizing information about
clusters of objects. BIRCH uses CF to represent a
subcluster. If CF of a subcluster is given, we can
obtain the centroid, radius, and diameter of that
subcluster easily. The CF vector of a new cluster is
formed by merging two subclusters. This can be
directly derived from the CF vectors of the two
subclusters by algebra operations. A CF tree is a
height balanced tree that stores the clustering
features for a hierarchical clustering. The non leaf
nodes store sums of the CFs of their children and
thus summarize clustering information about their
children.
BIRCH algorithm consists of four phases.
In Phase 1, BIRCH scans the database to build an
initial in-memory CF tree, which can be viewed as
a multilevel compression of the data that tries to
preserve the inherent clustering structure of the
data. BIRCH partitions the input data set into many
subclusters by a CF tree. In Phase 2, it reduces the
size of the CF tree that is the number of subclusters
in order to apply a global clustering algorithm in
Phase 3 on those generated subclusters. In Phase 4,
each point in the data set is redistributed to the
closest centroids of the clusters produced in Phase
3. Among these phases, Phase 2 and Phase 4 are
used to further improve the clustering quality.
Therefore these two phases are optional. BIRCH
tries to produce the best clusters with the available
resources with a limited amount of main memory.
An important consideration is to minimize the time
required for input/output. BIRCH applies a
multiphase clustering technique as: a single scan of
the data set yields a basic good clustering and one
or more additional scans can be used to further
improve the quality [9].
G. Karypis, E.H. Han, and V. Kumar in
[10] proposed another hybrid clustering algorithm
named as CHAMELEON. Chameleon is a
hierarchical clustering algorithm that uses dynamic
modeling to determine the similarity between pairs
of clusters. The Chameleon algorithm’s key feature
is that it gives importance for both
interconnectivity and closeness in identifying the
most similar pair of clusters. Interconnectivity is
the number of links between two clusters and
closeness is the length of those links. This
algorithm is described as follows and summarized
in fig. 3
1) Construct a k-nearest neighbour graph.
2) Partition the k-nearest neighbour graph
into many small sub clusters.
3) Merge those sub clusters into final
clustering results.
Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research
and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.504-508
507 | P a g e
Fig.3 Chameleon Algorithm
Chameleon uses a k-nearest neighbour
graph approach to construct a sparse graph. Each
vertex of this graph represents a data object. There
exists an edge between two vertices or objects if
one object is among the k most similar objects of
the other. The edges are weighted to reflect the
similarity between objects. Chameleon uses a graph
partitioning algorithm to partition the k-nearest
neighbour graph into a large number of relatively
small subclusters. It then uses an agglomerative
hierarchical clustering algorithm that repeatedly
merges subclusters based on their similarity. To
determine the pairs of most similar subclusters, it
takes into account both the interconnectivity as
well as the closeness of the clusters [9].
Zhao and Karypis in [11] showed that the
hybrid model of Bisecting k-means (BKM) and k-
means (KM) clustering produces better results than
individual BKM and KM. BKM [12] is a variant of
KM clustering that produces either a partitional or
a hierarchical clustering by recursively applying the
basic KM method. It starts by considering the
whole dataset to be one cluster. At each step, one
cluster V is selected and bisected further into two
partitions V1 and V2 using the basic KM algorithm.
This process continues until the desired number of
clusters or some other specified stopping condition
is reached. There are a number of different ways to
choose which cluster to split. For example, we can
choose: the largest cluster at each step or the one
with least overall similarity or a criterion that
satisfies both size and overall similarity. This
bisecting approach is very attractive in many
applications such as document retrieval, document
indexing problems and gene expression analysis as
it is based on the homogeneity criterion. However,
in some cases when a fraction of the dataset is left
behind with no other way to re-cluster it again at
each level of the binary hierarchical tree, a
“refinement” is needed to re-cluster these resulting
solutions. In [11], it has been conclude that the
BKM with end-result refinement using the KM
produces better results than KM and BKM. A
drawback of this end- result enhancement is that
KM has to wait until the former BKM finishes its
clustering and then it takes the final set of centroids
as initial centres for a better refinement [2].
Thus, in hybrid clustering, cascaded clustering
algorithms cooperates together for the goal of
refining the clustering solutions produced by a
former clustering algorithm. Different hybrid
clustering approaches discussed above are shown
to be effective in improving the clustering quality
but main drawback of this hybrid clustering
approach is that it does not allow synchronous
execution of the clustering algorithms that is one
algorithm has to wait for another algorithm to
finish its clustering.
4. Conclusions
Combining multiple clustering is
considered as an example to further broaden a new
progress in the area of data clustering. In this
paper different combined clustering approaches
have been discussed that are shown to be effective
in improving the clustering quality. Thus,
computational complexity is the main drawback of
ensemble clustering techniques and idle time
wastage is one of the drawback of hybrid clustering
approaches
REFERENCES
[1] Cutting, D., Karger, D., Pedersen, J. and
Turkey, J.W., “Scatter/Gather: A Cluster-
based Approach to Browsing Large
Document Collections,” SIGIR ‘92, 1992,
pp. 318-329.
[2] M. Steinbach, G. Karypis, V. Kumar, “A
comparison of document clustering
techniques”, In Proceeding of the KDD
Workshop on Text Mining, 2000, pp. 109-
110.
[3] A. Strehl, J. Ghosh, “Cluster ensembles:
knowledge reuse framework for
combining partitioning”, Conference on
Artificial Intelligence (AAAI 2002),
AAAI/MIT Press, Cambridge, MA, 2002,
pp. 93-98.
[4] U. M. Fayyad, C. Reina, and P. S.
Bradley, “Initialization of iterative
refinement clustering algorithms”, InProc.
14th Intl. Conf. on Machine learning
(ICML), 1998, pp. 194-198.
[5] D. Greene, P. Cunningham, “Efficient
ensemble methods for document
clustering”, Technical Report, Trinity
College Dublin, Computer Science
Department, 2006.
[6] N.M. Murty and G. Krishna, “A Hybrid
Clustering Procedure for Concentric and
Chain-Like Clusters”, Int’l J. Computer
and Information Sciences, vol. 10, no. 6,
1981, pp. 397-412.
[7] C. Lin, M. Chen, “Combining partitional
and hierarchical algorithms for robust and
efficient data clustering with cohesion
self-merging”, IEEE Transactions on
Knowledge and Data Engineering 17 (2),
2005, pp. 145-159.
Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research
and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.504-508
508 | P a g e
[8] T. Zhang, R. Ramakrishna, and M. Livny,
“BIRCH: An Efficient Data Clustering
Method for Very Large Databases,” Proc.
Conf. Management of Data (ACM
SIGMOD ’96), 1996, pp. 103-114.
[9] Jiawei Han and Micheline Kamber, Data
Mining: Concepts and Techniques
(Second Edition. Jim Gray, Series Editor,
Morgan Kaufmann Publishers, March
2006).
[10] G. Karypis, E.H. Han, and V. Kumar,
“Chameleon: Hierarchical Clustering
Using Dynamic Modeling,” IEEE
Computer Society, vol. 32, no. 8, Aug.
1999, pp. 68-75.
[11] Y. Zhao, G. Karypis, “Criterion functions
for document clustering: experiments and
analysis”, Technical Report, 2002.
[12] S.M. Savaresi, D. Boley, “On the
performance of bisecting k-means and
PDDP”, in: Proceedings of the 1st SIAM
International Conference on Data Mining,
, 2001, pp. 114.
[13] Rasha Kashef, Mohamed S. Kamel,
“Cooperative Clustering”, Pattern
Recognition, 43, 2010, pp. 2315-2329.

Weitere ähnliche Inhalte

Was ist angesagt?

A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET Journal
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimizationAlexander Decker
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data IJMER
 
Analysis and implementation of modified k medoids
Analysis and implementation of modified k medoidsAnalysis and implementation of modified k medoids
Analysis and implementation of modified k medoidseSAT Publishing House
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
 
Chapter1_C.doc
Chapter1_C.docChapter1_C.doc
Chapter1_C.docbutest
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957IJMER
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...IOSR Journals
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
 

Was ist angesagt? (20)

A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
J41046368
J41046368J41046368
J41046368
 
50120130406008
5012013040600850120130406008
50120130406008
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
 
Analysis and implementation of modified k medoids
Analysis and implementation of modified k medoidsAnalysis and implementation of modified k medoids
Analysis and implementation of modified k medoids
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
Chapter1_C.doc
Chapter1_C.docChapter1_C.doc
Chapter1_C.doc
 
Clustering
ClusteringClustering
Clustering
 
20320140501002 2
20320140501002 220320140501002 2
20320140501002 2
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
50120130406022
5012013040602250120130406022
50120130406022
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 

Andere mochten auch (20)

By35435437
By35435437By35435437
By35435437
 
Cb33474482
Cb33474482Cb33474482
Cb33474482
 
Cr33562566
Cr33562566Cr33562566
Cr33562566
 
Ce33493496
Ce33493496Ce33493496
Ce33493496
 
Cu33582587
Cu33582587Cu33582587
Cu33582587
 
Hi3312831286
Hi3312831286Hi3312831286
Hi3312831286
 
Ie3313971401
Ie3313971401Ie3313971401
Ie3313971401
 
Ir3415981603
Ir3415981603Ir3415981603
Ir3415981603
 
Do33694700
Do33694700Do33694700
Do33694700
 
Bu35414419
Bu35414419Bu35414419
Bu35414419
 
Bn35364376
Bn35364376Bn35364376
Bn35364376
 
Cy32624628
Cy32624628Cy32624628
Cy32624628
 
Cz32629632
Cz32629632Cz32629632
Cz32629632
 
Bx33454457
Bx33454457Bx33454457
Bx33454457
 
Bs33424429
Bs33424429Bs33424429
Bs33424429
 
Cm33537542
Cm33537542Cm33537542
Cm33537542
 
Ja3116861689
Ja3116861689Ja3116861689
Ja3116861689
 
Cd32504509
Cd32504509Cd32504509
Cd32504509
 
Cg32519523
Cg32519523Cg32519523
Cg32519523
 
Io3415841586
Io3415841586Io3415841586
Io3415841586
 

Ähnlich wie Cg33504508

84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2IAEME Publication
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataIRJET Journal
 
Clustering Approach Recommendation System using Agglomerative Algorithm
Clustering Approach Recommendation System using Agglomerative AlgorithmClustering Approach Recommendation System using Agglomerative Algorithm
Clustering Approach Recommendation System using Agglomerative AlgorithmIRJET Journal
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks IJECEIAES
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmIRJET Journal
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Editor IJARCET
 

Ähnlich wie Cg33504508 (20)

84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
Bl24409420
Bl24409420Bl24409420
Bl24409420
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkages
 
F04463437
F04463437F04463437
F04463437
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
 
Clustering Approach Recommendation System using Agglomerative Algorithm
Clustering Approach Recommendation System using Agglomerative AlgorithmClustering Approach Recommendation System using Agglomerative Algorithm
Clustering Approach Recommendation System using Agglomerative Algorithm
 
A new link based approach for categorical data clustering
A new link based approach for categorical data clusteringA new link based approach for categorical data clustering
A new link based approach for categorical data clustering
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
 
K044055762
K044055762K044055762
K044055762
 
Af4201214217
Af4201214217Af4201214217
Af4201214217
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 

Kürzlich hochgeladen

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Kürzlich hochgeladen (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Cg33504508

  • 1. Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.504-508 504 | P a g e Survey of Combined Clustering Approaches Mr. Santosh D. Rokade*, Mr. A. M. Bainwad** *( M.Tech Student, CSE Department, SGGS IE&T, Nanded, India) ** (Assistant Professor, CSE Department, SGGS IE&T, Nanded, India) ABSTRACT Data clustering is one of the very important techniques and plays an important role in many areas such as data mining, pattern recognition, machine learning, bioinformatics and other fields. There are many clustering approaches proposed in the literature with different quality, complexity tradeoffs. Each clustering algorithm has its own characteristics and works on its domain space with no optimum solution for all datasets of different properties, sizes, structures, and distributions. Combining multiple clustering is considered as new progress in the area of data clustering. In this paper different combining clustering algorithms are discussed. Combining clustering is based on the level of cooperation between the clustering algorithms; either they cooperate on the intermediate level or end result level. Cooperation among multiple clustering techniques is for the goal of increasing the homogeneity of objects within the clusters Keywords - Chameleon , Ensemble clustering, Generic and Kernel based ensemble clustering, Hybrid clustering, 1. Introduction The goal of clustering is to group the data points or objects that are close or similar to each other and identify such grouping in an unsupervised manner, unsupervised is in the sense that no information is provided to the algorithm about which data point belongs to which cluster. In other words data clustering is a data analysis technique that enables the abstraction of large amounts of data by forming meaningful groups or categories of objects, these objects are formally known as clusters. This grouping is in such a way that objects in the same cluster are similar to each other, and those in different clusters are dissimilar according to some similarity measure or criteria. The increasing importance of data clustering in different areas has led to the development of a variety of algorithms. These algorithms are differing in many aspects, such as the similarity measure used, the types of attributes they use to characterize the dataset, and the representation of the clusters. Combination. of different clustering algorithm is new progress area in the document clustering to improve the result of different clustering algorithms. The cooperation is performed on end result level or intermediate level. Examples of end-result cooperation are the ensemble clustering and the hybrid clustering approaches [3-9, 11, 12]. Cooperative clustering model is an example of intermediate level cooperation [13]. Sometimes, k-means and agglomerative hierarchical approaches are combined so as to get the best results as compares to individual algorithm. For example, in the document domain Scatter/Gather [1], a document browsing system based on clustering, uses a hybrid approach involving both k-means and agglomerative hierarchical clustering. The k-means is used because of its run-time efficiency and the agglomerative hierarchical clustering is used because of its quality [2]. Survey of different ensemble clustering and hybrid clustering approaches is done in the upcoming section of this paper. 2. Ensemble clustering The concept of cluster ensemble is introduced in [3] by A. Strehl and J. Ghosh. In this paper, they introduce the problem of combining multiple partitioning of a set of objects without accessing the original features. They call this problem as cluster ensemble problem. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, they propose three effective and efficient techniques for obtaining high quality combiners or consensus functions. The first combiner induces a similarity measure from the partitioning and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Unlike classification or regression settings, there have been very few approaches proposed for combining multiple clusterings. Bradley and Fayyad [4] in 1998 proposed an approach for combining multiple clusterings; here they combine the results of several clusterings of a given dataset, where each solution resides in a common known feature space, for example combining multiple sets of cluster centers obtained by using k-means with different initializations. According to D. Greene, P. Cunningham, recent techniques for ensemble clustering are effective in
  • 2. Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.504-508 505 | P a g e improving the accuracy and stability of standard clustering algorithms though these techniques have drawback of computational cost of generating and combining multiple clusterings of the data. D. Greene, P. Cunningham proposed efficient ensemble methods for document clustering, in that they present an efficient kernel-based ensemble clustering method suitable for application to large, high-dimensional datasets [5]. 2.1 Generic Ensemble Clustering Ensemble clustering is based on the idea of combining multiple clusterings of a given dataset to produce a better combined or aggregated solution. The general process followed by these techniques is given in the fig.1 which has two different phases. Fig.1 Generic ensemble clustering process. Phase-1. Generation: Construct a collect-ion of τ base clustering solutions, denoted as which represents the members of the ensemble. This is typically done by repeatedly applying a given clustering algorithm in a manner that leads to diversity among the members. Phase-2. Integration: Once a collection of ensemble members has been generated, a suitable integration function is applied to combine them to produce a final “consensus” clustering. 2.2 Kernel-based Ensemble Clustering In order to avoid repeated recompution of similarity values in original feature space, D. Greene and P. Cunningham choose to represent the data in the form of an kernel matching k, where k indicates the close resemblance between object and . The main advantage of using kernel methods in the ensemble clustering is that after construction of single kernel matrix we may subsequently generate multiple partitions without using original data. In [5] Greene D.Tsymbal proposed a Kernel-based correspondence clustering with prototype reduction that produces more stable results than other schemes such as those based on pair-wise co-assignments, which are highly sensitive to the choice of final clustering algorithm. The Kernel-based correspondence clustering algorithm is described as follows: 1) Construct full kernel matrix k and set counter . 2) Increment t and generate base clustering :  Produce a sub sampling without replacement.  Apply adjusted kernel k-means with random initialization to the samples.  Assign each out-of-sample object to the nearest centroid in 3) If , initialize V as the binary membership matrix for . Otherwise, update V as follows:  Compute the current consensus clustering C from V such that:  Find the optimal correspondence between the clusters in and .  For each object assigned to the jth cluster in the , increment . 4) Repeat from Step 2 until is stable or . 5) Return the final consensus clustering . 2.3 Ensemble clustering with Kernel Reduction The ensemble clustering approach introduced by D. Greene, P. Cunningham [5] allows each base clustering to be generated without referring back to the original feature space but, for larger datasets the computational cost of repeatedly applying an algorithm is very high . To reduce this computational cost, the value of n should be reduced. After this the ensemble process becomes less computationally expensive. Greene and Cunningham [5] showed that the principles underlying the kernel-based prototype reduction technique may also be used to improve the efficiency of ensemble clustering. The proposed techniques mainly performed in three steps such as applying prototype reduction, performing correspondence clustering on the reduced representation and subsequently mapping the resulting aggregate solution back to the original data. The entire process is illustrated in Fig. 2. Fig. 2 Ensemble clustering process with prototype reduction The entire ensemble process with prototype reduction is summarized in following algorithm. 1) Construct full n * n kernel matrix K from the original data X.
  • 3. Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.504-508 506 | P a g e 2) Apply prototype reduction to form the reduced kernel matrix k’. 3) Apply kernel-based correspondence clustering using K’ as given in kernel- based correspondence clustering algorithm to produce a consensus clustering ’. 4) Construct a full clustering by assigning a cluster label to each based on the nearest cluster in ’. 5) Apply adjusted kernel k-means using as an initial partition to produce a refined final clustering of . Recent ensemble clustering techniques have been shown to be effective in improving the accuracy and stability of standard clustering algorithms but computational complexity is the main drawback of ensemble clustering techniques. 3. Hybrid Clustering To improve the performance and efficiency of algorithms several clustering methods have been proposed to combine the features of hierarchical and partitional clustering algorithms. In general, these algorithms first partition the input data set into m sub clusters and then a new hierarchical structure is constructed based on these m subclusters. This idea of hybrid clustering is first proposed in [6] where a multilevel algorithm is developed. N. M. Murty and G. Krishna described a hybrid clustering algorithm based on the concepts of multilevel theory which is nonhierarchical at the first level and hierarchical from second level onwards to cluster data sets having chain-like clusters and concentric clusters. N. M. Murty and G. Krishna observed that this hybrid clustering algorithm gives the same results as the hierarchical clustering algorithm with less computation and storage requirements. At the first level, the multilevel algorithm partitions the data set into several partitions and then performs the k-means algorithm on each partition to obtain several subclusters. In subsequent levels, this algorithm uses the centroids of the subclusters identified in the previous level as the new input data points and performs the hierarchical clustering algorithm on those points. This process continues until exactly k clusters are determined. Finally, the algorithm performs a top-down process to reassign all points of each subcluster to the cluster of their centroids [7]. Balanced Iterative Reduced Clustering using Hierarchies (BIRCH) is another hybrid clustering algorithm designed to deal with large input data sets [8, 9]. BIRCH algorithm introduces two important concepts, first is cluster feature and another is cluster feature tree which are used to summarize cluster representations. These structures help the clustering method achieve good speed and scalability in large databases and also make it effective for incremental and dynamic clustering of incoming objects. A clustering feature (CF) is three dimensional vector summarizing information about clusters of objects. BIRCH uses CF to represent a subcluster. If CF of a subcluster is given, we can obtain the centroid, radius, and diameter of that subcluster easily. The CF vector of a new cluster is formed by merging two subclusters. This can be directly derived from the CF vectors of the two subclusters by algebra operations. A CF tree is a height balanced tree that stores the clustering features for a hierarchical clustering. The non leaf nodes store sums of the CFs of their children and thus summarize clustering information about their children. BIRCH algorithm consists of four phases. In Phase 1, BIRCH scans the database to build an initial in-memory CF tree, which can be viewed as a multilevel compression of the data that tries to preserve the inherent clustering structure of the data. BIRCH partitions the input data set into many subclusters by a CF tree. In Phase 2, it reduces the size of the CF tree that is the number of subclusters in order to apply a global clustering algorithm in Phase 3 on those generated subclusters. In Phase 4, each point in the data set is redistributed to the closest centroids of the clusters produced in Phase 3. Among these phases, Phase 2 and Phase 4 are used to further improve the clustering quality. Therefore these two phases are optional. BIRCH tries to produce the best clusters with the available resources with a limited amount of main memory. An important consideration is to minimize the time required for input/output. BIRCH applies a multiphase clustering technique as: a single scan of the data set yields a basic good clustering and one or more additional scans can be used to further improve the quality [9]. G. Karypis, E.H. Han, and V. Kumar in [10] proposed another hybrid clustering algorithm named as CHAMELEON. Chameleon is a hierarchical clustering algorithm that uses dynamic modeling to determine the similarity between pairs of clusters. The Chameleon algorithm’s key feature is that it gives importance for both interconnectivity and closeness in identifying the most similar pair of clusters. Interconnectivity is the number of links between two clusters and closeness is the length of those links. This algorithm is described as follows and summarized in fig. 3 1) Construct a k-nearest neighbour graph. 2) Partition the k-nearest neighbour graph into many small sub clusters. 3) Merge those sub clusters into final clustering results.
  • 4. Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.504-508 507 | P a g e Fig.3 Chameleon Algorithm Chameleon uses a k-nearest neighbour graph approach to construct a sparse graph. Each vertex of this graph represents a data object. There exists an edge between two vertices or objects if one object is among the k most similar objects of the other. The edges are weighted to reflect the similarity between objects. Chameleon uses a graph partitioning algorithm to partition the k-nearest neighbour graph into a large number of relatively small subclusters. It then uses an agglomerative hierarchical clustering algorithm that repeatedly merges subclusters based on their similarity. To determine the pairs of most similar subclusters, it takes into account both the interconnectivity as well as the closeness of the clusters [9]. Zhao and Karypis in [11] showed that the hybrid model of Bisecting k-means (BKM) and k- means (KM) clustering produces better results than individual BKM and KM. BKM [12] is a variant of KM clustering that produces either a partitional or a hierarchical clustering by recursively applying the basic KM method. It starts by considering the whole dataset to be one cluster. At each step, one cluster V is selected and bisected further into two partitions V1 and V2 using the basic KM algorithm. This process continues until the desired number of clusters or some other specified stopping condition is reached. There are a number of different ways to choose which cluster to split. For example, we can choose: the largest cluster at each step or the one with least overall similarity or a criterion that satisfies both size and overall similarity. This bisecting approach is very attractive in many applications such as document retrieval, document indexing problems and gene expression analysis as it is based on the homogeneity criterion. However, in some cases when a fraction of the dataset is left behind with no other way to re-cluster it again at each level of the binary hierarchical tree, a “refinement” is needed to re-cluster these resulting solutions. In [11], it has been conclude that the BKM with end-result refinement using the KM produces better results than KM and BKM. A drawback of this end- result enhancement is that KM has to wait until the former BKM finishes its clustering and then it takes the final set of centroids as initial centres for a better refinement [2]. Thus, in hybrid clustering, cascaded clustering algorithms cooperates together for the goal of refining the clustering solutions produced by a former clustering algorithm. Different hybrid clustering approaches discussed above are shown to be effective in improving the clustering quality but main drawback of this hybrid clustering approach is that it does not allow synchronous execution of the clustering algorithms that is one algorithm has to wait for another algorithm to finish its clustering. 4. Conclusions Combining multiple clustering is considered as an example to further broaden a new progress in the area of data clustering. In this paper different combined clustering approaches have been discussed that are shown to be effective in improving the clustering quality. Thus, computational complexity is the main drawback of ensemble clustering techniques and idle time wastage is one of the drawback of hybrid clustering approaches REFERENCES [1] Cutting, D., Karger, D., Pedersen, J. and Turkey, J.W., “Scatter/Gather: A Cluster- based Approach to Browsing Large Document Collections,” SIGIR ‘92, 1992, pp. 318-329. [2] M. Steinbach, G. Karypis, V. Kumar, “A comparison of document clustering techniques”, In Proceeding of the KDD Workshop on Text Mining, 2000, pp. 109- 110. [3] A. Strehl, J. Ghosh, “Cluster ensembles: knowledge reuse framework for combining partitioning”, Conference on Artificial Intelligence (AAAI 2002), AAAI/MIT Press, Cambridge, MA, 2002, pp. 93-98. [4] U. M. Fayyad, C. Reina, and P. S. Bradley, “Initialization of iterative refinement clustering algorithms”, InProc. 14th Intl. Conf. on Machine learning (ICML), 1998, pp. 194-198. [5] D. Greene, P. Cunningham, “Efficient ensemble methods for document clustering”, Technical Report, Trinity College Dublin, Computer Science Department, 2006. [6] N.M. Murty and G. Krishna, “A Hybrid Clustering Procedure for Concentric and Chain-Like Clusters”, Int’l J. Computer and Information Sciences, vol. 10, no. 6, 1981, pp. 397-412. [7] C. Lin, M. Chen, “Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging”, IEEE Transactions on Knowledge and Data Engineering 17 (2), 2005, pp. 145-159.
  • 5. Mr. Santosh D. Rokade, Mr. A. M. Bainwad / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.504-508 508 | P a g e [8] T. Zhang, R. Ramakrishna, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” Proc. Conf. Management of Data (ACM SIGMOD ’96), 1996, pp. 103-114. [9] Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques (Second Edition. Jim Gray, Series Editor, Morgan Kaufmann Publishers, March 2006). [10] G. Karypis, E.H. Han, and V. Kumar, “Chameleon: Hierarchical Clustering Using Dynamic Modeling,” IEEE Computer Society, vol. 32, no. 8, Aug. 1999, pp. 68-75. [11] Y. Zhao, G. Karypis, “Criterion functions for document clustering: experiments and analysis”, Technical Report, 2002. [12] S.M. Savaresi, D. Boley, “On the performance of bisecting k-means and PDDP”, in: Proceedings of the 1st SIAM International Conference on Data Mining, , 2001, pp. 114. [13] Rasha Kashef, Mohamed S. Kamel, “Cooperative Clustering”, Pattern Recognition, 43, 2010, pp. 2315-2329.