SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
International Journal of Research in Computer Science
eISSN 2249-8265 Volume 2 Issue 1 (2011) pp. 39-43
© White Globe Publications
www.ijorcs.org


            QUALITY OF CLUSTER INDEX BASED ON
                 STUDY OF DECISION TREE
                       B.Rajasekhar1, B. Sunil Kumar2, Rajesh Vibhudi3, B.V.Rama Krishna4
                       12
                          Assistant Professor, Jawaharlal Nehru Institute of Technology, Hyderabad
                                 3
                                   Sri Mittapalli college of Engineering, Guntur, Hyderabad
                     4
                      Associate Professor, St Mary’s College of Engineering & Technology, Hyderabad

   Abstract:- Quality of clustering is an important          similarity measure of clusters whose bases are the
issue in application of clustering techniques. Most          dispersion measure of a cluster and the cluster
traditional cluster validity indices are geometry-based      dissimilarity measure. In the case of fuzzy clustering
cluster quality measures. This work proposes a cluster       algorithms, some validity indices such as partition
validity index based on the decision-theoretic rough         coefficient [1] and classification entropy use only the
set model by considering various loss functions. Real        information of fuzzy membership grades to evaluate
time retail data show the usefulness of the proposed         clustering results. The third category consists of
validity index for the evaluation of rough and crisp         validity indices that make use of not only the fuzzy
clustering. The measure is shown to help determine           membership grades but also the structure of the data.
optimal number of clusters, as well as an important          All these validity indices are essentially based on the
parameter called threshold in rough clustering. The          geometric characteristics of the clusters. A decision-
experiments with a promotional campaign for the              theoretic measure of cluster quality, decision theoretic
retail data illustrate the ability of the proposed           framework has been helpful in providing a better
measure to incorporate financial considerations in           understanding of classification models [4]. The
evaluating quality of a clustering scheme. This ability      decision theoretic rough set model considers various
to deal with monetary values distinguishes the               classes of loss functions. By adjusting loss functions,
proposed decision-theoretic measure from other               the decision-theoretic rough set model can also be
distance-based measures. Our proposed system                 extended to the multi category problem. It is possible
validity index can also be efficient for evaluating other    to construct a cluster validity index by considering
clustering algorithms such as fuzzy clustering.              various loss functions based on decision theory. Such a
                                                             measure has an added advantage of being applicable to
Keywords: Clustering, Classification, Decision Tree,
                                                             rough-set-based clustering. This work describes how to

                I.
K-means.
                                                             develop a cluster validity index from the decision-
                                                             theoretic rough set model. Based on the decision
                      INTRODUCTION
                                                             theory, the proposed rough cluster validity index is
   Unsupervised learning clustering is one of the            taken as a function of total risk for grouping objects
techniques in data mining, categorizes unlabeled             using a clustering algorithm. Since crisp[5] clustering
objects into several clusters such that the objects          is a special case of rough clustering, index validity is
belonging to the same cluster are other similar than         applicable to both rough clustering and crisp
those belonging to different clusters. Conventional          clustering. Experiments with synthetic and real-world
clustering assigns an object to exactly one cluster.         data show the usefulness of the proposed validity
Assign an object to rough-set-based variation makes it       index for the evaluation of rough clustering and crisp
possible. [3]. Quality of clustering is an important         clustering.
issue in application of clustering techniques to real-
world data. A good measure of cluster quality will help                 II.   CLUSTERING TECHNIQUE
in deciding various parameters used in clustering               The clustering technique K-means [7] is a
algorithms. One such parameter that is common to             prototype-based,     simple     partitional    clustering
most clustering algorithms is the number of clusters.        technique which attempts to find k non-overlapping
Many different indices of cluster validity have been         clusters. These clusters are represented by their
proposed. In general, indices of cluster validity fall       centroids (a cluster centroid is typically the mean of
into one of three categories. Some validity indices          the points in the cluster). The clustering process of K-
measure partition validity to evaluate the properties of     means is as follows. Firstly, k initial centroids are
crisp structure imposed on the data by the clustering        selected, where k is specified by the user and indicates
algorithm, such as Dunn indices [7] and Davies-Bould         the desired number of clusters.
in index [2]. These validity indices are based on


                                                                               www.ijorcs.org
40                                                          B.Rajasekhar, B. Sunil Kumar, Rajesh Vibhudi, B.V.Rama Krishna
   Secondly, every point in the data is then assigned to         and Intra Î. The greater the value of Intra the more is
the closest centroid, and each collection of points              the cluster compactness.[1] If the second BMUs of all
assigned to a centroid forms a cluster. The centroid of          data vectors in Ck are also in Ck, then Intra(Ck)=1. The
each cluster is then updated based on the points                 intra-cluster connectivity of all clusters (Intra) is the


                                                                                  𝐴 = ∑ 𝑘𝐾
                                                                                                 𝐼𝑛𝑡𝑟𝑎_𝐶𝑜𝑛𝑛(𝐶 𝑘 )
assigned to the cluster. This process is repeated until          average compactness which is given below

                                                                                                                  𝐾
no point changes clusters.
A. Clustering Crisp Method: The objective of the k-
means is to assign n objects to k clusters. The process          D. Cluster Quality: Several cluster validity indices to
begins by randomly choosing k objects as the centroids           evaluate cluster quality obtained by different clustering


 𝑑(𝑥1 , ⃗1 ) between the object vector ⃗1 and the cluster
   ⃗ 𝑐                                  𝑥
of the k clusters. The objects are assigned to one of the        algorithms. An excellent summary of various validity


vector ⃗1 the distance 𝑑(𝑥1 , ⃗1 ) can be the standard
           𝑐                ⃗ 𝑐
k clusters based on the minimum value of the distance            measures [10] are two classical cluster validity indices
                                                                 and one used for fuzzy clusters.
Euclidean distance.                                              1. Davies-Bouldin Index:
Assignment of all the objects to various clusters, the              This index [6] is a function of the ratio of the sum


                  ∑ ⃗ 𝑖 ∈𝑐 𝑖 ⃗ 𝑖
                    𝑥 ⃗ 𝑥                                        the distance between cluster ⃗ 𝑖 and ⃗𝑗 , denoted by 𝑑 𝑖𝑗 ,
                                                                                                𝑐     𝑐
                                                                 of within cluster scatter to between-cluster separation.


           ⃗𝑖 =
           𝑐                     , 𝑤ℎ𝑒𝑟𝑒 1 ≤ 𝑖 ≤ 𝑘
new centroid vectors of the clusters are calculated as


                     |𝑐 𝑖 |
                                                                 The scatter within the ith cluster, denoted by Si, and

                        ⃗
                                                                                                                              1�
Here |𝑐 𝑖 | is the cardinality of cluster ⃗ 𝑖 . The process
       ⃗                                  𝑐                                 𝑆 𝑖,𝑞 = �|𝑐 | ∑ ⃗𝑥∈𝑐 𝑖‖ ⃗ −
                                                                                               ⃗ 𝑥                    ⃗ 𝑖 ‖2 �
                                                                                                                      𝑐
                                                                                         1                                  𝑞    𝑞
                                                                 are defined as follows:

                                                                                      ⃗   𝑖

                                                                                       𝑑 𝑖𝑗,𝑡 = �𝑐 𝑖 − ⃗𝑗 � 𝑡
                                                                                                 ⃗     𝑐
stops when the centroids of clusters stabilize, i.e., the


                                                                 where 𝑐 𝑖 is the center of the ith cluster. 𝑐 𝑖𝑗 is the
centroid vectors from the previous iteration are
identical to those generated in the current iteration.

                                                                 number of objects in ⃗𝑗 . Integers q and t can be
                                                                                         𝑐
B. Cluster Validity: A New validity index conn_index
for prototype based clustering of data sets is applicable
with a wide variety of cluster characteristics clusters of       selected independently such that q, t > 1. The Davies-
different shapes, sizes, densities and even overlaps.            Bouldin index for a clustering scheme (CS) is then

                                                                            1
                                                                                   𝑘

                                                                    𝐷𝐵(𝐶𝑆) = � 𝑅 𝑖,𝑞𝑡 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖,𝑞𝑡
Conn_index is based on weighted Delaunay                         defined as


                                                                             𝑘
triangulation called “connectivity matrix”.


                                                                                       = max1≤𝑗≤𝑘,𝑗≠1 { 𝑆 𝑖,𝑞 + 𝑆 𝑗,𝑞 ⁄ 𝑑 𝑖𝑗,𝑡 }
                                                                                  𝑖=1
   Crisp clustering the Davies-Bouldin index and the
generalized Dunn Index are some of the most
commonly used indices depend on a separation
measure between clusters and a measure for                       The Davies-Bouldin index considers the average case
compactness of clusters based on distance. When the              of similarity between each cluster and the one that is
clusters have homogeneous density distribution, one              most similar to it. Lower Davies-Bouldin index means
effective approach to correctly evaluate the clustering          a better clustering scheme.
of data sets is CDbw (composite density between and
within clusters) [16]. CDbw finds prototypes for                 2. Dunn Index:
clusters instead of representing the clusters by their           Dunn proposed another cluster validity index [7]. The
centroids, and calculates the validity measure based on

                                                                                                                     𝛿�𝑐 𝑖 , ⃗𝑗 �
                                                                                                                       ⃗ 𝑐
                                                                 index corresponding to a clustering scheme (CS) is


                                                                    𝐷(𝐶𝑆) = min � min                         �                     ��
inter- and intra-cluster densities, and cluster                  defined by

                                                                                                                  max1≤𝑞≤𝑘 ∆(𝑐 𝑞 )⃗
separation.

                                                                               1≤𝑗≤𝑘 1≤𝑗≤𝑘,𝑗≠1

                                                                          𝛿�𝑐 𝑖 , ⃗𝑗 � =
                                                                            ⃗ 𝑐                    min                �𝑐 𝑖 −𝑐 𝑗 � ,
                                                                                                                       ⃗ ⃗
C. Compactness of Clusters: Assuming k number of
clusters, N prototypes v in a data set, Ck and Cl are two

                                                                                              1≤𝑖,𝑗≤𝑘,𝑖≠𝑗

                                                                             ∆(𝑐 𝑖 ) = max ‖ ⃗ 𝑖 − ⃗ 𝑡 ‖
                                                                               ⃗             𝑥     𝑥
different clusters where 1 ≤ k, l ≤K, the new proposed
CONN_Index will be defined with the help of Intra

                                                                                              ⃗ 𝑖 ,𝑥 𝑡 ∈𝑐 𝑖
                                                                                              𝑥 ⃗ ⃗
and Inter quantities which are considered as
compactness and separation. The compactness of Ck,
Intra (Ck) is the ratio of the number of data vectors in


                                                                 usually large and the diameters of the clusters, Δ ( 𝑐 𝑖 )
Ck whose second BMU is also in Ck, to the number of              If a data set is well separated by a clustering scheme,
                                                                 the distance among the clusters, δ(ci,cj)(1≤ i,j ≤k) is

                        ∑ 𝑖,𝑗�𝐶𝐴𝐷𝐽(𝑖, 𝑗): 𝑣 𝑖 𝑣 𝑗 ∈ 𝐶 𝑘 �
                           𝑁
data vectors in Ck. The Intra (Ck) is defined by

     𝐼𝑛𝑡𝑟𝑎_𝐶𝑜𝑛𝑛(𝐶 𝑘 ) =
                         ∑ 𝑖,𝑗{ 𝐶𝐴𝐷𝐽(𝑖, 𝑗): 𝑣 𝑖 ∈ 𝐶 𝑘 }
                             𝑁
                                                                 (1≤ i ≤k), are expected to be small. Therefore, a large
                                                                 value of D(CS) corresponds to a good clustering



                                                                                  www.ijorcs.org
Quality of Cluster Index Based on Study of Decision Tree                                                              41
scheme. The main drawback of the Dunn index is that            each other, would like to compute a function f: X*Y->
the calculation is computationally expensive and the           {0,1} at (x,y) with minimal amount of interaction
index is sensitive to noise.                                   between them. Interaction is some measure of
                                                               communication between the two parties and it is
                  III. DECISION TREE                           usually the total number of bits exchanged between the
                                                               parties. The classification of objects according to
A. Decision Tree
                                                               approximation operators in rough set theory can be
   A decision tree depicts riles for Classifying data          easily fitted into the Bayesian decision-theoretic
into groups. Splits entire data set into some number of        framework. Let Ω={A,Ac} denote the set of states
pieces and then another rule may be applied to a piece,        indicating that an object is in A and not in A,
different rules to different pieces forming a second           respectively. Let A ={a1,a2,a3} be the set of actions,
generation of pieces. The tree depicts the first split into    where a1,a2,a3 represent the three actions in classifying
pieces as branches emanating from a root and                   an object, deciding POS(A), deciding NEG(A), and
subsequent splits as branches emanating from nodes on          deciding BND(A), respectively.
older branches. The leaves of the tree are the final
groups, the unsplit nodes. For some perverse reason,           C. Implementation of the CRISP-DM:
trees are always drawn upside down, like an
                                                               CRISP-DM is based on the process flow showed in
organizational chart. For a tree to be useful, the data in
                                                               Figure 1. The model proposes the following steps:
a leaf must be similar with respect to some target
measure, so that the tree represents the segregation of a       1. Business Understanding – to understand the rules
mixture of data into purified groups.                              and business objectives of the company.
   Consider an example of data collected on people in           2. Understanding Data – to collect and describe data.
a city park in the vicinity of a hotdog and ice cream           3. Data Preparation – to prepare data for import into
stand. The owner of the concession stand wants to                  the software.
know what predisposes people to buy ice cream.                  4. Modelling – to select the modelling technique to be
Among all the people observed, forty percent buy ice               used.
cream. This is represented in the root node of the tree         5. Evaluation – to evaluate the process to see if the
at the [9] top of the diagram. The first rule splits the           technique solves the problem of modelling and
data according to the weather. Unless it is sunny and              creation of rules.
hot, only five percent buy ice cream. This is                   6. Deployment – to deploy the system and train its
represented in the leaf on the left branch. On sunny               users.
and hot days, sixty percent buy ice cream. The tree
represents this population as an internal node that is
further split into two branches, one of which is split
again.

                     40% Buy ice cream

      No Sunny                                 Hot Yes


  5 % Buy Ice cream                 60% Buy Ice Cream

                     No           Have extra money ?     Yes


   30% Buy Ice Cream                   80% Buy Ice Cream


         No
                                                                         Figure 2 Example of Crisp data mining
                                    Yes
                Crave Ice Cream
                                                                            IV. PROPOSED SYSTEM
   10% Buy Ice Cream                70% Buy Ice Cream
                                                                  Unsupervised classification method when the only
                                                               data available are unlabeled. It is need to know the
              Figure 1 Example of Decision Tree
                                                               number of clusters. A cluster validity measure can
B. Yao’s model for Decision Tree:                              provide us some information about the appropriate
                                                               number of clusters. Our solution possible to construct a
  The model consists of two parties to holding values          cluster validity measure by considering various loss
xЄX and yЄY respectively who can communicate with              functions based on decision theory.


                                                                                www.ijorcs.org
42                                                          B.Rajasekhar, B. Sunil Kumar, Rajesh Vibhudi, B.V.Rama Krishna

                           Cluster                               favorable execution time and the user has to know in
     Dataset               Validity                              advance how many clusters are to be searched, k-
                           Measure                               means is data driven is efficient for smaller data sets
                                                                 and anomaly detection. Instead of taking the mean
                                                                 value of the objects in a cluster as a reference point, a
                                                                 Medoid can be used, which is the most centrally
                           Decision              Loss            located object in a cluster. Clustering requires the
                           Tree                  Function        distance between every pair of objects only once and
                                                                 uses the distance at every stage of iteration.
                                        Result
                                                                 Comparing to [8] clustering, classification algorithms
                                                                 performs efficient for complex datasets, noise and
               Figure 3 is proposed system ([6])
                                                                 outlier detection such as algorithm designers have had
   We choose K-means clustering because 1)it is data             much success with equal width method, equal depth
driven method relatively few assumptions on the                  method approaches to building class descriptions. It is
distributions of the underlying data and 2) greedy               chosen decision tree learners made popular by ID3,
search strategy of K-means guarantees at least a local           C4.5 and CART, because they are relatively fast and
minimum of the criterion function, thereby                       typically they produce competitive classifiers. In fact,
accelerating the convergence of clusters on large                the decision tree generator C4.5, a successor to ID3,
datasets.                                                        has become a standard factor for comparison in
A. Cluster Quality on Decision Theory:                           machine learning research, because it produces good
                                                                 classifiers quickly. For non numeric datasets, the
    Unsupervised learning method is the techniques we            growth of the run time of ID3 (and C4.5) is linear in all
apply only data available are unlabeled, algorithms              examples.
need to know the number of clusters. Cluster validity
measures are Davies-Bouldin can help us assess                      The practical run time complexity of C4.5 has been
whether a clustering method accurately presents the              determined empirically to be worse than O (e2) on
structure of the data set. There are several cluster             some datasets. One possible explanation is based on
indices to evaluate crisp and fuzzy clusteruing.                 the observation of Oates and Jensen (1998) that the
Decision framework has been helpful in providing a               size of C4.5 trees increases linearly with the number of
better understanding of the classification model.                examples. One of the factors of a in C4.5’s run-time
Decision rough set model considers various classes of            complexity corresponds to the tree depth, which
loss functions, the extension of the decision rough set          cannot be larger than the number of attributes. Tree
model to multicategory is possible to construct a                depth is related to tree size, and thereby to the number
cluster validity measure by considering various loss             of examples. When compared with C4.5, the run time


                                                                                   V. CONCLUSION
functions based on decision theory. Within a given set           complexity of CART is satisfactory.
of objects there may be clusters such that objects in the
same cluster are more similar than those in different
clusters. Clustering is to find the right groups or                 A cluster quality index based on decision theory,
clusters for the given set of objects. To find right             proposal uses a loss function to construct the quality
cluster we need exponential time comparisons has                 index. Therefore, the cluster quality is evaluated by
been proved to be NP-hard. For defining framework                considering the total risk of classifying all the objects.
we assume partitions a set of objects X={x1….xn} into            Such a decision-theoretic representation of cluster
clusters CS={c1…ck}, the k-means algorithm                       quality may be more useful in business-oriented data
approximate the actual clustering. It is possible that           mining than traditional geometry-based cluster quality
each object may not necessarily belong to only one               measures. In addition to evaluating crisp clustering, the
cluster. However there will be corresponding to each             proposal is an evaluation measure for rough clustering.
cluster within the clustering scheme, the centroid of            This is the first measure that takes into account special
the hypothetical core will be used Cluster core. Let             features of rough clustering that allow for an object to
core (ci) be the core of the cluster ci, which is used to        belong to more than one cluster. The measure is shown
calculate the centroid of the cluster. Any x1Є core (ci)         to be useful in determining important aspects of a
cannot belong to other clusters. Therefore, core (ci) can        clustering exercise such as determining the appropriate
be considered the best representation of ci to a certain         number of clusters and size of boundary region. The
extent.                                                          application of the measure to synthetic data with
                                                                 known number of clusters and boundary region
B. Comparison of Clustering and Classification:                  provides credence to the proposal.
Clustering work well for finding unlabeled clusters in              A real advantage of the decision-theoretic cluster
small to large data points K-means algorithm is its              validity measure is its ability to include monetary


                                                                                  www.ijorcs.org
Quality of Cluster Index Based on Study of Decision Tree                            43
considerations in evaluating a clustering scheme. Use
of the measure to derive an appropriate clustering
scheme for a promotional campaign in a retail store
highlighted its unique ability to include cost and
benefit considerations in commercial data mining. We
can also extend it to evaluating other clustering
algorithms such as fuzzy clustering. Such a cluster
validity measure can be useful in further theoretical


                    VI. REFERENCES
development in clustering.



[1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective
      Function Algorithms. Plenum Press, 1981.
[2] D.L. Davies and D.W. Bouldin, “A Cluster Separation
      Measure,” IEEE Trans. Pattern Analysis and Machine
      Intelligence, vol. 1, no 2, pp. 224-227, Apr. 1979.
[3]   J.C. Dunn, “Well Separated Clusters and Optimal Fuzzy
      Partitions,” J. Cybernetics, vol. 4, pp. 95-104, 1974.
[4]   S. Hirano and S. Tsumoto, “On Constructing Clusters
      from Non- Euclidean Dissimilarity Matrix by Using
      Rough Clustering,” Proc. Japanese Soc. for Artificial
      Intelligence (JSAI) Workshops, pp. 5-16, 2005.
[5]   T.B. Ho and N.B. Nguyen, “Nonhierarchical Document
      Clustering by a Tolerance Rough Set Model,” Int’l J.
      Intelligent Systems, vol. 17, no. 2, pp. 199-212, 2002.
[6]   Rough Cluster Quality Index Based on Decision Theory
      Pawan Lingras, Member, IEEE, Min Chen, and
      Duoqian Miao IEEE TRANSACTIONS ON
      KNOWLEDGE AND DATA ENGINEERING, VOL.
      21, NO. 7, JULY 2009
[7]   W. Pedrycz and J. Waletzky, “Fuzzy Clustering with
      Partial Supervision,” IEEE Trans. Systems, Man, and
      Cybernetics, vol. 27, no. 5, pp. 787-795, Sept. 1997.
[8]   Partition Algorithms– A Study and Emergence of
      Mining Projected Clusters in High-Dimensional
      Dataset-International Journal of Computer Science and
      Telecommunications [Volume 2, Issue 4, July 2011]
[9]   Jensen, D. D. and Cohen, P. R (1999), "Multiple
      Comparisons in Induction Algorithms," Machine
      Learning (to appear). Excellent discussion of bias
      inherent      in    selecting     an     input.    Explore
      http://www.cs.umass.edu/~jensen/papers.




                                                                   www.ijorcs.org

Weitere ähnliche Inhalte

Was ist angesagt?

Cluster analysis
Cluster analysisCluster analysis
Cluster analysisAcad
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clusteringKrish_ver2
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringDr Nisha Arora
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...Adam Fausett
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysisAnimesh Kumar
 

Was ist angesagt? (20)

Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 
K means clustering
K means clusteringK means clustering
K means clustering
 
My8clst
My8clstMy8clst
My8clst
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Cluster Validation
Cluster ValidationCluster Validation
Cluster Validation
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
cluster analysis
cluster analysiscluster analysis
cluster analysis
 
Dbm630 lecture09
Dbm630 lecture09Dbm630 lecture09
Dbm630 lecture09
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
K means report
K means reportK means report
K means report
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Bb25322324
Bb25322324Bb25322324
Bb25322324
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 

Andere mochten auch

Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...Alexander Decker
 
Improved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansImproved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansIJASCSE
 
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajjaTrabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajjaadriannaranjo3
 
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)IJERD Editor
 

Andere mochten auch (6)

50120130406008
5012013040600850120130406008
50120130406008
 
Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...
 
Improved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansImproved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-Means
 
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajjaTrabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
 
By
ByBy
By
 
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
 

Ähnlich wie QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE

26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET Journal
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET Journal
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques finalBenard Maina
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporaryprjpublications
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...IRJET Journal
 

Ähnlich wie QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE (20)

Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
 
Clustering
ClusteringClustering
Clustering
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques final
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
 

Mehr von IJORCS

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsIJORCS
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4IJORCS
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemIJORCS
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveIJORCS
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...IJORCS
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionIJORCS
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicIJORCS
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...IJORCS
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2IJORCS
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1IJORCS
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessIJORCS
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...IJORCS
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...IJORCS
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsIJORCS
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...IJORCS
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...IJORCS
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 

Mehr von IJORCS (20)

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition System
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression Function
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State Logic
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETs
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE

  • 1. International Journal of Research in Computer Science eISSN 2249-8265 Volume 2 Issue 1 (2011) pp. 39-43 © White Globe Publications www.ijorcs.org QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE B.Rajasekhar1, B. Sunil Kumar2, Rajesh Vibhudi3, B.V.Rama Krishna4 12 Assistant Professor, Jawaharlal Nehru Institute of Technology, Hyderabad 3 Sri Mittapalli college of Engineering, Guntur, Hyderabad 4 Associate Professor, St Mary’s College of Engineering & Technology, Hyderabad Abstract:- Quality of clustering is an important similarity measure of clusters whose bases are the issue in application of clustering techniques. Most dispersion measure of a cluster and the cluster traditional cluster validity indices are geometry-based dissimilarity measure. In the case of fuzzy clustering cluster quality measures. This work proposes a cluster algorithms, some validity indices such as partition validity index based on the decision-theoretic rough coefficient [1] and classification entropy use only the set model by considering various loss functions. Real information of fuzzy membership grades to evaluate time retail data show the usefulness of the proposed clustering results. The third category consists of validity index for the evaluation of rough and crisp validity indices that make use of not only the fuzzy clustering. The measure is shown to help determine membership grades but also the structure of the data. optimal number of clusters, as well as an important All these validity indices are essentially based on the parameter called threshold in rough clustering. The geometric characteristics of the clusters. A decision- experiments with a promotional campaign for the theoretic measure of cluster quality, decision theoretic retail data illustrate the ability of the proposed framework has been helpful in providing a better measure to incorporate financial considerations in understanding of classification models [4]. The evaluating quality of a clustering scheme. This ability decision theoretic rough set model considers various to deal with monetary values distinguishes the classes of loss functions. By adjusting loss functions, proposed decision-theoretic measure from other the decision-theoretic rough set model can also be distance-based measures. Our proposed system extended to the multi category problem. It is possible validity index can also be efficient for evaluating other to construct a cluster validity index by considering clustering algorithms such as fuzzy clustering. various loss functions based on decision theory. Such a measure has an added advantage of being applicable to Keywords: Clustering, Classification, Decision Tree, rough-set-based clustering. This work describes how to I. K-means. develop a cluster validity index from the decision- theoretic rough set model. Based on the decision INTRODUCTION theory, the proposed rough cluster validity index is Unsupervised learning clustering is one of the taken as a function of total risk for grouping objects techniques in data mining, categorizes unlabeled using a clustering algorithm. Since crisp[5] clustering objects into several clusters such that the objects is a special case of rough clustering, index validity is belonging to the same cluster are other similar than applicable to both rough clustering and crisp those belonging to different clusters. Conventional clustering. Experiments with synthetic and real-world clustering assigns an object to exactly one cluster. data show the usefulness of the proposed validity Assign an object to rough-set-based variation makes it index for the evaluation of rough clustering and crisp possible. [3]. Quality of clustering is an important clustering. issue in application of clustering techniques to real- world data. A good measure of cluster quality will help II. CLUSTERING TECHNIQUE in deciding various parameters used in clustering The clustering technique K-means [7] is a algorithms. One such parameter that is common to prototype-based, simple partitional clustering most clustering algorithms is the number of clusters. technique which attempts to find k non-overlapping Many different indices of cluster validity have been clusters. These clusters are represented by their proposed. In general, indices of cluster validity fall centroids (a cluster centroid is typically the mean of into one of three categories. Some validity indices the points in the cluster). The clustering process of K- measure partition validity to evaluate the properties of means is as follows. Firstly, k initial centroids are crisp structure imposed on the data by the clustering selected, where k is specified by the user and indicates algorithm, such as Dunn indices [7] and Davies-Bould the desired number of clusters. in index [2]. These validity indices are based on www.ijorcs.org
  • 2. 40 B.Rajasekhar, B. Sunil Kumar, Rajesh Vibhudi, B.V.Rama Krishna Secondly, every point in the data is then assigned to and Intra Î. The greater the value of Intra the more is the closest centroid, and each collection of points the cluster compactness.[1] If the second BMUs of all assigned to a centroid forms a cluster. The centroid of data vectors in Ck are also in Ck, then Intra(Ck)=1. The each cluster is then updated based on the points intra-cluster connectivity of all clusters (Intra) is the 𝐴 = ∑ 𝑘𝐾 𝐼𝑛𝑡𝑟𝑎_𝐶𝑜𝑛𝑛(𝐶 𝑘 ) assigned to the cluster. This process is repeated until average compactness which is given below 𝐾 no point changes clusters. A. Clustering Crisp Method: The objective of the k- means is to assign n objects to k clusters. The process D. Cluster Quality: Several cluster validity indices to begins by randomly choosing k objects as the centroids evaluate cluster quality obtained by different clustering 𝑑(𝑥1 , ⃗1 ) between the object vector ⃗1 and the cluster ⃗ 𝑐 𝑥 of the k clusters. The objects are assigned to one of the algorithms. An excellent summary of various validity vector ⃗1 the distance 𝑑(𝑥1 , ⃗1 ) can be the standard 𝑐 ⃗ 𝑐 k clusters based on the minimum value of the distance measures [10] are two classical cluster validity indices and one used for fuzzy clusters. Euclidean distance. 1. Davies-Bouldin Index: Assignment of all the objects to various clusters, the This index [6] is a function of the ratio of the sum ∑ ⃗ 𝑖 ∈𝑐 𝑖 ⃗ 𝑖 𝑥 ⃗ 𝑥 the distance between cluster ⃗ 𝑖 and ⃗𝑗 , denoted by 𝑑 𝑖𝑗 , 𝑐 𝑐 of within cluster scatter to between-cluster separation. ⃗𝑖 = 𝑐 , 𝑤ℎ𝑒𝑟𝑒 1 ≤ 𝑖 ≤ 𝑘 new centroid vectors of the clusters are calculated as |𝑐 𝑖 | The scatter within the ith cluster, denoted by Si, and ⃗ 1� Here |𝑐 𝑖 | is the cardinality of cluster ⃗ 𝑖 . The process ⃗ 𝑐 𝑆 𝑖,𝑞 = �|𝑐 | ∑ ⃗𝑥∈𝑐 𝑖‖ ⃗ − ⃗ 𝑥 ⃗ 𝑖 ‖2 � 𝑐 1 𝑞 𝑞 are defined as follows: ⃗ 𝑖 𝑑 𝑖𝑗,𝑡 = �𝑐 𝑖 − ⃗𝑗 � 𝑡 ⃗ 𝑐 stops when the centroids of clusters stabilize, i.e., the where 𝑐 𝑖 is the center of the ith cluster. 𝑐 𝑖𝑗 is the centroid vectors from the previous iteration are identical to those generated in the current iteration. number of objects in ⃗𝑗 . Integers q and t can be 𝑐 B. Cluster Validity: A New validity index conn_index for prototype based clustering of data sets is applicable with a wide variety of cluster characteristics clusters of selected independently such that q, t > 1. The Davies- different shapes, sizes, densities and even overlaps. Bouldin index for a clustering scheme (CS) is then 1 𝑘 𝐷𝐵(𝐶𝑆) = � 𝑅 𝑖,𝑞𝑡 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖,𝑞𝑡 Conn_index is based on weighted Delaunay defined as 𝑘 triangulation called “connectivity matrix”. = max1≤𝑗≤𝑘,𝑗≠1 { 𝑆 𝑖,𝑞 + 𝑆 𝑗,𝑞 ⁄ 𝑑 𝑖𝑗,𝑡 } 𝑖=1 Crisp clustering the Davies-Bouldin index and the generalized Dunn Index are some of the most commonly used indices depend on a separation measure between clusters and a measure for The Davies-Bouldin index considers the average case compactness of clusters based on distance. When the of similarity between each cluster and the one that is clusters have homogeneous density distribution, one most similar to it. Lower Davies-Bouldin index means effective approach to correctly evaluate the clustering a better clustering scheme. of data sets is CDbw (composite density between and within clusters) [16]. CDbw finds prototypes for 2. Dunn Index: clusters instead of representing the clusters by their Dunn proposed another cluster validity index [7]. The centroids, and calculates the validity measure based on 𝛿�𝑐 𝑖 , ⃗𝑗 � ⃗ 𝑐 index corresponding to a clustering scheme (CS) is 𝐷(𝐶𝑆) = min � min � �� inter- and intra-cluster densities, and cluster defined by max1≤𝑞≤𝑘 ∆(𝑐 𝑞 )⃗ separation. 1≤𝑗≤𝑘 1≤𝑗≤𝑘,𝑗≠1 𝛿�𝑐 𝑖 , ⃗𝑗 � = ⃗ 𝑐 min �𝑐 𝑖 −𝑐 𝑗 � , ⃗ ⃗ C. Compactness of Clusters: Assuming k number of clusters, N prototypes v in a data set, Ck and Cl are two 1≤𝑖,𝑗≤𝑘,𝑖≠𝑗 ∆(𝑐 𝑖 ) = max ‖ ⃗ 𝑖 − ⃗ 𝑡 ‖ ⃗ 𝑥 𝑥 different clusters where 1 ≤ k, l ≤K, the new proposed CONN_Index will be defined with the help of Intra ⃗ 𝑖 ,𝑥 𝑡 ∈𝑐 𝑖 𝑥 ⃗ ⃗ and Inter quantities which are considered as compactness and separation. The compactness of Ck, Intra (Ck) is the ratio of the number of data vectors in usually large and the diameters of the clusters, Δ ( 𝑐 𝑖 ) Ck whose second BMU is also in Ck, to the number of If a data set is well separated by a clustering scheme, the distance among the clusters, δ(ci,cj)(1≤ i,j ≤k) is ∑ 𝑖,𝑗�𝐶𝐴𝐷𝐽(𝑖, 𝑗): 𝑣 𝑖 𝑣 𝑗 ∈ 𝐶 𝑘 � 𝑁 data vectors in Ck. The Intra (Ck) is defined by 𝐼𝑛𝑡𝑟𝑎_𝐶𝑜𝑛𝑛(𝐶 𝑘 ) = ∑ 𝑖,𝑗{ 𝐶𝐴𝐷𝐽(𝑖, 𝑗): 𝑣 𝑖 ∈ 𝐶 𝑘 } 𝑁 (1≤ i ≤k), are expected to be small. Therefore, a large value of D(CS) corresponds to a good clustering www.ijorcs.org
  • 3. Quality of Cluster Index Based on Study of Decision Tree 41 scheme. The main drawback of the Dunn index is that each other, would like to compute a function f: X*Y-> the calculation is computationally expensive and the {0,1} at (x,y) with minimal amount of interaction index is sensitive to noise. between them. Interaction is some measure of communication between the two parties and it is III. DECISION TREE usually the total number of bits exchanged between the parties. The classification of objects according to A. Decision Tree approximation operators in rough set theory can be A decision tree depicts riles for Classifying data easily fitted into the Bayesian decision-theoretic into groups. Splits entire data set into some number of framework. Let Ω={A,Ac} denote the set of states pieces and then another rule may be applied to a piece, indicating that an object is in A and not in A, different rules to different pieces forming a second respectively. Let A ={a1,a2,a3} be the set of actions, generation of pieces. The tree depicts the first split into where a1,a2,a3 represent the three actions in classifying pieces as branches emanating from a root and an object, deciding POS(A), deciding NEG(A), and subsequent splits as branches emanating from nodes on deciding BND(A), respectively. older branches. The leaves of the tree are the final groups, the unsplit nodes. For some perverse reason, C. Implementation of the CRISP-DM: trees are always drawn upside down, like an CRISP-DM is based on the process flow showed in organizational chart. For a tree to be useful, the data in Figure 1. The model proposes the following steps: a leaf must be similar with respect to some target measure, so that the tree represents the segregation of a 1. Business Understanding – to understand the rules mixture of data into purified groups. and business objectives of the company. Consider an example of data collected on people in 2. Understanding Data – to collect and describe data. a city park in the vicinity of a hotdog and ice cream 3. Data Preparation – to prepare data for import into stand. The owner of the concession stand wants to the software. know what predisposes people to buy ice cream. 4. Modelling – to select the modelling technique to be Among all the people observed, forty percent buy ice used. cream. This is represented in the root node of the tree 5. Evaluation – to evaluate the process to see if the at the [9] top of the diagram. The first rule splits the technique solves the problem of modelling and data according to the weather. Unless it is sunny and creation of rules. hot, only five percent buy ice cream. This is 6. Deployment – to deploy the system and train its represented in the leaf on the left branch. On sunny users. and hot days, sixty percent buy ice cream. The tree represents this population as an internal node that is further split into two branches, one of which is split again. 40% Buy ice cream No Sunny Hot Yes 5 % Buy Ice cream 60% Buy Ice Cream No Have extra money ? Yes 30% Buy Ice Cream 80% Buy Ice Cream No Figure 2 Example of Crisp data mining Yes Crave Ice Cream IV. PROPOSED SYSTEM 10% Buy Ice Cream 70% Buy Ice Cream Unsupervised classification method when the only data available are unlabeled. It is need to know the Figure 1 Example of Decision Tree number of clusters. A cluster validity measure can B. Yao’s model for Decision Tree: provide us some information about the appropriate number of clusters. Our solution possible to construct a The model consists of two parties to holding values cluster validity measure by considering various loss xЄX and yЄY respectively who can communicate with functions based on decision theory. www.ijorcs.org
  • 4. 42 B.Rajasekhar, B. Sunil Kumar, Rajesh Vibhudi, B.V.Rama Krishna Cluster favorable execution time and the user has to know in Dataset Validity advance how many clusters are to be searched, k- Measure means is data driven is efficient for smaller data sets and anomaly detection. Instead of taking the mean value of the objects in a cluster as a reference point, a Medoid can be used, which is the most centrally Decision Loss located object in a cluster. Clustering requires the Tree Function distance between every pair of objects only once and uses the distance at every stage of iteration. Result Comparing to [8] clustering, classification algorithms performs efficient for complex datasets, noise and Figure 3 is proposed system ([6]) outlier detection such as algorithm designers have had We choose K-means clustering because 1)it is data much success with equal width method, equal depth driven method relatively few assumptions on the method approaches to building class descriptions. It is distributions of the underlying data and 2) greedy chosen decision tree learners made popular by ID3, search strategy of K-means guarantees at least a local C4.5 and CART, because they are relatively fast and minimum of the criterion function, thereby typically they produce competitive classifiers. In fact, accelerating the convergence of clusters on large the decision tree generator C4.5, a successor to ID3, datasets. has become a standard factor for comparison in A. Cluster Quality on Decision Theory: machine learning research, because it produces good classifiers quickly. For non numeric datasets, the Unsupervised learning method is the techniques we growth of the run time of ID3 (and C4.5) is linear in all apply only data available are unlabeled, algorithms examples. need to know the number of clusters. Cluster validity measures are Davies-Bouldin can help us assess The practical run time complexity of C4.5 has been whether a clustering method accurately presents the determined empirically to be worse than O (e2) on structure of the data set. There are several cluster some datasets. One possible explanation is based on indices to evaluate crisp and fuzzy clusteruing. the observation of Oates and Jensen (1998) that the Decision framework has been helpful in providing a size of C4.5 trees increases linearly with the number of better understanding of the classification model. examples. One of the factors of a in C4.5’s run-time Decision rough set model considers various classes of complexity corresponds to the tree depth, which loss functions, the extension of the decision rough set cannot be larger than the number of attributes. Tree model to multicategory is possible to construct a depth is related to tree size, and thereby to the number cluster validity measure by considering various loss of examples. When compared with C4.5, the run time V. CONCLUSION functions based on decision theory. Within a given set complexity of CART is satisfactory. of objects there may be clusters such that objects in the same cluster are more similar than those in different clusters. Clustering is to find the right groups or A cluster quality index based on decision theory, clusters for the given set of objects. To find right proposal uses a loss function to construct the quality cluster we need exponential time comparisons has index. Therefore, the cluster quality is evaluated by been proved to be NP-hard. For defining framework considering the total risk of classifying all the objects. we assume partitions a set of objects X={x1….xn} into Such a decision-theoretic representation of cluster clusters CS={c1…ck}, the k-means algorithm quality may be more useful in business-oriented data approximate the actual clustering. It is possible that mining than traditional geometry-based cluster quality each object may not necessarily belong to only one measures. In addition to evaluating crisp clustering, the cluster. However there will be corresponding to each proposal is an evaluation measure for rough clustering. cluster within the clustering scheme, the centroid of This is the first measure that takes into account special the hypothetical core will be used Cluster core. Let features of rough clustering that allow for an object to core (ci) be the core of the cluster ci, which is used to belong to more than one cluster. The measure is shown calculate the centroid of the cluster. Any x1Є core (ci) to be useful in determining important aspects of a cannot belong to other clusters. Therefore, core (ci) can clustering exercise such as determining the appropriate be considered the best representation of ci to a certain number of clusters and size of boundary region. The extent. application of the measure to synthetic data with known number of clusters and boundary region B. Comparison of Clustering and Classification: provides credence to the proposal. Clustering work well for finding unlabeled clusters in A real advantage of the decision-theoretic cluster small to large data points K-means algorithm is its validity measure is its ability to include monetary www.ijorcs.org
  • 5. Quality of Cluster Index Based on Study of Decision Tree 43 considerations in evaluating a clustering scheme. Use of the measure to derive an appropriate clustering scheme for a promotional campaign in a retail store highlighted its unique ability to include cost and benefit considerations in commercial data mining. We can also extend it to evaluating other clustering algorithms such as fuzzy clustering. Such a cluster validity measure can be useful in further theoretical VI. REFERENCES development in clustering. [1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, 1981. [2] D.L. Davies and D.W. Bouldin, “A Cluster Separation Measure,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, no 2, pp. 224-227, Apr. 1979. [3] J.C. Dunn, “Well Separated Clusters and Optimal Fuzzy Partitions,” J. Cybernetics, vol. 4, pp. 95-104, 1974. [4] S. Hirano and S. Tsumoto, “On Constructing Clusters from Non- Euclidean Dissimilarity Matrix by Using Rough Clustering,” Proc. Japanese Soc. for Artificial Intelligence (JSAI) Workshops, pp. 5-16, 2005. [5] T.B. Ho and N.B. Nguyen, “Nonhierarchical Document Clustering by a Tolerance Rough Set Model,” Int’l J. Intelligent Systems, vol. 17, no. 2, pp. 199-212, 2002. [6] Rough Cluster Quality Index Based on Decision Theory Pawan Lingras, Member, IEEE, Min Chen, and Duoqian Miao IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 21, NO. 7, JULY 2009 [7] W. Pedrycz and J. Waletzky, “Fuzzy Clustering with Partial Supervision,” IEEE Trans. Systems, Man, and Cybernetics, vol. 27, no. 5, pp. 787-795, Sept. 1997. [8] Partition Algorithms– A Study and Emergence of Mining Projected Clusters in High-Dimensional Dataset-International Journal of Computer Science and Telecommunications [Volume 2, Issue 4, July 2011] [9] Jensen, D. D. and Cohen, P. R (1999), "Multiple Comparisons in Induction Algorithms," Machine Learning (to appear). Excellent discussion of bias inherent in selecting an input. Explore http://www.cs.umass.edu/~jensen/papers. www.ijorcs.org