SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
4th International Summer School
Achievements and Applications of Contemporary
Informatics, Mathematics and Physics
National University of Technology of the Ukraine
Kiev, Ukraine, August 5-16, 2009



                                 Clustering Theory

                   Data Mining for Quality Improvement
                      with Nonsmooth Optimization
                          vs. PAM and k-Means

                Gerhard-Wilhelm Weber * and Başak Akteke-Öztürk
                Gerhard-                          Akteke-
                                  Institute of Applied Mathematics
                          Middle East Technical University, Ankara, Turkey
               * Faculty of Economics, Management and Law, University of Siegen, Germany
                Center for Research on Optimization and Control, University of Aveiro, Portugal
Outline

 • Quality Analysis

 • Data Mining for Quality Analysis

 • Clustering Methods

 • Results and Comparison

 • Decision Tree Analysis of A Cluster

 • Conclusion
Quality Analysis


• Quality is an essential requirement of
    – products,
    – processes, and
    – services.


• This study is a part of a project whose main focus is on quality analysis:
  relationship between input and output

• Modern quality analysis takes advantage of using tools of Data Mining.
Data Mining for Quality Analysis


 Data mining tools such as
     –   decision trees (e.g. classification and regression trees (CART)),
     –   neural networks (NN),
     –   self-organizing maps (SOM),
     –   support vector machines (SVM),
 are highly prefered for modeling and producing rules for the output.

 Applications of such tools are not enough such that the
 industry people would prefer and make use of them for
 quality analysis needs.
Aim of Our Data Mining Studies

 • to identify the data mining approaches that can
   effectively improve product and process quality in industrial
   organizations:
     – classification / prediction,
     – clustering and
     – association analysis,


 • to develop new data mining software and improve the
   existing ones for quality analysis.

 • Inital study:  To identify the most influential variables that
   cause defects on the items produced by a casting company
   located in Turkey.
Our Data Set


  • Our data set:     92 objects (rows),
                      35 process variables (columns).

  • Belongs to a particular product, which has high percentage
    of defectives collected during the first five months
    production period of 2006.

  • Missing values:    filled with the averages of the columns
Clustering - 2 Algorithms (Model Free)
               choose a randon start partition




                           compute centroids




                            create minimal distance partition




           end partition




                     minimal distance procedure
Clustering - 2 Algorithms (Model Free)

           choose a randon start partition




                      test an object in all clusters




          update the centroids




              end partition




                        exchange procedure
                   minimal distance procedure
Our Clustering

• The data set scaled to the interval [0,1] before the clustering analysis:

                                      xi − xmin
                                xi =
                                  '
                                                 .
                                     xmax − xmin

• We used k-means, PAM (Partitioning Around Medoids) and
  a modified k-means by Nonsmooth Analysis:

        • to understand the data set by examining the groups in the data,
        • to find the outliers of the data set,
        • our data set was not big.

• These methods use Euclidean metric by default.
About the Methods


• PAM is more robust than k-means
  in the presence of noise and outliers.

• PAM minimizes a sum of dissimilarities
  instead of a sum of squared Euclidean distances.

• Medoids are less influenced by the presence of noise and outliers.

• A medoid can be defined as that object of a cluster, whose
  average distance (dissimilarity) to all the objects in the cluster
  is minimal.
Nonsmooth Analysis


 • k-means takes as input:
   the number of clusters and initial cluster centers.

 • This problem can be reduced to nonsmooth optimization problem
   --> initial problem for the a modified k-means.

     – global optimization techniques,
     – nonsmooth optimization algorithms and
     – derivative free optimization for the modified k-means algorithm.


 • The minimum sum of squares problem -->
   nonsmooth and nonconvex optimization problem.
k-Means Results


       k=2   cluster_1 (70 Object) – cluster_2 (22 Object)   1.113769


             cluster_1 (68 Object) – cluster_2 (22 Object)   1.111567
       k=3   cluster_1 (68 Object) – cluster_3 (2 Object)    1.593595
             cluster_2 (22 Object) – cluster_3 (2 Object)    1.968277


             cluster_1 (68 Object) – cluster_2 (6 Object)    1.44533
             cluster_1 (68 Object) – cluster_3 (2 Object)    1.593595
             cluster_1 (68 Object) – cluster_4 (16 Object)   1.104353
       k=4   cluster_2 (6 Object) – cluster_3 (2 Object)     2.197992
             cluster_2 (6 Object) – cluster_4 (16 Object)    1.055844
             cluster_3 (2 Object) – cluster_4 (16 Object)    1.95292
k-Means Results


     • Best result is for k=2.

     • The proximities of clusters for k=3 and k=4 are higher.

     • But, the results of k=3 and k=4 are artificial,
       one of the clusters contain only 2 objects.

     • These objects are outliers.
PAM Results


    2 clusters   cluster_1 (40 Objects) – cluster_2 (52 Objects)   1.2838



                 cluster_1 (33 Objects) – cluster_2 (34 Objects)   1.2838
    3 clusters   cluster_1 (33 Objects) – cluster_3 (25 Objects)   1.2729
                 cluster_2 (34 Objects) – cluster_3 (25 Objects)   1.1242


                 cluster_1 (20 Objects) – cluster_2 (34 Objects)   1.2838
                 cluster_1 (20 Objects) – cluster_3 (25 Objects)   1.2729
                 cluster_1 (20 Objects) – cluster_4 (13 Objects)   1.1374
    4 clusters
                 cluster_2 (34 Objects) – cluster_3 (25 Objects)   1.1242
                 cluster_2 (34 Objects) – cluster_4 (13 Objects)   1.5336
                 cluster_3 (25 Objects) – cluster_4 (13 Objects)   1.5523
PAM Results


  • The proximities of clusters for k=4 is higher, i.e.,
    the clusters are better separated.
  • The number of objects in the clusters are 20, 34, 25 and 13.
  • This is quite natural grouping of the data.
  • Best result is for k=4.
  • We can say that clustering conducted by PAM is a
    fine tuning of the one done by k-means.


                                              PAM
                                1.00   2.00         3.00   4.00   Total
             k-Means     1.00   20     12           25     13      70
                         2.00    0     22            0      0      22
                 Total          20     34           25     13      92
Modified k-Means Results


                       k=2                         k=3                                 k=4
                                                                     cluster_1: 45 Objects
                                        cluster_1: 59 Objects
         cluster_1: 61 Objects                                       cluster_2: 24 Objects
                                        cluster_2: 31 Objects
         cluster_2: 31 Objects                                       cluster_3: 2 Objects
                                        cluster_3: 2 Objects
                                                                     clluster_4: 21 Objects


      For k=4, k-means has 2 clusters of less than 10 objects.
      Modified k-means has only 1 cluster of less than 10 objects,
      others have all more than 20.
      Best result is for k=2.

                                                         Modified global k -Means
                                                    1.00                                2.00     Total
               k-Means           1.00                61                                     9     70
                                 2.00                0                                      22    22
                     Total                           61                                     31    92
Modified k-Means Results


 • Modified k-means gave more natural results than k-means.

 • Found clusters by this modified method are more balanced in
   terms of objects numbers.

 • As k increases, k-means give artificial results;
   however, modified global k-means gives reasonable clusters
   except for one cluster.

 • This new algorithm can be used when k is not known a priori.

 • It is easy to use and the running time of algorithm is
   significantly short (seconds in all of our runs).
Studies on Found Clusters


   We obtained the rule sets for k-means when k = 2,3 and 4.

   These rule sets show us which values of the process variables
   together characterize any regarded class of the object.

   These results are meaningful for the decision maker
   which is in our case the company.

   Instead of rule sets it will be meaningful for you to see the
   decision tree analysis of the clusters.

   We applied CART (classification and regression trees)
   of SPSS Clementine® 10.1, on the group we found from
   k-means for k=2.
Results


  • We chose the big cluster of 70 objects as our dataset for
    CART.

  • We formed 7 different training sets of 60 objects randomly
    and 7 test sets from the remaining 10 objects.

  • One output variable (i.e., response variable) which represents
    the total defective items.

  • We obtained 7 decision tree models from these training and
    test sets.
Results


      We used two main measure to compare these models:
         – Mean error (ME)
         – Mean absolute error (MAE)
         – Correlation


                           Average    1.Model    2.Model   3.Model 4.Model 5.Model 6.Model     7.Model
           Training ME            0        0,0       0,0       0,0      0,0      0,0     0,0        0,0
         Training MAE           2,8        2,6       3,1       3,0      2,5      3,2     2,4        2,8
    Training correlation      0,887      0,922     0,840     0,871    0,917    0,874   0,911     0,872
               Test ME       -0,004      0,008     0,031     0,053   -0,064    0,002   -0,02     -0,04
              Test MAE         7,74        5,2       7,7       6,9      9,5      5,5     7,7      11,7
        Test correlation      0,040     -0,453    -0,046     0,555    0,146   -0,378   0,535     -0,08
Results

                                      Cluster of 70 Objects     Whole data set of 92 objects
                       Training ME                          0                              0
                      Training MAE                        2,8                           3.23
               Training korelasyonu                    0,887                          0.8098
                            Test ME                   -0,004                           -0.21
                          Test MAE                      7,74                            6.85
                   Test korelasyonu                    0,040                          0.0757



   Our studies shows that it is better to make clustering
   before building models and extracting rulesets.


   We obtained 4 most important variables for the response
   variables.


   2 of these important variables are also the most important
   ones for the whole set.
Conclusion

 • When the data mining techniques used for classification /
   prediction cannot produce accurate results or cannot build
   models which are capable of predicting correctly, it is better
   to find the homogenous groups in the data set.

 • Clustering algorithms produce highly different results,
   one should choose the most efficient and natural one.

 • Modified k-Means can be preferred instead of k-Means.
References
[1] Akteke-Özturk, B., Weber, G.-W., and Kropat, E., Continuous optimization
    approaches for minimum sum of squares, in the ISI Proceedings of 20th
    Mini-EURO Conference Continuous Optimization and Knowledge-Based
    Technologies (Neringa, Lithuania, May 20-23, 2008) 253-258.

[2] Bagirov, A.M., Rubinov, A.M., Soukhoroukova, N.V., and Yearwood, J.,
   Unsupervised and supervised data classification via nonsmooth and global
   optimization, TOP 11, 1 (2003), 1-93.
[3] Bakır, B., Batmaz, Đ., Güntürkün, F.A., Đpekçi, Đ.A., Köksal, G., and
    Özdemirel, N.E., Defect Cause Modeling with Decision Tree and Regression
    Analysis, Proceedings of XVII. International Conference on Computer and
    Information Science and Engineering, Cairo, Egypt, December 08-10, 2006,
    Volume 17, pp. 266-269, ISBN 975-00803-7-8.
[4] Sugar, C.A. and James, G.M., Finding the number of clusters in a
    dataset: an information-theoretic approach, Journal of the American
    Statistical Association 98, 463 (2003) 750-763.
[5] Volkovich, Z., Barzily, Z., Weber, G.-W., and Toledano-Kitai, D., Cluster
    stabilityestimation based on a minimal spanning trees approach, Proceedings
    of the Second Global Conference on Power Control and Optimization, AIP
    Conference Proceedings 1159, Bali, Indonesia, 1-3 June 2009, Subseries:
    Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4 (August
    2009) 299-305; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..

Weitere ähnliche Inhalte

Was ist angesagt?

Cluster analysis using k-means method in R
Cluster analysis using k-means method in RCluster analysis using k-means method in R
Cluster analysis using k-means method in RVladimir Bakhrushin
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based ClusteringSSA KPI
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptxQingsong Guo
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringSajib Sen
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithmDarshak Mehta
 
K means clustering
K means clusteringK means clustering
K means clusteringKuppusamy P
 
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-IIFast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-IIijeei-iaes
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsAjay Bidyarthy
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using ClusteringDessy Amirudin
 
Image classification using neural network
Image classification using neural networkImage classification using neural network
Image classification using neural networkBhavyateja Potineni
 

Was ist angesagt? (20)

Kmeans
KmeansKmeans
Kmeans
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Cluster analysis using k-means method in R
Cluster analysis using k-means method in RCluster analysis using k-means method in R
Cluster analysis using k-means method in R
 
08 clustering
08 clustering08 clustering
08 clustering
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 
Cs36565569
Cs36565569Cs36565569
Cs36565569
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-IIFast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
 
Data miningpresentation
Data miningpresentationData miningpresentation
Data miningpresentation
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation Algorithms
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using Clustering
 
K means
K meansK means
K means
 
Image classification using neural network
Image classification using neural networkImage classification using neural network
Image classification using neural network
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 

Andere mochten auch

软件加密解密技术及应用
软件加密解密技术及应用软件加密解密技术及应用
软件加密解密技术及应用yiditushe
 
I F E E L D O Y O U D R
I  F E E L  D O  Y O U  D RI  F E E L  D O  Y O U  D R
I F E E L D O Y O U D Rghanyog
 
E M P L O Y E E E M P L O Y E R & S U P E R L I V I N G D R
E M P L O Y E E  E M P L O Y E R &  S U P E R L I V I N G  D RE M P L O Y E E  E M P L O Y E R &  S U P E R L I V I N G  D R
E M P L O Y E E E M P L O Y E R & S U P E R L I V I N G D Rghanyog
 
T H E C O R E O F S E X D R S H R I N I W A S K A S H A L I K A R
T H E  C O R E  O F  S E X  D R  S H R I N I W A S  K A S H A L I K A RT H E  C O R E  O F  S E X  D R  S H R I N I W A S  K A S H A L I K A R
T H E C O R E O F S E X D R S H R I N I W A S K A S H A L I K A Rghanyog
 
Sahastranetra A Bestseller On Vishnusahasranam Dr. Shriniwas Kashalikar
Sahastranetra A Bestseller On Vishnusahasranam Dr. Shriniwas KashalikarSahastranetra A Bestseller On Vishnusahasranam Dr. Shriniwas Kashalikar
Sahastranetra A Bestseller On Vishnusahasranam Dr. Shriniwas Kashalikarghanyog
 
Locandina Invito I Bambini I Giovani E La Città Workshop Modena
Locandina Invito I Bambini I Giovani E La Città Workshop ModenaLocandina Invito I Bambini I Giovani E La Città Workshop Modena
Locandina Invito I Bambini I Giovani E La Città Workshop ModenaMarrài a Fura
 
Creating Passive Revenue in Your Business
Creating Passive Revenue in Your BusinessCreating Passive Revenue in Your Business
Creating Passive Revenue in Your BusinessFellow.app
 
M A U N A ( S I L E N C E) & S U P E R L I V I N G D R S H R I N I W A S ...
M A U N A ( S I L E N C E) &  S U P E R L I V I N G   D R  S H R I N I W A S ...M A U N A ( S I L E N C E) &  S U P E R L I V I N G   D R  S H R I N I W A S ...
M A U N A ( S I L E N C E) & S U P E R L I V I N G D R S H R I N I W A S ...ghanyog
 
Ecolife Recreio Eco Esfera E Mail
Ecolife  Recreio    Eco Esfera   E MailEcolife  Recreio    Eco Esfera   E Mail
Ecolife Recreio Eco Esfera E Mailimoveisdorio
 
Guida Ecoidea 4 - L’ufficio ecologico
Guida Ecoidea 4 - L’ufficio ecologicoGuida Ecoidea 4 - L’ufficio ecologico
Guida Ecoidea 4 - L’ufficio ecologicoMarrài a Fura
 
Guida Ecoidea 1 - Compostaggio Domestico
Guida Ecoidea 1 - Compostaggio DomesticoGuida Ecoidea 1 - Compostaggio Domestico
Guida Ecoidea 1 - Compostaggio DomesticoMarrài a Fura
 
Holistic Approach To Saving Energy Dr Shriiwas Kashalikar
Holistic  Approach To  Saving  Energy  Dr  Shriiwas  KashalikarHolistic  Approach To  Saving  Energy  Dr  Shriiwas  Kashalikar
Holistic Approach To Saving Energy Dr Shriiwas Kashalikarghanyog
 
Arogyaka Rajmarg Dr. Shriniwas Kashalikar
Arogyaka Rajmarg Dr. Shriniwas KashalikarArogyaka Rajmarg Dr. Shriniwas Kashalikar
Arogyaka Rajmarg Dr. Shriniwas Kashalikarghanyog
 
pmi 35 contact hrs
pmi 35 contact hrspmi 35 contact hrs
pmi 35 contact hrsJose Staff
 
Game Design 2 - Lecture 2 - Menu Flow
Game Design 2 - Lecture 2 - Menu FlowGame Design 2 - Lecture 2 - Menu Flow
Game Design 2 - Lecture 2 - Menu FlowDavid Farrell
 
D I A B E T E S A N D B H R A M A R I D R S H R I N I W A S K A S H A L ...
D I A B E T E S  A N D  B H R A M A R I  D R  S H R I N I W A S  K A S H A L ...D I A B E T E S  A N D  B H R A M A R I  D R  S H R I N I W A S  K A S H A L ...
D I A B E T E S A N D B H R A M A R I D R S H R I N I W A S K A S H A L ...ghanyog
 

Andere mochten auch (20)

软件加密解密技术及应用
软件加密解密技术及应用软件加密解密技术及应用
软件加密解密技术及应用
 
I F E E L D O Y O U D R
I  F E E L  D O  Y O U  D RI  F E E L  D O  Y O U  D R
I F E E L D O Y O U D R
 
E M P L O Y E E E M P L O Y E R & S U P E R L I V I N G D R
E M P L O Y E E  E M P L O Y E R &  S U P E R L I V I N G  D RE M P L O Y E E  E M P L O Y E R &  S U P E R L I V I N G  D R
E M P L O Y E E E M P L O Y E R & S U P E R L I V I N G D R
 
T H E C O R E O F S E X D R S H R I N I W A S K A S H A L I K A R
T H E  C O R E  O F  S E X  D R  S H R I N I W A S  K A S H A L I K A RT H E  C O R E  O F  S E X  D R  S H R I N I W A S  K A S H A L I K A R
T H E C O R E O F S E X D R S H R I N I W A S K A S H A L I K A R
 
Sahastranetra A Bestseller On Vishnusahasranam Dr. Shriniwas Kashalikar
Sahastranetra A Bestseller On Vishnusahasranam Dr. Shriniwas KashalikarSahastranetra A Bestseller On Vishnusahasranam Dr. Shriniwas Kashalikar
Sahastranetra A Bestseller On Vishnusahasranam Dr. Shriniwas Kashalikar
 
Locandina Invito I Bambini I Giovani E La Città Workshop Modena
Locandina Invito I Bambini I Giovani E La Città Workshop ModenaLocandina Invito I Bambini I Giovani E La Città Workshop Modena
Locandina Invito I Bambini I Giovani E La Città Workshop Modena
 
Habilidades
HabilidadesHabilidades
Habilidades
 
Deseo
DeseoDeseo
Deseo
 
Creating Passive Revenue in Your Business
Creating Passive Revenue in Your BusinessCreating Passive Revenue in Your Business
Creating Passive Revenue in Your Business
 
M A U N A ( S I L E N C E) & S U P E R L I V I N G D R S H R I N I W A S ...
M A U N A ( S I L E N C E) &  S U P E R L I V I N G   D R  S H R I N I W A S ...M A U N A ( S I L E N C E) &  S U P E R L I V I N G   D R  S H R I N I W A S ...
M A U N A ( S I L E N C E) & S U P E R L I V I N G D R S H R I N I W A S ...
 
Ecolife Recreio Eco Esfera E Mail
Ecolife  Recreio    Eco Esfera   E MailEcolife  Recreio    Eco Esfera   E Mail
Ecolife Recreio Eco Esfera E Mail
 
Guida Ecoidea 4 - L’ufficio ecologico
Guida Ecoidea 4 - L’ufficio ecologicoGuida Ecoidea 4 - L’ufficio ecologico
Guida Ecoidea 4 - L’ufficio ecologico
 
Guida Ecoidea 1 - Compostaggio Domestico
Guida Ecoidea 1 - Compostaggio DomesticoGuida Ecoidea 1 - Compostaggio Domestico
Guida Ecoidea 1 - Compostaggio Domestico
 
Holistic Approach To Saving Energy Dr Shriiwas Kashalikar
Holistic  Approach To  Saving  Energy  Dr  Shriiwas  KashalikarHolistic  Approach To  Saving  Energy  Dr  Shriiwas  Kashalikar
Holistic Approach To Saving Energy Dr Shriiwas Kashalikar
 
Arogyaka Rajmarg Dr. Shriniwas Kashalikar
Arogyaka Rajmarg Dr. Shriniwas KashalikarArogyaka Rajmarg Dr. Shriniwas Kashalikar
Arogyaka Rajmarg Dr. Shriniwas Kashalikar
 
pmi 35 contact hrs
pmi 35 contact hrspmi 35 contact hrs
pmi 35 contact hrs
 
scuola 2
scuola 2scuola 2
scuola 2
 
Game Design 2 - Lecture 2 - Menu Flow
Game Design 2 - Lecture 2 - Menu FlowGame Design 2 - Lecture 2 - Menu Flow
Game Design 2 - Lecture 2 - Menu Flow
 
D I A B E T E S A N D B H R A M A R I D R S H R I N I W A S K A S H A L ...
D I A B E T E S  A N D  B H R A M A R I  D R  S H R I N I W A S  K A S H A L ...D I A B E T E S  A N D  B H R A M A R I  D R  S H R I N I W A S  K A S H A L ...
D I A B E T E S A N D B H R A M A R I D R S H R I N I W A S K A S H A L ...
 
Ensemble Content Based Instruction
Ensemble Content Based InstructionEnsemble Content Based Instruction
Ensemble Content Based Instruction
 

Ähnlich wie Clustering Theory

Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareMohammed Kharma
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering108kaushik
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateBilly Yang
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화NAVER Engineering
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1arogozhnikov
 
Analysis and implementation of modified k medoids
Analysis and implementation of modified k medoidsAnalysis and implementation of modified k medoids
Analysis and implementation of modified k medoidseSAT Publishing House
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introductionYan Xu
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 

Ähnlich wie Clustering Theory (20)

Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
Data analysis of weather forecasting
Data analysis of weather forecastingData analysis of weather forecasting
Data analysis of weather forecasting
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & Update
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
Analysis and implementation of modified k medoids
Analysis and implementation of modified k medoidsAnalysis and implementation of modified k medoids
Analysis and implementation of modified k medoids
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introduction
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 

Mehr von SSA KPI

Germany presentation
Germany presentationGermany presentation
Germany presentationSSA KPI
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energySSA KPI
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainabilitySSA KPI
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentSSA KPI
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering educationSSA KPI
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginersSSA KPI
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011SSA KPI
 
Talking with money
Talking with moneyTalking with money
Talking with moneySSA KPI
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investmentSSA KPI
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesSSA KPI
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice gamesSSA KPI
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security CostsSSA KPI
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsSSA KPI
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5SSA KPI
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4SSA KPI
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3SSA KPI
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2SSA KPI
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1SSA KPI
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biologySSA KPI
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsSSA KPI
 

Mehr von SSA KPI (20)

Germany presentation
Germany presentationGermany presentation
Germany presentation
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energy
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainability
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable development
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering education
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginers
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011
 
Talking with money
Talking with moneyTalking with money
Talking with money
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investment
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice games
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security Costs
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biology
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functions
 

Kürzlich hochgeladen

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 

Kürzlich hochgeladen (20)

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 

Clustering Theory

  • 1. 4th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 5-16, 2009 Clustering Theory Data Mining for Quality Improvement with Nonsmooth Optimization vs. PAM and k-Means Gerhard-Wilhelm Weber * and Başak Akteke-Öztürk Gerhard- Akteke- Institute of Applied Mathematics Middle East Technical University, Ankara, Turkey * Faculty of Economics, Management and Law, University of Siegen, Germany Center for Research on Optimization and Control, University of Aveiro, Portugal
  • 2. Outline • Quality Analysis • Data Mining for Quality Analysis • Clustering Methods • Results and Comparison • Decision Tree Analysis of A Cluster • Conclusion
  • 3. Quality Analysis • Quality is an essential requirement of – products, – processes, and – services. • This study is a part of a project whose main focus is on quality analysis: relationship between input and output • Modern quality analysis takes advantage of using tools of Data Mining.
  • 4. Data Mining for Quality Analysis Data mining tools such as – decision trees (e.g. classification and regression trees (CART)), – neural networks (NN), – self-organizing maps (SOM), – support vector machines (SVM), are highly prefered for modeling and producing rules for the output. Applications of such tools are not enough such that the industry people would prefer and make use of them for quality analysis needs.
  • 5. Aim of Our Data Mining Studies • to identify the data mining approaches that can effectively improve product and process quality in industrial organizations: – classification / prediction, – clustering and – association analysis, • to develop new data mining software and improve the existing ones for quality analysis. • Inital study: To identify the most influential variables that cause defects on the items produced by a casting company located in Turkey.
  • 6. Our Data Set • Our data set: 92 objects (rows), 35 process variables (columns). • Belongs to a particular product, which has high percentage of defectives collected during the first five months production period of 2006. • Missing values: filled with the averages of the columns
  • 7. Clustering - 2 Algorithms (Model Free) choose a randon start partition compute centroids create minimal distance partition end partition minimal distance procedure
  • 8. Clustering - 2 Algorithms (Model Free) choose a randon start partition test an object in all clusters update the centroids end partition exchange procedure minimal distance procedure
  • 9. Our Clustering • The data set scaled to the interval [0,1] before the clustering analysis: xi − xmin xi = ' . xmax − xmin • We used k-means, PAM (Partitioning Around Medoids) and a modified k-means by Nonsmooth Analysis: • to understand the data set by examining the groups in the data, • to find the outliers of the data set, • our data set was not big. • These methods use Euclidean metric by default.
  • 10. About the Methods • PAM is more robust than k-means in the presence of noise and outliers. • PAM minimizes a sum of dissimilarities instead of a sum of squared Euclidean distances. • Medoids are less influenced by the presence of noise and outliers. • A medoid can be defined as that object of a cluster, whose average distance (dissimilarity) to all the objects in the cluster is minimal.
  • 11. Nonsmooth Analysis • k-means takes as input: the number of clusters and initial cluster centers. • This problem can be reduced to nonsmooth optimization problem --> initial problem for the a modified k-means. – global optimization techniques, – nonsmooth optimization algorithms and – derivative free optimization for the modified k-means algorithm. • The minimum sum of squares problem --> nonsmooth and nonconvex optimization problem.
  • 12. k-Means Results k=2 cluster_1 (70 Object) – cluster_2 (22 Object) 1.113769 cluster_1 (68 Object) – cluster_2 (22 Object) 1.111567 k=3 cluster_1 (68 Object) – cluster_3 (2 Object) 1.593595 cluster_2 (22 Object) – cluster_3 (2 Object) 1.968277 cluster_1 (68 Object) – cluster_2 (6 Object) 1.44533 cluster_1 (68 Object) – cluster_3 (2 Object) 1.593595 cluster_1 (68 Object) – cluster_4 (16 Object) 1.104353 k=4 cluster_2 (6 Object) – cluster_3 (2 Object) 2.197992 cluster_2 (6 Object) – cluster_4 (16 Object) 1.055844 cluster_3 (2 Object) – cluster_4 (16 Object) 1.95292
  • 13. k-Means Results • Best result is for k=2. • The proximities of clusters for k=3 and k=4 are higher. • But, the results of k=3 and k=4 are artificial, one of the clusters contain only 2 objects. • These objects are outliers.
  • 14. PAM Results 2 clusters cluster_1 (40 Objects) – cluster_2 (52 Objects) 1.2838 cluster_1 (33 Objects) – cluster_2 (34 Objects) 1.2838 3 clusters cluster_1 (33 Objects) – cluster_3 (25 Objects) 1.2729 cluster_2 (34 Objects) – cluster_3 (25 Objects) 1.1242 cluster_1 (20 Objects) – cluster_2 (34 Objects) 1.2838 cluster_1 (20 Objects) – cluster_3 (25 Objects) 1.2729 cluster_1 (20 Objects) – cluster_4 (13 Objects) 1.1374 4 clusters cluster_2 (34 Objects) – cluster_3 (25 Objects) 1.1242 cluster_2 (34 Objects) – cluster_4 (13 Objects) 1.5336 cluster_3 (25 Objects) – cluster_4 (13 Objects) 1.5523
  • 15. PAM Results • The proximities of clusters for k=4 is higher, i.e., the clusters are better separated. • The number of objects in the clusters are 20, 34, 25 and 13. • This is quite natural grouping of the data. • Best result is for k=4. • We can say that clustering conducted by PAM is a fine tuning of the one done by k-means. PAM 1.00 2.00 3.00 4.00 Total k-Means 1.00 20 12 25 13 70 2.00 0 22 0 0 22 Total 20 34 25 13 92
  • 16. Modified k-Means Results k=2 k=3 k=4 cluster_1: 45 Objects cluster_1: 59 Objects cluster_1: 61 Objects cluster_2: 24 Objects cluster_2: 31 Objects cluster_2: 31 Objects cluster_3: 2 Objects cluster_3: 2 Objects clluster_4: 21 Objects For k=4, k-means has 2 clusters of less than 10 objects. Modified k-means has only 1 cluster of less than 10 objects, others have all more than 20. Best result is for k=2. Modified global k -Means 1.00 2.00 Total k-Means 1.00 61 9 70 2.00 0 22 22 Total 61 31 92
  • 17. Modified k-Means Results • Modified k-means gave more natural results than k-means. • Found clusters by this modified method are more balanced in terms of objects numbers. • As k increases, k-means give artificial results; however, modified global k-means gives reasonable clusters except for one cluster. • This new algorithm can be used when k is not known a priori. • It is easy to use and the running time of algorithm is significantly short (seconds in all of our runs).
  • 18. Studies on Found Clusters We obtained the rule sets for k-means when k = 2,3 and 4. These rule sets show us which values of the process variables together characterize any regarded class of the object. These results are meaningful for the decision maker which is in our case the company. Instead of rule sets it will be meaningful for you to see the decision tree analysis of the clusters. We applied CART (classification and regression trees) of SPSS Clementine® 10.1, on the group we found from k-means for k=2.
  • 19. Results • We chose the big cluster of 70 objects as our dataset for CART. • We formed 7 different training sets of 60 objects randomly and 7 test sets from the remaining 10 objects. • One output variable (i.e., response variable) which represents the total defective items. • We obtained 7 decision tree models from these training and test sets.
  • 20. Results We used two main measure to compare these models: – Mean error (ME) – Mean absolute error (MAE) – Correlation Average 1.Model 2.Model 3.Model 4.Model 5.Model 6.Model 7.Model Training ME 0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 Training MAE 2,8 2,6 3,1 3,0 2,5 3,2 2,4 2,8 Training correlation 0,887 0,922 0,840 0,871 0,917 0,874 0,911 0,872 Test ME -0,004 0,008 0,031 0,053 -0,064 0,002 -0,02 -0,04 Test MAE 7,74 5,2 7,7 6,9 9,5 5,5 7,7 11,7 Test correlation 0,040 -0,453 -0,046 0,555 0,146 -0,378 0,535 -0,08
  • 21. Results Cluster of 70 Objects Whole data set of 92 objects Training ME 0 0 Training MAE 2,8 3.23 Training korelasyonu 0,887 0.8098 Test ME -0,004 -0.21 Test MAE 7,74 6.85 Test korelasyonu 0,040 0.0757 Our studies shows that it is better to make clustering before building models and extracting rulesets. We obtained 4 most important variables for the response variables. 2 of these important variables are also the most important ones for the whole set.
  • 22. Conclusion • When the data mining techniques used for classification / prediction cannot produce accurate results or cannot build models which are capable of predicting correctly, it is better to find the homogenous groups in the data set. • Clustering algorithms produce highly different results, one should choose the most efficient and natural one. • Modified k-Means can be preferred instead of k-Means.
  • 23. References [1] Akteke-Özturk, B., Weber, G.-W., and Kropat, E., Continuous optimization approaches for minimum sum of squares, in the ISI Proceedings of 20th Mini-EURO Conference Continuous Optimization and Knowledge-Based Technologies (Neringa, Lithuania, May 20-23, 2008) 253-258. [2] Bagirov, A.M., Rubinov, A.M., Soukhoroukova, N.V., and Yearwood, J., Unsupervised and supervised data classification via nonsmooth and global optimization, TOP 11, 1 (2003), 1-93. [3] Bakır, B., Batmaz, Đ., Güntürkün, F.A., Đpekçi, Đ.A., Köksal, G., and Özdemirel, N.E., Defect Cause Modeling with Decision Tree and Regression Analysis, Proceedings of XVII. International Conference on Computer and Information Science and Engineering, Cairo, Egypt, December 08-10, 2006, Volume 17, pp. 266-269, ISBN 975-00803-7-8. [4] Sugar, C.A. and James, G.M., Finding the number of clusters in a dataset: an information-theoretic approach, Journal of the American Statistical Association 98, 463 (2003) 750-763. [5] Volkovich, Z., Barzily, Z., Weber, G.-W., and Toledano-Kitai, D., Cluster stabilityestimation based on a minimal spanning trees approach, Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings 1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4 (August 2009) 299-305; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..