SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
4th International Summer School
Achievements and Applications of Contemporary
Informatics, Mathematics and Physics
National University of Technology of the Ukraine
Kiev, Ukraine, August 5-16, 2009




     Methods from Mathematical Data Mining
                         (Supported by Optimization)


             Gerhard-Wilhelm Weber * and Başak Akteke-Öztürk
             Gerhard-                          Akteke-
                               Institute of Applied Mathematics
                       Middle East Technical University, Ankara, Turkey

            * Faculty of Economics, Management and Law,   University of Siegen, Germany
             Center for Research on Optimization and Control, University of Aveiro, Portugal



                                                     1
        EURO CBBM
        EURO            EURO ORD
                        EURO                                         CE*OC                     August 8, 2009
4th International Summer School
Achievements and Applications of Contemporary
Informatics, Mathematics and Physics
National University of Technology of the Ukraine
Kiev, Ukraine, August 5-16, 2009

                                  Clustering Theory

              Cluster Number and Cluster Stability Estimation
                                          Z. Volkovich
     Software Engineering Department, ORT Braude College of Engineering, Karmiel 21982, Israel

                                            Z. Barzily
     Software Engineering Department, ORT Braude College of Engineering, Karmiel 21982, Israel

                                          G.-W. Weber
          Departments of Scientific Computing, Financial Mathematics and Actuarial Sciences,
      Institute of Applied Mathematics, Middle East Technical University, 06531, Ankara, Turkey

                                       D. Toledano-Kitai
     Software Engineering Department, ORT Braude College of Engineering, Karmiel 21982, Israel
                                                  2
                                                                                    August 8, 2009
Clustering
• An essential tool for “unsupervised” learning is
  cluster analysis which suggests categorizing data
  (objects, instances) into groups such that the
  likeness within a group is much higher than the one
  between the groups.

• This resemblance is often described by a
  distance function.


                            3
                                                August 8, 2009
Clustering

For a given set S ⊂ IR d a clustering algorithm CL
constructs a clustered set:
   CL(S, int-part, k) = Π(S) = (π1(S) ,…, πk (S)),
such that CL(x) = CL(y) = i, if x and y are similar:
   x, y ∈ πi(S), for some i=1,…,k;
and CL(x) ≠ CL(y), if x and y are dissimilar.

                              4
                                                August 8, 2009
Clustering

The disjoint subsets   πi (S), i=1,…,k, are named
clusters:
      k

     U π (S )
     i =1
            i   = S , and π i ∩ π j = ∅ for i ≠ j.




                               5
                                                     August 8, 2009
Clustering




  CL(x) = CL(y)       CL(x) ≠ CL(y)


                  6
                              August 8, 2009
Clustering
The iterative clustering process is usually carried out in two phases:
a partitioning phase and a quality assessment phase.
In the partitioning phase, a label is assigned to each element
in view of the assumption that, in addition to the observed features,
for each data item, there is a hidden, unobserved feature
representing cluster membership.
The quality assessment phase measures the grouping quality.
The outcome of the clustering process is a partition that acquires
the highest quality score.
Except for the data itself, two essential input parameters are
typically required: an initial partition and a suggested number of
clusters. Here, the parameters are denoted as
• int-part ;
• k.                                   7
                                                             August 8, 2009
The Problem
Partitions generated by the iterative algorithms are commonly
sensitive to initial partitions fed in as an input parameter.
Selection of “good” initial partitions is an essential
clustering problem.
Another problem arising here is choosing the right number of the
clusters. It is well known that this key task of the cluster analysis
is ill posed. For instance, the “correct” number of clusters in a
data set can depend on the scale in which the data are measured.

In this talk, we address to the last problem concerning
determination of the number of clusters.

                                   8
                                                            August 8, 2009
The Problem
Partitions generated by the iterative algorithms are commonly
sensitive to initial partitions fed in as an input parameter.
Selection of “good” initial partitions is an essential
clustering problem.
Another problem arising here is choosing the right number of the
clusters. It is well known that this key task of the cluster analysis
is ill posed. For instance, the “correct” number of clusters in a
data set can depend on the scale in which the data are measured.




                                   9
                                                            August 8, 2009
The Problem
Many approaches to this problem exploit the within-cluster
dispersion matrix (defined according to the pattern of a
covariance matrix). The span of this matrix (column space)
usually decreases as the number of groups rises, and may have
a point in which it “falls”. Such an “elbow” on the graph locates,
in several known methods, the “true” number of clusters.
Stability based approaches, for the cluster validation problem,
evaluate the partitions’ variability under repeated applications
of a clustering algorithm. Low variability is understood as
high consistency in the result obtained, and the number of clusters
that maximizes cluster stability is accepted as an estimate for the
“true” number of clusters.
                                 10
                                                          August 8, 2009
The Concept
In the current talk, the problem of determining the
true number of clusters is addressed by the cluster
stability approach.
We propose a method for the study of cluster stability.
This method suggests a geometrical stability of a
partition.
• We draw samples from the source data and estimate
   the clusters by means of each of the drawn samples.
• We compare pairs of the partitions obtained.
• A pair is considered to be consistent if the obtained
   division is close.
                            11
                                                 August 8, 2009
The Concept
• We quantify this closeness by the number of edges
  connecting points from different samples in a
  minimal spanning tree (MST) constructed for each one
  of the clusters.
• We use the Friedman and Rafsky two sample test
  statistic which measures these quantities. Under the
  null hypothesis on the homogeneity of the source data,
  this statistic is approximately normally distributed.
  So, the case of well mingled samples within the clusters
  leads to normal distribution of the considered statistic.
                            12
                                                 August 8, 2009
The Concept
Examples of MST produced by samples within a cluster:




                          13
                                             August 8, 2009
The Concept
The left-side picture is an example of “a good cluster”
where the quantity of edges connecting points from
different samples (marked by solid red lines) is
relatively big.
The right-side picture images a “poor situation” when
only one (and long) edge connects the (sub-) clusters.




                            14
                                                 August 8, 2009
The Two-Sample MST-Test
Henze and Penrose (1979) considered the asymptotic behavior of
Rmn :
the number of edges of V which connect a point of S to a point of T .
Suppose that |S|=m → ∞ and |T|=n → ∞ such that
m /(m+n) → p∈ (0, 1).
              ∈
Introducing q = 1 − p and r = 2pq, they obtained:
                      1 
                          Rmn −
                                 2mn 
                                 m+n
                                              (
                                      → N 0, σ d
                                                2
                                                    ),
                     m+n 
                                                    2
where the convergence is in distribution and N(0, σ d ) denotes
the normal distribution with a 0 expectation and a variance
  2
σ d := r (r + Cd (1 − 2r)), for some constant Cd
depending only on the space’s dimension d.
                                    15
                                                            August 8, 2009
Concept
• Resting upon this fact, the standard score
                        2K        m
                 Y j :=       Rj − 
                         m        K
  of the mentioned edges quantity is calculated
  for each cluster j=1,…, K ,
  where m is the sample size and
  K denotes the number of clusters.
                         %
• The partition quality Y is represented by the
  worst cluster corresponding to the
  minimal standard score value obtained.
                              16
                                                  August 8, 2009
Concept
• It is natural to expect that the true number of
  clusters can be characterized by the empirical
  distribution of the partition standard score
  having the shortest left tail.
• The proposed methodology is expressed as a
  sequential creation of the described distribution
  with its left-asymmetry estimation.




                           17
                                                August 8, 2009
Concept
One of important problems appearing here is the
so-called clusters coordination problem.
Actually, the same cluster can be differently tagged
within repeated rerunning of the algorithm.
This fact results from the inherent symmetry of
the partitions according to their clusters labels.




                            18
                                                     August 8, 2009
Concept
We solve this problem by the following way:
Let S = S1 ∪ S 2 . Consider three categorizations:
                  Π K := Cl ( S , K ) ,
                  Π K ,1 := Cl ( S1, K ) ,
                  Π K ,2 := Cl ( S2 , K ) .
Thus, we get two partitions for each of the samples
Si, i=1,2. The first one is induced by ΠK and the
second one is Π K ,i , i = 1, 2 .
                              19
                                                     August 8, 2009
Concept
For each one of the samples i =1,2, our purpose is
to find the permutation ψ of the set {1,…,K} which
minimizes the quantities of the misclassified items:

                                         ( i ) x , i = 1, 2 ,
ψ i*                 ψ α
       = arg min ∑ I      (           )
                          K ,i ( x ) ≠ α K ( ) 
            ψ   x∈ X                            

where I(z) is the indicator function of the event z and
α K ,i , α Ki ) are assignments defined by ∏ K , ∏ K ,i ,
           (

correspondingly.
                                  20
                                                         August 8, 2009
Concept

The well-known Hungarian method for solving
this problem has computational complexity of O(K3).
After changing the cluster labels of the partitions
∏ K ,i , i = 1, 2 , consistent with ψ i , i = 1, 2 ,
                                      *

we can assume that these partitions are coordinated,
i.e., the clusters are consistently designated.




                           21
                                                August 8, 2009
Algorithm
1. Choose the parameters: K*, J, m, Cl .
2. For K = 2 to K*
3.    For j = 1 to J
4.      Sj,1= sample (X, m) , Sj,2= sample (X  Sj,1, m)
5.      Calculate
        ΠK , j =Cl( S(j), K) ,
        ΠK , j,1 =Cl( Sj ,1, K) ,
        ΠK , j,2 =Cl( Sj ,2, K) .
6.      Solve the coordination problem.
                             22
                                                   August 8, 2009
Algorithm
7.       Calculate Yj(k),   k=1,…,K, % (jK ) .
                                     Y
8.    end if j
9.    Calculate an asymmetry index (percentile) IK
           % (jK ) | j = 1,...,J }.
      for {Y
10. end if K
11. The “true” K* is selected as the one which yields
    the maximal value of the index.

Here, sample(S,m) is a procedure which selects a
random sample of size m from the set S, without
replacement.               23
                                                 August 8, 2009
Numerical Experiments
We have carried out various numerical experiments on synthetic
and real data sets. We choose K*=7 in all tests, and we provide
10 trials for each experiment.
The results are presented via the error-bar plots of the sample
percentiles’ mean within the trials. The sizes of the error bars
equal two standard deviations, found inside the trials of the results.
The standard version of the Partitioning Around Medoids (PAM)
algorithm has been used for clustering.
The empirical percentiles of 25%, 75% and 90% have been used
as the asymmetry indexes.
                                    24
                                                              August 8, 2009
Numerical Experiments – Synthetic Data
The synthesized data are mixtures of 2-dimensional
Gaussian distributions with independent coordinates
owning the same standard deviation σ.
Mean values of the components are placed on the
unit circle on the angular neighboring distance 2π / k .
                                                     ˆ

Each data set contains 4000 items.
Here, we took J=100 (J: number of samples) and
m=200 (m: size of samples).

                             25
                                                   August 8, 2009
Synthetic Data - Example 1
The first data set has the parameters k = 4 and σ = 0.3.
                                      ˆ




As we see, all of the three indexes clearly indicate
four clusters.                26
                                                   August 8, 2009
Synthetic Data - Example 2
 The second synthetic data set has the parameters k = 5
                                                  ˆ
 and σ = 0.3.




The components are obviously overlapping in this case.
                             27
                                                  August 8, 2009
Synthetic Data - Example 2




 As it can be seen, the true number of clusters has been
 successfully found by all indexes.
                            28
                                                 August 8, 2009
Numerical Experiments – Real-World Data
  First Data Sets
The first real data set was chosen from the text collection
http://ftp.cs.cornell.edu/pub/smart/ .

This set consists of the following three sub-collections
DC0: Medlars Collection (1033 medical abstracts),
DC1: CISI Collection (1460 information science abstracts),
DC2: Cranfield Collection (1400 aerodynamics abstracts).


                               29
                                                     August 8, 2009
Numerical Experiments – Real-World Data
  First Data Sets
 We picked the 600 “best” terms, following the common
 bag of words method.
 It is known that this collection is well separated
 by means of its first two leading principal components.
 Here, we also took J=100 and m=200.




                             30
                                                  August 8, 2009
Real-World Data - First Data Sets




All the indexes receive their maximal values at K=3,
i.e., the number of clusters is properly determined.
                           31
                                               August 8, 2009
Numerical Experiments – Real-World Data
  Second Data Set
 Another considered data set is the famous
 Iris Flower Data Set, available, for example, at
 http://archive.ics.uci.edu/ml/datasets/Iris .
 This dataset is composed from 150 4-dimensional
 feature vectors of three equally sized sets of iris flowers.
 We choose J=200 and the sample size equals 70.


                               32
                                                     August 8, 2009
Real-World Data – Iris Flower Data Set




Our method turns out a three clusters structure.
                            33
                                                   August 8, 2009
Conclusions -
 The Rationale of Our Approach
• In this paper, we propose a novel approach, based on
  the Minimal Spanning Tree two sample test, for the
  cluster stability assessment.
• The method offers to quantify the partitions’ features
  through the test statistic computed within the clusters
  built by means of sample pairs.
• The worst cluster, determined by the lowest
  standardized statistic value, characterizes the
  partition quality.

                              34
                                                   August 8, 2009
Conclusions -
 The Rationale of Our Approach
• The departure from the theoretical model, which
  suggests well-mingled samples within the clusters,
  is described by the left tail of the score distribution.
• The shortest tail corresponds to the “true” number
  of clusters.
• All presented experiments detect the true number
  of clusters.



                               35
                                                     August 8, 2009
Conclusions

• In the case of the five components Gaussian data set,
  the true number of clusters was found even though
  a certain overlapping of the clusters exists.
• The four Gaussian components data set contains
  sufficiently separated components. Therefore,
  it is of no revelation that the true number of clusters
  is attained here.




                              36
                                                   August 8, 2009
Conclusions

• The analysis of the abstracts data set is carried out
  with 600 terms and the true number of clusters
  was also detected.
• The Iris Flower dataset is sufficiently difficult to
  analyze due to the fact that two clusters are not
  linearly separable. However, the true number
  of clusters was found here as well.




                              37
                                                    August 8, 2009
References
Barzily, Z., Volkovich, Z.V., Akteke-Öztürk, B., and Weber, G.-W., Cluster stability using minimal spanning trees,
ISI Proceedings of 20th Mini-EURO Conference Continuous Optimization and Knowledge-Based Technologies
(Neringa, Lithuania, May 20-23, 2008) 248-252.

Barzily, Z., Volkovich, Z.V., Akteke-Öztürk, B., and Weber, G.-W., On a minimal spanning tree approach in the
cluster validation problem, to appear in the special issue of INFORMATICA at the occasion of 20th Mini-EURO
Conference Continuous Optimization and Knowledge Based Technologies (Neringa, Lithuania, May 20-23, 2008),
Dzemyda, G., Miettinen, K., and Sakalauskas, L., guest editors.

Volkovich, V., Barzily, Z., Weber, G.-W., and Toledano-Kitai, D., Cluster stability estimation based on a minimal
spanning trees approach, Proceedings of the Second Global Conference on Power Control and Optimization, AIP
Conference Proceedings 1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN
978-0-7354-0696-4 (August 2009) 299-305; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..




                                                           38
                                                                                                    August 8, 2009

Weitere ähnliche Inhalte

Was ist angesagt?

A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...
ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...
ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...IOSRJVSP
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...zukun
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2arogozhnikov
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1arogozhnikov
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_applicationCemal Ardil
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 
00463517b1e90c1e63000000
00463517b1e90c1e6300000000463517b1e90c1e63000000
00463517b1e90c1e63000000Ivonne Liu
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Mostafa G. M. Mostafa
 
Fuzzy entropy based optimal
Fuzzy entropy based optimalFuzzy entropy based optimal
Fuzzy entropy based optimalijsc
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3arogozhnikov
 
Steganographic Scheme Based on Message-Cover matching
Steganographic Scheme Based on Message-Cover matchingSteganographic Scheme Based on Message-Cover matching
Steganographic Scheme Based on Message-Cover matchingIJECEIAES
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task LearningMasahiro Suzuki
 
Image Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkImage Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkCSCJournals
 
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching PursuitRobust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching PursuitPantelis Bouboulis
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...wl820609
 

Was ist angesagt? (20)

A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...
ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...
ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2
 
Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...
Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...
Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_application
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
00463517b1e90c1e63000000
00463517b1e90c1e6300000000463517b1e90c1e63000000
00463517b1e90c1e63000000
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 
Fuzzy entropy based optimal
Fuzzy entropy based optimalFuzzy entropy based optimal
Fuzzy entropy based optimal
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3
 
Steganographic Scheme Based on Message-Cover matching
Steganographic Scheme Based on Message-Cover matchingSteganographic Scheme Based on Message-Cover matching
Steganographic Scheme Based on Message-Cover matching
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
 
Image Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkImage Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic Framework
 
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching PursuitRobust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
 

Andere mochten auch

Lesson 4: Calculating Limits (Section 41 slides)
Lesson 4: Calculating Limits (Section 41 slides)Lesson 4: Calculating Limits (Section 41 slides)
Lesson 4: Calculating Limits (Section 41 slides)Mel Anthony Pepito
 
Lesson 26: Evaluating Definite Integrals
Lesson 26: Evaluating Definite IntegralsLesson 26: Evaluating Definite Integrals
Lesson 26: Evaluating Definite IntegralsMel Anthony Pepito
 
Lesson 17: Indeterminate Forms and L'Hôpital's Rule
Lesson 17: Indeterminate Forms and L'Hôpital's RuleLesson 17: Indeterminate Forms and L'Hôpital's Rule
Lesson 17: Indeterminate Forms and L'Hôpital's RuleMel Anthony Pepito
 
Lesson 16: Inverse Trigonometric Functions (Section 041 slides)
Lesson 16: Inverse Trigonometric Functions (Section 041 slides)Lesson 16: Inverse Trigonometric Functions (Section 041 slides)
Lesson 16: Inverse Trigonometric Functions (Section 041 slides)Mel Anthony Pepito
 
Lesson 11: Implicit Differentiation
Lesson 11: Implicit DifferentiationLesson 11: Implicit Differentiation
Lesson 11: Implicit DifferentiationMel Anthony Pepito
 
Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Mel Anthony Pepito
 
Lesson 22: Optimization II (Section 021 slides)
Lesson 22: Optimization II (Section 021 slides)Lesson 22: Optimization II (Section 021 slides)
Lesson 22: Optimization II (Section 021 slides)Mel Anthony Pepito
 
Lesson 8: Basic Differentiation Rules (Section 41 slides)
Lesson 8: Basic Differentiation Rules (Section 41 slides) Lesson 8: Basic Differentiation Rules (Section 41 slides)
Lesson 8: Basic Differentiation Rules (Section 41 slides) Mel Anthony Pepito
 
Lesson 22: Optimization (Section 021 slides)
Lesson 22: Optimization (Section 021 slides)Lesson 22: Optimization (Section 021 slides)
Lesson 22: Optimization (Section 021 slides)Mel Anthony Pepito
 
Lesson 13: Related Rates Problems
Lesson 13: Related Rates ProblemsLesson 13: Related Rates Problems
Lesson 13: Related Rates ProblemsMel Anthony Pepito
 
Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides) Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides) Mel Anthony Pepito
 
Lesson 3: Limits (Section 21 slides)
Lesson 3: Limits (Section 21 slides)Lesson 3: Limits (Section 21 slides)
Lesson 3: Limits (Section 21 slides)Mel Anthony Pepito
 
Lesson 6: Limits Involving ∞ (Section 21 slides)
Lesson 6: Limits Involving ∞ (Section 21 slides)Lesson 6: Limits Involving ∞ (Section 21 slides)
Lesson 6: Limits Involving ∞ (Section 21 slides)Mel Anthony Pepito
 
Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear ApproximationMel Anthony Pepito
 
Lesson 10: The Chain Rule (Section 41 slides)
Lesson 10: The Chain Rule (Section 41 slides)Lesson 10: The Chain Rule (Section 41 slides)
Lesson 10: The Chain Rule (Section 41 slides)Mel Anthony Pepito
 
Lesson18 -maximum_and_minimum_values_slides
Lesson18 -maximum_and_minimum_values_slidesLesson18 -maximum_and_minimum_values_slides
Lesson18 -maximum_and_minimum_values_slidesMel Anthony Pepito
 

Andere mochten auch (20)

Lesson 21: Curve Sketching
Lesson 21: Curve SketchingLesson 21: Curve Sketching
Lesson 21: Curve Sketching
 
Lesson 24: Area and Distances
Lesson 24: Area and DistancesLesson 24: Area and Distances
Lesson 24: Area and Distances
 
Lesson 4: Calculating Limits (Section 41 slides)
Lesson 4: Calculating Limits (Section 41 slides)Lesson 4: Calculating Limits (Section 41 slides)
Lesson 4: Calculating Limits (Section 41 slides)
 
Lesson 26: Evaluating Definite Integrals
Lesson 26: Evaluating Definite IntegralsLesson 26: Evaluating Definite Integrals
Lesson 26: Evaluating Definite Integrals
 
Lesson 17: Indeterminate Forms and L'Hôpital's Rule
Lesson 17: Indeterminate Forms and L'Hôpital's RuleLesson 17: Indeterminate Forms and L'Hôpital's Rule
Lesson 17: Indeterminate Forms and L'Hôpital's Rule
 
Lesson 16: Inverse Trigonometric Functions (Section 041 slides)
Lesson 16: Inverse Trigonometric Functions (Section 041 slides)Lesson 16: Inverse Trigonometric Functions (Section 041 slides)
Lesson 16: Inverse Trigonometric Functions (Section 041 slides)
 
Lecture7
Lecture7Lecture7
Lecture7
 
Lesson 11: Implicit Differentiation
Lesson 11: Implicit DifferentiationLesson 11: Implicit Differentiation
Lesson 11: Implicit Differentiation
 
Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)
 
Lesson 22: Optimization II (Section 021 slides)
Lesson 22: Optimization II (Section 021 slides)Lesson 22: Optimization II (Section 021 slides)
Lesson 22: Optimization II (Section 021 slides)
 
Lesson 8: Basic Differentiation Rules (Section 41 slides)
Lesson 8: Basic Differentiation Rules (Section 41 slides) Lesson 8: Basic Differentiation Rules (Section 41 slides)
Lesson 8: Basic Differentiation Rules (Section 41 slides)
 
Introduction
IntroductionIntroduction
Introduction
 
Lesson 22: Optimization (Section 021 slides)
Lesson 22: Optimization (Section 021 slides)Lesson 22: Optimization (Section 021 slides)
Lesson 22: Optimization (Section 021 slides)
 
Lesson 13: Related Rates Problems
Lesson 13: Related Rates ProblemsLesson 13: Related Rates Problems
Lesson 13: Related Rates Problems
 
Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides) Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides)
 
Lesson 3: Limits (Section 21 slides)
Lesson 3: Limits (Section 21 slides)Lesson 3: Limits (Section 21 slides)
Lesson 3: Limits (Section 21 slides)
 
Lesson 6: Limits Involving ∞ (Section 21 slides)
Lesson 6: Limits Involving ∞ (Section 21 slides)Lesson 6: Limits Involving ∞ (Section 21 slides)
Lesson 6: Limits Involving ∞ (Section 21 slides)
 
Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear Approximation
 
Lesson 10: The Chain Rule (Section 41 slides)
Lesson 10: The Chain Rule (Section 41 slides)Lesson 10: The Chain Rule (Section 41 slides)
Lesson 10: The Chain Rule (Section 41 slides)
 
Lesson18 -maximum_and_minimum_values_slides
Lesson18 -maximum_and_minimum_values_slidesLesson18 -maximum_and_minimum_values_slides
Lesson18 -maximum_and_minimum_values_slides
 

Ähnlich wie Methods from Mathematical Data Mining (Supported by Optimization)

11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.pptSueMiu
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
Kinetic bands versus Bollinger Bands
Kinetic bands versus Bollinger  BandsKinetic bands versus Bollinger  Bands
Kinetic bands versus Bollinger BandsAlexandru Daia
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
Image segmentation
Image segmentationImage segmentation
Image segmentationkhyati gupta
 
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptxthanhdowork
 
L4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics CourseL4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics CourseMohaiminur Rahman
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...butest
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 

Ähnlich wie Methods from Mathematical Data Mining (Supported by Optimization) (20)

11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clustering
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
50120130406039
5012013040603950120130406039
50120130406039
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
Kinetic bands versus Bollinger Bands
Kinetic bands versus Bollinger  BandsKinetic bands versus Bollinger  Bands
Kinetic bands versus Bollinger Bands
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
 
L4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics CourseL4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics Course
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 

Mehr von SSA KPI

Germany presentation
Germany presentationGermany presentation
Germany presentationSSA KPI
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energySSA KPI
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainabilitySSA KPI
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentSSA KPI
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering educationSSA KPI
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginersSSA KPI
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011SSA KPI
 
Talking with money
Talking with moneyTalking with money
Talking with moneySSA KPI
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investmentSSA KPI
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesSSA KPI
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice gamesSSA KPI
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security CostsSSA KPI
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsSSA KPI
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5SSA KPI
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4SSA KPI
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3SSA KPI
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2SSA KPI
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1SSA KPI
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biologySSA KPI
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsSSA KPI
 

Mehr von SSA KPI (20)

Germany presentation
Germany presentationGermany presentation
Germany presentation
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energy
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainability
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable development
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering education
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginers
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011
 
Talking with money
Talking with moneyTalking with money
Talking with money
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investment
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice games
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security Costs
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biology
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functions
 

Kürzlich hochgeladen

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 

Kürzlich hochgeladen (20)

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 

Methods from Mathematical Data Mining (Supported by Optimization)

  • 1. 4th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 5-16, 2009 Methods from Mathematical Data Mining (Supported by Optimization) Gerhard-Wilhelm Weber * and Başak Akteke-Öztürk Gerhard- Akteke- Institute of Applied Mathematics Middle East Technical University, Ankara, Turkey * Faculty of Economics, Management and Law, University of Siegen, Germany Center for Research on Optimization and Control, University of Aveiro, Portugal 1 EURO CBBM EURO EURO ORD EURO CE*OC August 8, 2009
  • 2. 4th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 5-16, 2009 Clustering Theory Cluster Number and Cluster Stability Estimation Z. Volkovich Software Engineering Department, ORT Braude College of Engineering, Karmiel 21982, Israel Z. Barzily Software Engineering Department, ORT Braude College of Engineering, Karmiel 21982, Israel G.-W. Weber Departments of Scientific Computing, Financial Mathematics and Actuarial Sciences, Institute of Applied Mathematics, Middle East Technical University, 06531, Ankara, Turkey D. Toledano-Kitai Software Engineering Department, ORT Braude College of Engineering, Karmiel 21982, Israel 2 August 8, 2009
  • 3. Clustering • An essential tool for “unsupervised” learning is cluster analysis which suggests categorizing data (objects, instances) into groups such that the likeness within a group is much higher than the one between the groups. • This resemblance is often described by a distance function. 3 August 8, 2009
  • 4. Clustering For a given set S ⊂ IR d a clustering algorithm CL constructs a clustered set: CL(S, int-part, k) = Π(S) = (π1(S) ,…, πk (S)), such that CL(x) = CL(y) = i, if x and y are similar: x, y ∈ πi(S), for some i=1,…,k; and CL(x) ≠ CL(y), if x and y are dissimilar. 4 August 8, 2009
  • 5. Clustering The disjoint subsets πi (S), i=1,…,k, are named clusters: k U π (S ) i =1 i = S , and π i ∩ π j = ∅ for i ≠ j. 5 August 8, 2009
  • 6. Clustering CL(x) = CL(y) CL(x) ≠ CL(y) 6 August 8, 2009
  • 7. Clustering The iterative clustering process is usually carried out in two phases: a partitioning phase and a quality assessment phase. In the partitioning phase, a label is assigned to each element in view of the assumption that, in addition to the observed features, for each data item, there is a hidden, unobserved feature representing cluster membership. The quality assessment phase measures the grouping quality. The outcome of the clustering process is a partition that acquires the highest quality score. Except for the data itself, two essential input parameters are typically required: an initial partition and a suggested number of clusters. Here, the parameters are denoted as • int-part ; • k. 7 August 8, 2009
  • 8. The Problem Partitions generated by the iterative algorithms are commonly sensitive to initial partitions fed in as an input parameter. Selection of “good” initial partitions is an essential clustering problem. Another problem arising here is choosing the right number of the clusters. It is well known that this key task of the cluster analysis is ill posed. For instance, the “correct” number of clusters in a data set can depend on the scale in which the data are measured. In this talk, we address to the last problem concerning determination of the number of clusters. 8 August 8, 2009
  • 9. The Problem Partitions generated by the iterative algorithms are commonly sensitive to initial partitions fed in as an input parameter. Selection of “good” initial partitions is an essential clustering problem. Another problem arising here is choosing the right number of the clusters. It is well known that this key task of the cluster analysis is ill posed. For instance, the “correct” number of clusters in a data set can depend on the scale in which the data are measured. 9 August 8, 2009
  • 10. The Problem Many approaches to this problem exploit the within-cluster dispersion matrix (defined according to the pattern of a covariance matrix). The span of this matrix (column space) usually decreases as the number of groups rises, and may have a point in which it “falls”. Such an “elbow” on the graph locates, in several known methods, the “true” number of clusters. Stability based approaches, for the cluster validation problem, evaluate the partitions’ variability under repeated applications of a clustering algorithm. Low variability is understood as high consistency in the result obtained, and the number of clusters that maximizes cluster stability is accepted as an estimate for the “true” number of clusters. 10 August 8, 2009
  • 11. The Concept In the current talk, the problem of determining the true number of clusters is addressed by the cluster stability approach. We propose a method for the study of cluster stability. This method suggests a geometrical stability of a partition. • We draw samples from the source data and estimate the clusters by means of each of the drawn samples. • We compare pairs of the partitions obtained. • A pair is considered to be consistent if the obtained division is close. 11 August 8, 2009
  • 12. The Concept • We quantify this closeness by the number of edges connecting points from different samples in a minimal spanning tree (MST) constructed for each one of the clusters. • We use the Friedman and Rafsky two sample test statistic which measures these quantities. Under the null hypothesis on the homogeneity of the source data, this statistic is approximately normally distributed. So, the case of well mingled samples within the clusters leads to normal distribution of the considered statistic. 12 August 8, 2009
  • 13. The Concept Examples of MST produced by samples within a cluster: 13 August 8, 2009
  • 14. The Concept The left-side picture is an example of “a good cluster” where the quantity of edges connecting points from different samples (marked by solid red lines) is relatively big. The right-side picture images a “poor situation” when only one (and long) edge connects the (sub-) clusters. 14 August 8, 2009
  • 15. The Two-Sample MST-Test Henze and Penrose (1979) considered the asymptotic behavior of Rmn : the number of edges of V which connect a point of S to a point of T . Suppose that |S|=m → ∞ and |T|=n → ∞ such that m /(m+n) → p∈ (0, 1). ∈ Introducing q = 1 − p and r = 2pq, they obtained: 1   Rmn − 2mn  m+n (  → N 0, σ d 2 ), m+n  2 where the convergence is in distribution and N(0, σ d ) denotes the normal distribution with a 0 expectation and a variance 2 σ d := r (r + Cd (1 − 2r)), for some constant Cd depending only on the space’s dimension d. 15 August 8, 2009
  • 16. Concept • Resting upon this fact, the standard score 2K  m Y j :=  Rj −  m  K of the mentioned edges quantity is calculated for each cluster j=1,…, K , where m is the sample size and K denotes the number of clusters. % • The partition quality Y is represented by the worst cluster corresponding to the minimal standard score value obtained. 16 August 8, 2009
  • 17. Concept • It is natural to expect that the true number of clusters can be characterized by the empirical distribution of the partition standard score having the shortest left tail. • The proposed methodology is expressed as a sequential creation of the described distribution with its left-asymmetry estimation. 17 August 8, 2009
  • 18. Concept One of important problems appearing here is the so-called clusters coordination problem. Actually, the same cluster can be differently tagged within repeated rerunning of the algorithm. This fact results from the inherent symmetry of the partitions according to their clusters labels. 18 August 8, 2009
  • 19. Concept We solve this problem by the following way: Let S = S1 ∪ S 2 . Consider three categorizations: Π K := Cl ( S , K ) , Π K ,1 := Cl ( S1, K ) , Π K ,2 := Cl ( S2 , K ) . Thus, we get two partitions for each of the samples Si, i=1,2. The first one is induced by ΠK and the second one is Π K ,i , i = 1, 2 . 19 August 8, 2009
  • 20. Concept For each one of the samples i =1,2, our purpose is to find the permutation ψ of the set {1,…,K} which minimizes the quantities of the misclassified items: ( i ) x , i = 1, 2 , ψ i* ψ α = arg min ∑ I  ( ) K ,i ( x ) ≠ α K ( )  ψ x∈ X   where I(z) is the indicator function of the event z and α K ,i , α Ki ) are assignments defined by ∏ K , ∏ K ,i , ( correspondingly. 20 August 8, 2009
  • 21. Concept The well-known Hungarian method for solving this problem has computational complexity of O(K3). After changing the cluster labels of the partitions ∏ K ,i , i = 1, 2 , consistent with ψ i , i = 1, 2 , * we can assume that these partitions are coordinated, i.e., the clusters are consistently designated. 21 August 8, 2009
  • 22. Algorithm 1. Choose the parameters: K*, J, m, Cl . 2. For K = 2 to K* 3. For j = 1 to J 4. Sj,1= sample (X, m) , Sj,2= sample (X Sj,1, m) 5. Calculate ΠK , j =Cl( S(j), K) , ΠK , j,1 =Cl( Sj ,1, K) , ΠK , j,2 =Cl( Sj ,2, K) . 6. Solve the coordination problem. 22 August 8, 2009
  • 23. Algorithm 7. Calculate Yj(k), k=1,…,K, % (jK ) . Y 8. end if j 9. Calculate an asymmetry index (percentile) IK % (jK ) | j = 1,...,J }. for {Y 10. end if K 11. The “true” K* is selected as the one which yields the maximal value of the index. Here, sample(S,m) is a procedure which selects a random sample of size m from the set S, without replacement. 23 August 8, 2009
  • 24. Numerical Experiments We have carried out various numerical experiments on synthetic and real data sets. We choose K*=7 in all tests, and we provide 10 trials for each experiment. The results are presented via the error-bar plots of the sample percentiles’ mean within the trials. The sizes of the error bars equal two standard deviations, found inside the trials of the results. The standard version of the Partitioning Around Medoids (PAM) algorithm has been used for clustering. The empirical percentiles of 25%, 75% and 90% have been used as the asymmetry indexes. 24 August 8, 2009
  • 25. Numerical Experiments – Synthetic Data The synthesized data are mixtures of 2-dimensional Gaussian distributions with independent coordinates owning the same standard deviation σ. Mean values of the components are placed on the unit circle on the angular neighboring distance 2π / k . ˆ Each data set contains 4000 items. Here, we took J=100 (J: number of samples) and m=200 (m: size of samples). 25 August 8, 2009
  • 26. Synthetic Data - Example 1 The first data set has the parameters k = 4 and σ = 0.3. ˆ As we see, all of the three indexes clearly indicate four clusters. 26 August 8, 2009
  • 27. Synthetic Data - Example 2 The second synthetic data set has the parameters k = 5 ˆ and σ = 0.3. The components are obviously overlapping in this case. 27 August 8, 2009
  • 28. Synthetic Data - Example 2 As it can be seen, the true number of clusters has been successfully found by all indexes. 28 August 8, 2009
  • 29. Numerical Experiments – Real-World Data First Data Sets The first real data set was chosen from the text collection http://ftp.cs.cornell.edu/pub/smart/ . This set consists of the following three sub-collections DC0: Medlars Collection (1033 medical abstracts), DC1: CISI Collection (1460 information science abstracts), DC2: Cranfield Collection (1400 aerodynamics abstracts). 29 August 8, 2009
  • 30. Numerical Experiments – Real-World Data First Data Sets We picked the 600 “best” terms, following the common bag of words method. It is known that this collection is well separated by means of its first two leading principal components. Here, we also took J=100 and m=200. 30 August 8, 2009
  • 31. Real-World Data - First Data Sets All the indexes receive their maximal values at K=3, i.e., the number of clusters is properly determined. 31 August 8, 2009
  • 32. Numerical Experiments – Real-World Data Second Data Set Another considered data set is the famous Iris Flower Data Set, available, for example, at http://archive.ics.uci.edu/ml/datasets/Iris . This dataset is composed from 150 4-dimensional feature vectors of three equally sized sets of iris flowers. We choose J=200 and the sample size equals 70. 32 August 8, 2009
  • 33. Real-World Data – Iris Flower Data Set Our method turns out a three clusters structure. 33 August 8, 2009
  • 34. Conclusions - The Rationale of Our Approach • In this paper, we propose a novel approach, based on the Minimal Spanning Tree two sample test, for the cluster stability assessment. • The method offers to quantify the partitions’ features through the test statistic computed within the clusters built by means of sample pairs. • The worst cluster, determined by the lowest standardized statistic value, characterizes the partition quality. 34 August 8, 2009
  • 35. Conclusions - The Rationale of Our Approach • The departure from the theoretical model, which suggests well-mingled samples within the clusters, is described by the left tail of the score distribution. • The shortest tail corresponds to the “true” number of clusters. • All presented experiments detect the true number of clusters. 35 August 8, 2009
  • 36. Conclusions • In the case of the five components Gaussian data set, the true number of clusters was found even though a certain overlapping of the clusters exists. • The four Gaussian components data set contains sufficiently separated components. Therefore, it is of no revelation that the true number of clusters is attained here. 36 August 8, 2009
  • 37. Conclusions • The analysis of the abstracts data set is carried out with 600 terms and the true number of clusters was also detected. • The Iris Flower dataset is sufficiently difficult to analyze due to the fact that two clusters are not linearly separable. However, the true number of clusters was found here as well. 37 August 8, 2009
  • 38. References Barzily, Z., Volkovich, Z.V., Akteke-Öztürk, B., and Weber, G.-W., Cluster stability using minimal spanning trees, ISI Proceedings of 20th Mini-EURO Conference Continuous Optimization and Knowledge-Based Technologies (Neringa, Lithuania, May 20-23, 2008) 248-252. Barzily, Z., Volkovich, Z.V., Akteke-Öztürk, B., and Weber, G.-W., On a minimal spanning tree approach in the cluster validation problem, to appear in the special issue of INFORMATICA at the occasion of 20th Mini-EURO Conference Continuous Optimization and Knowledge Based Technologies (Neringa, Lithuania, May 20-23, 2008), Dzemyda, G., Miettinen, K., and Sakalauskas, L., guest editors. Volkovich, V., Barzily, Z., Weber, G.-W., and Toledano-Kitai, D., Cluster stability estimation based on a minimal spanning trees approach, Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings 1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4 (August 2009) 299-305; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds.. 38 August 8, 2009