SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
A vne
 da cd
Ifr t nT e r i
nomai h oyn
     o
C P “ aN t e”
 VRi n us l
          hl                               CP
                                            VR
  T ti
   u rl
    oa                                  J n 1 -82 1
                                         u e 31 0 0
                                        S nFa c c ,A
                                         a rn i oC
                                               s
Gaussian Mixtures:
Classification & PDF Estimation

Francisco Escolano & Anand Rangarajan
Gaussian Mixtures

Background. Gaussian Mixtures are ubiquitous in CVPR. For
instance, in CBIR, it is sometimes iteresting to model the image as a
pdf over the pixel colors and positions (see for instance [Goldberger et
al.,03] where a KL-divergence computation method is presented).
GMs often provide a model for the pdf associated to the image and
this is useful for segmentation. GMs, as we have seen in the
previous lesson, are also useful for modeling shapes.
Therefore GMs estimation has been a recurrent topic in CVPR.
Traditional methods, associated to the EM algorithm have evolved
to incorporate IT elements like the MDL principle for model-order
selection [Figueiredo et al.,02] in parallel with the development of
Variational Bayes (VB) [Constantinopoulos and Likas,07]

                                                                           2/43
Uses of Gaussian Mixtures




Figure: Gaussian Mixtures for modeling images (top) and for color-based
segmentation (bottom)

                                                                          3/43
Review of Gaussian Mixtures

Definition
A d-dimensional random variable Y follows a finite-mixture
distribution when its pdf p(Y |Θ) can be described by a weighted
sum of known pdfs named kernels. When all of these kernels are
Gaussian, the mixture is named in the same way:
                                    K
                       p(Y |Θ) =         πi p(Y |Θi ),
                                   i=1

where 0 ≤ πi ≤ 1, i = 1, . . . , K , K πi = 1, K is the number of
                                          i=1
kernels, π1 , . . . , πK are the a priori probabilities of each kernel, and
Θi are the parameters describing the kernel. In GMs, Θi = {µi , Σi },
that is, the mean vector and covariance.
                                                                              4/43
Review of Gaussian Mixtures (2)

GMs and Maximum Likelihood
The whole set of parameters of a given K-mixture is denoted by
Θ ≡ {Θ1 , . . . , ΘK , π1 , . . . , πK }. Obtaining the optimal set of
parameters Θ∗ is usually posed in terms of maximizing the
log-likelihood of the pdf to be estimated, based on a set of N i.i.d.
samples of the variable Y = {y1 , . . . , yN }:
                                                           N
       L(Θ, Y ) = (Y |Θ) = log p(Y |Θ) = log                    p(yn |Θ)
                                                          n=1
                                  N          K
                             =         log         πk p(yn |Θk ).
                                 n=1         k=1

                                                                           5/43
Review of Gaussian Mixtures (3)

GMs and EM
The EM algorithm allows to find maximum-likelihood solutions to
problems where there are hidden variables. In the case of Gaussian
mixtures, these variables are a set of N labels Z = {z 1 , . . . , z N }
associated to the samples. Each label is a binary vector
         (n)         (n)                                          (n)
z i = [z1 , . . . , zK ], being K the number of components, zm = 1
       (n)
and zp = 0, if p = m, denoting that yn has been generated by the
kernel m. Then, given the complete set of data X = {Y , Z }, the
log-likelihood of this set is given by
                                 N   K
                                            n
             log p(Y , Z |Θ) =             zk log[πk p(yn |Θk )].
                                 n=1 k=1

                                                                           6/43
Review of Gaussian Mixtures (4)

E-Step
Consists in estimating the expected value of the hidden variables
given the visible data Y and the current estimation of the
parameters Θ∗ (t):
               (n)                     (n)
           E [zk |Y , Θ∗ (t)] = P[zk = 1|yn , Θ∗ (t)])
                                  πk (t)p(yn |Θ∗ (t))
                                    ∗
                                               k
                              =                         .
                                ΣK πj∗ (t)p(yn |Θ∗ (t))
                                 j=1              k

Thus, the probability of generating yn with the kernel k is given by:

                                   πk p(yn |k)
                     p(k|yn ) =                  .
                                  ΣK πj p(yn |j)
                                   j=1
                                                                        7/43
Review of Gaussian Mixtures (5)

M-Step
Given the expected Z , the new parameters Θ∗ (t + 1) are given by:
                                 N
                            1
                       πk =           p(k|yn ),
                            N
                                n=1

                              N
                              n=1 p(k|yn )yn
                      µk =     N
                                             ,
                               n=1 p(k|yn )
                     N
                     n=1 p(k|yn )(yn − µk )(yn    − µk )T
             Σk =               N
                                                            .
                                n=1 p(k|yn )


                                                                     8/43
Model Order Selection

Two Extreme Approaches
   How many kernels are needed for describe the distribution?
   [Figueiredo and Jain,02] it is proposed to perform EM for
   different values of K and take the one optimizing ML and a
   MLD-like criterion. Starting from a high K , kernel fusions are
   preformed if needed. Local optima arise.
   In EBEM [Pe˜alver et al., 09] we show that it is possible to apply
                n
   MDL more efficiently and robustly by starting from a unique
   kernel and splitting only if the underlying data is not Gaussian.
   The main challenge of this approach is how to estimate
   Gaussianity for multi-dimensional data.

                                                                        9/43
Model Order Selection (2)

MDL
Minimum Description Length and related principles choose a
representation of the data that allows us to express them with the
shortest possible message from a postulated set of models.
Rissanen’ MDL implies as minimizing

                                                  N(K )
           CMDL (Θ(K ) , K ) = −L(Θ(K ) , Y ) +         log n,
                                                   2
where: N(K ) is the number of parameters required to define a
K -component mixture, and n is the number of samples.

                                            d(d + 1)
              N(K ) = (K − 1) + K      d+                 .
                                               2
                                                                     10/43
Gaussian Deficiency

Maximum Entropy of a Mixture
Attending to the 2nd Gibbs Theorem, Gaussian variables have the
maximum entropy among all the variables with equal variance. This
theoretical maximum entropy for a d-dimensional variable Y only
depends on the covariance Σ and is given by:
                                 1
                  Hmax (Y ) =      log[(2πe)d |Σ|].
                                 2
Therefore, the maximum entropy of the mixture is given by
                                  K
                   Hmax (Y ) =         πk Hmax (k).
                                 k=1

                                                                11/43
Gaussian Deficiency (2)

Gaussian Deficiency
Instead of using the MDL principle we may compare the estimated
entropy of the underlying data with the entropy of a Gaussian. We
define the Gaussianity Deficiency GD of the whole mixture as the
normalized weighted sum of the differences between maximum and
real entropy of each kernel:
          K                                    K
                    Hmax (k) − Hreal (k)                       Hreal (k)
GD =           πk                          =         πk   1−               ,
                         Hmax (k)                              Hmax (k)
         k=1                                   k=1

where Hreal (k) is the real entropy of the data under the k−th
kernel. We have: 0 ≤ GD ≤ 1 (0 iff Gaussian). If the GD is high
enough we may stop the algorithm.
                                                                           12/43
Gaussian Deficiency (3)

Kernel Selection
If the GD ratio is below a given threshold, we consider that all
kernels are well fitted. Otherwise, we select the kernel with the
highest individual ratio and it is replaced by two other kernels that
are conveniently placed and initialized. Then, a new EM epoch with
K + 1 kernels starts. The worst kernel is given by

                                (Hmax (k) − Hreal (k))
             k ∗ = arg max πk                            .
                       k              Hmax (k)

Independently of using MDL or GD, in order to decide what kernel
can be split by two other kernels (if needed), we compute and later
expression and we decide to split k ∗ accordingly to MDL or GD.

                                                                      13/43
Split Process

Split Constrains
The k ∗ component must be decomposed into the kernels k1 and k2
with parameters Θk1 = (µk1 , Σk1 ) and Θk2 = (µk2 , Σk2 ). In
multivariate settings, the corresponding priors, the mean vectors and
the covariance matrices should satisfy the following split equations:

                            π∗ = π1 + π2 ,
                        π∗ µ∗ = π1 µ1 + π2 µ2 ,
       π∗ (Σ∗ + µ∗ µT ) = π1 (Σ1 + µ1 µT ) + π2 (Σ2 + µ2 µT ),
                    ∗                   1                 2

Clearly, the split move is an ill-posed problem because the number of
equations is less than the number of unknowns.

                                                                    14/43
Split Process (2)

Split
                                            T
Following [Dellaportas,06], let Σ∗ = V∗ Λ∗ V∗ . Let also be D a d × d
rotation matrix with orthonormal unit vectors as columns. Then:
                    π1 = u1 π∗ , π2 = (1 − u1 )π∗ ,
                 µ1 = µ∗ − ( d u2 λi∗ V∗ ) π2 ,
                                 i=1
                                     i       i
                                                    π1
                                d    i        i     π1
                 µ2 = µ∗ + (    i=1 u2   λi∗ V∗ )   π2 ,
           Λ1 = diag(u3 )diag(ι − u2 )diag(ι + u2 )Λ∗ π∗ ,
                                                       π1
          Λ2 = diag(ι − u3 )diag(ι − u2 )diag(ι + u2 )Λ∗ π∗ ,
                                                          π2
                     V1 = DV∗ , V2 = D T V∗ ,


                                                                        15/43
Split Process (3)

Split (cont.)
The latter spectral split method has a non-evident random
component, because ι is a d x 1 vector of ones,
u1 , u2 = (u2 , u2 , . . . , u2 )T and u3 = (u3 , u3 , . . . , u3 )T are 2d + 1
            1 2               d               1 2               d

random variables needed to build priors, means and eigenvalues for
the new component in the mixture. They are calculated as:
                                    1
                     u1 ∼ β(2, 2), u2 ∼ β(1, 2d),
                j              1             j
               u2 ∼ U(−1, 1), u3 ∼ β(1, d), u3 ∼ U(0, 1),

with j = 2, . . . , d, U(., .) and β(., .) denotes Beta and Uniform
distributions respectively.

                                                                                  16/43
Split Process (4)




Figure: Split of a 2D kernel into two ones.


                                               17/43
EBEM Algorithm
Alg. 1: EBEM - Entropy Based EM Algorithm

Input: convergence th
                                                         1   N                1      N             T
K = 1, i = 0, π1 = 1, Θ1 = {µ1 , Σ1 } where µ1 = N           i=1 yi , Σ1 = N−1       i=1 (yi − µ1 ) (yi − µ1 )
Final = false
repeat
      i =i +1
      repeat
             EM iteration
             Estimate log-likelihood in iteration i: (Y |Θ(i))
      until | (Y |Θ(i)) − (Y |Θ(i − 1))| < convergence th ;
      Evaluate Hreal (Y ) and Hmax (Y )
                                                               (H     (k)−Hreal (k))
      Select k ∗ with the highest ratio: k ∗ = arg maxk πk maxH           (k)
                                                                      max
                                                                 d(d+1)                                          N(k)
      Estimate CMDL in iteration i: N(k) = (k − 1) + k d +          2
                                                                            , CMDL (Θ(i)) = − (Y |Θ(i)) +         2
                                                                                                                        log n
       if (C(Θ(i)) ≥ C(Θ(i − 1))) then
             Final = true
             K = K − 1, Θ∗ = Θ(i − 1)
       end
       else
             Decompose k ∗ in k1 and k2
       end
until Final=true ;
Output: Optimal mixture model: K , Θ∗


                                                                                                                           18/43
EBEM Algorithm (2)




Figure: Top: MML (Figueiredo & Jain), Bottom: EBEM
                                                     19/43
EBEM Algorithm (3)




Figure: Color Segmentation: EBEM (2nd col.) vs VEM (3rd col.)

                                                                20/43
EBEM Algorithm (4)

   Table: EM, VEM and EBEM in Color Image Segmentation

   Algorithm        “Forest”   “Sunset”    “Lighthouse”
                     (K=5)       (K=7)         (K=8)
Classic EM (PSNR)     5.35        14.76         12.08
      (dB)            ±0.39      ±2.07         ±2.49
  VEM (PSNR)          10.96       18.64         15.88
      (dB)            ±0.59      ±0.40         ±1.08
 EBEM (PSNR)         14.1848      18.91        19.4205
      (dB)            ±0.35      ±0.38         ±2.11

                                                          21/43
EBEM Algorithm (5)

EBEM in Higher Dimensions
   We have also tested the algorithm with the well known Wine
   data set, that contains 3 classes of 178 (13-dimensional)
   instances.
   The number of samples, 178 is not enough to build the pdf
   using Parzen’s windows method in a 13-dimensional space.
   With the MST approach (see below) where no pdf estimation is
   needed, the algorithm has been applied to this data set.
   After EBEM ends with K = 3, a maximum a posteriori
   classifier was built. The classification performance was 96.1%.
   This result is either similar or even better than the experiments
   reported in the literature.
                                                                   22/43
Entropic Graphs

EGs and R´nyi Entropy
         e
Entropic Spanning Graphs obtained from data to estimate R´nyi’s
                                                             e
α-entropy [Hero and Michel, 02] belong to the “non plug-in” methods
for entropy estimation. R´nyi’s α-entropy of a probability density
                         e
function p is defined as:

                               1
                   Hα (p) =       ln       p α (z)dz
                              1−α      z
for α ∈ [0, 1[. The α-entropy converges to the Shannon one
limα→1 Hα (p) = H(p) ≡ − p(z) ln p(z)dz, so it is possible to
obtain the Shannon entropy from the R´nyi’s one if the latter limit
                                        e
is either solved or numerically approximated.

                                                                      23/43
Entropic Graphs (2)

EGs and R´nyi Entropy (cont.)
         e
Let be a graph G consisting in a set of vertices Xn = {x1 , . . . , xn },
with xn ∈ R d and edges {e} that connect vertices: eij = (xi , xj ). If
we denote by M(Xn ) the possible sets of edges in the class of acyclic
graphs spanning Xn (spanning trees), the total edge length
functional of the Euclidean power weighted Minimal Spanning Tree
is:

                   LMST (Xn ) = min
                    γ                                ||e||γ
                                 M(Xn )
                                          e∈M(Xn )

with γ∈ [0, d] and ||.|| the Euclidean distance. The MST has been
used in order to measure the randomness of a set of points.
                                                                        24/43
Entropic Graphs (3)

EGs and R´nyi Entropy (cont.)
         e
It is intuitive that the length of the MST for the uniformly
distributed points increases at a greater rate than does the MST
spanning the more concentrated nonuniform set of points. For
d ≥ 2:
                              d     Lγ (Xn )
                   Hα (Xn ) =    ln          − ln βLγ ,d
                              γ       nα
is an asymptotically unbiased and almost surely consistent estimator
of the α-entropy of p where α = (d − γ)/d and βLγ ,d is a constant
bias correction for which there are only known approximations and
bounds: (i) Monte Carlo simulation of uniform random samples on
unit cube [0, 1]d ; (ii) Large d approximation: (γ/2) ln(d/(2πe)).

                                                                   25/43
Entropic Graphs (4)




Figure: Uniform (left) vs Gaussian (right) distribution’s EGs.


                                                                 26/43
Entropic Graphs (5)




                                             a+b×e cd
Figure: Extrapolation to Shannon: α∗ = 1 −      N




                                                        27/43
Variational Bayes

Problem Definition
Given N i.i.d. samples X = {x 1 , . . . , x N } of a d-dimensional
random variable X , their associated hidden variables
Z = {z 1 , . . . , z N } and the parameters Θ of the model, the Bayesian
posterior is given by [Watanabe et al.,09] :
                                      N       n n
                              p(Θ)    n=1 p(x , z |Θ)
             p(Z , Θ|X ) =           N
                                                          .
                                             n n
                             p(Θ)    n=1 p(x , z |Θ)dΘ

Since the integration w.r.t. Θ is analytically intractable, the
posterior is approximated by a factorized distribution
q(Z , Θ) = q(Z )q(Θ) and the optimal approximation is the one that
minimizes the variational free energy.
                                                                       28/43
Variational Bayes (2)

Problem Definition (cont.)
The variational free energy is given by:
                                                      N
                         q(Z , Θ)
L(q) =     q(Z , Θ) log             dΘ − log   p(Θ)         p(x n |θ)dΘ ,
                        p(Z , Θ|X )
                                                      n=1

where the first term is the Kullback-Leibler divergence between the
approximation and the true posterior. As the second term is
independent of the approximation, the Variational Bayes (VB)
approach is reduced to minimize the latter divergence. Such
minimization is addressed in a EM-like process alternating the
updating of q(Θ) and the updating of q(Z ).

                                                                        29/43
Variational Bayes (3)


Problem Definition (cont.)
The EM-like process alternating the updating of q(Θ) and the
updating of q(Z ) is given by
                                 N
         q(Θ) ∝ p(Θ) exp             log p(x n , z n |Θ)    q(Z )
                               n=1
                          N
         q(Z ) ∝ exp           log p(x n , z n |Θ)   q(Θ)
                         n=1




                                                                    30/43
Variational Bayes (4)

Problem Definition (cont.)
In [Constantinopoulos and Likas,07] , the optimization of the
variational free energy yields (being N (.) and W(.) are respectively
the Gaussian and Wishart densities):
                                           n                n
               q(Z ) = N n=1
                                 s
                                 k=1 rk n
                                          zk    K
                                                k=s+1 ρk n
                                                           zk
                                K
                     q(µ) =     k=1 N (µk |mk , Σk
                                K
                     q(Σ) =     k=1 W(Σk |νk , Vk )
                              Γ( K                                   γk −1
                                                                     ˜
               s        −K +s      k=s+1 γk )
                                         ˜
q(β) = (1 −    k=1 πk )         K             · Kk=s+1 1− s πk
                                                              πk
                                                                             ,
                                k=s+1 Γ(˜k )
                                         γ                     k=1


After the maximization of the free energy w.r.t. q(.), it proceeds to
update the coefficients in α which denote the free components.
                                                                             31/43
Model Selection in VB

Fixed and Free Components
   In the latter framework, it is assumed that a number of K − s
   components fit the data well in their region of influence (fixed
   components) and then model order selection is posed in terms
   of optimizing the parameters of the remaing s (free
   components).
   Let α = {πk }sk=1 the coefficients of the free components and
              K
   β = {πk }k=s+1 the coefficients of the fixed components.Under
   the i.i.d. sampling assumption, the prior distribution of Z given
   α and β can be modeled by a product of multinomials:
                            N     s       n
                                         zk   K         n
                                                       zk
            p(Z |α, β) =    n=1   k=1 πk      k=s+1 πk      .

                                                                   32/43
Model Selection in VB (2)


Fixed and Free Components (cont.)
    Moreover, assuming conjugate Dirichlet priors over the set of
    mixing coefficients, we have that
                                   p(β|α) =
                     −K +s Γ(
                                  K
                                      γk )                             γk −1
            s                     k=s+1        K            πk
     (1 −   k=1 πk )          K            ·   k=s+1   1−   s
                                                                  πk
                                                                               .
                             k=s+1   Γ(γk )                 k=1

    Then, considering fixed coefficients Θ is redefined as
    Θ = {µ, Σ, β} and we have the following factorization:

                    q(Z , Θ) = q(Z )q(µ)q(Σ)q(β) .


                                                                                   33/43
Model Selection in VB (3)


Kernel Splits
    In [Constantinopoulos and Likas,07] , the VBgmm methodis used
    for training an initial K = 2 model. Then, in the so called
    VBgmmSplit, they proceed by sorting the obtained kernels and
    then trying to split them recursively.
    Each splitting consists of:
         Removing the original component.
         Replacing it by two kernels with the same covariance matrix as
         the original but with means placed in opposite directions along
         the maximum variabiability direction.



                                                                           34/43
Model Selection in VB (4)

Kernel Splits (cont)
    Independently of the split strategy, the critical point of
    VBgmmSplit is the amount of splits needed until convergence.
    At each iteration of the latter algorithm the K current exisiting
    kernels are splited. Consider the case of any split is detected as
    proper (non-zero π after running the VB update described in
    the previous section, where each new kernel is considered as
    free).
    Then, the number of components increases and then a new set
    of splitting tests starts in the next iteration. This means that if
    the algorithm stops (all splits failed) with K kernels, the
    number of splits has been 1 + 2 + . . . + K = K (K + 1)/2.

                                                                      35/43
Model Selection in VB (5)

EBVS Split
   We split only one kernel per iteration. In order to do so, we
   implement a selection criterion based on measuring the entropy
   of the kernels.
   If ones uses Leonenko’s estimator then there is no need of
   extrapolation as in EGs, and asymptotic consistence is ensured.
   Then, at each iteration of the algorithm we select the worse, in
   terms of low entropy, to be split. If the split is successful we
   will have K + 1 kernels to feed the VB optimization in the next
   iteration. Otherwise, there is no need to add a new kernel and
   the process converges to K kernels. The key question here is
   that the overall process is linear (one split per iteration).
                                                                  36/43
EBVS: Fast BV




Figure: EBVS Results



                                   37/43
EBVS: Fast BV (2)




Figure: EBVS Results (more)



                                  38/43
EBVS: Fast BV (3)


MD Experiments
   With this approach using Leonenko’s estimator, the
   classification performance we obtain on this data set is 86%.
   Altough experiments in higher dimensions can be performed,
   when the number of samples is not high enough, the risk of
   unbounded maxima of the likelihood function is higher, due to
   singular covariance matrices.
   The entropy estimation method, however, performs very well
   with thousands of dimensions.



                                                                   39/43
Conclusions


Summarizing Ideas in GMs
    In the multi-dimensional case, efficient entropy estimators
    become critical.
    In VB where model-order selection is implicit, it is possible to
    reduce the complexity at least by one order of magnitude.
    Can use the same approach for shapes in 2D and 3D.
    Once we have the mixtures, new measures for compare them
    are waiting to be discovered and used. Let’s do it!



                                                                       40/43
References

[Goldberger et al.,03] Goldberger, J., Gordon, S., Greenspan, H
(2003). An Efficient Image Similarity Measure Based on
Approximations of KL-Divergence Between Two Gaussian Mixtures.
ICCV’03
[Figueiredo and Jain, 02] Figueiredo, M. and Jain, A. (2002).
Unsupervised learning of nite mixture models. IEEE Trans. Pattern
Anal. Mach. Intell., vol. 24, no. 3, pp. 381399
[Constantinopoulos and Likas,07] Constantinopoulos, C. and Likas, A.
(2007). Unsupervised Learning of Gaussian Mixtures based on
Variational Component Splitting. IEEE Trans. Neural Networks, vol.
18., no. 3, 745–755.


                                                                    41/43
References (2)

[Pe˜alver et al., 09] Pe˜alver, A., Escolano, F., S´ez, J.M. Learning
   n                    n                          a
Gaussian Mixture Models with Entropy Based Criteria. IEEE Trans.
on Neural Networks, 20(11) 1756–1771.
[Dellaportas,06] Dellaportas, P. and Papageorgiou I. (2006).
Multivariate mixtures of normals with unknown number of
components. Statistics and Computing, vol. 16, no. 1, pp. 57–68
[Hero and Michel,02] Hero, A. and Michel, o. (2002). Applications of
spanning entropic graphs. IEEE Signal Processing Magazine, vol. 19,
no. 5, pp. 85–95
[Watanabe et al.,09] Watanabe, K., Akaho, S., Omachi, S.:
Variational bayesian mixture model on a subspace of exponential
family distributions. IEEE Transactions on Neural Networks 20(11)
1783–1796
                                                                        42/43
References (3)

Escolano et al.,10] Escolano, F., Pe˜alver A. and Bonev, B. (2010).
                                    n
Entropy-based Variational Scheme for Fast Bayes Learning of
Gaussian Mixtures. SSPR’2010 (accepted)
[Rajwadee et al.,09] Ajit Rajwade, Arunava Banerjee, Anand
Rangarajan(2009). Probability Density Estimation Using Isocontours
and Isosurfaces: Applications to Information-Theoretic Image
Registration. IEEE Trans. Pattern Anal. Mach. Intell. 31(3):
475–491
[Chen et al.,10] Ting Chen, Baba C Vemuri, Anand Rangarajan,
Stephan J Eisenschenk (2010). Group-wise Point-set registration
using a novel CDF-based Havrda-Charv´t Divergence.Int J Comput
                                     a
Vis. 86 (1):111-124

                                                                      43/43

Weitere ähnliche Inhalte

Was ist angesagt?

Kolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametriclineKolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametriclineAlina Barbulescu
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleAlexander Litvinenko
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1
 
Graph kernels
Graph kernelsGraph kernels
Graph kernelsLuc Brun
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsLuc Brun
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Meanstthonet
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesVjekoslavKovac1
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsVjekoslavKovac1
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Taiji Suzuki
 
The Black-Litterman model in the light of Bayesian portfolio analysis
The Black-Litterman model in the light of Bayesian portfolio analysisThe Black-Litterman model in the light of Bayesian portfolio analysis
The Black-Litterman model in the light of Bayesian portfolio analysisDaniel Bruggisser
 
Parameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial DecisionsParameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial DecisionsDaniel Bruggisser
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural NetworkMasahiro Suzuki
 

Was ist angesagt? (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Kolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametriclineKolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametricline
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular Integrals
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli... Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operators
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
The Black-Litterman model in the light of Bayesian portfolio analysis
The Black-Litterman model in the light of Bayesian portfolio analysisThe Black-Litterman model in the light of Bayesian portfolio analysis
The Black-Litterman model in the light of Bayesian portfolio analysis
 
Parameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial DecisionsParameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial Decisions
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
 
Mgm
MgmMgm
Mgm
 

Ähnlich wie CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures

CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
The Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is DiagonalizableThe Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is DiagonalizableJay Liew
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Heuristics for counterexamples to the Agrawal Conjecture
Heuristics for counterexamples to the Agrawal ConjectureHeuristics for counterexamples to the Agrawal Conjecture
Heuristics for counterexamples to the Agrawal ConjectureAmshuman Hegde
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayesPhong Vo
 
Chapter 9 computation of the dft
Chapter 9 computation of the dftChapter 9 computation of the dft
Chapter 9 computation of the dftmikeproud
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesChristian Robert
 
Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Pierre Jacob
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsBigMC
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 

Ähnlich wie CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures (20)

CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
The Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is DiagonalizableThe Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is Diagonalizable
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Igv2008
Igv2008Igv2008
Igv2008
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
Heuristics for counterexamples to the Agrawal Conjecture
Heuristics for counterexamples to the Agrawal ConjectureHeuristics for counterexamples to the Agrawal Conjecture
Heuristics for counterexamples to the Agrawal Conjecture
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayes
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
Chapter 9 computation of the dft
Chapter 9 computation of the dftChapter 9 computation of the dft
Chapter 9 computation of the dft
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixtures
 
Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering models
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 

Mehr von zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Mehr von zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Kürzlich hochgeladen

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 

Kürzlich hochgeladen (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 

CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures

  • 1. A vne da cd Ifr t nT e r i nomai h oyn o C P “ aN t e” VRi n us l hl CP VR T ti u rl oa J n 1 -82 1 u e 31 0 0 S nFa c c ,A a rn i oC s Gaussian Mixtures: Classification & PDF Estimation Francisco Escolano & Anand Rangarajan
  • 2. Gaussian Mixtures Background. Gaussian Mixtures are ubiquitous in CVPR. For instance, in CBIR, it is sometimes iteresting to model the image as a pdf over the pixel colors and positions (see for instance [Goldberger et al.,03] where a KL-divergence computation method is presented). GMs often provide a model for the pdf associated to the image and this is useful for segmentation. GMs, as we have seen in the previous lesson, are also useful for modeling shapes. Therefore GMs estimation has been a recurrent topic in CVPR. Traditional methods, associated to the EM algorithm have evolved to incorporate IT elements like the MDL principle for model-order selection [Figueiredo et al.,02] in parallel with the development of Variational Bayes (VB) [Constantinopoulos and Likas,07] 2/43
  • 3. Uses of Gaussian Mixtures Figure: Gaussian Mixtures for modeling images (top) and for color-based segmentation (bottom) 3/43
  • 4. Review of Gaussian Mixtures Definition A d-dimensional random variable Y follows a finite-mixture distribution when its pdf p(Y |Θ) can be described by a weighted sum of known pdfs named kernels. When all of these kernels are Gaussian, the mixture is named in the same way: K p(Y |Θ) = πi p(Y |Θi ), i=1 where 0 ≤ πi ≤ 1, i = 1, . . . , K , K πi = 1, K is the number of i=1 kernels, π1 , . . . , πK are the a priori probabilities of each kernel, and Θi are the parameters describing the kernel. In GMs, Θi = {µi , Σi }, that is, the mean vector and covariance. 4/43
  • 5. Review of Gaussian Mixtures (2) GMs and Maximum Likelihood The whole set of parameters of a given K-mixture is denoted by Θ ≡ {Θ1 , . . . , ΘK , π1 , . . . , πK }. Obtaining the optimal set of parameters Θ∗ is usually posed in terms of maximizing the log-likelihood of the pdf to be estimated, based on a set of N i.i.d. samples of the variable Y = {y1 , . . . , yN }: N L(Θ, Y ) = (Y |Θ) = log p(Y |Θ) = log p(yn |Θ) n=1 N K = log πk p(yn |Θk ). n=1 k=1 5/43
  • 6. Review of Gaussian Mixtures (3) GMs and EM The EM algorithm allows to find maximum-likelihood solutions to problems where there are hidden variables. In the case of Gaussian mixtures, these variables are a set of N labels Z = {z 1 , . . . , z N } associated to the samples. Each label is a binary vector (n) (n) (n) z i = [z1 , . . . , zK ], being K the number of components, zm = 1 (n) and zp = 0, if p = m, denoting that yn has been generated by the kernel m. Then, given the complete set of data X = {Y , Z }, the log-likelihood of this set is given by N K n log p(Y , Z |Θ) = zk log[πk p(yn |Θk )]. n=1 k=1 6/43
  • 7. Review of Gaussian Mixtures (4) E-Step Consists in estimating the expected value of the hidden variables given the visible data Y and the current estimation of the parameters Θ∗ (t): (n) (n) E [zk |Y , Θ∗ (t)] = P[zk = 1|yn , Θ∗ (t)]) πk (t)p(yn |Θ∗ (t)) ∗ k = . ΣK πj∗ (t)p(yn |Θ∗ (t)) j=1 k Thus, the probability of generating yn with the kernel k is given by: πk p(yn |k) p(k|yn ) = . ΣK πj p(yn |j) j=1 7/43
  • 8. Review of Gaussian Mixtures (5) M-Step Given the expected Z , the new parameters Θ∗ (t + 1) are given by: N 1 πk = p(k|yn ), N n=1 N n=1 p(k|yn )yn µk = N , n=1 p(k|yn ) N n=1 p(k|yn )(yn − µk )(yn − µk )T Σk = N . n=1 p(k|yn ) 8/43
  • 9. Model Order Selection Two Extreme Approaches How many kernels are needed for describe the distribution? [Figueiredo and Jain,02] it is proposed to perform EM for different values of K and take the one optimizing ML and a MLD-like criterion. Starting from a high K , kernel fusions are preformed if needed. Local optima arise. In EBEM [Pe˜alver et al., 09] we show that it is possible to apply n MDL more efficiently and robustly by starting from a unique kernel and splitting only if the underlying data is not Gaussian. The main challenge of this approach is how to estimate Gaussianity for multi-dimensional data. 9/43
  • 10. Model Order Selection (2) MDL Minimum Description Length and related principles choose a representation of the data that allows us to express them with the shortest possible message from a postulated set of models. Rissanen’ MDL implies as minimizing N(K ) CMDL (Θ(K ) , K ) = −L(Θ(K ) , Y ) + log n, 2 where: N(K ) is the number of parameters required to define a K -component mixture, and n is the number of samples. d(d + 1) N(K ) = (K − 1) + K d+ . 2 10/43
  • 11. Gaussian Deficiency Maximum Entropy of a Mixture Attending to the 2nd Gibbs Theorem, Gaussian variables have the maximum entropy among all the variables with equal variance. This theoretical maximum entropy for a d-dimensional variable Y only depends on the covariance Σ and is given by: 1 Hmax (Y ) = log[(2πe)d |Σ|]. 2 Therefore, the maximum entropy of the mixture is given by K Hmax (Y ) = πk Hmax (k). k=1 11/43
  • 12. Gaussian Deficiency (2) Gaussian Deficiency Instead of using the MDL principle we may compare the estimated entropy of the underlying data with the entropy of a Gaussian. We define the Gaussianity Deficiency GD of the whole mixture as the normalized weighted sum of the differences between maximum and real entropy of each kernel: K K Hmax (k) − Hreal (k) Hreal (k) GD = πk = πk 1− , Hmax (k) Hmax (k) k=1 k=1 where Hreal (k) is the real entropy of the data under the k−th kernel. We have: 0 ≤ GD ≤ 1 (0 iff Gaussian). If the GD is high enough we may stop the algorithm. 12/43
  • 13. Gaussian Deficiency (3) Kernel Selection If the GD ratio is below a given threshold, we consider that all kernels are well fitted. Otherwise, we select the kernel with the highest individual ratio and it is replaced by two other kernels that are conveniently placed and initialized. Then, a new EM epoch with K + 1 kernels starts. The worst kernel is given by (Hmax (k) − Hreal (k)) k ∗ = arg max πk . k Hmax (k) Independently of using MDL or GD, in order to decide what kernel can be split by two other kernels (if needed), we compute and later expression and we decide to split k ∗ accordingly to MDL or GD. 13/43
  • 14. Split Process Split Constrains The k ∗ component must be decomposed into the kernels k1 and k2 with parameters Θk1 = (µk1 , Σk1 ) and Θk2 = (µk2 , Σk2 ). In multivariate settings, the corresponding priors, the mean vectors and the covariance matrices should satisfy the following split equations: π∗ = π1 + π2 , π∗ µ∗ = π1 µ1 + π2 µ2 , π∗ (Σ∗ + µ∗ µT ) = π1 (Σ1 + µ1 µT ) + π2 (Σ2 + µ2 µT ), ∗ 1 2 Clearly, the split move is an ill-posed problem because the number of equations is less than the number of unknowns. 14/43
  • 15. Split Process (2) Split T Following [Dellaportas,06], let Σ∗ = V∗ Λ∗ V∗ . Let also be D a d × d rotation matrix with orthonormal unit vectors as columns. Then: π1 = u1 π∗ , π2 = (1 − u1 )π∗ , µ1 = µ∗ − ( d u2 λi∗ V∗ ) π2 , i=1 i i π1 d i i π1 µ2 = µ∗ + ( i=1 u2 λi∗ V∗ ) π2 , Λ1 = diag(u3 )diag(ι − u2 )diag(ι + u2 )Λ∗ π∗ , π1 Λ2 = diag(ι − u3 )diag(ι − u2 )diag(ι + u2 )Λ∗ π∗ , π2 V1 = DV∗ , V2 = D T V∗ , 15/43
  • 16. Split Process (3) Split (cont.) The latter spectral split method has a non-evident random component, because ι is a d x 1 vector of ones, u1 , u2 = (u2 , u2 , . . . , u2 )T and u3 = (u3 , u3 , . . . , u3 )T are 2d + 1 1 2 d 1 2 d random variables needed to build priors, means and eigenvalues for the new component in the mixture. They are calculated as: 1 u1 ∼ β(2, 2), u2 ∼ β(1, 2d), j 1 j u2 ∼ U(−1, 1), u3 ∼ β(1, d), u3 ∼ U(0, 1), with j = 2, . . . , d, U(., .) and β(., .) denotes Beta and Uniform distributions respectively. 16/43
  • 17. Split Process (4) Figure: Split of a 2D kernel into two ones. 17/43
  • 18. EBEM Algorithm Alg. 1: EBEM - Entropy Based EM Algorithm Input: convergence th 1 N 1 N T K = 1, i = 0, π1 = 1, Θ1 = {µ1 , Σ1 } where µ1 = N i=1 yi , Σ1 = N−1 i=1 (yi − µ1 ) (yi − µ1 ) Final = false repeat i =i +1 repeat EM iteration Estimate log-likelihood in iteration i: (Y |Θ(i)) until | (Y |Θ(i)) − (Y |Θ(i − 1))| < convergence th ; Evaluate Hreal (Y ) and Hmax (Y ) (H (k)−Hreal (k)) Select k ∗ with the highest ratio: k ∗ = arg maxk πk maxH (k) max d(d+1) N(k) Estimate CMDL in iteration i: N(k) = (k − 1) + k d + 2 , CMDL (Θ(i)) = − (Y |Θ(i)) + 2 log n if (C(Θ(i)) ≥ C(Θ(i − 1))) then Final = true K = K − 1, Θ∗ = Θ(i − 1) end else Decompose k ∗ in k1 and k2 end until Final=true ; Output: Optimal mixture model: K , Θ∗ 18/43
  • 19. EBEM Algorithm (2) Figure: Top: MML (Figueiredo & Jain), Bottom: EBEM 19/43
  • 20. EBEM Algorithm (3) Figure: Color Segmentation: EBEM (2nd col.) vs VEM (3rd col.) 20/43
  • 21. EBEM Algorithm (4) Table: EM, VEM and EBEM in Color Image Segmentation Algorithm “Forest” “Sunset” “Lighthouse” (K=5) (K=7) (K=8) Classic EM (PSNR) 5.35 14.76 12.08 (dB) ±0.39 ±2.07 ±2.49 VEM (PSNR) 10.96 18.64 15.88 (dB) ±0.59 ±0.40 ±1.08 EBEM (PSNR) 14.1848 18.91 19.4205 (dB) ±0.35 ±0.38 ±2.11 21/43
  • 22. EBEM Algorithm (5) EBEM in Higher Dimensions We have also tested the algorithm with the well known Wine data set, that contains 3 classes of 178 (13-dimensional) instances. The number of samples, 178 is not enough to build the pdf using Parzen’s windows method in a 13-dimensional space. With the MST approach (see below) where no pdf estimation is needed, the algorithm has been applied to this data set. After EBEM ends with K = 3, a maximum a posteriori classifier was built. The classification performance was 96.1%. This result is either similar or even better than the experiments reported in the literature. 22/43
  • 23. Entropic Graphs EGs and R´nyi Entropy e Entropic Spanning Graphs obtained from data to estimate R´nyi’s e α-entropy [Hero and Michel, 02] belong to the “non plug-in” methods for entropy estimation. R´nyi’s α-entropy of a probability density e function p is defined as: 1 Hα (p) = ln p α (z)dz 1−α z for α ∈ [0, 1[. The α-entropy converges to the Shannon one limα→1 Hα (p) = H(p) ≡ − p(z) ln p(z)dz, so it is possible to obtain the Shannon entropy from the R´nyi’s one if the latter limit e is either solved or numerically approximated. 23/43
  • 24. Entropic Graphs (2) EGs and R´nyi Entropy (cont.) e Let be a graph G consisting in a set of vertices Xn = {x1 , . . . , xn }, with xn ∈ R d and edges {e} that connect vertices: eij = (xi , xj ). If we denote by M(Xn ) the possible sets of edges in the class of acyclic graphs spanning Xn (spanning trees), the total edge length functional of the Euclidean power weighted Minimal Spanning Tree is: LMST (Xn ) = min γ ||e||γ M(Xn ) e∈M(Xn ) with γ∈ [0, d] and ||.|| the Euclidean distance. The MST has been used in order to measure the randomness of a set of points. 24/43
  • 25. Entropic Graphs (3) EGs and R´nyi Entropy (cont.) e It is intuitive that the length of the MST for the uniformly distributed points increases at a greater rate than does the MST spanning the more concentrated nonuniform set of points. For d ≥ 2: d Lγ (Xn ) Hα (Xn ) = ln − ln βLγ ,d γ nα is an asymptotically unbiased and almost surely consistent estimator of the α-entropy of p where α = (d − γ)/d and βLγ ,d is a constant bias correction for which there are only known approximations and bounds: (i) Monte Carlo simulation of uniform random samples on unit cube [0, 1]d ; (ii) Large d approximation: (γ/2) ln(d/(2πe)). 25/43
  • 26. Entropic Graphs (4) Figure: Uniform (left) vs Gaussian (right) distribution’s EGs. 26/43
  • 27. Entropic Graphs (5) a+b×e cd Figure: Extrapolation to Shannon: α∗ = 1 − N 27/43
  • 28. Variational Bayes Problem Definition Given N i.i.d. samples X = {x 1 , . . . , x N } of a d-dimensional random variable X , their associated hidden variables Z = {z 1 , . . . , z N } and the parameters Θ of the model, the Bayesian posterior is given by [Watanabe et al.,09] : N n n p(Θ) n=1 p(x , z |Θ) p(Z , Θ|X ) = N . n n p(Θ) n=1 p(x , z |Θ)dΘ Since the integration w.r.t. Θ is analytically intractable, the posterior is approximated by a factorized distribution q(Z , Θ) = q(Z )q(Θ) and the optimal approximation is the one that minimizes the variational free energy. 28/43
  • 29. Variational Bayes (2) Problem Definition (cont.) The variational free energy is given by: N q(Z , Θ) L(q) = q(Z , Θ) log dΘ − log p(Θ) p(x n |θ)dΘ , p(Z , Θ|X ) n=1 where the first term is the Kullback-Leibler divergence between the approximation and the true posterior. As the second term is independent of the approximation, the Variational Bayes (VB) approach is reduced to minimize the latter divergence. Such minimization is addressed in a EM-like process alternating the updating of q(Θ) and the updating of q(Z ). 29/43
  • 30. Variational Bayes (3) Problem Definition (cont.) The EM-like process alternating the updating of q(Θ) and the updating of q(Z ) is given by N q(Θ) ∝ p(Θ) exp log p(x n , z n |Θ) q(Z ) n=1 N q(Z ) ∝ exp log p(x n , z n |Θ) q(Θ) n=1 30/43
  • 31. Variational Bayes (4) Problem Definition (cont.) In [Constantinopoulos and Likas,07] , the optimization of the variational free energy yields (being N (.) and W(.) are respectively the Gaussian and Wishart densities): n n q(Z ) = N n=1 s k=1 rk n zk K k=s+1 ρk n zk K q(µ) = k=1 N (µk |mk , Σk K q(Σ) = k=1 W(Σk |νk , Vk ) Γ( K γk −1 ˜ s −K +s k=s+1 γk ) ˜ q(β) = (1 − k=1 πk ) K · Kk=s+1 1− s πk πk , k=s+1 Γ(˜k ) γ k=1 After the maximization of the free energy w.r.t. q(.), it proceeds to update the coefficients in α which denote the free components. 31/43
  • 32. Model Selection in VB Fixed and Free Components In the latter framework, it is assumed that a number of K − s components fit the data well in their region of influence (fixed components) and then model order selection is posed in terms of optimizing the parameters of the remaing s (free components). Let α = {πk }sk=1 the coefficients of the free components and K β = {πk }k=s+1 the coefficients of the fixed components.Under the i.i.d. sampling assumption, the prior distribution of Z given α and β can be modeled by a product of multinomials: N s n zk K n zk p(Z |α, β) = n=1 k=1 πk k=s+1 πk . 32/43
  • 33. Model Selection in VB (2) Fixed and Free Components (cont.) Moreover, assuming conjugate Dirichlet priors over the set of mixing coefficients, we have that p(β|α) = −K +s Γ( K γk ) γk −1 s k=s+1 K πk (1 − k=1 πk ) K · k=s+1 1− s πk . k=s+1 Γ(γk ) k=1 Then, considering fixed coefficients Θ is redefined as Θ = {µ, Σ, β} and we have the following factorization: q(Z , Θ) = q(Z )q(µ)q(Σ)q(β) . 33/43
  • 34. Model Selection in VB (3) Kernel Splits In [Constantinopoulos and Likas,07] , the VBgmm methodis used for training an initial K = 2 model. Then, in the so called VBgmmSplit, they proceed by sorting the obtained kernels and then trying to split them recursively. Each splitting consists of: Removing the original component. Replacing it by two kernels with the same covariance matrix as the original but with means placed in opposite directions along the maximum variabiability direction. 34/43
  • 35. Model Selection in VB (4) Kernel Splits (cont) Independently of the split strategy, the critical point of VBgmmSplit is the amount of splits needed until convergence. At each iteration of the latter algorithm the K current exisiting kernels are splited. Consider the case of any split is detected as proper (non-zero π after running the VB update described in the previous section, where each new kernel is considered as free). Then, the number of components increases and then a new set of splitting tests starts in the next iteration. This means that if the algorithm stops (all splits failed) with K kernels, the number of splits has been 1 + 2 + . . . + K = K (K + 1)/2. 35/43
  • 36. Model Selection in VB (5) EBVS Split We split only one kernel per iteration. In order to do so, we implement a selection criterion based on measuring the entropy of the kernels. If ones uses Leonenko’s estimator then there is no need of extrapolation as in EGs, and asymptotic consistence is ensured. Then, at each iteration of the algorithm we select the worse, in terms of low entropy, to be split. If the split is successful we will have K + 1 kernels to feed the VB optimization in the next iteration. Otherwise, there is no need to add a new kernel and the process converges to K kernels. The key question here is that the overall process is linear (one split per iteration). 36/43
  • 37. EBVS: Fast BV Figure: EBVS Results 37/43
  • 38. EBVS: Fast BV (2) Figure: EBVS Results (more) 38/43
  • 39. EBVS: Fast BV (3) MD Experiments With this approach using Leonenko’s estimator, the classification performance we obtain on this data set is 86%. Altough experiments in higher dimensions can be performed, when the number of samples is not high enough, the risk of unbounded maxima of the likelihood function is higher, due to singular covariance matrices. The entropy estimation method, however, performs very well with thousands of dimensions. 39/43
  • 40. Conclusions Summarizing Ideas in GMs In the multi-dimensional case, efficient entropy estimators become critical. In VB where model-order selection is implicit, it is possible to reduce the complexity at least by one order of magnitude. Can use the same approach for shapes in 2D and 3D. Once we have the mixtures, new measures for compare them are waiting to be discovered and used. Let’s do it! 40/43
  • 41. References [Goldberger et al.,03] Goldberger, J., Gordon, S., Greenspan, H (2003). An Efficient Image Similarity Measure Based on Approximations of KL-Divergence Between Two Gaussian Mixtures. ICCV’03 [Figueiredo and Jain, 02] Figueiredo, M. and Jain, A. (2002). Unsupervised learning of nite mixture models. IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 381399 [Constantinopoulos and Likas,07] Constantinopoulos, C. and Likas, A. (2007). Unsupervised Learning of Gaussian Mixtures based on Variational Component Splitting. IEEE Trans. Neural Networks, vol. 18., no. 3, 745–755. 41/43
  • 42. References (2) [Pe˜alver et al., 09] Pe˜alver, A., Escolano, F., S´ez, J.M. Learning n n a Gaussian Mixture Models with Entropy Based Criteria. IEEE Trans. on Neural Networks, 20(11) 1756–1771. [Dellaportas,06] Dellaportas, P. and Papageorgiou I. (2006). Multivariate mixtures of normals with unknown number of components. Statistics and Computing, vol. 16, no. 1, pp. 57–68 [Hero and Michel,02] Hero, A. and Michel, o. (2002). Applications of spanning entropic graphs. IEEE Signal Processing Magazine, vol. 19, no. 5, pp. 85–95 [Watanabe et al.,09] Watanabe, K., Akaho, S., Omachi, S.: Variational bayesian mixture model on a subspace of exponential family distributions. IEEE Transactions on Neural Networks 20(11) 1783–1796 42/43
  • 43. References (3) Escolano et al.,10] Escolano, F., Pe˜alver A. and Bonev, B. (2010). n Entropy-based Variational Scheme for Fast Bayes Learning of Gaussian Mixtures. SSPR’2010 (accepted) [Rajwadee et al.,09] Ajit Rajwade, Arunava Banerjee, Anand Rangarajan(2009). Probability Density Estimation Using Isocontours and Isosurfaces: Applications to Information-Theoretic Image Registration. IEEE Trans. Pattern Anal. Mach. Intell. 31(3): 475–491 [Chen et al.,10] Ting Chen, Baba C Vemuri, Anand Rangarajan, Stephan J Eisenschenk (2010). Group-wise Point-set registration using a novel CDF-based Havrda-Charv´t Divergence.Int J Comput a Vis. 86 (1):111-124 43/43