SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Outline




            Machine Learning

               Devdatt Dubhashi
Department of Computer Science and Engineering
              Chalmers University
             Gothenburg, Sweden.



                   LP3 2007




                Dubhashi   Machine Learrning
Outline


Outline




  1   k-Means Clustering



  2   Mixtures of Gaussians and EM Algorithm




                           Dubhashi   Machine Learrning
Outline


Outline




  1   k-Means Clustering



  2   Mixtures of Gaussians and EM Algorithm




                           Dubhashi   Machine Learrning
k-means
                       mix gaussians


Clustering




     Data set {x 1 , · · · , x N } of N observations of a random
     d-dim eculidean variable x.
     Goal is to partition the data set ito K clusters (K known).
     Intuitively, the points within a cluster must be “close” to
     each other compared to pints outside the cluster.




                           Dubhashi    Machine Learrning
k-means
                       mix gaussians


Cluster centers and assignments




     Find a set of centers µk , k ∈ [K ]
     Assign each data point to one of the centers so as to
     minimize the sum of the squares of the distances to the
     assigned centers.




                           Dubhashi    Machine Learrning
k-means
                      mix gaussians


Assignment and Distortion



     Introduce binary indicator variables

                           1, if xn is asssigned to µk
                rn,k :=
                           0, otherwise

     Minimize the distortion measure

                   J :=                  rn,k ||xn − µk ||2 .
                          n∈[N] k∈[K ]




                          Dubhashi       Machine Learrning
k-means
                         mix gaussians


Two Step Optimization




  Start with some initial values of µk . Basic iteration consists of
  two steps until convergence.
            M Minimize J wrt rn,k keeping µk fixed.
             E Minimize J wrt µk keeping rn,k fixed.




                             Dubhashi    Machine Learrning
k-means
                        mix gaussians


Two Step Optimization: M Step




  Minimize J wrt rn,k keeping µk fixed:

                        1, if k = argminj ||xn − µk ||2
              rn,k :=
                        0, otherwise.




                            Dubhashi    Machine Learrning
k-means
                        mix gaussians


Two Step Optimization: E Step


  Minimize J wrt µk keeping rn,k fixed: J is a quadratic function of
  µk , so setting derivative to zero gives:

                              rn,k (x n − µk ) = 0,
                      n∈[N]

  hence,
                                        n rn,k x n
                           µk =                      .
                                         n rn,k
  In words: set µk to be the mean of the points assigned to
  cluster k, hence called K-means Algorithms.



                            Dubhashi      Machine Learrning
k-means
                     mix gaussians


K-Means Algorithm Analysis




     Since J decreses at each iteration, convergence
     guaranteesd.
     But may converge to a local rather than a global optimum.




                         Dubhashi    Machine Learrning
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (a)                  2     (b)



   0                        0



  −2                       −2

       −2    0     2              −2              0        2




                       Dubhashi        Machine Learrning
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (c)                  2     (d)



   0                        0



  −2                       −2

       −2    0     2              −2              0        2




                       Dubhashi        Machine Learrning
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (e)                  2     (f)



   0                        0



  −2                       −2

       −2    0     2              −2               0        2




                       Dubhashi         Machine Learrning
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (g)                  2     (h)



   0                        0



  −2                       −2

       −2    0     2              −2              0        2




                       Dubhashi        Machine Learrning
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (i)



   0



  −2

       −2    0     2




                       Dubhashi   Machine Learrning
k-means
                      mix gaussians


K-Means and Image Segmentation



     Image Segmentation problem: partition image into regions
     of homogeneosu visual appearance, corresponding to
     objects or parts of objects.
     Each pixel is a 3-dim point corresponding to intensities of
     red, blue and green channels.
     perform K-means and redraw image replacing ecah pixel
     by the corresponding center µk .




                          Dubhashi    Machine Learrning
k-means
                mix gaussians


K-Means Algorithm: Example

          ¤¢ 
         £ ¡                          ¤¢ 
                                     £ ¡




                    Dubhashi    Machine Learrning
k-means
                mix gaussians


K-Means Algorithm: Example

        ¦¤¢ 
        ¥£ ¡                    Original image




                    Dubhashi    Machine Learrning
k-means
                mix gaussians


K-Means Algorithm: Example




                    Dubhashi    Machine Learrning
k-means
                mix gaussians


K-Means Algorithm: Example




                    Dubhashi    Machine Learrning
k-means
                       mix gaussians


K-Means and Data Compression


      Lossy as opposed to lossless compression where we
      accept some errors in recontruction in return for higher rate
      of compression.
      Instead of storing all the N data pointsm store only the
      identity of the assigned cluster, and the cluster centers.
      Significant savings provided K << N.
      Each data point approximated by nearest center µk :
      code-book vectors.
      New data compressed by finding nearest center and
      storing only the label k of corresponding cluster.
  Scheme called Vector Quantization.


                           Dubhashi    Machine Learrning
k-means
                      mix gaussians


K-Means and Data Compression: Example

     Suppose original image has N pixels comprising {R, G, B}
     values which are stored with 8 bits precision. Then total
     space required is 24N bits.
     Instead if we first do K -means and transmit only label of
     corresponding cluster for ecah pixel, this takes log K bits
     per pixel for a total of N log K bits.
     Also need to transmit the K code–book vectors which
     needs 24K bits.
     In example, original image has 240 × 180 = 43, 200 pixels,
     requring 24 × 43, 200 = 1, 036, 800 pixels.
     Compressed images require 43, 248 (K = 2), 86, 472
     (K = 3) and 173, 040 (K = 10) bits.


                          Dubhashi    Machine Learrning
k-means
                      mix gaussians


Mixtures of Gaussians: Motivation




     Pure Gaussian distributions have limitations when it comes
     to modelling real life data.
     Example: “Olfd Faithful” eruption durations.
     Forms two dominant clumps
     Single Gaussian can’t model this data well
     Linear superposition of two Gaussians does much better.




                          Dubhashi    Machine Learrning
k-means
                         mix gaussians


Old Faithful Eruptions


   100                                   100


    80                                    80


    60                                    60


    40                                    40
         1   2   3   4      5      6           1   2      3    4   5   6




                             Dubhashi      Machine Learrning
k-means
                      mix gaussians


Mixtures of Gaussians: Modelling


 Linear combination of Gaussians
 can give rise to complex        p(x)
 distributions.
 By using a suffieicnt number of
 Gaussians and adjusting their
 means and covariannces, as well
 as linear combination
 coefiicients, can model almost
                                                          x
 any continuous density to
 arbitrary accuracy.




                          Dubhashi    Machine Learrning
k-means
                     mix gaussians


Mixtures of Gaussians: Definition



     Superpositon of Gaussians of the form

                   p(x) :=            πk N (x | µk , Σk ).
                             k∈[K ]

     Each Gaussian density N (x | µk , Σk ) is a component of
     the mixture with its own mean and covariance.
     Parameters πk are mixing coefficients and satisfy
     0 ≤ πk ≤ 1 and k πk = 1.




                         Dubhashi      Machine Learrning
k-means
                                 mix gaussians


Mixtures of Gaussians: Definition



    1                                             1
         (a)                                              (b)



   0.5                              0.2          0.5


               0.5         0.3
    0                                             0

         0           0.5              1                    0               0.5   1




                                     Dubhashi          Machine Learrning
k-means
                                 mix gaussians


Mixtures of Gaussians: Definition



    1
         (a)



   0.5                              0.2


               0.5         0.3
    0

         0           0.5              1




                                     Dubhashi    Machine Learrning
k-means
                      mix gaussians


Equivalent Definition: Latent Variable
     Can introduce a latent variable z which is such that exactly
     one component is 1 and the rest are zeros, with
     p(zk = 1) = πk . This variable gives the component.
     Given z, the conditional distribution is
                    p(x | zk = 1) = N (x | µk , Σk ).
     Inverting this, using Baye’s rule
                γ(zk ) := p(zk = 1 | x)
                           p(zk = 1)p(x | zk = 1)
                        =
                            j p(zj = 1)p(x | zj = 1)
                                πk N (x | µk , Σk )
                        =
                                 j πj N (x | µj , Σj )

     is the posterior probability or responsibility that component
     k takes for observation x.
                          Dubhashi    Machine Learrning
k-means
                     mix gaussians


Mixtures and Responsibilities



    1                                 1
         (a)                                  (b)



   0.5                               0.5



    0                                 0

         0     0.5        1                    0               0.5   1




                         Dubhashi          Machine Learrning
k-means
                     mix gaussians


Mixtures and Responsibilities



    1                                 1
         (b)                                  (c)



   0.5                               0.5



    0                                 0

         0     0.5        1                    0               0.5   1




                         Dubhashi          Machine Learrning
k-means
                      mix gaussians


Learning Mixtures




     Suppose we have a data set of observations represented
     by a N × D matrix X := {x 1 , · · · , x N } and we want to
     model it as a mixture of K Gaussians.
     Need to find mixing coefficients πk , and parameters of
     component models, µk and Σk .




                          Dubhashi    Machine Learrning
k-means
                       mix gaussians


Learning Mixtures: The Means

     Start with the loglikelihood function:
                                                                      

         ln p(X | πµ, Σ) =             ln          πk N (x n | µk , Σk )
                              n∈[N]        k∈[K ]

     Setting derivative wrt µk to zero, and assuming Σ is
     invertible gives:

                                1
                        µk =                   γ(zn,k )x n ,
                                Nk
                                       n∈[N]

     where Nk :=     n∈[N] γ(zn,k ).



                           Dubhashi      Machine Learrning
k-means
                      mix gaussians


Learning Mixtures: The Means



     Interpret Nk as the “effective number of points” assigned to
     cluster k .
     Note that the mean µk for the kth Gaussian component is
     given by a weighted mean of all points in the data set
     The weighting factor for data point x n is given by the
     posterior probability or responsibilty of component k for
     generating x n .




                          Dubhashi    Machine Learrning
k-means
                         mix gaussians


Learning Mixtures: The Covariances



     Setting derivative wrt Σk to zero, and assuming Σ is
     invertible gives:

                    1
             σk =                 γ(zn,k )(x n − µk )(x n − µk )T .
                    Nk
                          n∈[N]

     which is same as sigle Gaussian solution but with a
     avergae weighted by the corresponding posterior
     probability.




                             Dubhashi    Machine Learrning
k-means
                      mix gaussians


Learning Mixtures: Mixing Coefficients



     Setting derivative wrt πk to zero, and taking into account
     that k πk = 1 (lagrange multipliers!)

                                         Nk
                                  πk =      .
                                         N
     The mixing coefficient for the kth componet is the average
     responsibilility that the component takes for explaining the
     data set.




                          Dubhashi    Machine Learrning
k-means
                          mix gaussians


Learning Mixtures: EM Algorithm
   1   Initialize means, covars and mix coeffs and repeat:
   2   E Step: Evaluate responsibilities using current parameters:
                                           πk N (x n | µk , Σk )
                     γ(zn,k ) =
                                            j πj N (x n | µj , Σj )
   3   M Step: Re-estimate parameters using current
       responsibilities:
                                1
                         µnew =
                          k          γ(zn,k )x n
                                Nk n
                     1
            Σnew =
             k                   γ(zn,k )(x n − µnew )(x n − µnew )T .
                                                 k            k
                     Nk     n

                                      new          Nk
                                     πk =             ,
                                                   N
       where Nk :=    n   γ(zn,k ).
                                Dubhashi       Machine Learrning
k-means
                 mix gaussians


EM Algorithm: Example



   2                              2




   0                              0




  −2                             −2

       −2   0   (a)     2             −2                  0   (b)   2




                      Dubhashi        Machine Learrning
k-means
                    mix gaussians


EM Algorithm: Example



   2     ¤¢ 
        £ ¡                          2      ¤¢ 
                                           £ ¡

   0                                 0




  −2                                −2

       −2      0   (c)     2             −2                  0   (d)   2




                         Dubhashi        Machine Learrning
k-means
                    mix gaussians


EM Algorithm: Example



   2     ¤¢ 
        £ ¡                          2     ¦¤¢ 
                                           ¥£ ¡

   0                                 0




  −2                                −2

       −2      0   (e)     2             −2                  0   (f)   2




                         Dubhashi        Machine Learrning
k-means
                     mix gaussians


EM vs K-Means




    K-means performs a hard assignment of data points to
    clusters i.e. each data point is assigned to a unique cluster.
    EM algorithm makes a soft assignment based on posterior
    probabilities.
    K-means can be derived as the limit of the EM algorithm
    assigned to a particular instance of Gaussian mixtures.




                         Dubhashi    Machine Learrning

Weitere ähnliche Inhalte

Was ist angesagt?

CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtureszukun
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel LearningMasahiro Suzuki
 
Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersTaiji Suzuki
 
study Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large Imagesstudy Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large ImagesChiamin Hsu
 
Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...
Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...
Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...Emre Barlas
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
Summary of "A Universally-Truthful Approximation Scheme for Multi-unit Auction"
Summary of "A Universally-Truthful Approximation Scheme for Multi-unit Auction"Summary of "A Universally-Truthful Approximation Scheme for Multi-unit Auction"
Summary of "A Universally-Truthful Approximation Scheme for Multi-unit Auction"Thatchaphol Saranurak
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Electromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
Electromagnetic analysis using FDTD method, Biological Effects Of EM SpectrumElectromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
Electromagnetic analysis using FDTD method, Biological Effects Of EM SpectrumSuleyman Demirel University
 
Detection of unknown signal
Detection of unknown signalDetection of unknown signal
Detection of unknown signalsumitf1
 
Brief summary of signals
Brief summary of signalsBrief summary of signals
Brief summary of signalsaroosa khan
 

Was ist angesagt? (20)

CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
Fdtd
FdtdFdtd
Fdtd
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
 
Advances in Directed Spanners
Advances in Directed SpannersAdvances in Directed Spanners
Advances in Directed Spanners
 
study Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large Imagesstudy Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large Images
 
Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...
Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...
Development of a Pseudo-Spectral 3D Navier Stokes Solver for Wind Turbine App...
 
FDTD Presentation
FDTD PresentationFDTD Presentation
FDTD Presentation
 
Unit ii
Unit iiUnit ii
Unit ii
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Image transforms
Image transformsImage transforms
Image transforms
 
E33018021
E33018021E33018021
E33018021
 
Ma2520962099
Ma2520962099Ma2520962099
Ma2520962099
 
Summary of "A Universally-Truthful Approximation Scheme for Multi-unit Auction"
Summary of "A Universally-Truthful Approximation Scheme for Multi-unit Auction"Summary of "A Universally-Truthful Approximation Scheme for Multi-unit Auction"
Summary of "A Universally-Truthful Approximation Scheme for Multi-unit Auction"
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Electromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
Electromagnetic analysis using FDTD method, Biological Effects Of EM SpectrumElectromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
Electromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
 
Detection of unknown signal
Detection of unknown signalDetection of unknown signal
Detection of unknown signal
 
PIMRC 2012
PIMRC 2012PIMRC 2012
PIMRC 2012
 
Brief summary of signals
Brief summary of signalsBrief summary of signals
Brief summary of signals
 

Ähnlich wie Machine Learning

2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussiannozomuhamada
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Machine learning (7)
Machine learning (7)Machine learning (7)
Machine learning (7)NYversity
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7aVuTran231
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 3
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 3ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 3
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 3tingyuansenastro
 
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...Ashley Smith
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Cheng Feng
 
Optimization Techniques
Optimization TechniquesOptimization Techniques
Optimization TechniquesAjay Bidyarthy
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Matrix multiplicationdesign
Matrix multiplicationdesignMatrix multiplicationdesign
Matrix multiplicationdesignRespa Peter
 
Clustering Theory
Clustering TheoryClustering Theory
Clustering TheorySSA KPI
 
Low-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamicsLow-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamicsAlexander Litvinenko
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Meanstthonet
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
matrixMultiplication
matrixMultiplicationmatrixMultiplication
matrixMultiplicationCNP Slagle
 

Ähnlich wie Machine Learning (20)

2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Machine learning (7)
Machine learning (7)Machine learning (7)
Machine learning (7)
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7a
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 3
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 3ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 3
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 3
 
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01
 
Optimization Techniques
Optimization TechniquesOptimization Techniques
Optimization Techniques
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Subquad multi ff
Subquad multi ffSubquad multi ff
Subquad multi ff
 
Matrix multiplicationdesign
Matrix multiplicationdesignMatrix multiplicationdesign
Matrix multiplicationdesign
 
Clustering Theory
Clustering TheoryClustering Theory
Clustering Theory
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
 
Low-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamicsLow-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamics
 
metode iterasi Gauss seidel
metode iterasi Gauss seidelmetode iterasi Gauss seidel
metode iterasi Gauss seidel
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
matrixMultiplication
matrixMultiplicationmatrixMultiplication
matrixMultiplication
 
Presentation slides
Presentation slidesPresentation slides
Presentation slides
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Machine Learning

  • 1. Outline Machine Learning Devdatt Dubhashi Department of Computer Science and Engineering Chalmers University Gothenburg, Sweden. LP3 2007 Dubhashi Machine Learrning
  • 2. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
  • 3. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
  • 4. k-means mix gaussians Clustering Data set {x 1 , · · · , x N } of N observations of a random d-dim eculidean variable x. Goal is to partition the data set ito K clusters (K known). Intuitively, the points within a cluster must be “close” to each other compared to pints outside the cluster. Dubhashi Machine Learrning
  • 5. k-means mix gaussians Cluster centers and assignments Find a set of centers µk , k ∈ [K ] Assign each data point to one of the centers so as to minimize the sum of the squares of the distances to the assigned centers. Dubhashi Machine Learrning
  • 6. k-means mix gaussians Assignment and Distortion Introduce binary indicator variables 1, if xn is asssigned to µk rn,k := 0, otherwise Minimize the distortion measure J := rn,k ||xn − µk ||2 . n∈[N] k∈[K ] Dubhashi Machine Learrning
  • 7. k-means mix gaussians Two Step Optimization Start with some initial values of µk . Basic iteration consists of two steps until convergence. M Minimize J wrt rn,k keeping µk fixed. E Minimize J wrt µk keeping rn,k fixed. Dubhashi Machine Learrning
  • 8. k-means mix gaussians Two Step Optimization: M Step Minimize J wrt rn,k keeping µk fixed: 1, if k = argminj ||xn − µk ||2 rn,k := 0, otherwise. Dubhashi Machine Learrning
  • 9. k-means mix gaussians Two Step Optimization: E Step Minimize J wrt µk keeping rn,k fixed: J is a quadratic function of µk , so setting derivative to zero gives: rn,k (x n − µk ) = 0, n∈[N] hence, n rn,k x n µk = . n rn,k In words: set µk to be the mean of the points assigned to cluster k, hence called K-means Algorithms. Dubhashi Machine Learrning
  • 10. k-means mix gaussians K-Means Algorithm Analysis Since J decreses at each iteration, convergence guaranteesd. But may converge to a local rather than a global optimum. Dubhashi Machine Learrning
  • 11. k-means mix gaussians K-Means Algorithm: Example 2 (a) 2 (b) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  • 12. k-means mix gaussians K-Means Algorithm: Example 2 (c) 2 (d) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  • 13. k-means mix gaussians K-Means Algorithm: Example 2 (e) 2 (f) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  • 14. k-means mix gaussians K-Means Algorithm: Example 2 (g) 2 (h) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  • 15. k-means mix gaussians K-Means Algorithm: Example 2 (i) 0 −2 −2 0 2 Dubhashi Machine Learrning
  • 16. k-means mix gaussians K-Means and Image Segmentation Image Segmentation problem: partition image into regions of homogeneosu visual appearance, corresponding to objects or parts of objects. Each pixel is a 3-dim point corresponding to intensities of red, blue and green channels. perform K-means and redraw image replacing ecah pixel by the corresponding center µk . Dubhashi Machine Learrning
  • 17. k-means mix gaussians K-Means Algorithm: Example ¤¢  £ ¡ ¤¢  £ ¡ Dubhashi Machine Learrning
  • 18. k-means mix gaussians K-Means Algorithm: Example ¦¤¢  ¥£ ¡ Original image Dubhashi Machine Learrning
  • 19. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
  • 20. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
  • 21. k-means mix gaussians K-Means and Data Compression Lossy as opposed to lossless compression where we accept some errors in recontruction in return for higher rate of compression. Instead of storing all the N data pointsm store only the identity of the assigned cluster, and the cluster centers. Significant savings provided K << N. Each data point approximated by nearest center µk : code-book vectors. New data compressed by finding nearest center and storing only the label k of corresponding cluster. Scheme called Vector Quantization. Dubhashi Machine Learrning
  • 22. k-means mix gaussians K-Means and Data Compression: Example Suppose original image has N pixels comprising {R, G, B} values which are stored with 8 bits precision. Then total space required is 24N bits. Instead if we first do K -means and transmit only label of corresponding cluster for ecah pixel, this takes log K bits per pixel for a total of N log K bits. Also need to transmit the K code–book vectors which needs 24K bits. In example, original image has 240 × 180 = 43, 200 pixels, requring 24 × 43, 200 = 1, 036, 800 pixels. Compressed images require 43, 248 (K = 2), 86, 472 (K = 3) and 173, 040 (K = 10) bits. Dubhashi Machine Learrning
  • 23. k-means mix gaussians Mixtures of Gaussians: Motivation Pure Gaussian distributions have limitations when it comes to modelling real life data. Example: “Olfd Faithful” eruption durations. Forms two dominant clumps Single Gaussian can’t model this data well Linear superposition of two Gaussians does much better. Dubhashi Machine Learrning
  • 24. k-means mix gaussians Old Faithful Eruptions 100 100 80 80 60 60 40 40 1 2 3 4 5 6 1 2 3 4 5 6 Dubhashi Machine Learrning
  • 25. k-means mix gaussians Mixtures of Gaussians: Modelling Linear combination of Gaussians can give rise to complex p(x) distributions. By using a suffieicnt number of Gaussians and adjusting their means and covariannces, as well as linear combination coefiicients, can model almost x any continuous density to arbitrary accuracy. Dubhashi Machine Learrning
  • 26. k-means mix gaussians Mixtures of Gaussians: Definition Superpositon of Gaussians of the form p(x) := πk N (x | µk , Σk ). k∈[K ] Each Gaussian density N (x | µk , Σk ) is a component of the mixture with its own mean and covariance. Parameters πk are mixing coefficients and satisfy 0 ≤ πk ≤ 1 and k πk = 1. Dubhashi Machine Learrning
  • 27. k-means mix gaussians Mixtures of Gaussians: Definition 1 1 (a) (b) 0.5 0.2 0.5 0.5 0.3 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
  • 28. k-means mix gaussians Mixtures of Gaussians: Definition 1 (a) 0.5 0.2 0.5 0.3 0 0 0.5 1 Dubhashi Machine Learrning
  • 29. k-means mix gaussians Equivalent Definition: Latent Variable Can introduce a latent variable z which is such that exactly one component is 1 and the rest are zeros, with p(zk = 1) = πk . This variable gives the component. Given z, the conditional distribution is p(x | zk = 1) = N (x | µk , Σk ). Inverting this, using Baye’s rule γ(zk ) := p(zk = 1 | x) p(zk = 1)p(x | zk = 1) = j p(zj = 1)p(x | zj = 1) πk N (x | µk , Σk ) = j πj N (x | µj , Σj ) is the posterior probability or responsibility that component k takes for observation x. Dubhashi Machine Learrning
  • 30. k-means mix gaussians Mixtures and Responsibilities 1 1 (a) (b) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
  • 31. k-means mix gaussians Mixtures and Responsibilities 1 1 (b) (c) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
  • 32. k-means mix gaussians Learning Mixtures Suppose we have a data set of observations represented by a N × D matrix X := {x 1 , · · · , x N } and we want to model it as a mixture of K Gaussians. Need to find mixing coefficients πk , and parameters of component models, µk and Σk . Dubhashi Machine Learrning
  • 33. k-means mix gaussians Learning Mixtures: The Means Start with the loglikelihood function:   ln p(X | πµ, Σ) = ln  πk N (x n | µk , Σk ) n∈[N] k∈[K ] Setting derivative wrt µk to zero, and assuming Σ is invertible gives: 1 µk = γ(zn,k )x n , Nk n∈[N] where Nk := n∈[N] γ(zn,k ). Dubhashi Machine Learrning
  • 34. k-means mix gaussians Learning Mixtures: The Means Interpret Nk as the “effective number of points” assigned to cluster k . Note that the mean µk for the kth Gaussian component is given by a weighted mean of all points in the data set The weighting factor for data point x n is given by the posterior probability or responsibilty of component k for generating x n . Dubhashi Machine Learrning
  • 35. k-means mix gaussians Learning Mixtures: The Covariances Setting derivative wrt Σk to zero, and assuming Σ is invertible gives: 1 σk = γ(zn,k )(x n − µk )(x n − µk )T . Nk n∈[N] which is same as sigle Gaussian solution but with a avergae weighted by the corresponding posterior probability. Dubhashi Machine Learrning
  • 36. k-means mix gaussians Learning Mixtures: Mixing Coefficients Setting derivative wrt πk to zero, and taking into account that k πk = 1 (lagrange multipliers!) Nk πk = . N The mixing coefficient for the kth componet is the average responsibilility that the component takes for explaining the data set. Dubhashi Machine Learrning
  • 37. k-means mix gaussians Learning Mixtures: EM Algorithm 1 Initialize means, covars and mix coeffs and repeat: 2 E Step: Evaluate responsibilities using current parameters: πk N (x n | µk , Σk ) γ(zn,k ) = j πj N (x n | µj , Σj ) 3 M Step: Re-estimate parameters using current responsibilities: 1 µnew = k γ(zn,k )x n Nk n 1 Σnew = k γ(zn,k )(x n − µnew )(x n − µnew )T . k k Nk n new Nk πk = , N where Nk := n γ(zn,k ). Dubhashi Machine Learrning
  • 38. k-means mix gaussians EM Algorithm: Example 2 2 0 0 −2 −2 −2 0 (a) 2 −2 0 (b) 2 Dubhashi Machine Learrning
  • 39. k-means mix gaussians EM Algorithm: Example 2 ¤¢  £ ¡ 2 ¤¢  £ ¡ 0 0 −2 −2 −2 0 (c) 2 −2 0 (d) 2 Dubhashi Machine Learrning
  • 40. k-means mix gaussians EM Algorithm: Example 2 ¤¢  £ ¡ 2 ¦¤¢  ¥£ ¡ 0 0 −2 −2 −2 0 (e) 2 −2 0 (f) 2 Dubhashi Machine Learrning
  • 41. k-means mix gaussians EM vs K-Means K-means performs a hard assignment of data points to clusters i.e. each data point is assigned to a unique cluster. EM algorithm makes a soft assignment based on posterior probabilities. K-means can be derived as the limit of the EM algorithm assigned to a particular instance of Gaussian mixtures. Dubhashi Machine Learrning