SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Expectation Maximization and
         Mixture of Gaussians




                            1
(bpm
                                                125)
 Recommend   me
                          Bpm
  some music!             90!
 Discover groups
  of similar songs…
                                                  Only my
                                                railgun (bpm
            Bach Sonata                              120)
            #1 (bpm 60)   My Music Collection




                                                2
(bpm
                                                 125)
 Recommend   me
  some music!
                                                     bpm
 Discover groups                                    120
  of similar songs…
                                                   Only my
                                                 railgun (bpm
            Bach Sonata                               120)
            #1 (bpm 60)    My Music Collection


                      bpm 60


                                                 3
An unsupervised classifying method




               4
1.    Initialize K
      “means” µk , one
      for each class        µ1

    Eg.  Use random
      starting points, or
  €   choose k random €                     µ2
      points from the set



                                 €K=2
                                        5
1       0
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new clusters

        €


                            6
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new clusters

        €
                        0   1




                        7
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        8
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        9
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        10
0        1
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new clusters

        €


                            11
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        12
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new clusters

        €


                        13
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        14
4.    When means do
      not change
      anymore 
      clustering DONE.




                         15
 InK-means, a point can only have 1 class
 But what about points that lie in between
  groups? eg. Jazz + Classical




                                        16
The Famous “GMM”:
Gaussian Mixture Model




              17
Mean

p(X) = N(X | µ,Σ)
                                   Variance


                    Gaussian ==
                     “Normal”
                    distribution




                                     18
p(X) = N(X | µ,Σ) + N(X | µ,Σ)




                         19
p(X) = N(X | µ1,Σ1 ) + N(X | µ2 ,Σ 2 )
Example:

                                      Variance




                                 20
p(X) = π 1N(X | µ1,Σ1 ) + π 2 N(X | µ2 ,Σ 2 )
                                          k
Example:
                            Mixing
                          Coefficient
                                         ∑π    k    =1
                                         k=1




                                 €



              π 1 = 0.7                 π 2 = 0.3
                                                   21
K
        p(X) = ∑ π k N(X | µk ,Σ k )
                k=1


    Example:

    K =2
€

€                                      22
 K-means     is a    Mixture of
 classifier            Gaussians is a
                       probability model
                      We can USE it as a
                       “soft” classifier




                                    23
 K-means     is a    Mixture of
 classifier            Gaussians is a
                       probability model
                      We can USE it as a
                       “soft” classifier




                                    24
 K-means      is a          Mixture of
  classifier                  Gaussians is a
                              probability model
                             We can USE it as a
                              “soft” classifier

Parameter to fit to data:   Parameters to fit to data:
    • Mean µk                   • Mean µk
                                • Covariance Σ k
                                • Mixing coefficient π k



€                            €                  25
                                  €
EM for GMM




             26
1.      Initialize means    µk                          1 0
      2.    E Step: Assign each point to a cluster
      3.    M Step: Given clusters, refine mean µk of each
            cluster k
4.      Stop when change in means is small
                 €
                                    €



                                                   27
1.      Initialize Gaussian* parameters: means µk ,
        covariances Σ k and mixing coefficients π k
      2.    E Step: Assign each point Xn an assignment
            score γ (znk ) for each cluster k            0.5 0.5
      3.    M Step: Given scores, adjust µk ,€ k ,Σ k
                                              π
            for€each cluster k                €
4.  Evaluate
  €             likelihood. If likelihood or
        parameters converge, stop.
                                € € €

       *There are k Gaussians


                                                    28
1.    Initialize µk , Σk
          π k , one for each
          Gaussian k
                 €                              π2         Σ2
        Tip!  Use K-means
€     €   result to initialize:                       µ2
          µk ← µk
           Σk ← cov(cluster(K)) €           €
           π k ← Number of pointspoints
                                  in k  €
                 Total number of

                                                 29

€
Latent variable
 2.    E Step: For each                                    .7    .3
       point Xn, determine
       its assignment score
       to each Gaussian k:




           is called a “responsibility”: how much is this Gaussian k
γ (znk )   responsible for this point Xn?
                                                                30
3.    M Step: For each
       Gaussian k, update
       parameters using
       new γ (znk )

                      Responsibility
                       for this Xn
Mean of Gaussian k
  €




Find the mean that “fits” the assignment scores best
                                             31
3.    M Step: For each
      Gaussian k, update
      parameters using
      new γ (znk )


Covariance matrix
 €
of Gaussian k




                           Just calculated this!
                                     32
3.    M Step: For each
      Gaussian k, update
      parameters using
      new γ (znk )


Mixing Coefficient
 €
                                   eg. 105.6/200
for Gaussian k



                      Total # of
                        points
                                          33
4.    Evaluate log likelihood. If likelihood or
      parameters converge, stop. Else go to Step
      2 (E step).




Likelihood is the probability that the data X
  was generated by the parameters you found.
  ie. Correctness!


                                           34
35
old              Hidden
1.      Initialize parameters   θ                   variables
                                          old
      2.    E Step: Evaluate p(Z | X,θ          )
      3.    M Step: Evaluate                         Observed
                                                     variables


                     €
                 €                                              Likelihood
             where




4.      Evaluate log likelihood. If likelihood or
        parameters converge, stop. Else θ old ← θ new
        and go to E Step.
                                                        36
 K-means  can be formulated as EM
 EM for Gaussian Mixtures
 EM for Bernoulli Mixtures

 EM for Bayesian Linear Regression




                                      37
 “Expectation”
Calculated the fixed, data-dependent
  parameters of the function Q.
 “Maximization”
Once the parameters of Q are known, it is fully
  determined, so now we can maximize Q.




                                         38
 We  learned how to cluster data in an
  unsupervised manner
 Gaussian Mixture Models are useful for
  modeling data with “soft” cluster
  assignments
 Expectation Maximization is a method used
  when we have a model with latent variables
  (values we don’t know, but estimate with
  each step)                                   0.5 0.5




                                       39
 Myquestion: What other applications could
 use EM? How about EM of GMMs?
                                       40

Weitere ähnliche Inhalte

Was ist angesagt?

Random forest
Random forestRandom forest
Random forest
Ujjawal
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Models
guestfee8698
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
kylin
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
Reza Ramezani
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognition
zukun
 

Was ist angesagt? (20)

Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Random forest
Random forestRandom forest
Random forest
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Models
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
4. Linear Algebra for Machine Learning: Eigenvalues, Eigenvectors and Diagona...
4. Linear Algebra for Machine Learning: Eigenvalues, Eigenvectors and Diagona...4. Linear Algebra for Machine Learning: Eigenvalues, Eigenvectors and Diagona...
4. Linear Algebra for Machine Learning: Eigenvalues, Eigenvectors and Diagona...
 
Kernel estimation(ref)
Kernel estimation(ref)Kernel estimation(ref)
Kernel estimation(ref)
 
Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognition
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Hierarchical clustering.pptx
Hierarchical clustering.pptxHierarchical clustering.pptx
Hierarchical clustering.pptx
 

Ähnlich wie Expectation Maximization and Gaussian Mixture Models

The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
Colm Connaughton
 
2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian
nozomuhamada
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 
Cluster-cluster aggregation with (complete) collisional fragmentation
Cluster-cluster aggregation with (complete) collisional fragmentationCluster-cluster aggregation with (complete) collisional fragmentation
Cluster-cluster aggregation with (complete) collisional fragmentation
Colm Connaughton
 
Quantization
QuantizationQuantization
Quantization
wtyru1989
 
Cluster aggregation with complete collisional fragmentation
Cluster aggregation with complete collisional fragmentationCluster aggregation with complete collisional fragmentation
Cluster aggregation with complete collisional fragmentation
Colm Connaughton
 
Diffraction,unit 2
Diffraction,unit  2Diffraction,unit  2
Diffraction,unit 2
Kumar
 

Ähnlich wie Expectation Maximization and Gaussian Mixture Models (16)

The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
 
Manuscript 1334
Manuscript 1334Manuscript 1334
Manuscript 1334
 
Manuscript 1334-1
Manuscript 1334-1Manuscript 1334-1
Manuscript 1334-1
 
2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloMonte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
 
Ordinary abelian varieties having small embedding degree
Ordinary abelian varieties having small embedding degreeOrdinary abelian varieties having small embedding degree
Ordinary abelian varieties having small embedding degree
 
How to design a linear control system
How to design a linear control systemHow to design a linear control system
How to design a linear control system
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Cluster-cluster aggregation with (complete) collisional fragmentation
Cluster-cluster aggregation with (complete) collisional fragmentationCluster-cluster aggregation with (complete) collisional fragmentation
Cluster-cluster aggregation with (complete) collisional fragmentation
 
Color Coding-Related Techniques
Color Coding-Related TechniquesColor Coding-Related Techniques
Color Coding-Related Techniques
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Stochastic Approximation and Simulated Annealing
Stochastic Approximation and Simulated AnnealingStochastic Approximation and Simulated Annealing
Stochastic Approximation and Simulated Annealing
 
Quantization
QuantizationQuantization
Quantization
 
Cluster aggregation with complete collisional fragmentation
Cluster aggregation with complete collisional fragmentationCluster aggregation with complete collisional fragmentation
Cluster aggregation with complete collisional fragmentation
 
Diffraction,unit 2
Diffraction,unit  2Diffraction,unit  2
Diffraction,unit 2
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Expectation Maximization and Gaussian Mixture Models

  • 1. Expectation Maximization and Mixture of Gaussians 1
  • 2. (bpm 125)  Recommend me Bpm some music! 90!  Discover groups of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection 2
  • 3. (bpm 125)  Recommend me some music! bpm  Discover groups 120 of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection bpm 60 3
  • 5. 1.  Initialize K “means” µk , one for each class µ1   Eg. Use random starting points, or € choose k random € µ2 points from the set €K=2 5
  • 6. 1 0 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 6
  • 7. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 0 1 7
  • 8. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 8
  • 9. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 9
  • 10. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 10
  • 11. 0 1 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 11
  • 12. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 12
  • 13. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 13
  • 14. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 14
  • 15. 4.  When means do not change anymore  clustering DONE. 15
  • 16.  InK-means, a point can only have 1 class  But what about points that lie in between groups? eg. Jazz + Classical 16
  • 17. The Famous “GMM”: Gaussian Mixture Model 17
  • 18. Mean p(X) = N(X | µ,Σ) Variance Gaussian == “Normal” distribution 18
  • 19. p(X) = N(X | µ,Σ) + N(X | µ,Σ) 19
  • 20. p(X) = N(X | µ1,Σ1 ) + N(X | µ2 ,Σ 2 ) Example: Variance 20
  • 21. p(X) = π 1N(X | µ1,Σ1 ) + π 2 N(X | µ2 ,Σ 2 ) k Example: Mixing Coefficient ∑π k =1 k=1 € π 1 = 0.7 π 2 = 0.3 21
  • 22. K p(X) = ∑ π k N(X | µk ,Σ k ) k=1 Example: K =2 € € 22
  • 23.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 23
  • 24.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 24
  • 25.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier Parameter to fit to data: Parameters to fit to data: • Mean µk • Mean µk • Covariance Σ k • Mixing coefficient π k € € 25 €
  • 27. 1.  Initialize means µk 1 0 2.  E Step: Assign each point to a cluster 3.  M Step: Given clusters, refine mean µk of each cluster k 4.  Stop when change in means is small € € 27
  • 28. 1.  Initialize Gaussian* parameters: means µk , covariances Σ k and mixing coefficients π k 2.  E Step: Assign each point Xn an assignment score γ (znk ) for each cluster k 0.5 0.5 3.  M Step: Given scores, adjust µk ,€ k ,Σ k π for€each cluster k € 4.  Evaluate € likelihood. If likelihood or parameters converge, stop. € € € *There are k Gaussians 28
  • 29. 1.  Initialize µk , Σk π k , one for each Gaussian k € π2 Σ2   Tip! Use K-means € € result to initialize: µ2 µk ← µk Σk ← cov(cluster(K)) € € π k ← Number of pointspoints in k € Total number of 29 €
  • 30. Latent variable 2.  E Step: For each .7 .3 point Xn, determine its assignment score to each Gaussian k: is called a “responsibility”: how much is this Gaussian k γ (znk ) responsible for this point Xn? 30
  • 31. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Responsibility for this Xn Mean of Gaussian k € Find the mean that “fits” the assignment scores best 31
  • 32. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Covariance matrix € of Gaussian k Just calculated this! 32
  • 33. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Mixing Coefficient € eg. 105.6/200 for Gaussian k Total # of points 33
  • 34. 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else go to Step 2 (E step). Likelihood is the probability that the data X was generated by the parameters you found. ie. Correctness! 34
  • 35. 35
  • 36. old Hidden 1.  Initialize parameters θ variables old 2.  E Step: Evaluate p(Z | X,θ ) 3.  M Step: Evaluate Observed variables € € Likelihood where 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else θ old ← θ new and go to E Step. 36
  • 37.  K-means can be formulated as EM  EM for Gaussian Mixtures  EM for Bernoulli Mixtures  EM for Bayesian Linear Regression 37
  • 38.  “Expectation” Calculated the fixed, data-dependent parameters of the function Q.  “Maximization” Once the parameters of Q are known, it is fully determined, so now we can maximize Q. 38
  • 39.  We learned how to cluster data in an unsupervised manner  Gaussian Mixture Models are useful for modeling data with “soft” cluster assignments  Expectation Maximization is a method used when we have a model with latent variables (values we don’t know, but estimate with each step) 0.5 0.5 39
  • 40.  Myquestion: What other applications could use EM? How about EM of GMMs? 40