SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Expectation Maximization and
         Mixture of Gaussians




                            1
(bpm
                                                125)
 Recommend   me
                          Bpm
  some music!             90!
 Discover groups
  of similar songs…
                                                  Only my
                                                railgun (bpm
            Bach Sonata                              120)
            #1 (bpm 60)   My Music Collection




                                                2
(bpm
                                                 125)
 Recommend   me
  some music!
                                                     bpm
 Discover groups                                    120
  of similar songs…
                                                   Only my
                                                 railgun (bpm
            Bach Sonata                               120)
            #1 (bpm 60)    My Music Collection


                      bpm 60


                                                 3
An unsupervised classifying method




               4
1.    Initialize K
      “means” µk , one
      for each class        µ1

    Eg.  Use random
      starting points, or
  €   choose k random €                     µ2
      points from the set



                                 €K=2
                                        5
1       0
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new clusters

        €


                            6
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new clusters

        €
                        0   1




                        7
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        8
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        9
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        10
0        1
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new clusters

        €


                            11
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        12
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new clusters

        €


                        13
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters




                        14
4.    When means do
      not change
      anymore 
      clustering DONE.




                         15
 InK-means, a point can only have 1 class
 But what about points that lie in between
  groups? eg. Jazz + Classical




                                        16
The Famous “GMM”:
Gaussian Mixture Model




              17
Mean

p(X) = N(X | µ,Σ)
                                   Variance


                    Gaussian ==
                     “Normal”
                    distribution




                                     18
p(X) = N(X | µ,Σ) + N(X | µ,Σ)




                         19
p(X) = N(X | µ1,Σ1 ) + N(X | µ2 ,Σ 2 )
Example:

                                      Variance




                                 20
p(X) = π 1N(X | µ1,Σ1 ) + π 2 N(X | µ2 ,Σ 2 )
                                          k
Example:
                            Mixing
                          Coefficient
                                         ∑π    k    =1
                                         k=1




                                 €



              π 1 = 0.7                 π 2 = 0.3
                                                   21
K
        p(X) = ∑ π k N(X | µk ,Σ k )
                k=1


    Example:

    K =2
€

€                                      22
 K-means     is a    Mixture of
 classifier            Gaussians is a
                       probability model
                      We can USE it as a
                       “soft” classifier




                                    23
 K-means     is a    Mixture of
 classifier            Gaussians is a
                       probability model
                      We can USE it as a
                       “soft” classifier




                                    24
 K-means      is a          Mixture of
  classifier                  Gaussians is a
                              probability model
                             We can USE it as a
                              “soft” classifier

Parameter to fit to data:   Parameters to fit to data:
    • Mean µk                   • Mean µk
                                • Covariance Σ k
                                • Mixing coefficient π k



€                            €                  25
                                  €
EM for GMM




             26
1.      Initialize means    µk                          1 0
      2.    E Step: Assign each point to a cluster
      3.    M Step: Given clusters, refine mean µk of each
            cluster k
4.      Stop when change in means is small
                 €
                                    €



                                                   27
1.      Initialize Gaussian* parameters: means µk ,
        covariances Σ k and mixing coefficients π k
      2.    E Step: Assign each point Xn an assignment
            score γ (znk ) for each cluster k            0.5 0.5
      3.    M Step: Given scores, adjust µk ,€ k ,Σ k
                                              π
            for€each cluster k                €
4.  Evaluate
  €             likelihood. If likelihood or
        parameters converge, stop.
                                € € €

       *There are k Gaussians


                                                    28
1.    Initialize µk , Σk
          π k , one for each
          Gaussian k
                 €                              π2         Σ2
        Tip!  Use K-means
€     €   result to initialize:                       µ2
          µk ← µk
           Σk ← cov(cluster(K)) €           €
           π k ← Number of pointspoints
                                  in k  €
                 Total number of

                                                 29

€
Latent variable
 2.    E Step: For each                                    .7    .3
       point Xn, determine
       its assignment score
       to each Gaussian k:




           is called a “responsibility”: how much is this Gaussian k
γ (znk )   responsible for this point Xn?
                                                                30
3.    M Step: For each
       Gaussian k, update
       parameters using
       new γ (znk )

                      Responsibility
                       for this Xn
Mean of Gaussian k
  €




Find the mean that “fits” the assignment scores best
                                             31
3.    M Step: For each
      Gaussian k, update
      parameters using
      new γ (znk )


Covariance matrix
 €
of Gaussian k




                           Just calculated this!
                                     32
3.    M Step: For each
      Gaussian k, update
      parameters using
      new γ (znk )


Mixing Coefficient
 €
                                   eg. 105.6/200
for Gaussian k



                      Total # of
                        points
                                          33
4.    Evaluate log likelihood. If likelihood or
      parameters converge, stop. Else go to Step
      2 (E step).




Likelihood is the probability that the data X
  was generated by the parameters you found.
  ie. Correctness!


                                           34
35
old              Hidden
1.      Initialize parameters   θ                   variables
                                          old
      2.    E Step: Evaluate p(Z | X,θ          )
      3.    M Step: Evaluate                         Observed
                                                     variables


                     €
                 €                                              Likelihood
             where




4.      Evaluate log likelihood. If likelihood or
        parameters converge, stop. Else θ old ← θ new
        and go to E Step.
                                                        36
 K-means  can be formulated as EM
 EM for Gaussian Mixtures
 EM for Bernoulli Mixtures

 EM for Bayesian Linear Regression




                                      37
 “Expectation”
Calculated the fixed, data-dependent
  parameters of the function Q.
 “Maximization”
Once the parameters of Q are known, it is fully
  determined, so now we can maximize Q.




                                         38
 We  learned how to cluster data in an
  unsupervised manner
 Gaussian Mixture Models are useful for
  modeling data with “soft” cluster
  assignments
 Expectation Maximization is a method used
  when we have a model with latent variables
  (values we don’t know, but estimate with
  each step)                                   0.5 0.5




                                       39
 Myquestion: What other applications could
 use EM? How about EM of GMMs?
                                       40

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

GMM
GMMGMM
GMM
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Random forest
Random forestRandom forest
Random forest
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
 
Metaheuristics
MetaheuristicsMetaheuristics
Metaheuristics
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 

Ähnlich wie Expectation Maximization and Gaussian Mixture Models

The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...Colm Connaughton
 
2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussiannozomuhamada
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloMonte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloXin-She Yang
 
Ordinary abelian varieties having small embedding degree
Ordinary abelian varieties having small embedding degreeOrdinary abelian varieties having small embedding degree
Ordinary abelian varieties having small embedding degreePaula Valenca
 
How to design a linear control system
How to design a linear control systemHow to design a linear control system
How to design a linear control systemAlireza Mirzaei
 
Cluster-cluster aggregation with (complete) collisional fragmentation
Cluster-cluster aggregation with (complete) collisional fragmentationCluster-cluster aggregation with (complete) collisional fragmentation
Cluster-cluster aggregation with (complete) collisional fragmentationColm Connaughton
 
Color Coding-Related Techniques
Color Coding-Related TechniquesColor Coding-Related Techniques
Color Coding-Related Techniquescseiitgn
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Stochastic Approximation and Simulated Annealing
Stochastic Approximation and Simulated AnnealingStochastic Approximation and Simulated Annealing
Stochastic Approximation and Simulated AnnealingSSA KPI
 
Quantization
QuantizationQuantization
Quantizationwtyru1989
 
Cluster aggregation with complete collisional fragmentation
Cluster aggregation with complete collisional fragmentationCluster aggregation with complete collisional fragmentation
Cluster aggregation with complete collisional fragmentationColm Connaughton
 
Diffraction,unit 2
Diffraction,unit  2Diffraction,unit  2
Diffraction,unit 2Kumar
 

Ähnlich wie Expectation Maximization and Gaussian Mixture Models (16)

The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
The Inverse Smoluchowski Problem, Particles In Turbulence 2011, Potsdam, Marc...
 
Manuscript 1334
Manuscript 1334Manuscript 1334
Manuscript 1334
 
Manuscript 1334-1
Manuscript 1334-1Manuscript 1334-1
Manuscript 1334-1
 
2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloMonte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
 
Ordinary abelian varieties having small embedding degree
Ordinary abelian varieties having small embedding degreeOrdinary abelian varieties having small embedding degree
Ordinary abelian varieties having small embedding degree
 
How to design a linear control system
How to design a linear control systemHow to design a linear control system
How to design a linear control system
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Cluster-cluster aggregation with (complete) collisional fragmentation
Cluster-cluster aggregation with (complete) collisional fragmentationCluster-cluster aggregation with (complete) collisional fragmentation
Cluster-cluster aggregation with (complete) collisional fragmentation
 
Color Coding-Related Techniques
Color Coding-Related TechniquesColor Coding-Related Techniques
Color Coding-Related Techniques
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Stochastic Approximation and Simulated Annealing
Stochastic Approximation and Simulated AnnealingStochastic Approximation and Simulated Annealing
Stochastic Approximation and Simulated Annealing
 
Quantization
QuantizationQuantization
Quantization
 
Cluster aggregation with complete collisional fragmentation
Cluster aggregation with complete collisional fragmentationCluster aggregation with complete collisional fragmentation
Cluster aggregation with complete collisional fragmentation
 
Diffraction,unit 2
Diffraction,unit  2Diffraction,unit  2
Diffraction,unit 2
 

Kürzlich hochgeladen

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Kürzlich hochgeladen (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Expectation Maximization and Gaussian Mixture Models

  • 1. Expectation Maximization and Mixture of Gaussians 1
  • 2. (bpm 125)  Recommend me Bpm some music! 90!  Discover groups of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection 2
  • 3. (bpm 125)  Recommend me some music! bpm  Discover groups 120 of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection bpm 60 3
  • 5. 1.  Initialize K “means” µk , one for each class µ1   Eg. Use random starting points, or € choose k random € µ2 points from the set €K=2 5
  • 6. 1 0 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 6
  • 7. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 0 1 7
  • 8. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 8
  • 9. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 9
  • 10. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 10
  • 11. 0 1 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 11
  • 12. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 12
  • 13. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 13
  • 14. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 14
  • 15. 4.  When means do not change anymore  clustering DONE. 15
  • 16.  InK-means, a point can only have 1 class  But what about points that lie in between groups? eg. Jazz + Classical 16
  • 17. The Famous “GMM”: Gaussian Mixture Model 17
  • 18. Mean p(X) = N(X | µ,Σ) Variance Gaussian == “Normal” distribution 18
  • 19. p(X) = N(X | µ,Σ) + N(X | µ,Σ) 19
  • 20. p(X) = N(X | µ1,Σ1 ) + N(X | µ2 ,Σ 2 ) Example: Variance 20
  • 21. p(X) = π 1N(X | µ1,Σ1 ) + π 2 N(X | µ2 ,Σ 2 ) k Example: Mixing Coefficient ∑π k =1 k=1 € π 1 = 0.7 π 2 = 0.3 21
  • 22. K p(X) = ∑ π k N(X | µk ,Σ k ) k=1 Example: K =2 € € 22
  • 23.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 23
  • 24.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 24
  • 25.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier Parameter to fit to data: Parameters to fit to data: • Mean µk • Mean µk • Covariance Σ k • Mixing coefficient π k € € 25 €
  • 27. 1.  Initialize means µk 1 0 2.  E Step: Assign each point to a cluster 3.  M Step: Given clusters, refine mean µk of each cluster k 4.  Stop when change in means is small € € 27
  • 28. 1.  Initialize Gaussian* parameters: means µk , covariances Σ k and mixing coefficients π k 2.  E Step: Assign each point Xn an assignment score γ (znk ) for each cluster k 0.5 0.5 3.  M Step: Given scores, adjust µk ,€ k ,Σ k π for€each cluster k € 4.  Evaluate € likelihood. If likelihood or parameters converge, stop. € € € *There are k Gaussians 28
  • 29. 1.  Initialize µk , Σk π k , one for each Gaussian k € π2 Σ2   Tip! Use K-means € € result to initialize: µ2 µk ← µk Σk ← cov(cluster(K)) € € π k ← Number of pointspoints in k € Total number of 29 €
  • 30. Latent variable 2.  E Step: For each .7 .3 point Xn, determine its assignment score to each Gaussian k: is called a “responsibility”: how much is this Gaussian k γ (znk ) responsible for this point Xn? 30
  • 31. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Responsibility for this Xn Mean of Gaussian k € Find the mean that “fits” the assignment scores best 31
  • 32. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Covariance matrix € of Gaussian k Just calculated this! 32
  • 33. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Mixing Coefficient € eg. 105.6/200 for Gaussian k Total # of points 33
  • 34. 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else go to Step 2 (E step). Likelihood is the probability that the data X was generated by the parameters you found. ie. Correctness! 34
  • 35. 35
  • 36. old Hidden 1.  Initialize parameters θ variables old 2.  E Step: Evaluate p(Z | X,θ ) 3.  M Step: Evaluate Observed variables € € Likelihood where 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else θ old ← θ new and go to E Step. 36
  • 37.  K-means can be formulated as EM  EM for Gaussian Mixtures  EM for Bernoulli Mixtures  EM for Bayesian Linear Regression 37
  • 38.  “Expectation” Calculated the fixed, data-dependent parameters of the function Q.  “Maximization” Once the parameters of Q are known, it is fully determined, so now we can maximize Q. 38
  • 39.  We learned how to cluster data in an unsupervised manner  Gaussian Mixture Models are useful for modeling data with “soft” cluster assignments  Expectation Maximization is a method used when we have a model with latent variables (values we don’t know, but estimate with each step) 0.5 0.5 39
  • 40.  Myquestion: What other applications could use EM? How about EM of GMMs? 40