SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
EM algorithm and its application in Probabilistic Latent
                   Semantic Analysis (pLSA)

                                                 Duc-Hieu Tran
                                             tdh.net [at] gmail.com

                                            Nanyang Technological University


                                                   July 27, 2010




Duc-Hieu Trantdh.net [at] gmail.com (NTU)              EM in pLSA              July 27, 2010   1 / 27
The parameter estimation problem


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   2 / 27
The parameter estimation problem


  Introduction



   Known the prior probabilities P(ωi ), class-conditional densities p(x|ωi )
   =⇒ optimal classifier
           P(ωj |x) ∝ p(x|ωj )p(ωj )
           decide ωi if p(ωi |x) > P(ωj |x), ∀j = i
   In practice, p(x|ωi ) is unknown – just estimated from training samples
   (e.g., assume p(x|ωi ) ∼ N (µi , Σi )).




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   3 / 27
The parameter estimation problem


  Frequentist vs. Bayesian schools


   Frequentist
           parameters – quantities whose values are fixed but unknown.
           the best estimate of their values – the one that maximizes the
           probability of obtaining the observed samples.
   Bayesian
           paramters – random variables having some known prior distribution.
           observation of the samples converts this to a posterior density;
           revising our opinion about the true values of the parameters.




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   4 / 27
The parameter estimation problem


  Examples

           training samples: S = {(x (1) , y (1) ), . . . (x (m) , y (m) )}
           frequentist: maximum likelihood

                                                max            p(y (i) |x (i) ; θ)
                                                   θ
                                                          i

           bayesian: P(θ) – prior, e.g., P(θ) ∼ N (0, I)
                                                        m
                                    P(θ|S) ∝                   P(y (i) |x (i) , θ) .P(θ)
                                                       i=1

                                              θMAP = arg max P(θ|S)
                                                                  θ




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA                         July 27, 2010   5 / 27
EM algorithm


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)           EM in pLSA   July 27, 2010   6 / 27
EM algorithm


  An estimation problem

           training set of m independent samples: {x (1) , x (2) , . . . , x (m) }
           goal: fit the paramters of a model p(x, z) to the data
           the likelihood:
                                        m                         m
                                                    (i)
                            (θ) =           log p(x ; θ) =              log       p(x (i) , z; θ)
                                      i=1                         i=1         z

           explicitly maximize (θ) might be difficult.
           z - laten random variable
           if z (i) were observed, then maximum likelihood estimation would be
           easy.
           strategy: repeatedly construct a lower-bound on                           (E-step) and
           optimize that lower-bound (M-step).


Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                             July 27, 2010   7 / 27
EM algorithm


  EM algorithm (1)
           digression: Jensen’s inequality.
           f – convex function; E [f (X )] ≥ f (E [X ])
           for each i, Qi – distribution of z: z Qi (z) = 1, Qi (z) ≥ 0

                          (θ) =             log p(x (i) ; θ)
                                     i

                               =            log             p(x (i) , z (i) ; θ)
                                     i              z (i)
                                                                          p(x (i) , z (i) ; θ)
                               =            log             Qi (z (i) )                                          (1)
                                     i
                                                                             Qi (z (i) )
                                                    z (i)
                              applying Jensen’s inequality, concave function log
                                                                       p(x (i) , z (i) ; θ)
                               ≥                    Qi (z (i) )log                                               (2)
                                     i
                                                                          Qi (z (i) )
                                            z (i)

      More detail . . .
Duc-Hieu Trantdh.net [at] gmail.com (NTU)                      EM in pLSA                        July 27, 2010   8 / 27
EM algorithm


  EM algorithm (2)
           for any set of distribution Qi , formula (2) gives a lower-bound on (θ)
           how to choose Qi ?
           strategy: make the inequality hold with equality at our particular
           value of θ.
           require:
                                                 p(x (i) , z (i) ; θ)
                                                                      =c
                                                    Qi (z (i) )
           c – constant not depend on z (i)
           choose: Qi (z (i) ) ∝ p(x (i) , z (i) ; θ)
           we know           z   Qi (z (i) ) = 1, so

                                     p(x (i) , z (i) ; θ)   p(x (i) , z (i) ; θ)
                  Qi (z (i) ) =                           =                      = p(z (i) |x (i) ; θ)
                                      z p(x (i) , z; θ)       p(x (i) ; θ)

Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                           July 27, 2010   9 / 27
EM algorithm


  EM algorithm (3)


           Qi – posterior distribution of z (i) given x (i) and the parameter θ
   EM algorithm: repeat until convergence
           E-step: for each i
                                                Qi (z (i) ) := p(z (i) |x (i) ; θ)
           M-step:

                                                                                   p(x (i) , z (i) ; θ)
                             θ := arg max                        Qi (z (i) ) log
                                            θ        i
                                                                                      Qi (z (i) )
                                                         z (i)

   The algorithm will converge, since (θ(t) ) ≤ (θ(t+1) )



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                EM in pLSA                                July 27, 2010   10 / 27
EM algorithm


  EM algorithm (4)
   Digression: coordinate ascent algorithm.
       maxW (α1 , . . . αm )
             α
           loop until converge:
           for i ∈ 1, . . . , m:

                                    αi = arg max W (α1 , . . . , αi , . . . , αm )
                                                                 ˆ
                                              αi
                                              ˆ

   EM-algorithm as coordinate ascent algorithm

                                                                             p(x (i) , z (i) ; θ)
                               J(Q, θ) =                   Qi (z (i) ) log
                                              i
                                                                                Qi (z (i) )
                                                   z (i)

            (θ) ≥ J(Q, θ)
           EM algorithm can be viewed as coordinate ascent on J
           E-step: maximize w.r.t Q
           M-step: maximize w.r.t θ
Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                                July 27, 2010   11 / 27
Probabilistic Latent Sematic Analysis


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA   July 27, 2010   12 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (1)

           set of documents D = {d1 , . . . , dN }
           set of words W = {w1 , . . . , wM }
           set of unobserved classes Z = {z1 , . . . , zK }
           conditional independence assumption:

                                         P(di , wj |zk ) = P(di |zk )P(wj |zk )                                (3)

           so,
                                                              K
                                        P(wj |di ) =               P(zk |di )P(wj |zk )                        (4)
                                                             k=1
                                                                   K
                                   P(di , wj ) = P(di )                 P(wj |zk )P(zk |di )
                                                                  k=1
      More detail . . .



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                           July 27, 2010   13 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (2)

           n(di , wj ) – # word wj in doc. di
           Likelihood
                                     N                    N       M
                            L=            P(di ) =                     [P(di , wj )]n(di ,wj )
                                    i=1                 i=1 j=1

                                     N      M                 K                                  n(di ,wj )

                               =                  P(di )              P(wj |zk )P(zk |di )
                                    i=1 j=1                 k=1

           log-likelihood           = log(L)
                  N     M                                                               K
             =                 n(di , wj ) log P(di ) + n(di , wj ) log                     P(wj |zk )P(zk |di )
                 i=1 j=1                                                              k=1



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                                  July 27, 2010   14 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (3)
           maximize w.r.t P(wj |zk ), P(zk |di )
           ≈ maximize
                              N      M                          K
                                         n(di , wj ) log             P(wj |zk )P(zk |di )
                            i=1 j=1                           k=1
                                    N    M                           K
                                                                                      P(wj |zk )P(zk |di )
                            =                 n(di , wj ) log              Qk (zk )
                                                                                           Qk (zk )
                                   i=1 j=1                           k=1
                                    N    M                     K
                                                                                      P(wj |zk )P(zk |di )
                            ≥                 n(di , wj )            Qk (zk ) log
                                                                                           Qk (zk )
                                   i=1 j=1                   k=1

           choose
                                                     P(wj |zk )P(zk |di )
                                  Qk (zk ) =         K
                                                                                       = P(zk |di , wj )
                                                     l=1 P(wj |zl )P(zl |di )
              More detail . . .

Duc-Hieu Trantdh.net [at] gmail.com (NTU)                    EM in pLSA                             July 27, 2010   15 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (4)


           ≈ maximize (w.r.t P(wj |zk ), P(zk |di ))

                         N    M                      K
                                                                                P(wj |zk )P(zk |di )
                                    n(di , wj )           P(zk |di , wj ) log
                                                                                  P(zk |di , wj )
                       i=1 j=1                     k=1

           ≈ maximize
                        N     M                     K
                                   n(di , wj )           P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                     k=1




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   16 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (5)
   EM-algorithm
      E-step: update
                                                                  P(wj |zk )P(zk |di )
                                    P(zk |di , wj ) =             K
                                                                  l=1 P(wj |zl )P(zl |di )
           M-step: maximize w.r.t P(wj |zk ), P(zk |di )
                        N     M                     K
                                   n(di , wj )           P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                     k=1

           subject to
                                            M
                                                  P(wj |zk ) = 1, k ∈ {1 . . . K }
                                            j=1
                                             K
                                                  P(zk |di ) = 1, i ∈ {1 . . . N}
                                            k=1
Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   17 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (6)


   Solution of maximization problem in M-step:
                                                      N
                                                      i=1 n(di , wj )P(zk |di , wj )
                          P(wj |zk ) =           M      N
                                                 m=1    n=1 n(dn , wm )P(zk |dn , wm )
                                                 M
                                                 j=1 n(di , wj )P(zk |di , wj )
                          P(zk |di ) =
                                                               n(di )
                                M
   where, n(di ) =              j=1 n(di , wj )
      More detail . . .




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                  EM in pLSA               July 27, 2010   18 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (7)

   All together
           E-step:
                                                                  P(wj |zk )P(zk |di )
                                    P(zk |di , wj ) =             K
                                                                  l=1 P(wj |zl )P(zl |di )
           M-step:
                                                         N
                                                         i=1 n(di , wj )P(zk |di , wj )
                           P(wj |zk ) =             M      N
                                                    m=1    n=1 n(dn , wm )P(zk |dn , wm )
                                                    M
                                                    j=1 n(di , wj )P(zk |di , wj )
                            P(zk |di ) =
                                                                  n(di )




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   19 / 27
Reference


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)        EM in pLSA   July 27, 2010   20 / 27
Reference




           R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification,
           Wiley-Interscience, 2001.
           T. Hofmann, ”Unsupervised learning by probabilistic latent semantic
           analysis,” Machine Learning, vol. 42, 2001, p. 177–196.
           Course: ”Machine Learning CS229”, Andrew Ng, Stanford University




Duc-Hieu Trantdh.net [at] gmail.com (NTU)        EM in pLSA      July 27, 2010   21 / 27
Appendix

   Generative model for word/document co-occurence
       select a document di with probability (w.p) P(di )
       pick a latent class zk w.p P(zk |di )
       generate a word wj w.p P(wj |zk )
                                K                                K
            P(di , wj ) =           P(di , wj |zk )P(zk ) =            P(wj |zk )P(di |zk )P(zk )
                              k=1                                k=1
                                                                  K
                                                            =          P(wj |zk )P(zk |di )P(di )
                                                                 k=1
                                                                          K
                                                            = P(di )           P(wj |zk )P(zk |di )
                                                                         k=1
                                               P(di , wj ) = P(wj |di )P(di )
                                                                 K
                                            =⇒ P(wj |di ) =            P(zk |di )P(wj |zk )
                                                                 k=1

Duc-Hieu Trantdh.net [at] gmail.com (NTU)           EM in pLSA                          July 27, 2010   22 / 27
Appendix




                                                        K
                                       P(wj |di ) =          P(zk |di )P(wj |zk )
                                                       k=1
                       K
           since       k=1 P(zk |di )       = 1, P(wj , di ) is convex combination of P(wj |zk )
           ≈ each document is modelled as a mixture of topics




                                                                                                    Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                     July 27, 2010     23 / 27
Appendix




                                               P(di , wj |zk )P(zk )
                               P(zk |di , wj ) =                                            (5)
                                                   P(di , wj )
                                               P(wj |zk )P(di |zk )P(zk )
                                             =                                              (6)
                                                       P(di , wj )
                                               P(wj |zk )P(zk |di )
                                             =                                              (7)
                                                   P(wj |di )
                                                 P(wj |zk )P(zk |di )
                                             = K                                            (8)
                                                 l=1 P(wj |zl )P(zl |di )

   From (5) to (6) by conditional independence assumption (3). From (7) to
   (8) by (4).                                                        Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)          EM in pLSA               July 27, 2010   24 / 27
Appendix




   Lagrange multipliers τk , ρi
                        N     M                    K
               H=                  n(di , wj )          P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                    k=1
                                                              
                        K                   M                        N              K
                   +         τk 1 −              P(wj |di ) +            ρi 1 −         P(zk |di )
                       k=1                  j=1                      i=1            k=1

                                                  N
                         ∂H                       i=1 P(zk |di , wj )n(di , wj )
                                   =                                                − τk = 0
                       ∂P(wj |zk )                        P(wj |zk )
                                                  M
                         ∂H                       j=1 n(di , wj )P(zk |di , wj )
                                   =                                                − ρi = 0
                       ∂P(zk |di )                        P(zk |di )




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA                           July 27, 2010   25 / 27
Appendix




                M
   from         j=1 P(wj |zk )      =1

                                             M   N
                                   τk =               P(zk |di , wj )n(di , wj )
                                            j=1 i=1

                K
   from         k=1 P(zk |di , wj )         =1

                                                     ρi = n(di )

   =⇒ P(wj |zk ), P(zk |di )                                                                       Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)             EM in pLSA                   July 27, 2010     26 / 27
Appendix


  Applying the Jensen’s inequality




           f (x) = log (x), concave function

                                    p(x (i) , z (i) ; θ)                      p(x (i) , z (i) ; θ)
                f    Ez (i) ∼Qi                              ≥ Ez (i) ∼Qi f
                                       Qi (z (i) )                               Qi (z (i) )

                                                                                                      Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)             EM in pLSA                      July 27, 2010     27 / 27

Weitere ähnliche Inhalte

Was ist angesagt?

ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習
Shota Ishikawa
 

Was ist angesagt? (20)

圏論とHaskellは仲良し
圏論とHaskellは仲良し圏論とHaskellは仲良し
圏論とHaskellは仲良し
 
[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...
[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...
[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...
 
ディープラーニング入門 ~ 画像処理・自然言語処理について ~
ディープラーニング入門 ~ 画像処理・自然言語処理について ~ディープラーニング入門 ~ 画像処理・自然言語処理について ~
ディープラーニング入門 ~ 画像処理・自然言語処理について ~
 
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
 
Deep sets
Deep setsDeep sets
Deep sets
 
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
 
Crfと素性テンプレート
Crfと素性テンプレートCrfと素性テンプレート
Crfと素性テンプレート
 
関数プログラマから見たPythonと機械学習
関数プログラマから見たPythonと機械学習関数プログラマから見たPythonと機械学習
関数プログラマから見たPythonと機械学習
 
ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習
 
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
 
劣モジュラ最適化と機械学習1章
劣モジュラ最適化と機械学習1章劣モジュラ最適化と機械学習1章
劣モジュラ最適化と機械学習1章
 
ベイズ最適化
ベイズ最適化ベイズ最適化
ベイズ最適化
 
ホモトピー型理論入門
ホモトピー型理論入門ホモトピー型理論入門
ホモトピー型理論入門
 
C++ tips 3 カンマ演算子編
C++ tips 3 カンマ演算子編C++ tips 3 カンマ演算子編
C++ tips 3 カンマ演算子編
 
指数時間アルゴリズム入門
指数時間アルゴリズム入門指数時間アルゴリズム入門
指数時間アルゴリズム入門
 
Lookahead Optimizer: k steps forward, 1 step back
Lookahead Optimizer: k steps forward, 1 step backLookahead Optimizer: k steps forward, 1 step back
Lookahead Optimizer: k steps forward, 1 step back
 
計算化学実習講座:第一回
計算化学実習講座:第一回計算化学実習講座:第一回
計算化学実習講座:第一回
 
敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度
 

Ähnlich wie EM algorithm and its application in probabilistic latent semantic analysis

Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
Per Kristian Lehre
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
PK Lehre
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)
NYversity
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Chiheb Ben Hammouda
 
Logics of the laplace transform
Logics of the laplace transformLogics of the laplace transform
Logics of the laplace transform
Tarun Gehlot
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayes
Phong Vo
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuQuantum modes - Ion Cotaescu
Quantum modes - Ion Cotaescu
SEENET-MTP
 

Ähnlich wie EM algorithm and its application in probabilistic latent semantic analysis (20)

Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talk
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataAda boost brown boost performance with noisy data
Ada boost brown boost performance with noisy data
 
Logics of the laplace transform
Logics of the laplace transformLogics of the laplace transform
Logics of the laplace transform
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayes
 
11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...
 
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...
 
sada_pres
sada_pressada_pres
sada_pres
 
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuQuantum modes - Ion Cotaescu
Quantum modes - Ion Cotaescu
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihood
 
Image denoising
Image denoisingImage denoising
Image denoising
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 

Mehr von zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
zukun
 

Mehr von zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 

Kürzlich hochgeladen

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Kürzlich hochgeladen (20)

TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 

EM algorithm and its application in probabilistic latent semantic analysis

  • 1. EM algorithm and its application in Probabilistic Latent Semantic Analysis (pLSA) Duc-Hieu Tran tdh.net [at] gmail.com Nanyang Technological University July 27, 2010 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 1 / 27
  • 2. The parameter estimation problem Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 2 / 27
  • 3. The parameter estimation problem Introduction Known the prior probabilities P(ωi ), class-conditional densities p(x|ωi ) =⇒ optimal classifier P(ωj |x) ∝ p(x|ωj )p(ωj ) decide ωi if p(ωi |x) > P(ωj |x), ∀j = i In practice, p(x|ωi ) is unknown – just estimated from training samples (e.g., assume p(x|ωi ) ∼ N (µi , Σi )). Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 3 / 27
  • 4. The parameter estimation problem Frequentist vs. Bayesian schools Frequentist parameters – quantities whose values are fixed but unknown. the best estimate of their values – the one that maximizes the probability of obtaining the observed samples. Bayesian paramters – random variables having some known prior distribution. observation of the samples converts this to a posterior density; revising our opinion about the true values of the parameters. Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 4 / 27
  • 5. The parameter estimation problem Examples training samples: S = {(x (1) , y (1) ), . . . (x (m) , y (m) )} frequentist: maximum likelihood max p(y (i) |x (i) ; θ) θ i bayesian: P(θ) – prior, e.g., P(θ) ∼ N (0, I) m P(θ|S) ∝ P(y (i) |x (i) , θ) .P(θ) i=1 θMAP = arg max P(θ|S) θ Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 5 / 27
  • 6. EM algorithm Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 6 / 27
  • 7. EM algorithm An estimation problem training set of m independent samples: {x (1) , x (2) , . . . , x (m) } goal: fit the paramters of a model p(x, z) to the data the likelihood: m m (i) (θ) = log p(x ; θ) = log p(x (i) , z; θ) i=1 i=1 z explicitly maximize (θ) might be difficult. z - laten random variable if z (i) were observed, then maximum likelihood estimation would be easy. strategy: repeatedly construct a lower-bound on (E-step) and optimize that lower-bound (M-step). Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 7 / 27
  • 8. EM algorithm EM algorithm (1) digression: Jensen’s inequality. f – convex function; E [f (X )] ≥ f (E [X ]) for each i, Qi – distribution of z: z Qi (z) = 1, Qi (z) ≥ 0 (θ) = log p(x (i) ; θ) i = log p(x (i) , z (i) ; θ) i z (i) p(x (i) , z (i) ; θ) = log Qi (z (i) ) (1) i Qi (z (i) ) z (i) applying Jensen’s inequality, concave function log p(x (i) , z (i) ; θ) ≥ Qi (z (i) )log (2) i Qi (z (i) ) z (i) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 8 / 27
  • 9. EM algorithm EM algorithm (2) for any set of distribution Qi , formula (2) gives a lower-bound on (θ) how to choose Qi ? strategy: make the inequality hold with equality at our particular value of θ. require: p(x (i) , z (i) ; θ) =c Qi (z (i) ) c – constant not depend on z (i) choose: Qi (z (i) ) ∝ p(x (i) , z (i) ; θ) we know z Qi (z (i) ) = 1, so p(x (i) , z (i) ; θ) p(x (i) , z (i) ; θ) Qi (z (i) ) = = = p(z (i) |x (i) ; θ) z p(x (i) , z; θ) p(x (i) ; θ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 9 / 27
  • 10. EM algorithm EM algorithm (3) Qi – posterior distribution of z (i) given x (i) and the parameter θ EM algorithm: repeat until convergence E-step: for each i Qi (z (i) ) := p(z (i) |x (i) ; θ) M-step: p(x (i) , z (i) ; θ) θ := arg max Qi (z (i) ) log θ i Qi (z (i) ) z (i) The algorithm will converge, since (θ(t) ) ≤ (θ(t+1) ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 10 / 27
  • 11. EM algorithm EM algorithm (4) Digression: coordinate ascent algorithm. maxW (α1 , . . . αm ) α loop until converge: for i ∈ 1, . . . , m: αi = arg max W (α1 , . . . , αi , . . . , αm ) ˆ αi ˆ EM-algorithm as coordinate ascent algorithm p(x (i) , z (i) ; θ) J(Q, θ) = Qi (z (i) ) log i Qi (z (i) ) z (i) (θ) ≥ J(Q, θ) EM algorithm can be viewed as coordinate ascent on J E-step: maximize w.r.t Q M-step: maximize w.r.t θ Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 11 / 27
  • 12. Probabilistic Latent Sematic Analysis Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 12 / 27
  • 13. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (1) set of documents D = {d1 , . . . , dN } set of words W = {w1 , . . . , wM } set of unobserved classes Z = {z1 , . . . , zK } conditional independence assumption: P(di , wj |zk ) = P(di |zk )P(wj |zk ) (3) so, K P(wj |di ) = P(zk |di )P(wj |zk ) (4) k=1 K P(di , wj ) = P(di ) P(wj |zk )P(zk |di ) k=1 More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 13 / 27
  • 14. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (2) n(di , wj ) – # word wj in doc. di Likelihood N N M L= P(di ) = [P(di , wj )]n(di ,wj ) i=1 i=1 j=1 N M K n(di ,wj ) = P(di ) P(wj |zk )P(zk |di ) i=1 j=1 k=1 log-likelihood = log(L) N M K = n(di , wj ) log P(di ) + n(di , wj ) log P(wj |zk )P(zk |di ) i=1 j=1 k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 14 / 27
  • 15. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (3) maximize w.r.t P(wj |zk ), P(zk |di ) ≈ maximize N M K n(di , wj ) log P(wj |zk )P(zk |di ) i=1 j=1 k=1 N M K P(wj |zk )P(zk |di ) = n(di , wj ) log Qk (zk ) Qk (zk ) i=1 j=1 k=1 N M K P(wj |zk )P(zk |di ) ≥ n(di , wj ) Qk (zk ) log Qk (zk ) i=1 j=1 k=1 choose P(wj |zk )P(zk |di ) Qk (zk ) = K = P(zk |di , wj ) l=1 P(wj |zl )P(zl |di ) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 15 / 27
  • 16. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (4) ≈ maximize (w.r.t P(wj |zk ), P(zk |di )) N M K P(wj |zk )P(zk |di ) n(di , wj ) P(zk |di , wj ) log P(zk |di , wj ) i=1 j=1 k=1 ≈ maximize N M K n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 16 / 27
  • 17. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (5) EM-algorithm E-step: update P(wj |zk )P(zk |di ) P(zk |di , wj ) = K l=1 P(wj |zl )P(zl |di ) M-step: maximize w.r.t P(wj |zk ), P(zk |di ) N M K n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1 subject to M P(wj |zk ) = 1, k ∈ {1 . . . K } j=1 K P(zk |di ) = 1, i ∈ {1 . . . N} k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 17 / 27
  • 18. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (6) Solution of maximization problem in M-step: N i=1 n(di , wj )P(zk |di , wj ) P(wj |zk ) = M N m=1 n=1 n(dn , wm )P(zk |dn , wm ) M j=1 n(di , wj )P(zk |di , wj ) P(zk |di ) = n(di ) M where, n(di ) = j=1 n(di , wj ) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 18 / 27
  • 19. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (7) All together E-step: P(wj |zk )P(zk |di ) P(zk |di , wj ) = K l=1 P(wj |zl )P(zl |di ) M-step: N i=1 n(di , wj )P(zk |di , wj ) P(wj |zk ) = M N m=1 n=1 n(dn , wm )P(zk |dn , wm ) M j=1 n(di , wj )P(zk |di , wj ) P(zk |di ) = n(di ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 19 / 27
  • 20. Reference Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 20 / 27
  • 21. Reference R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, Wiley-Interscience, 2001. T. Hofmann, ”Unsupervised learning by probabilistic latent semantic analysis,” Machine Learning, vol. 42, 2001, p. 177–196. Course: ”Machine Learning CS229”, Andrew Ng, Stanford University Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 21 / 27
  • 22. Appendix Generative model for word/document co-occurence select a document di with probability (w.p) P(di ) pick a latent class zk w.p P(zk |di ) generate a word wj w.p P(wj |zk ) K K P(di , wj ) = P(di , wj |zk )P(zk ) = P(wj |zk )P(di |zk )P(zk ) k=1 k=1 K = P(wj |zk )P(zk |di )P(di ) k=1 K = P(di ) P(wj |zk )P(zk |di ) k=1 P(di , wj ) = P(wj |di )P(di ) K =⇒ P(wj |di ) = P(zk |di )P(wj |zk ) k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 22 / 27
  • 23. Appendix K P(wj |di ) = P(zk |di )P(wj |zk ) k=1 K since k=1 P(zk |di ) = 1, P(wj , di ) is convex combination of P(wj |zk ) ≈ each document is modelled as a mixture of topics Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 23 / 27
  • 24. Appendix P(di , wj |zk )P(zk ) P(zk |di , wj ) = (5) P(di , wj ) P(wj |zk )P(di |zk )P(zk ) = (6) P(di , wj ) P(wj |zk )P(zk |di ) = (7) P(wj |di ) P(wj |zk )P(zk |di ) = K (8) l=1 P(wj |zl )P(zl |di ) From (5) to (6) by conditional independence assumption (3). From (7) to (8) by (4). Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 24 / 27
  • 25. Appendix Lagrange multipliers τk , ρi N M K H= n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1   K M N K + τk 1 − P(wj |di ) + ρi 1 − P(zk |di ) k=1 j=1 i=1 k=1 N ∂H i=1 P(zk |di , wj )n(di , wj ) = − τk = 0 ∂P(wj |zk ) P(wj |zk ) M ∂H j=1 n(di , wj )P(zk |di , wj ) = − ρi = 0 ∂P(zk |di ) P(zk |di ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 25 / 27
  • 26. Appendix M from j=1 P(wj |zk ) =1 M N τk = P(zk |di , wj )n(di , wj ) j=1 i=1 K from k=1 P(zk |di , wj ) =1 ρi = n(di ) =⇒ P(wj |zk ), P(zk |di ) Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 26 / 27
  • 27. Appendix Applying the Jensen’s inequality f (x) = log (x), concave function p(x (i) , z (i) ; θ) p(x (i) , z (i) ; θ) f Ez (i) ∼Qi ≥ Ez (i) ∼Qi f Qi (z (i) ) Qi (z (i) ) Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 27 / 27