26. Expectation-
Maximization
Consider learning a naïve Bayes classifier using unlabeled
data. How can we estimate e.g. P(A|C)?
Initialization: randomly assign numbers to P(C), P(A|C), P(B|C)
repeat {
E-step: Compute P(C|A,B):
M-step: Re-compute maximum likelihood estimation
of P(C), P(A|C), P(B|C)
Calculate log likelihood of data
} until (likelihood of data not improving)
C
A B
29. Expectation-
Maximization
C
A B
M-step:
Re-compute maximum likelihood estimation of
P(C), P(A|C), P(B|C):
( ) 100 5 9 40
( )
( ) 100 5 9 40
( ) ( ) 9 40
( | )
( ) ( ) 100 5 9 40
( ) ( ) 9(1 ) 40(1 )
( | )
( ) ( ) 100(1 ) 5(1 ) 9(1 ) 40(1 )
N c w x y z
P c
N all
P a c N a c y z
P a c
P c N c w x y z
P a c N a c y z
P a c
P c N c w x y z
30. Expectation-
Maximization
C
A B
Calculate log likelihood of data:
1 2 154 1 2 154
100 5 9 40
log ( , ,..., ) log[ ( ) ( )... ( )]
log[ ( , ) ( , ) ( , ) ( , ) ]
100log ( , ) 5log ( , ) 9log ( , ) 40log ( , )
P d d d P d P d P d
P a b P a b P a b P a b
P a b P a b P a b P a b