A derivation of the sampling formulas for An Entity-Topic Model for Entity Linking [Han+ EMNLP-CoNLL12] and A Context-Aware Topic Model for Statistical Machine Translation [Su+ ACL15]
A derivation of the sampling formulas for An Entity-Topic Model for
Entity Linking [Han+ EMNLP-CoNLL12]
and
A Context-Aware Topic Model for Statistical Machine Translation [Su+ ACL15]
Ähnlich wie A derivation of the sampling formulas for An Entity-Topic Model for Entity Linking [Han+ EMNLP-CoNLL12] and A Context-Aware Topic Model for Statistical Machine Translation [Su+ ACL15]
Compiler Construction | Lecture 9 | Constraint ResolutionEelco Visser
Ähnlich wie A derivation of the sampling formulas for An Entity-Topic Model for Entity Linking [Han+ EMNLP-CoNLL12] and A Context-Aware Topic Model for Statistical Machine Translation [Su+ ACL15] (20)
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Linking [Han+ EMNLP-CoNLL12] and A Context-Aware Topic Model for Statistical Machine Translation [Su+ ACL15]
1. A derivation of the sampling formulas for
An Entity-Topic Model for Entity Linking [Han+ EMNLP-CoNLL12]
and
A Context-Aware Topic Model for Statistical Machine Translation [Su+ ACL15]
Tomonari MASADA @ Nagasaki University
September 17, 2015
The full joint distribution is obtained as follows.
p(m, w, z, e, a, θ, ϕ, ψ, ξ|α, β, γ, ι)
=
D∏
d=1
[
p(md|ed, ψ)p(ed|zd, ϕ)p(zd|θd)p(wd|ad, ξ)p(ad|ed)
]
·
D∏
d=1
p(θd|α) ·
K∏
k=1
p(ϕk|β) ·
T∏
t=1
p(ψ|γ) ·
T∏
t=1
p(ξ|ι)
=
D∏
d=1
[{ Md∏
i=1
p(mdi|ψedi
)p(edi|ϕzdi
)p(zdi|θd)
}{ Nd∏
n=1
p(wdn|ξadn
)p(adn|ed)
}]
·
D∏
d=1
p(θd|α) ·
K∏
k=1
p(ϕk|β) ·
T∏
t=1
p(ψt|γ) ·
T∏
t=1
p(ξt|ι)
=
D∏
d=1
[{ Md∏
i=1
K∏
k=1
T∏
t=1
(
ψt,mdi
ϕk,tθd,k
)∆(zdi=k∧edi=t)}{ Nd∏
n=1
T∏
t=1
(
ξt,wdn
∑Md
i=1 ∆(edi = t)
Md
)∆(adn=t)}]
·
D∏
d=1
p(θd|α) ·
K∏
k=1
p(ϕk|β) ·
T∏
t=1
p(ψt|γ) ·
T∏
t=1
p(ξt|ι)
=
D∏
d=1
[{ Md∏
i=1
T∏
t=1
ψ
∆(edi=t)
t,mdi
}{ Md∏
i=1
K∏
k=1
T∏
t=1
ϕ
∆(zdi=k∧edi=t)
k,t
}{ Md∏
i=1
K∏
k=1
θ
∆(zdi=k)
d,k
}{ Nd∏
n=1
T∏
t=1
ξ
∆(adn=t)
t,wdn
}]
·
D∏
d=1
[{ Nd∏
n=1
T∏
t=1
(∑Md
i=1 ∆(edi = t)
Md
)∆(adn=t)}]
·
D∏
d=1
p(θd|α) ·
K∏
k=1
p(ϕk|β) ·
T∏
t=1
p(ψt|γ) ·
T∏
t=1
p(ξt|ι)
=
U∏
u=1
T∏
t=1
ψ
Ct,u
t,u ·
K∏
k=1
T∏
t=1
ϕ
Ck,t
k,t ·
D∏
d=1
K∏
k=1
θ
Cd,k
d,k ·
T∏
t=1
V∏
v=1
ξ
Ct,v
t,v ·
D∏
d=1
T∏
t=1
(
Md,t
Md
)Nd,t
·
D∏
d=1
p(θd|α) ·
K∏
k=1
p(ϕk|β) ·
T∏
t=1
p(ψt|γ) ·
T∏
t=1
p(ξt|ι) , (1)
where ∆(·) is 1 if the proposition in the parentheses is true and is 0 otherwise.
Nd,t and Md,t are defined as follows: Nd,t ≡
∑Nd
n=1 ∆(adn = t); Md,t ≡
∑Md
i=1 ∆(edi = t).
The Cs are defined as follows: Ct,u ≡
∑D
d=1
∑Md
i=1 ∆(edi = t ∧ mdi = u); Ck,t ≡
∑D
d=1
∑Md
i=1 ∆(zdi =
k ∧ edi = t); Cd,k ≡
∑Md
i=1 ∆(zdi = k); Ct,v ≡
∑D
d=1
∑Nd
n=1 ∆(adn = t ∧ wdn = v).
1
2. We marginalize the multinomial parameters out.
p(m, w, z, e, a|α, β, γ, ι) =
∫
p(m, w, z, e, a, θ, ϕ, ψ, ξ|α, β, γ, ι)dθdϕdψdξ
=
T∏
t=1
∏
u Γ(Ct,u + γu)
Γ(Ct +
∑
u γu)
Γ(
∑
u γu)
∏
u Γ(γu)
·
K∏
k=1
T∏
t=1
∏
t Γ(Ck,t + βt)
Γ(Ck +
∑
t βt)
Γ(
∑
t βt)
∏
t Γ(βt)
·
D∏
d=1
K∏
k=1
∏
k Γ(Cd,k + αk)
Γ(Md +
∑
k αk)
Γ(
∑
k αk)
∏
k Γ(αk)
·
T∏
t=1
V∏
v=1
∏
v Γ(Ct,v + ιv)
Γ(Ct +
∑
v ιv)
Γ(
∑
v ιv)
∏
v Γ(ιv)
·
D∏
d=1
T∏
t=1
(
Md,t
Md
)Nd,t
(2)
We remove the ith mention in the dth document.
p(m−di
, w, z−di
, e−di
, a|α, β, γ, ι)
=
T∏
t=1
∏
u Γ(C−di
t,u + γu)
Γ(C−di
t +
∑
u γu)
Γ(
∑
u γu)
∏
u Γ(γu)
·
K∏
k=1
T∏
t=1
∏
t Γ(C−di
k,t + βt)
Γ(C−di
k +
∑
t βt)
Γ(
∑
t βt)
∏
t Γ(βt)
·
D∏
d=1
K∏
k=1
∏
k Γ(C−di
d,k + αk)
Γ(Md − 1 +
∑
k αk)
Γ(
∑
k αk)
∏
k Γ(αk)
·
T∏
t=1
V∏
v=1
∏
v Γ(Ct,v + ιv)
Γ(Ct +
∑
v ιv)
Γ(
∑
v ιv)
∏
v Γ(ιv)
·
D∏
d=1
T∏
t=1
(
M−di
d,t
Md − 1
)Nd,t
(3)
And add the mention of the same type with different latent variable values.
p(mdi, zdi = k, edi = t|m−di
, w, z−di
, e−di
, a, α, β, γ, ι)
=
p(mdi, zdi = k, edi = t, m−di
, w, z−di
, e−di
, a|α, β, γ, ι)
p(m−di, w, z−di, e−di, a|α, β, γ, ι)
=
Γ(C−di
t,mdi
+ 1 + γmdi
)
Γ(C−di
t + 1 +
∑
u γu)
Γ(C−di
t +
∑
u γu)
Γ(C−di
t,mdi
+ γmdi
)
·
Γ(C−di
k,t + 1 + βt)
Γ(C−di
k + 1 +
∑
t βt)
Γ(C−di
k +
∑
t βt)
Γ(C−di
k,t + βt)
·
Γ(C−di
d,k + 1 + αk)
Γ(Md +
∑
k αk)
Γ(Md − 1 +
∑
k αk)
Γ(C−di
d,k + αk)
·
(
M−di
d,t + 1
Md
Md − 1
M−di
d,t
)Nd,t
=
C−di
t,mdi
+ γmdi
C−di
t +
∑
u γu
·
C−di
k,t + βt
C−di
k +
∑
t βt
·
C−di
d,k + αk
Md +
∑
k αk
·
(
M−di
d,t + 1
Md
Md − 1
M−di
d,t
)Nd,t
(4)
Therefore, zdi can be updated based on the following probabilities:
p(zdi = k|m, w, z−di
, e, a, α, β, γ, ι)
=
p(mdi, zdi = k, edi = t|m−di
, w, z−di
, e−di
, a, α, β, γ, ι)
∑K
k=1 p(mdi, zdi = k, edi = t|m−di, w, z−di, e−di, a, α, β, γ, ι)
=
[
C−di
t,mdi
+γmdi
C−di
t +
∑
u γu
·
C−di
k,t +βt
C−di
k +
∑
t βt
·
C−di
d,k +αk
Md+
∑
k αk
·
(
M−di
d,t +1
Md
Md−1
M−di
d,t
)Nd,t
]
∑K
k=1
[
C−di
t,mdi
+γmdi
C−di
t +
∑
u γu
·
C−di
k,t +βt
C−di
k +
∑
t βt
·
C−di
d,k +αk
Md+
∑
k αk
·
(
M−di
d,t +1
Md
Md−1
M−di
d,t
)Nd,t
]
∝
C−di
k,t + βt
C−di
k +
∑
t βt
·
C−di
d,k + αk
Md +
∑
k αk
(5)
Further, edi can be updated based on the following probabilities:
p(edi = t|m, w, z, e−di
, a, α, β, γ, ι)
=
p(mdi, zdi = k, edi = t|m−di
, w, z−di
, e−di
, a, α, β, γ, ι)
∑T
t=1 p(mdi, zdi = k, edi = t|m−di, w, z−di, e−di, a, α, β, γ, ι)
∝
C−di
t,mdi
+ γmdi
C−di
t +
∑
u γu
·
C−di
k,t + βt
C−di
k +
∑
t βt
·
(
M−di
d,t + 1
M−di
d,t
)Nd,t
(6)
2
3. We remove the nth word token in the dth document.
p(m, w−dn
, z, e, a−dn
|α, β, γ, ι)
=
T∏
t=1
∏
u Γ(Ct,u + γu)
Γ(Ct +
∑
u γu)
Γ(
∑
u γu)
∏
u Γ(γu)
·
K∏
k=1
T∏
t=1
∏
t Γ(Ck,t + βt)
Γ(Ck +
∑
t βt)
Γ(
∑
t βt)
∏
t Γ(βt)
·
D∏
d=1
K∏
k=1
∏
k Γ(Cd,k + αk)
Γ(Md +
∑
k αk)
Γ(
∑
k αk)
∏
k Γ(αk)
·
T∏
t=1
V∏
v=1
∏
v Γ(C−dn
t,v + ιv)
Γ(C−dn
t +
∑
v ιv)
Γ(
∑
v ιv)
∏
v Γ(ιv)
·
D∏
d=1
T∏
t=1
(
Md,t
Md
)N−dn
d,t
(7)
And add the word token of the same word type with a different latent variable value.
p(wdn, adn = t|m, w−dn
, z, e, a−dn
, α, β, γ, ι)
=
p(wdn, adn = t, m, w−dn
, z, e, a−dn
|α, β, γ, ι)
p(m, w−dn, z, e, a−dn|α, β, γ, ι)
=
Γ(C−dn
t,wdn
+ 1 + ιwdn
)
Γ(C−dn
t + 1 +
∑
v ιv)
Γ(C−dn
t +
∑
v ιv)
Γ(C−dn
t,wdn
+ ιwdn
)
·
(
Md,t
Md
)N−dn
d,t +1(
Md
Md,t
)N−dn
d,t
=
C−dn
t,wdn
+ ιwdn
C−dn
t +
∑
v ιv
·
(
Md,t
Md
)
(8)
Therefore, adn can be updated based on the following probabilities:
p(adn = t|m, w, z, e, a−dn
, α, β, γ, ι)
p(wdn, adn = t|m, w−dn
, z, e, a−dn
, α, β, γ, ι)
∑T
t=1 p(wdn, adn = t|m, w−dn, z, e, a−dn, α, β, γ, ι)
∝
C−dn
t,wdn
+ ιwdn
C−dn
t +
∑
v ιv
·
(
Md,t
Md
)
(9)
3