SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Infinite Latent Process Decomposition
Tomonari Masada
Department of Computer and Information Sciences
Nagasaki University
1-14 Bunkyo-machi, Nagasaki-shi, Nagasaki, Japan
masada@cis.nagasaki-u.ac.jp
October 7, 2010
Abstract
This paper proposes a new Bayesian probabilistic model targeting microarray data. We extend latent
process decomposition (LPD) [3] so that we can assume that there are infinite latent processes. We call
the proposed model infinite latent process decomposition (iLPD). Further, we provide a collapsed variational
Bayesian (CVB) inference for iLPD. Our CVB improves the CVB proposed for LPD in [8] with respect to the
following two aspects. First, our CVB realizes a more efficient inference by treating full posterior distributions
over the hyperparameters of Dirichlet variables based on the discussions in [6]. Second, we correct the weakness
of CVB in [8], which makes the evaluation of the variational lower bound of the log evidence dependent on
the ordering of genes. This dependency is removed by applying the second order approximation proposed in
[6]. These two spects are independent of the assumption of infinite latent processes. Therefore, our CVB can
also be applied to the original LPD. The experiment comparing iLPD with LPD by using the proposed CVB
and also with LDA by using the CVB in [8] is under progress. This paper mainly includes the details of the
model construction of iLPD.
1 Introduction
This paper proposes an extension of latent process decomposition (LPD) [3]. In this new version of LPD, we
can assume that there are infinite latent processes. We denote the model as iLPD, which is an abbreviation
of infinite latent process decomposition. Further, we provide a set of update formulas of collapsed variational
Bayesian (CVB) inference for iLPD. This paper includes the full details of iLPD and its CVB inference.
The rest of the paper is organized as follows. Section 2 provides the previous works related to LPD and also
to the assumtion of infinite topics in the field of text mining. In Section 3, iLPD is described in its full details.
Section 4 provides all update formulas required for implementing CVB for iLPD. Section 5 shows how to obtain
the lower bound of the log evidence, which is required when we evaluate the efficiency of iLPD and compare iLPD
with LPD. Section 6 concludes the paper with planned future work.
2 Previous Works
By regarding samples as documents, genes as words, and latent processes as latent topics, we can grasp LPD
[3] as a “bioinformatics variant” of latent Dirichlet allocation (LDA) [1]. Therefore, we can say that our iLPD
extends LPD just as hierarchical Dirichlet process (HDP) [5] extends LDA by assuming that there are infinite
latent processes. Further, the CVB for LDA originally proposed in [7] is greatly improved by the CVB proposed
in [6], because we can treat full posterior distributions over the hyperparameters of Dirichlet variables based on
the discussions in [6]. The improved CVB can be applied to both LDA and HDP. In a similar manner, we provide
an improved CVB applicable to both LPD and iLPD. Our CVB inference is more efficient than the CVB for LPD
proposed in [8] with respect to the following two aspects:
1. We introduce auxiliary variables by following the approach proposed in [6]. This approach treats full
posterior distributions over the hyperparameters of Dirichlet variables and is independent of the assumption
of infinite latent processes. Therefore, our CVB is also applicable to LPD after a small modification.
1
2. We use a more natural approximation technique in computing the variational lower bound of the log evidence
than [8]. The approximation proposed in [8] is not technically natural, because the lower bound computation
depends on the ordering of genes. Therefore, we use the second order approximation technique proposed in
[6] and remove the dependence on the ordering of genes.
LPD has a completely different model construction with respect to the gene expression data when compared
with the model construction of LDA related to the word frequencies. The expression data are continuous data
and are modeled by Gaussian distributions, though the word frequencies are discrete data and are modeled by
multinomial distributions in LDA and HDP. Therefore, while our proposal is heavily based on the discussions in
[6], it is not a trivial task to obtain iLPD from LPD and further to obtain a CVB for iLPD from CVB for LPD.
3 Infinite Latent Process Decomposition (iLPD)
3.1 Generative description of iLPD
In this paper, we identify various types of entities appearing in our probabilistic model with their indices as below:
• {1, . . . , D}: the set of samples,
• {1, . . . , G}: the set of genes, and
• {1, . . . , K}: the set of latent processes.
We give a generative description of iLPD below. Note that, by regarding samples as documents, genes as words,
and latent processes as latent topics, we can grasp iLPD as a “bioinformatics variant” of HDP [5]. Therefore, the
description below can be understand in parallel with the description of HDP.
• For each sample d, the parameter θd = (θd1, . . . , θdK) of the multinomial distribution Multi(θd), defined
over latent processes {1, . . . , K}, is drawn from the Dirichlet process DP(α, π).
– We use the stick-breaking construction [4] for the center measure π of DP(α, π). We denote the param-
eter of the single parameter Beta distribution Beta(1, γ) appearing in the stick-breaking construction
as γ, which is in turn drawn from the Gamma distribution Gamma(aγ, bγ).
– The concentration parameter α of DP(α, π) is drawn from the Gamma distribution Gamma(aα, bα).
• For each pair of gene g and latent process k, a mean parameter µgk and a precision parameter λgk of the
Gaussian distribution Gauss(µgk, λgk) are drawm from the Gaussian prior distribution Gauss(µ0, ρ) and the
Gamma prior distribution Gamma(a0, b0), respectively.
– We assume that the precision parameter ρ of the Gamma prior Gauss(µ0, ρ) is in turn drawn from the
Gamma distribution Gamma(aρ, bρ).
• For each pair of sample d and gene g, a latent process is drawn from the multinomial distribution Multi(θd).
Let zdg be the latent variable whose value is this drawn process.
• Based on the process zdg drawn from Multi(θd) for the pair of sample d and gene g, a real number is drawn
from the Gaussian distribution Gauss(µgzdg
, λgzdg
). Let xdg be the observed variable whose value is this
drawn real number. xdg corresponds to the expression level in the microarray data.
By using the two Gamma distributions Gamma(aα, bα) and Gamma(aγ, bγ), we can treat full posterior distribu-
tions over the hyperparameters α and γ of the Dirichlet process DP(α, π). This is a remarkable achievement given
in [6]. Therefore, we apply this technique to our CVB inference. This technique can also be applied to the latent
topic Dirichlet prior of LDA and to the latent process Dirichlet prior of LPD. The details of this application
can be deduced from the discussions on the word Dirichlet prior Dirichlet(β, τ) in [6], because the number
of different words is assumed to be finite in [6] just like the number of latent topics (resp. latent processes) is
assumed to be finite in LDA (resp. LPD).
2
3.2 Joint distribution
Based on the generative description in Section 3.1, we can give the full joint distribution of iLPD as follows:
p(x, z, θ, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ)
=
∏
d
p(θd|α, π) · p(α|aα, bα) · p(˜π|γ) · p(γ|aγ, bγ) ·
∏
g,k
p(µgk|µ0, ρ) · p(ρ|aρ, bρ) ·
∏
g,k
p(λgk|a0, b0)
·
∏
d
∏
g
p(zdg|θd)p(xdg|µgzdg
, λgzdg
)
=
∏
d
Γ(α)
∏
k Γ(απk)
∏
k
θαπk−1
dk ·
abα
α
Γ(aα)
αaα−1
e−bαα
·
K∏
k=1
Γ(1 + γ)
Γ(1)Γ(γ)
˜π1−1
k (1 − ˜πk)γ−1
·
a
bγ
γ
Γ(aγ)
γaγ −1
e−bγ γ
·
∏
g
∏
k
√
ρ
2π
exp
{
−
ρ
2
(µgk − µ0)2
}
·
a
bρ
ρ
Γ(aρ)
ρaρ−1
e−bρρ
·
∏
g
∏
k
ba0
0
Γ(a0)
λa0−1
gk e−b0λgk
·
∏
d
nd!
∏
k ndk!
∏
k
θndk
dk ·
∏
d
∏
g
∏
k
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]ndgk
, (1)
where ndgk is equal to one when gene g in sample d is assigned to latent process k and is equal to zero otherwise.
Further, we define ndk ≡
∑
g ndgk.
In Eq. (1), p(˜π|γ) denotes the density function of the the single parameter Beta distribution Beta(1, γ)
appearing in the stick breaking construction for π. Between the values ˜πk drawn from Beta(1, γ) and the
parameters πk of the center measure of the Dirichlet process DP(α, π), the following equation holds:
πk = ˜πk
k−1∏
l=1
(1 − ˜πl) . (2)
3.3 Introducing augxiliary variables
By marginalizing out the latent process multinomial parameters θd for each sample d, we obtain the following:
p(x, z, µ, λ, α, ρ, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ) =
∫
p(x, z, θ, µ, λ, α, ρ, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ)dθ
=
∏
d
Γ(α)
Γ(nd + α)
∏
k
Γ(ndk + απk)
Γ(απk)
·
abα
α
Γ(aα)
αaα−1
e−bαα
·
K∏
k=1
Γ(1 + γ)
Γ(1)Γ(γ)
˜π1−1
k (1 − ˜πk)γ−1
·
a
bγ
γ
Γ(aγ)
γaγ −1
e−bγ γ
·
∏
g
∏
k
√
ρ
2π
exp
{
−
ρ
2
(µgk − µ0)2
}
·
a
bρ
ρ
Γ(aρ)
ρaρ−1
e−bρρ
·
∏
g
∏
k
ba0
0
Γ(a0)
λa0−1
gk e−b0λgk
·
∏
d
∏
g
∏
k
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]ndgk
. (3)
Now we introduce the auxiliary variables η and s to obtain efficient variational updates [6] as follows:
p(z, η, s|α, π) = p(η|α)p(s, z|α, π) =
∏
d
ηα−1
d (1 − ηd)nd−1
∏
k
[
ndk
sdk
]
(απk)sdk
Γ(nd)
. (4)
Then, the following equation holds by marginalizing out these auxiliary variables:
p(z|α, π) =
∫ ∑
s
p(z, η, s|α, π)dsdη =
∏
d
Γ(α)
Γ(nd + α)
∏
k
Γ(ndk + απk)
Γ(απk)
. (5)
3
After introducing the auxiliary variables, the distribution in Eq. (3) can be rewritten as follows:
p(x, z, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ)
= p(η|α)p(s, z|α, π)p(α|aα, bα)p(τ|aπ)p(˜π|γ)p(γ|aγ, bγ)p(x|z, µ, λ)p(λ|a0, b0)p(µ|µ0, ρ)p(ρ|aρ, bρ)p(α|aα, bα)
=
∏
d
ηα−1
d (1 − ηd)nd−1
∏
k
[
ndk
sdk
]
(απk)sdk
Γ(nd)
·
abα
α
Γ(aα)
αaα−1
e−bαα
·
K∏
k=1
γ(1 − ˜πk)γ−1
·
a
bγ
γ
Γ(aγ)
γaγ −1
e−bγ γ
·
∏
g
∏
k
√
ρ
2π
exp
{
−
ρ
2
(µgk − µ0)2
}
·
a
bρ
ρ
Γ(aρ)
ρaρ−1
e−bρρ
·
∏
g
∏
k
ba0
0
Γ(a0)
λa0−1
gk e−b0λgk
·
∏
d
∏
g
∏
k
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]ndgk
. (6)
3.4 A lower bound of the log evidence
The marginalized likelihood p(x) of the observed data x is often called evidence. By following a regular habit of
variational inferences, we introduce a variational posterior distribution q(z, η, s, µ, λ, ρ, α, π) and apply Jensen’s
inequality to obtain a lower bound of the log of the evidence as follows:
log p(x|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ)
= log
∫ ∑
z
∑
s
p(x, z, η, s, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ)dηdµdλdρdαdπdγ
= log
∫ ∑
z
∑
s
q(z, η, s, µ, λ, ρ, α, π, γ)
p(x, z, η, s, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ)
q(z, η, s, µ, λ, ρ, α, π)
dηdµdλdρdαdπdγ
≥
∫ ∑
z
∑
s
q(z, η, s, µ, λ, ρ, α, π, γ) log
p(x, z, η, s, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ)
q(z, η, s, µ, λ, ρ, α, π)
dηdµdλdρdαdπdγ
(7)
Let the right hand side, i.e., the lower bound of the log evidence, be referred to by L for the rest of the paper.
3.5 Posterior factorization assumption
We assume that q(z, η, s, µ, λ, ρ, α, π) can be factorized asq(η, s|z)q(z)q(µ)q(λ)q(ρ)q(α)q(π)q(γ). Then, L can be
written as follows:
L
=
∫ ∑
z
∑
s
q(η, s|z)q(z)q(µ)q(λ)q(ρ)q(α)q(π)q(γ)
log
p(η|α)p(s, z|α, τ)p(x|z, µ, λ)p(α|aα, bα)p(˜π|γ)(µ|µ0, ρ)p(λ|a0, b0)p(ρ|aρ, bρ)p(γ|aγ, bγ)
q(η, s|z)q(z)q(α)q(π)q(µ)q(λ)q(ρ)q(γ)
dηdαdπdµdλdρdγ
(8)
By taking a functional derivative of L in Eq. (8) with respect to q(η, s|z), it can be shown that L is maximized
when q(η, s|z) is equal to p(η, s|x, z, α, π, µ, λ). By replacing q(η, s|z) with p(η, s|x, z, α, π, µ, λ) in Eq. (8), we
obtain the following simplified form of L:
L =
∫ ∑
z
q(z)q(µ)q(λ)q(ρ)q(α)q(π)q(γ)
log
p(x, z|α, π, µ, λ)p(α|aα, bα)p(˜π|γ)p(µ|µ0, ρ)p(λ|a0, b0)p(ρ|aρ, bρ)p(γ|aγ, bγ)
q(z)q(µ)q(λ)q(ρ)q(α)q(π)q(γ)
dµdλdρdαdπdγ . (9)
The lower bound in Eq. (9) will be used to derive the update formula for q(z) in Section 4.5.
4
In Eq. (9), we set q(η, s|z) to be equal to p(η, s|x, z, α, τ, µ, λ) and maximize L. On the other hand, η and
s are decoupled in Eq. (4). Therefore, we further assume that q(η, s|z) are factorized as q(s|z)q(s|z). Then, by
rewriting L in Eq. (8), we obtain the following result:
L =
∫ ∑
z
q(η|z)q(z)q(α) log p(η|α)dηdα +
∫ ∑
z
∑
s
q(s|z)q(z)q(α)q(π) log p(s|z, α, π)dαdτ
+
∫
q(α) log p(α|aα, bα)dα +
∫
q(π) log p(˜π|γ)dπ +
∫
q(γ) log p(γ|aγ, bγ)dγ log p(x|z, µ, λ)dµdλ
+
∫ ∑
z
q(z)q(µ)q(λ) +
∫
q(µ)q(ρ) log p(µ|µ0, ρ)dµ +
∫
q(λ) log p(λ|a0, b0)dλ +
∫
q(ρ) log p(ρ|aρ, bρ)dρ
−
∫ ∑
z
q(η|z)q(z) log q(η|z)dη −
∑
z
∑
s
q(s|z)q(z) log q(s|z) −
∑
z
q(z) log q(z)
−
∫
q(α) log q(α)dα −
∫
q(π) log q(π)dπ −
∫
q(γ) log q(γ)dγ
−
∫
q(µ) log q(µ)dµ −
∫
q(λ) log q(λ)dλ −
∫
q(ρ) log q(ρ)dρ . (10)
The lower bound in Eq. (10) will be used to derive the update formulas in Section 4.
Finally, we assume that q(z) can be factorized as q(z) =
∏
d
∏
g q(zdg). Note that
∑K
k=1 q(zdg = k) = 1 is
satisfied for every pair of sample d and gene g.
4 Posterior Updates
In this section, by taking the functional derivative of the lower bound L with respect to each factor of the
variational posterior q(z, η, s, µ, λ, ρ, α, π) = q(η|z)q(s|z)q(z)q(µ)q(λ)q(α)q(π), we obtain the function form of
each factor.
4.1 Posteriors inheritable from CVB for HDP
For the variational posteriors q(α), q(π), q(γ), q(ηd|zd), and q(sdk|zdk), we can use the results of CVB for HDP
[6] as is. Therefore, we only show the resulting function forms below.
q(α) ∝ eα(−bα+
∑
d E[log ηd])
αaα+E[s··]−1
(11)
q(˜πk) ∝ ˜π
E[s·k]
k (1 − ˜πk)E[s·>k]+E[γ]−1
(12)
q(γ) ∝ e−γ(bγ −
∑
k E[log(1−˜πk)])
γaγ +K−1
(13)
q(ηd) ∝ η
E[α]−1
d (1 − ηd)nd−1
(14)
q(sdk|zdk) ∝
[
ndk
sdk
]
esdkE[log α]
esdkE[log πk]
, (15)
where we define s·k ≡
∑
d sdk, s·· ≡
∑
d
∑
k sdk, and s·>k ≡
∑
d
∑
l>k sdk. The formulas above include many
expectations E[·] taken with respect to the variational posteriors. Also for these expectations, we can use the
results presented in [6] as is. For completeness, we include the evaluation formulas of these expectations below.
E[log ηd] = Ψ(E[α]) − Ψ(nd + E[α]) (16)
E[sdk] ≈ G[α]G[πk]
{
1 −
∏
i
q(zid ̸= k)
}
·
{
Ψ(E+[ndk] + G[α]G[πk]) − Ψ(G[α]G[πk]) +
1
2
V+[ndk]Ψ′′
(E+[ndk] + G[α]G[πk])
}
(17)
E[log α] = Ψ(aα + E[s··]) − log
(
bα −
∑
d
E[log ηd]
)
(18)
E[log πk] = Ψ(E[s·k] + 1) +
k−1∑
l=1
Ψ(E[s·>l] + E[γ]) −
k∑
l=1
Ψ(E[s·≥l] + E[γ] + 1) (19)
E[log(1 − ˜πk)] = Ψ(E[s·>k] + E[γ]) − Ψ(E[s·≥k] + E[γ] + 1) , (20)
5
where
E+[ndk] =
E[ndk]
1 −
∏
g q(zdg ̸= k)
, (21)
V+[ndk] =
V[ndk]
1 −
∏
g q(zdg ̸= k)
− E+[ndk]2
∏
g
q(zdg ̸= k) . (22)
Our CVB inference uses the special mean E+[ndk] and the special variance V+[ndk] for ndk, which are proposed
in [6] for treating the case ndk = 0 exactly. This technique makes our CVB more efficient than the CVB in [8].
4.2 Mean posteriors q(µ)
By taking a functional derivative of L with respect to q(µ), we obtain
δL
δq(µ)
=
∫ ∑
z
q(z)q(λ) log p(x|z, µ, λ)dλ +
∫
q(ρ) log p(µ|µ0, ρ)dρ − log q(µ) + const. (23)
Therefore, q(µ) can be written as follows:
q(µ) ∝ exp
{ ∫
q(ρ) log p(µ|µ0, ρ)dρ
}
exp
{ ∫ ∑
z
q(z)q(λ) log p(x|z, µ, λ)dλ
}
. (24)
The integral appearing in the first exponential function can be evaluated as follows:
∫
q(ρ) log p(µ|µ0, ρ)dρ = −
1
2
∑
g,k
(µgk − µ0)2
∫
q(ρ)ρdρ + const. = −
E[ρ]
2
∑
g,k
(µgk − µ0)2
+ const. (25)
where we regard every term not related to µ as a constant. The integral inside the second exponential function
in Eq. (24) can be evaluated as follows:
∫ ∑
z
q(z)q(λ) log p(x|z, µ, λ)dλ =
∫ ∑
z
q(z)q(λ) log
[ ∏
g
∏
k
∏
d
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]ndgk
]
dλgk
=
∑
d,g
K∑
k=1
q(zdg = k)
∫
q(λgk) log
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]
dλgk
= −
∑
g,k
E[λgk]
∑
d
q(zdg = k)(xdg − µgk)2
2
+ const. = −
∑
g,k
E[λgk]
E[ngk]µ2
gk − 2µgkxgk
2
+ const. , (26)
where we regard every term not related to µ as a constant. In Eq. (26), we refer to
∑
d q(zdg = k) by E[ngk], i.e.,
the expected frequency of the assignment of gene g to latent process k. Further, we define xgk ≡
∑
d q(zdg = k)xdg.
By combining Eq. (25) and Eq. (26), we obtain
q(µgk) ∝ exp
{
−
E[ρ]
2
(µgk − µ0)2
}
exp
(
− E[λgk]
E[ngk]µ2
gk − 2µgkxgk
2
)
∝ exp
{
−
1
2
(
E[ρ]µ2
gk − 2E[ρ]µ0µgk + E[ngk]E[λgk]µ2
gk − 2E[λgk]xgkµgk
)}
∝ exp
{
−
E[ρ] + E[ngk]E[λgk]
2
(
µgk −
µ0E[ρ] + xgkE[λgk]
E[ρ] + E[ngk]E[λgk]
)2
}
. (27)
Eq. (27) tells that the mean parameter mgk and the precision parameter rgk of the variational Gaussian posterior
q(µgk) can be written as follows:
mgk =
µ0E[ρ] + xgkE[λgk]
rgk
, rgk = E[ρ] + E[ngk]E[λgk] . (28)
6
4.3 Precision posteriors q(λ)
By taking a functional derivative of L with respect to q(λ), we obtain
δL
δq(λ)
=
∫ ∑
z
q(z)q(µ) log p(x|z, µ, λ)dµ + log p(λ|a0, b0) − q(λ) + const. (29)
Therefore, q(λ) can be written as follows:
q(λ) ∝ p(λ|a0, b0) exp
{ ∫ ∑
z
q(z)q(µ) log p(x|z, µ, λ)dµ
}
. (30)
The integral inside the exponential function can be evaluated as follows:
∫ ∑
z
q(z)q(µ) log p(x|z, µ, λ)dµ =
∑
d,g
K∑
k=1
q(zdg = k)
∫
q(µgk) log
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]
dµgk
=
∑
g,k
log
√
λgk
2π
∑
d
q(zdg = k) −
∑
g,k
λgk
2
∑
d
q(zdg = k)
∫
q(µgk)(µ2
gk − 2xdgµgk + x2
dg)dµgk
=
∑
g,k
E[ngk] log
√
λgk
2π
−
∑
g,k
λgk
2
(
E[ngk]E[µ2
gk] − 2xgkE[µgk] + vgk
)
=
∑
g,k
E[ngk] log
√
λgk
2π
−
∑
g,k
λgk
{E[ngk]/rgk +
∑
d q(zdg = k)(xdg − mgk)2
2
}
(31)
where we define vgk ≡
∑
d q(zdg = k)x2
dg and replace the variance E[µ2
gk] − E[µgk]2
of µgk with the inversion of
the precision rgk, which is introduced in Eq. (28). Consequently, we can obtain q(λgk) as follows:
q(λgk) ∝ λa0−1
gk e−b0λgk
· exp
[
E[ngk] log
√
λgk
2π
− λgk
{E[ngk]/rgk +
∑
d q(zdg = k)(xdg − mgk)2
2
}]
∝ λ
E[ngk]/2+a0−1
gk exp
[
− λgk
{E[ngk]/rgk +
∑
d q(zdg = k)(xdg − mgk)2
2
+ b0
}]
. (32)
Eq. (32) tells that the shape parameter agk and the rate parameter bgk of the variational Gamma posterior q(λgk)
can be written as follows:
agk =
E[ngk]
2
+ a0 , bgk =
E[ngk]/rgk +
∑
d q(zdg = k)(xdg − mgk)2
2
+ b0 . (33)
By using Eq. (33), the expectation E[λgk] appearing in Eq. (28) can be evaluated as agk/bgk.
4.4 Precision hyperparameter posterior q(ρ)
We take a functional derivative of L with respect to q(ρ) as follows:
δL
δq(λ)
=
∫
q(µ) log p(µ|µ0, ρ)dµ + log p(ρ|aρ, bρ) − log q(ρ) + const. (34)
Then, q(ρ) can be written as
q(ρ) ∝ p(ρ|aρ, bρ) · exp
{ ∫
q(µ) log p(µ|µ0, ρ)dµ
}
. (35)
7
The integral inside the exponential function can be evaluated as follows:
∫
q(µ) log p(µ|µ0, ρ)dµ =
∑
g,k
∫
q(µgk) log
[√
ρ
2π
exp
{
−
ρ
2
(µgk − µ0)2
}]
dµgk
=
GK
2
log
( ρ
2π
)
−
ρ
2
∑
g,k
∫
q(µgk)(µgk − µ0)2
dµgk =
GK
2
log
( ρ
2π
)
−
ρ
2
∑
g,k
(
E[µ2
gk] − 2µ0E[µgk] + µ2
0
)
=
GK
2
log
( ρ
2π
)
−
ρ
2
∑
g,k
{(
E[µ2
gk] − E[µgk]2
)
+ E[µgk]2
− 2µ0E[µgk] + µ2
0
}
=
GK
2
log
( ρ
2π
)
−
ρ
2
∑
g,k
{ 1
rgk
+
(
E[µgk] − µ0
)2
}
(36)
Therefore, q(ρ) is obtained as
q(ρ) ∝ ρaρ−1
e−bρρ
· exp
[
GK
2
log
( ρ
2π
)
−
ρ
2
∑
g,k
{ 1
rgk
+
(
E[µgk] − µ0
)2
}]
∝ ρGK/2+aρ−1
· exp
[
− ρ
{
bρ +
∑
g,k
r−1
gk +
(
mgk − µ0
)2
2
}]
. (37)
Eq. (37) tells that the shape parameter a and the rate parameter b of the variational Gamma posterior q(ρ) can
be written as follows:
a = aρ +
GK
2
, b = bρ +
∑
g,k
r−1
gk +
(
mgk − µ0
)2
2
. (38)
4.5 Latent process assignment posteriors q(z)
Recall that q(z) is factorized as
∏
d
∏
g q(zdg). For each q(zdg), we take a functional derivative of L in Eq. (9) as
follows:
δL
δq(z′
dg)
=
∫ ∑
z¬dg
q(z¬dg
)q(α)q(π)q(µ)q(λ) log p(x, z¬dg
, z′
dg|α, π, µ, λ)dαdτdµdλ − log q(z′
dg) + const. (39)
Therefore, we obtain a function form of q(zdg = k) as follows:
q(zdg = k) ∝ exp
{ ∫ ∑
z¬dg
q(z¬dg
)q(α)q(π)q(µ)q(λ) log p(x, z¬dg
, zdg = k|α, π, µ, λ)dαdπdµdλ
}
. (40)
The integral inside the exponential function can be evaluated as below. First, we rewrite p(x, z|α, π, µ, λ) as:
p(x, z|α, π, µ, λ) = p(z|α, π)p(x|z, µ, λ)
=
∏
d
Γ(α)
Γ(nd + α)
∏
k
Γ(ndk + απk)
Γ(απk)
·
∏
d
∏
g
∏
k
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]ndgk
. (41)
By removing gene g from sample d, we obtain the following distribution:
p(x¬dg
, z¬dg
|α, π, µ, λ) = p(z¬dg
|α, π)p(x¬dg
|z¬dg
, µ, λ)
=
∏
d
Γ(α)
Γ(n¬dg
d + α)
∏
k
Γ(n¬dg
dk + απk)
Γ(απk)
·
∏
d
∏
g′̸=g
∏
k
[√
λg′k
2π
exp
{
−
λg′k
2
(xdg′ − µg′k)2
}]ndg′k
. (42)
We divide p(x, z|α, π, µ, λ) by p(x¬dg
, z¬dg
|α, π, µ, λ) and obtain the following result:
p(xdg, zdg = k|x¬dg
, z¬dg
, α, π, µ, λ) =
p(x, z¬dg
, zdg = k|α, π, µ, λ)
p(x¬dg, z¬dg|α, π, µ, λ)
=
n¬dg
dk + απk
n¬dg
d + α
·
√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}
. (43)
8
Therefore, the integral inside the exponential function in Eq. (40) can be evaluated as follows:
∫ ∑
z¬dg
q(z¬dg
)q(α)q(π)q(µ)q(λ) log p(x, z¬dg
, zdg = k|α, π, µ, λ)dαdπdµdλ
=
∫ ∑
z¬dg
q(z¬dg
)q(α)q(π)q(µ)q(λ) log
{
p(xdg, zdg = k|x¬dg
, z¬dg
, α, π, µ, λ)p(x¬dg
, z¬dg
|α, π, µ, λ)
}
dαdπdµdλ
=
∫ ∑
z¬dg
q(z¬dg
)q(α)q(π) log
n¬dg
dk + απk
n¬dg
d + α
dαdπ
+
∫
q(µgk)q(λgk) log
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]
dµgkdλgk + const. (44)
The first term in Eq. (44) can be approximated as is discussed in [6]:
∫ ∑
z¬dg
q(z¬dg
)q(α)q(π) log
n¬dg
dk + απk
n¬dg
d + α
dαdπ ≈ log(G[απk] + E[n¬dg
dk ]) −
V[n¬dg
dk ]
2(G[απk] + E[n¬dg
dk ])2
, (45)
where G[·] means the geometric expectation G[·] ≡ eE[log ·]
. The second term in Eq. (44) can be evaluated as
follows:
∫
q(µgk)q(λgk) log
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]
dµgkdλgk
=
1
2
E[log λgk] −
1
2
x2
dgE[λgk] + 2xdgE[λgk]E[µgk] −
1
2
E[λgk]E[µgk]2
−
1
2
E[λgk](E[µ2
gk] − E[µgk]2
) + const.
=
1
2
E[log λgk] −
agk
bgk
·
(xdg − mgk)2
+ r−1
gk
2
+ const. (46)
Therefore, we have obtained q(zdg = k) as below:
q(zdg = k) ∝≈ (G[απk] + E[n¬dg
dk ]) exp
{
−
V[n¬dg
dk ]
2(G[απk] + E[n¬dg
dk ])2
}
·
√
G[λgk] exp
{
−
agk
bgk
·
(xdg − mgk)2
+ r−1
gk
2
}
. (47)
5 Lower Bound
When we implement the inference for Bayesian probabilistic models, we often monitor the progress of the inference
by evaluating the lower bound of the log evidence per several iterations. The lower bound is expected to be
increased as the inference proceeds. Therefore, we can use the lower bound evaluation for checking the correctness
of the implementation. Further, we can also use the lower bound achieved at the final iteration of the inference
to compare e.g. the convergence efficiency of different inference approaches over the same training data.
In this section, we try to rewrite L only by using the parameters and their expectations so as to evaluate L
based on the results given in the preceding sections. First, we rewrite L in Eq. (9) as a sum of various terms
depending on different sets of parameters.
L =
∫ ∑
z
q(z)q(α)q(π) log p(z|α, π)dαdπ +
∫ ∑
z
q(z)q(µ)q(λ) log p(x|z, µ, λ)dµdλ
+
∫
q(α) log p(α|aα, bα)dα +
∫
q(π) log p(˜π|γ)dπ +
∫
q(γ) log p(γ|aγ, bγ)dγ
+
∫
q(µ)q(ρ) log p(µ|µ0, ρ)dµdρ +
∫
q(λ) log p(λ|a0, b0)dλ +
∫
q(ρ) log p(ρ|aρ, bρ)dρ
−
∑
z
q(z) log q(z) −
∫
q(α) log q(α)dα −
∫
q(π) log q(π)dπ −
∫
q(γ) log q(γ)dγ
−
∫
q(µ) log q(µ)dµ −
∫
q(λ) log q(λ)dλ −
∫
q(ρ) log q(ρ)dρ . (48)
9
From now on, we explain how to evaluate the terms in the right hand side of Eq.(48) one by one.
1. The first term in Eq. (48) is related to the posterior distribution of latent process assignments. Here we use
the approximation technique proposed in [6] as is and obtain the following result:
∫ ∑
z
q(z)q(α)q(π) log p(z|α, π)dαdπ
= D
∫
q(α) log Γ(α)dα −
∑
d
∫
q(α) log Γ(α + nd)dα
+
∑
d
∑
k
∑
z
q(z)
∫
q(α)q(π) log Γ(απk + ndk)dπdα − D
∑
k
∫
q(α)q(π) log Γ(απk)dπdα
≈ D log Γ(E[α]) −
∑
d
log Γ(E[α] + nd)
+
∑
d
∑
k
{
1 −
∏
g
q(zdg ̸= k)
}{
log Γ(G[απk] + E+[ndk]) +
V+[ndk]Ψ′
(G[απk] + E+[ndk])
2
}
. (49)
The CVB for LPD [8] adopts an approximation method where we do not need to evaluate the trigamma
function Ψ′
(·), which appears in Eq. (49). However, this approximation has a serious drawback. The eval-
uation of this term, i.e.,
∫ ∑
z q(z)q(α)q(π) log p(z|α, π)dαdπ, depends on the ordering of genes {1, . . . , G}.
Therefore, we use an approximation proposed in [6] and remove this dependence on the ordering of genes.
Further, we can treat the case ndk· = 0 exactly. The approximation method used in Eq. (49) is independent
of the assumption of infinite latent processes.
2. We focus on the second term related to the posterior of the observed data and rewrite it as follows:
∫ ∑
z
q(z)q(λ)q(µ) log p(x|z, µ, λ)dλdµ
=
∑
d,g
K∑
k=1
q(zdg = k)
∫
q(λgk)q(µgk) log
[√
λgk
2π
exp
{
−
λgk
2
(xdg − µgk)2
}]
dλgkdµgk
=
∑
g,k
E[ngk]
(
E[log λgk] − log 2π
)
2
−
∑
g,k
E[λgk]
{E[ngk]/rgk +
∑
d q(zdg = k)(xdg − mgk)2
2
}
. (50)
3. The four terms
∫
q(π) log p(˜π|γ)dπ,
∫
q(γ) log p(γ|aγ, bγ)dγ, −
∫
q(π) log q(π)dπ, and −
∫
q(γ) log q(γ)dγ in
the right hand side of Eq. (48) can be combined as
∫
q(γ)q(π) log
p(π,γ|aγ ,bγ )
q(π)q(γ) dπdγ. This is the negative of
the Kullback-Leibler divergence of q(γ)q(π) from p(π, γ|aγ, bγ) and can be evaluated as follows:
∫
q(γ)q(π) log
p(π, γ|aγ, bγ)
q(π)q(γ)
dπdγ
= aγ log bγ − log Γ(aγ) − (aγ + K) log
(
bγ −
∑
k
E[log(1 − ˜πk)]
)
+ log Γ(aγ + K)
−
∑
k
log Γ(E[s·≥k] + E[γ] + 1) +
∑
k
log Γ(E[s·k] + 1) +
∑
k
log Γ(E[s·>k] + E[γ])
−
∑
k
E[s·k]E[log ˜πk] −
∑
k
(E[s·>k] + E[γ])E[log(1 − ˜πk)] , (51)
where E[log ˜πk] can be evaluated as Ψ(E[s·k] + 1) − Ψ(E[s·≥k] + E[γ] + 1) based on Eq. (12).
4. By combining the terms related to the concentration parameter α, we obtain the negative of the Kullback-
10
Leibler divergence of q(α) from p(α|aα, bα) as follows:
∫
q(α)
log p(α|aα, bα)
q(α)
dα
=
{
aα log bα − log Γ(aα) + (aα − 1)Ψ(aα + E[s··]) − (aα − 1) log
(
bα −
∑
d
E[log ηd]
)
− bαE[α]
}
−
{
log
(
bα −
∑
d
E[log ηd]
)
− log Γ(aα + E[s··]) + (aα + E[s··] − 1)Ψ(aα + E[s··]) − (aα + E[s··])
}
.
= aα log
bα
bα −
∑
d E[log ηd]
+ log
Γ(aα + E[s··])
Γ(aα)
− E[s··]Ψ(aα + E[s··]) − E[α]
∑
d
E[log ηd] . (52)
5. We can rewrite the term
∫
q(µ)q(ρ) log p(µ|µ0, ρ)dµ as follows:
∫
q(µ)q(ρ) log p(µ|µ0, ρ)dµgkdρ =
∑
g,k
∫
q(ρ) log
√
ρ
2π
dρ −
∑
g,k
∫
q(µgk)q(ρgk)
ρ
2
(µgk − µ0)2
dµgkdρ
=
GK
2
{
E[log ρ] − log(2π)
}
−
E[ρ]
2
∑
g,k
{ 1
rgk
+
(
E[µgk] − µ0
)2
}
. (53)
6. The terms
∫
q(λ) log p(λ|a0, b0)dλ and
∫
q(ρ) log p(ρ|aρ, bρ)dρ can be evaluated as follows:
∫
q(λ) log p(λ|a0, b0)dλ =
∑
g,k
∫
q(λgk) log
ba0
0
Γ(a0)
λa0−1
gk e−b0λgk
dλgk
= GKa0 log b0 − GK log Γ(a0) + (a0 − 1)
∑
g,k
E[log λgk] − b0
∑
g,k
E[λgk] , (54)
∫
q(ρ) log p(ρ|aρ, bρ)dρ = aρ log bρ − log Γ(aρ) + (aρ − 1)E[log ρ] − bρE[ρ] . (55)
7. The term
∫
q(µ) log q(µ)dµ is evaluated by using the parameters mgk and rgk obtained in Eq. (28) as follows:
∫
q(µ) log q(µ)dµ =
∑
g,k
∫
q(µgk) log q(µgk)dµgk =
∑
g,k
∫
q(µgk) log
√
rgk
2π
exp
{
−
rgk
2
(µgk − mgk)2
}
dµgk
=
∑
g,k
log
√
rgk
2π
−
∑
g,k
rgk
2
∫
q(µgk)(µgk − mgk)2
dµgk =
∑
g,k
log rgk
2
−
GK log(2π)
2
−
GK
2
(56)
8. The terms
∫
q(λ) log q(λ)dλ and
∫
q(ρ) log q(ρ)dρ are evaluated based on Eq. (33) and Eq. (37), respectively:
∫
q(λ) log q(λ)dλ =
∑
g,k
agk log bgk −
∑
g,k
log Γ(agk) +
∑
g,k
(agk − 1)E[log λgk] −
∑
g,k
bgkE[λgk]
=
∑
g,k
log bgk −
∑
g,k
log Γ(agk) +
∑
g,k
(agk − 1)Ψ(agk) −
∑
g,k
agk (57)
∫
q(ρ) log q(ρ)dρ = log b − log Γ(a) + (a − 1)Ψ(a) − a . (58)
9. Finally, the term
∑
z q(z) log q(z) =
∑
d,g,k q(zdg = k) log q(zdg = k) can be evaluated by using the proba-
bilities obtained in Eq. (47).
6 Conclusion
We have implemented the proposed CVB based on the mathematical descriptions given in this paper. To achieve
the efficiency in computational cost, we have parallelized the inference with OpenMP library, because we have
already confirmed the efficiency of OpenMP parallelization in text mining using LDA-like topic models [2].
Further, the experiment comparing iLPD with LPD is now being conducted on the microarray data available at
http://www.gems-system.org/.
11
References
[1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research,
3:993–1022, 2003.
[2] T. Masada, D. Fukagawa, A. Takasu, Y. Shibata, and K. Oguri. Modeling topical trends over continuous
time with priors. in Proceedings of ISNN’10, pages 302–311, 2010.
[3] S. Rogers, M. Girolami, C. Campbell, and R. Breitling. The latent process decomposition of cDNA microarray
data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(2):143–156, 2005.
[4] J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639–650, 1994.
[5] Y.-W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the
American Statistical Association, 101(476):1566–1581, 2006.
[6] Y.-W. Teh, K. Kurihara, and M. Welling. Collapsed variational inference for HDP. in Advances in Neural
Information Processing Systems 20, pages 1481–1488, 2008.
[7] Y.-W. Teh, D. Newman, and M. Welling. A collapsed variational Bayesian inference algorithm for latent
Dirichlet allocation. in Advances in Neural Information Processing Systems 19, pages 1353–1360, 2007.
[8] Y.-M. Ying, P. Li, and C. Campbell. A marginalized variational Bayesian approach to the analysis of array
data. BMC Proceedings, 2(Suppl 4):S7, 2008.
12

Weitere ähnliche Inhalte

Was ist angesagt?

Perl optimization tidbits
Perl optimization tidbitsPerl optimization tidbits
Perl optimization tidbitsDaina Pettit
 
Xbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawatXbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawatNirmal Ghorawat
 
A Lossless FBAR Compressor
A Lossless FBAR CompressorA Lossless FBAR Compressor
A Lossless FBAR CompressorPhilip Alipour
 
Topic model, LDA and all that
Topic model, LDA and all thatTopic model, LDA and all that
Topic model, LDA and all thatZhibo Xiao
 
Estimation of bitlength of transformed quantized residue
Estimation of bitlength of transformed quantized residueEstimation of bitlength of transformed quantized residue
Estimation of bitlength of transformed quantized residueIAEME Publication
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
Conventional Implicature via Dependent Type Semantics
Conventional Implicature via Dependent Type SemanticsConventional Implicature via Dependent Type Semantics
Conventional Implicature via Dependent Type SemanticsDaisuke BEKKI
 
Formalizing Peer-to-Peer Systems based on Content Addressable Network
Formalizing Peer-to-Peer Systems based on Content Addressable NetworkFormalizing Peer-to-Peer Systems based on Content Addressable Network
Formalizing Peer-to-Peer Systems based on Content Addressable NetworkFaculty of Computer Science
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"nozyh
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScaleGoDataDriven
 
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015RIILP
 
Dna data compression algorithms based on redundancy
Dna data compression algorithms based on redundancyDna data compression algorithms based on redundancy
Dna data compression algorithms based on redundancyijfcstjournal
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterAdila Krisnadhi
 
Pert 05 aplikasi clustering
Pert 05 aplikasi clusteringPert 05 aplikasi clustering
Pert 05 aplikasi clusteringaiiniR
 

Was ist angesagt? (20)

Topic Models
Topic ModelsTopic Models
Topic Models
 
Perl optimization tidbits
Perl optimization tidbitsPerl optimization tidbits
Perl optimization tidbits
 
Xbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawatXbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawat
 
A Lossless FBAR Compressor
A Lossless FBAR CompressorA Lossless FBAR Compressor
A Lossless FBAR Compressor
 
Author Topic Model
Author Topic ModelAuthor Topic Model
Author Topic Model
 
Topic model, LDA and all that
Topic model, LDA and all thatTopic model, LDA and all that
Topic model, LDA and all that
 
Estimation of bitlength of transformed quantized residue
Estimation of bitlength of transformed quantized residueEstimation of bitlength of transformed quantized residue
Estimation of bitlength of transformed quantized residue
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
확률모델
확률모델확률모델
확률모델
 
Conventional Implicature via Dependent Type Semantics
Conventional Implicature via Dependent Type SemanticsConventional Implicature via Dependent Type Semantics
Conventional Implicature via Dependent Type Semantics
 
Formalizing Peer-to-Peer Systems based on Content Addressable Network
Formalizing Peer-to-Peer Systems based on Content Addressable NetworkFormalizing Peer-to-Peer Systems based on Content Addressable Network
Formalizing Peer-to-Peer Systems based on Content Addressable Network
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
 
Gate-Cs 1997
Gate-Cs 1997Gate-Cs 1997
Gate-Cs 1997
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015
ESR11 Hoang Cuong - EXPERT Summer School - Malaga 2015
 
Dna data compression algorithms based on redundancy
Dna data compression algorithms based on redundancyDna data compression algorithms based on redundancy
Dna data compression algorithms based on redundancy
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 Poster
 
Pert 05 aplikasi clustering
Pert 05 aplikasi clusteringPert 05 aplikasi clustering
Pert 05 aplikasi clustering
 

Andere mochten auch

「ソーシャルメディアから生徒と教職員を守る」セミナー 
「ソーシャルメディアから生徒と教職員を守る」セミナー 「ソーシャルメディアから生徒と教職員を守る」セミナー 
「ソーシャルメディアから生徒と教職員を守る」セミナー Taro Maezawa
 
Debian パッケージングチュートリアル
Debian パッケージングチュートリアルDebian パッケージングチュートリアル
Debian パッケージングチュートリアルNozomu KURASAWA
 
情シス・コミュニティ・スタートアップ
情シス・コミュニティ・スタートアップ情シス・コミュニティ・スタートアップ
情シス・コミュニティ・スタートアップTomoya Kawanishi
 
秘密基地式都市ブランディングモデル_20150212
秘密基地式都市ブランディングモデル_20150212秘密基地式都市ブランディングモデル_20150212
秘密基地式都市ブランディングモデル_20150212Takeshi Shibuya
 
患者安全のためのデブリーフィング支援者養成
患者安全のためのデブリーフィング支援者養成患者安全のためのデブリーフィング支援者養成
患者安全のためのデブリーフィング支援者養成Takahiro Matsumoto
 
チーム医療 と信念対立(改訂版)
チーム医療 と信念対立(改訂版)チーム医療 と信念対立(改訂版)
チーム医療 と信念対立(改訂版)Hirohisa Shimizu
 
バズフィードはなにがすごいのか? 海外における新興・大手メディアの現状比較
バズフィードはなにがすごいのか? 海外における新興・大手メディアの現状比較バズフィードはなにがすごいのか? 海外における新興・大手メディアの現状比較
バズフィードはなにがすごいのか? 海外における新興・大手メディアの現状比較Sato Keiichi
 
魔法使いと黒猫のウィズ ガチャ問題
魔法使いと黒猫のウィズ ガチャ問題魔法使いと黒猫のウィズ ガチャ問題
魔法使いと黒猫のウィズ ガチャ問題allyouneediskill
 
【勉強会資料】ソーシャルメディア採用の設計と運用(基礎編)
【勉強会資料】ソーシャルメディア採用の設計と運用(基礎編)【勉強会資料】ソーシャルメディア採用の設計と運用(基礎編)
【勉強会資料】ソーシャルメディア採用の設計と運用(基礎編)ITmedia_HR(人事・採用)
 
Visual storytelling – Tom Phillips, BuzzFeed
Visual storytelling – Tom Phillips, BuzzFeedVisual storytelling – Tom Phillips, BuzzFeed
Visual storytelling – Tom Phillips, BuzzFeedJournalism.co.uk
 
カンバンゲーム
カンバンゲームカンバンゲーム
カンバンゲームYasui Tsutomu
 
今日からできるカラーデザインチェック
今日からできるカラーデザインチェック今日からできるカラーデザインチェック
今日からできるカラーデザインチェックKunio Sakamoto
 
低コスト経営を実現するためのOSS ERP iDempiereの活用法
低コスト経営を実現するためのOSS ERP iDempiereの活用法低コスト経営を実現するためのOSS ERP iDempiereの活用法
低コスト経営を実現するためのOSS ERP iDempiereの活用法Hideaki Hagiwara
 
Cesiumを動かしてみよう
Cesiumを動かしてみようCesiumを動かしてみよう
Cesiumを動かしてみようKazutaka ishizaki
 
MMORPGで考えるレベルデザイン
MMORPGで考えるレベルデザインMMORPGで考えるレベルデザイン
MMORPGで考えるレベルデザインKatsumi Mizushima
 

Andere mochten auch (16)

「ソーシャルメディアから生徒と教職員を守る」セミナー 
「ソーシャルメディアから生徒と教職員を守る」セミナー 「ソーシャルメディアから生徒と教職員を守る」セミナー 
「ソーシャルメディアから生徒と教職員を守る」セミナー 
 
Debian パッケージングチュートリアル
Debian パッケージングチュートリアルDebian パッケージングチュートリアル
Debian パッケージングチュートリアル
 
情シス・コミュニティ・スタートアップ
情シス・コミュニティ・スタートアップ情シス・コミュニティ・スタートアップ
情シス・コミュニティ・スタートアップ
 
秘密基地式都市ブランディングモデル_20150212
秘密基地式都市ブランディングモデル_20150212秘密基地式都市ブランディングモデル_20150212
秘密基地式都市ブランディングモデル_20150212
 
患者安全のためのデブリーフィング支援者養成
患者安全のためのデブリーフィング支援者養成患者安全のためのデブリーフィング支援者養成
患者安全のためのデブリーフィング支援者養成
 
チーム医療 と信念対立(改訂版)
チーム医療 と信念対立(改訂版)チーム医療 と信念対立(改訂版)
チーム医療 と信念対立(改訂版)
 
バズフィードはなにがすごいのか? 海外における新興・大手メディアの現状比較
バズフィードはなにがすごいのか? 海外における新興・大手メディアの現状比較バズフィードはなにがすごいのか? 海外における新興・大手メディアの現状比較
バズフィードはなにがすごいのか? 海外における新興・大手メディアの現状比較
 
魔法使いと黒猫のウィズ ガチャ問題
魔法使いと黒猫のウィズ ガチャ問題魔法使いと黒猫のウィズ ガチャ問題
魔法使いと黒猫のウィズ ガチャ問題
 
【勉強会資料】ソーシャルメディア採用の設計と運用(基礎編)
【勉強会資料】ソーシャルメディア採用の設計と運用(基礎編)【勉強会資料】ソーシャルメディア採用の設計と運用(基礎編)
【勉強会資料】ソーシャルメディア採用の設計と運用(基礎編)
 
Visual storytelling – Tom Phillips, BuzzFeed
Visual storytelling – Tom Phillips, BuzzFeedVisual storytelling – Tom Phillips, BuzzFeed
Visual storytelling – Tom Phillips, BuzzFeed
 
カンバンゲーム
カンバンゲームカンバンゲーム
カンバンゲーム
 
JPiere概要
JPiere概要JPiere概要
JPiere概要
 
今日からできるカラーデザインチェック
今日からできるカラーデザインチェック今日からできるカラーデザインチェック
今日からできるカラーデザインチェック
 
低コスト経営を実現するためのOSS ERP iDempiereの活用法
低コスト経営を実現するためのOSS ERP iDempiereの活用法低コスト経営を実現するためのOSS ERP iDempiereの活用法
低コスト経営を実現するためのOSS ERP iDempiereの活用法
 
Cesiumを動かしてみよう
Cesiumを動かしてみようCesiumを動かしてみよう
Cesiumを動かしてみよう
 
MMORPGで考えるレベルデザイン
MMORPGで考えるレベルデザインMMORPGで考えるレベルデザイン
MMORPGで考えるレベルデザイン
 

Ähnlich wie Supplementary material for my following paper: Infinite Latent Process Decomposition

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Cemal Ardil
 
Local Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial OrdersLocal Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial OrdersTELKOMNIKA JOURNAL
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodIJERA Editor
 
A Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares ProblemsA Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares ProblemsDawn Cook
 
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...ijdpsjournal
 
Improved noisy gradient descent bit-flipping algorithm over Rayleigh fading ...
Improved noisy gradient descent bit-flipping algorithm over  Rayleigh fading ...Improved noisy gradient descent bit-flipping algorithm over  Rayleigh fading ...
Improved noisy gradient descent bit-flipping algorithm over Rayleigh fading ...IJECEIAES
 
A new efficient fpga design of residue to-binary converter
A new efficient fpga design of residue to-binary converterA new efficient fpga design of residue to-binary converter
A new efficient fpga design of residue to-binary converterVLSICS Design
 
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Andrea Dal Pozzolo
 
MIXED 0−1 GOAL PROGRAMMING APPROACH TO INTERVAL-VALUED BILEVEL PROGRAMMING PR...
MIXED 0−1 GOAL PROGRAMMING APPROACH TO INTERVAL-VALUED BILEVEL PROGRAMMING PR...MIXED 0−1 GOAL PROGRAMMING APPROACH TO INTERVAL-VALUED BILEVEL PROGRAMMING PR...
MIXED 0−1 GOAL PROGRAMMING APPROACH TO INTERVAL-VALUED BILEVEL PROGRAMMING PR...cscpconf
 
Implementation Of Parameterised Specifications
Implementation Of Parameterised SpecificationsImplementation Of Parameterised Specifications
Implementation Of Parameterised SpecificationsKimberly Williams
 
Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...eSAT Publishing House
 
Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...eSAT Journals
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada
 

Ähnlich wie Supplementary material for my following paper: Infinite Latent Process Decomposition (20)

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
 
Local Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial OrdersLocal Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial Orders
 
Colombo14a
Colombo14aColombo14a
Colombo14a
 
ssc_icml13
ssc_icml13ssc_icml13
ssc_icml13
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution Method
 
A Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares ProblemsA Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares Problems
 
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
 
Improved noisy gradient descent bit-flipping algorithm over Rayleigh fading ...
Improved noisy gradient descent bit-flipping algorithm over  Rayleigh fading ...Improved noisy gradient descent bit-flipping algorithm over  Rayleigh fading ...
Improved noisy gradient descent bit-flipping algorithm over Rayleigh fading ...
 
A new efficient fpga design of residue to-binary converter
A new efficient fpga design of residue to-binary converterA new efficient fpga design of residue to-binary converter
A new efficient fpga design of residue to-binary converter
 
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
 
Group Project
Group ProjectGroup Project
Group Project
 
AROPUB-IJPGE-14-30
AROPUB-IJPGE-14-30AROPUB-IJPGE-14-30
AROPUB-IJPGE-14-30
 
MIXED 0−1 GOAL PROGRAMMING APPROACH TO INTERVAL-VALUED BILEVEL PROGRAMMING PR...
MIXED 0−1 GOAL PROGRAMMING APPROACH TO INTERVAL-VALUED BILEVEL PROGRAMMING PR...MIXED 0−1 GOAL PROGRAMMING APPROACH TO INTERVAL-VALUED BILEVEL PROGRAMMING PR...
MIXED 0−1 GOAL PROGRAMMING APPROACH TO INTERVAL-VALUED BILEVEL PROGRAMMING PR...
 
Implementation Of Parameterised Specifications
Implementation Of Parameterised SpecificationsImplementation Of Parameterised Specifications
Implementation Of Parameterised Specifications
 
Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...
 
Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
Xia Xiao-research paper
Xia Xiao-research paperXia Xiao-research paper
Xia Xiao-research paper
 

Mehr von Tomonari Masada

Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Tomonari Masada
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Tomonari Masada
 
A note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxA note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxTomonari Masada
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用Tomonari Masada
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationTomonari Masada
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingTomonari Masada
 
A note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianA note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianTomonari Masada
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsTomonari Masada
 
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionLDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionTomonari Masada
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationTomonari Masada
 
Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Tomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelTomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Tomonari Masada
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...Tomonari Masada
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMTomonari Masada
 

Mehr von Tomonari Masada (20)

Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説
 
A note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxA note on the density of Gumbel-softmax
A note on the density of Gumbel-softmax
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocation
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
 
A note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianA note on variational inference for the univariate Gaussian
A note on variational inference for the univariate Gaussian
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior Distributions
 
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionLDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
 
A Note on ZINB-VAE
A Note on ZINB-VAEA Note on ZINB-VAE
A Note on ZINB-VAE
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM Allocation
 
A Note on TopicRNN
A Note on TopicRNNA Note on TopicRNN
A Note on TopicRNN
 
Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)
 
Poisson factorization
Poisson factorizationPoisson factorization
Poisson factorization
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28
 
FDSE2015
FDSE2015FDSE2015
FDSE2015
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LM
 

Kürzlich hochgeladen

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Kürzlich hochgeladen (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Supplementary material for my following paper: Infinite Latent Process Decomposition

  • 1. Infinite Latent Process Decomposition Tomonari Masada Department of Computer and Information Sciences Nagasaki University 1-14 Bunkyo-machi, Nagasaki-shi, Nagasaki, Japan masada@cis.nagasaki-u.ac.jp October 7, 2010 Abstract This paper proposes a new Bayesian probabilistic model targeting microarray data. We extend latent process decomposition (LPD) [3] so that we can assume that there are infinite latent processes. We call the proposed model infinite latent process decomposition (iLPD). Further, we provide a collapsed variational Bayesian (CVB) inference for iLPD. Our CVB improves the CVB proposed for LPD in [8] with respect to the following two aspects. First, our CVB realizes a more efficient inference by treating full posterior distributions over the hyperparameters of Dirichlet variables based on the discussions in [6]. Second, we correct the weakness of CVB in [8], which makes the evaluation of the variational lower bound of the log evidence dependent on the ordering of genes. This dependency is removed by applying the second order approximation proposed in [6]. These two spects are independent of the assumption of infinite latent processes. Therefore, our CVB can also be applied to the original LPD. The experiment comparing iLPD with LPD by using the proposed CVB and also with LDA by using the CVB in [8] is under progress. This paper mainly includes the details of the model construction of iLPD. 1 Introduction This paper proposes an extension of latent process decomposition (LPD) [3]. In this new version of LPD, we can assume that there are infinite latent processes. We denote the model as iLPD, which is an abbreviation of infinite latent process decomposition. Further, we provide a set of update formulas of collapsed variational Bayesian (CVB) inference for iLPD. This paper includes the full details of iLPD and its CVB inference. The rest of the paper is organized as follows. Section 2 provides the previous works related to LPD and also to the assumtion of infinite topics in the field of text mining. In Section 3, iLPD is described in its full details. Section 4 provides all update formulas required for implementing CVB for iLPD. Section 5 shows how to obtain the lower bound of the log evidence, which is required when we evaluate the efficiency of iLPD and compare iLPD with LPD. Section 6 concludes the paper with planned future work. 2 Previous Works By regarding samples as documents, genes as words, and latent processes as latent topics, we can grasp LPD [3] as a “bioinformatics variant” of latent Dirichlet allocation (LDA) [1]. Therefore, we can say that our iLPD extends LPD just as hierarchical Dirichlet process (HDP) [5] extends LDA by assuming that there are infinite latent processes. Further, the CVB for LDA originally proposed in [7] is greatly improved by the CVB proposed in [6], because we can treat full posterior distributions over the hyperparameters of Dirichlet variables based on the discussions in [6]. The improved CVB can be applied to both LDA and HDP. In a similar manner, we provide an improved CVB applicable to both LPD and iLPD. Our CVB inference is more efficient than the CVB for LPD proposed in [8] with respect to the following two aspects: 1. We introduce auxiliary variables by following the approach proposed in [6]. This approach treats full posterior distributions over the hyperparameters of Dirichlet variables and is independent of the assumption of infinite latent processes. Therefore, our CVB is also applicable to LPD after a small modification. 1
  • 2. 2. We use a more natural approximation technique in computing the variational lower bound of the log evidence than [8]. The approximation proposed in [8] is not technically natural, because the lower bound computation depends on the ordering of genes. Therefore, we use the second order approximation technique proposed in [6] and remove the dependence on the ordering of genes. LPD has a completely different model construction with respect to the gene expression data when compared with the model construction of LDA related to the word frequencies. The expression data are continuous data and are modeled by Gaussian distributions, though the word frequencies are discrete data and are modeled by multinomial distributions in LDA and HDP. Therefore, while our proposal is heavily based on the discussions in [6], it is not a trivial task to obtain iLPD from LPD and further to obtain a CVB for iLPD from CVB for LPD. 3 Infinite Latent Process Decomposition (iLPD) 3.1 Generative description of iLPD In this paper, we identify various types of entities appearing in our probabilistic model with their indices as below: • {1, . . . , D}: the set of samples, • {1, . . . , G}: the set of genes, and • {1, . . . , K}: the set of latent processes. We give a generative description of iLPD below. Note that, by regarding samples as documents, genes as words, and latent processes as latent topics, we can grasp iLPD as a “bioinformatics variant” of HDP [5]. Therefore, the description below can be understand in parallel with the description of HDP. • For each sample d, the parameter θd = (θd1, . . . , θdK) of the multinomial distribution Multi(θd), defined over latent processes {1, . . . , K}, is drawn from the Dirichlet process DP(α, π). – We use the stick-breaking construction [4] for the center measure π of DP(α, π). We denote the param- eter of the single parameter Beta distribution Beta(1, γ) appearing in the stick-breaking construction as γ, which is in turn drawn from the Gamma distribution Gamma(aγ, bγ). – The concentration parameter α of DP(α, π) is drawn from the Gamma distribution Gamma(aα, bα). • For each pair of gene g and latent process k, a mean parameter µgk and a precision parameter λgk of the Gaussian distribution Gauss(µgk, λgk) are drawm from the Gaussian prior distribution Gauss(µ0, ρ) and the Gamma prior distribution Gamma(a0, b0), respectively. – We assume that the precision parameter ρ of the Gamma prior Gauss(µ0, ρ) is in turn drawn from the Gamma distribution Gamma(aρ, bρ). • For each pair of sample d and gene g, a latent process is drawn from the multinomial distribution Multi(θd). Let zdg be the latent variable whose value is this drawn process. • Based on the process zdg drawn from Multi(θd) for the pair of sample d and gene g, a real number is drawn from the Gaussian distribution Gauss(µgzdg , λgzdg ). Let xdg be the observed variable whose value is this drawn real number. xdg corresponds to the expression level in the microarray data. By using the two Gamma distributions Gamma(aα, bα) and Gamma(aγ, bγ), we can treat full posterior distribu- tions over the hyperparameters α and γ of the Dirichlet process DP(α, π). This is a remarkable achievement given in [6]. Therefore, we apply this technique to our CVB inference. This technique can also be applied to the latent topic Dirichlet prior of LDA and to the latent process Dirichlet prior of LPD. The details of this application can be deduced from the discussions on the word Dirichlet prior Dirichlet(β, τ) in [6], because the number of different words is assumed to be finite in [6] just like the number of latent topics (resp. latent processes) is assumed to be finite in LDA (resp. LPD). 2
  • 3. 3.2 Joint distribution Based on the generative description in Section 3.1, we can give the full joint distribution of iLPD as follows: p(x, z, θ, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ) = ∏ d p(θd|α, π) · p(α|aα, bα) · p(˜π|γ) · p(γ|aγ, bγ) · ∏ g,k p(µgk|µ0, ρ) · p(ρ|aρ, bρ) · ∏ g,k p(λgk|a0, b0) · ∏ d ∏ g p(zdg|θd)p(xdg|µgzdg , λgzdg ) = ∏ d Γ(α) ∏ k Γ(απk) ∏ k θαπk−1 dk · abα α Γ(aα) αaα−1 e−bαα · K∏ k=1 Γ(1 + γ) Γ(1)Γ(γ) ˜π1−1 k (1 − ˜πk)γ−1 · a bγ γ Γ(aγ) γaγ −1 e−bγ γ · ∏ g ∏ k √ ρ 2π exp { − ρ 2 (µgk − µ0)2 } · a bρ ρ Γ(aρ) ρaρ−1 e−bρρ · ∏ g ∏ k ba0 0 Γ(a0) λa0−1 gk e−b0λgk · ∏ d nd! ∏ k ndk! ∏ k θndk dk · ∏ d ∏ g ∏ k [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }]ndgk , (1) where ndgk is equal to one when gene g in sample d is assigned to latent process k and is equal to zero otherwise. Further, we define ndk ≡ ∑ g ndgk. In Eq. (1), p(˜π|γ) denotes the density function of the the single parameter Beta distribution Beta(1, γ) appearing in the stick breaking construction for π. Between the values ˜πk drawn from Beta(1, γ) and the parameters πk of the center measure of the Dirichlet process DP(α, π), the following equation holds: πk = ˜πk k−1∏ l=1 (1 − ˜πl) . (2) 3.3 Introducing augxiliary variables By marginalizing out the latent process multinomial parameters θd for each sample d, we obtain the following: p(x, z, µ, λ, α, ρ, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ) = ∫ p(x, z, θ, µ, λ, α, ρ, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ)dθ = ∏ d Γ(α) Γ(nd + α) ∏ k Γ(ndk + απk) Γ(απk) · abα α Γ(aα) αaα−1 e−bαα · K∏ k=1 Γ(1 + γ) Γ(1)Γ(γ) ˜π1−1 k (1 − ˜πk)γ−1 · a bγ γ Γ(aγ) γaγ −1 e−bγ γ · ∏ g ∏ k √ ρ 2π exp { − ρ 2 (µgk − µ0)2 } · a bρ ρ Γ(aρ) ρaρ−1 e−bρρ · ∏ g ∏ k ba0 0 Γ(a0) λa0−1 gk e−b0λgk · ∏ d ∏ g ∏ k [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }]ndgk . (3) Now we introduce the auxiliary variables η and s to obtain efficient variational updates [6] as follows: p(z, η, s|α, π) = p(η|α)p(s, z|α, π) = ∏ d ηα−1 d (1 − ηd)nd−1 ∏ k [ ndk sdk ] (απk)sdk Γ(nd) . (4) Then, the following equation holds by marginalizing out these auxiliary variables: p(z|α, π) = ∫ ∑ s p(z, η, s|α, π)dsdη = ∏ d Γ(α) Γ(nd + α) ∏ k Γ(ndk + απk) Γ(απk) . (5) 3
  • 4. After introducing the auxiliary variables, the distribution in Eq. (3) can be rewritten as follows: p(x, z, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ) = p(η|α)p(s, z|α, π)p(α|aα, bα)p(τ|aπ)p(˜π|γ)p(γ|aγ, bγ)p(x|z, µ, λ)p(λ|a0, b0)p(µ|µ0, ρ)p(ρ|aρ, bρ)p(α|aα, bα) = ∏ d ηα−1 d (1 − ηd)nd−1 ∏ k [ ndk sdk ] (απk)sdk Γ(nd) · abα α Γ(aα) αaα−1 e−bαα · K∏ k=1 γ(1 − ˜πk)γ−1 · a bγ γ Γ(aγ) γaγ −1 e−bγ γ · ∏ g ∏ k √ ρ 2π exp { − ρ 2 (µgk − µ0)2 } · a bρ ρ Γ(aρ) ρaρ−1 e−bρρ · ∏ g ∏ k ba0 0 Γ(a0) λa0−1 gk e−b0λgk · ∏ d ∏ g ∏ k [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }]ndgk . (6) 3.4 A lower bound of the log evidence The marginalized likelihood p(x) of the observed data x is often called evidence. By following a regular habit of variational inferences, we introduce a variational posterior distribution q(z, η, s, µ, λ, ρ, α, π) and apply Jensen’s inequality to obtain a lower bound of the log of the evidence as follows: log p(x|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ) = log ∫ ∑ z ∑ s p(x, z, η, s, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ)dηdµdλdρdαdπdγ = log ∫ ∑ z ∑ s q(z, η, s, µ, λ, ρ, α, π, γ) p(x, z, η, s, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ) q(z, η, s, µ, λ, ρ, α, π) dηdµdλdρdαdπdγ ≥ ∫ ∑ z ∑ s q(z, η, s, µ, λ, ρ, α, π, γ) log p(x, z, η, s, µ, λ, ρ, α, π, γ|µ0, a0, b0, aρ, bρ, aα, bα, aγ, bγ) q(z, η, s, µ, λ, ρ, α, π) dηdµdλdρdαdπdγ (7) Let the right hand side, i.e., the lower bound of the log evidence, be referred to by L for the rest of the paper. 3.5 Posterior factorization assumption We assume that q(z, η, s, µ, λ, ρ, α, π) can be factorized asq(η, s|z)q(z)q(µ)q(λ)q(ρ)q(α)q(π)q(γ). Then, L can be written as follows: L = ∫ ∑ z ∑ s q(η, s|z)q(z)q(µ)q(λ)q(ρ)q(α)q(π)q(γ) log p(η|α)p(s, z|α, τ)p(x|z, µ, λ)p(α|aα, bα)p(˜π|γ)(µ|µ0, ρ)p(λ|a0, b0)p(ρ|aρ, bρ)p(γ|aγ, bγ) q(η, s|z)q(z)q(α)q(π)q(µ)q(λ)q(ρ)q(γ) dηdαdπdµdλdρdγ (8) By taking a functional derivative of L in Eq. (8) with respect to q(η, s|z), it can be shown that L is maximized when q(η, s|z) is equal to p(η, s|x, z, α, π, µ, λ). By replacing q(η, s|z) with p(η, s|x, z, α, π, µ, λ) in Eq. (8), we obtain the following simplified form of L: L = ∫ ∑ z q(z)q(µ)q(λ)q(ρ)q(α)q(π)q(γ) log p(x, z|α, π, µ, λ)p(α|aα, bα)p(˜π|γ)p(µ|µ0, ρ)p(λ|a0, b0)p(ρ|aρ, bρ)p(γ|aγ, bγ) q(z)q(µ)q(λ)q(ρ)q(α)q(π)q(γ) dµdλdρdαdπdγ . (9) The lower bound in Eq. (9) will be used to derive the update formula for q(z) in Section 4.5. 4
  • 5. In Eq. (9), we set q(η, s|z) to be equal to p(η, s|x, z, α, τ, µ, λ) and maximize L. On the other hand, η and s are decoupled in Eq. (4). Therefore, we further assume that q(η, s|z) are factorized as q(s|z)q(s|z). Then, by rewriting L in Eq. (8), we obtain the following result: L = ∫ ∑ z q(η|z)q(z)q(α) log p(η|α)dηdα + ∫ ∑ z ∑ s q(s|z)q(z)q(α)q(π) log p(s|z, α, π)dαdτ + ∫ q(α) log p(α|aα, bα)dα + ∫ q(π) log p(˜π|γ)dπ + ∫ q(γ) log p(γ|aγ, bγ)dγ log p(x|z, µ, λ)dµdλ + ∫ ∑ z q(z)q(µ)q(λ) + ∫ q(µ)q(ρ) log p(µ|µ0, ρ)dµ + ∫ q(λ) log p(λ|a0, b0)dλ + ∫ q(ρ) log p(ρ|aρ, bρ)dρ − ∫ ∑ z q(η|z)q(z) log q(η|z)dη − ∑ z ∑ s q(s|z)q(z) log q(s|z) − ∑ z q(z) log q(z) − ∫ q(α) log q(α)dα − ∫ q(π) log q(π)dπ − ∫ q(γ) log q(γ)dγ − ∫ q(µ) log q(µ)dµ − ∫ q(λ) log q(λ)dλ − ∫ q(ρ) log q(ρ)dρ . (10) The lower bound in Eq. (10) will be used to derive the update formulas in Section 4. Finally, we assume that q(z) can be factorized as q(z) = ∏ d ∏ g q(zdg). Note that ∑K k=1 q(zdg = k) = 1 is satisfied for every pair of sample d and gene g. 4 Posterior Updates In this section, by taking the functional derivative of the lower bound L with respect to each factor of the variational posterior q(z, η, s, µ, λ, ρ, α, π) = q(η|z)q(s|z)q(z)q(µ)q(λ)q(α)q(π), we obtain the function form of each factor. 4.1 Posteriors inheritable from CVB for HDP For the variational posteriors q(α), q(π), q(γ), q(ηd|zd), and q(sdk|zdk), we can use the results of CVB for HDP [6] as is. Therefore, we only show the resulting function forms below. q(α) ∝ eα(−bα+ ∑ d E[log ηd]) αaα+E[s··]−1 (11) q(˜πk) ∝ ˜π E[s·k] k (1 − ˜πk)E[s·>k]+E[γ]−1 (12) q(γ) ∝ e−γ(bγ − ∑ k E[log(1−˜πk)]) γaγ +K−1 (13) q(ηd) ∝ η E[α]−1 d (1 − ηd)nd−1 (14) q(sdk|zdk) ∝ [ ndk sdk ] esdkE[log α] esdkE[log πk] , (15) where we define s·k ≡ ∑ d sdk, s·· ≡ ∑ d ∑ k sdk, and s·>k ≡ ∑ d ∑ l>k sdk. The formulas above include many expectations E[·] taken with respect to the variational posteriors. Also for these expectations, we can use the results presented in [6] as is. For completeness, we include the evaluation formulas of these expectations below. E[log ηd] = Ψ(E[α]) − Ψ(nd + E[α]) (16) E[sdk] ≈ G[α]G[πk] { 1 − ∏ i q(zid ̸= k) } · { Ψ(E+[ndk] + G[α]G[πk]) − Ψ(G[α]G[πk]) + 1 2 V+[ndk]Ψ′′ (E+[ndk] + G[α]G[πk]) } (17) E[log α] = Ψ(aα + E[s··]) − log ( bα − ∑ d E[log ηd] ) (18) E[log πk] = Ψ(E[s·k] + 1) + k−1∑ l=1 Ψ(E[s·>l] + E[γ]) − k∑ l=1 Ψ(E[s·≥l] + E[γ] + 1) (19) E[log(1 − ˜πk)] = Ψ(E[s·>k] + E[γ]) − Ψ(E[s·≥k] + E[γ] + 1) , (20) 5
  • 6. where E+[ndk] = E[ndk] 1 − ∏ g q(zdg ̸= k) , (21) V+[ndk] = V[ndk] 1 − ∏ g q(zdg ̸= k) − E+[ndk]2 ∏ g q(zdg ̸= k) . (22) Our CVB inference uses the special mean E+[ndk] and the special variance V+[ndk] for ndk, which are proposed in [6] for treating the case ndk = 0 exactly. This technique makes our CVB more efficient than the CVB in [8]. 4.2 Mean posteriors q(µ) By taking a functional derivative of L with respect to q(µ), we obtain δL δq(µ) = ∫ ∑ z q(z)q(λ) log p(x|z, µ, λ)dλ + ∫ q(ρ) log p(µ|µ0, ρ)dρ − log q(µ) + const. (23) Therefore, q(µ) can be written as follows: q(µ) ∝ exp { ∫ q(ρ) log p(µ|µ0, ρ)dρ } exp { ∫ ∑ z q(z)q(λ) log p(x|z, µ, λ)dλ } . (24) The integral appearing in the first exponential function can be evaluated as follows: ∫ q(ρ) log p(µ|µ0, ρ)dρ = − 1 2 ∑ g,k (µgk − µ0)2 ∫ q(ρ)ρdρ + const. = − E[ρ] 2 ∑ g,k (µgk − µ0)2 + const. (25) where we regard every term not related to µ as a constant. The integral inside the second exponential function in Eq. (24) can be evaluated as follows: ∫ ∑ z q(z)q(λ) log p(x|z, µ, λ)dλ = ∫ ∑ z q(z)q(λ) log [ ∏ g ∏ k ∏ d [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }]ndgk ] dλgk = ∑ d,g K∑ k=1 q(zdg = k) ∫ q(λgk) log [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }] dλgk = − ∑ g,k E[λgk] ∑ d q(zdg = k)(xdg − µgk)2 2 + const. = − ∑ g,k E[λgk] E[ngk]µ2 gk − 2µgkxgk 2 + const. , (26) where we regard every term not related to µ as a constant. In Eq. (26), we refer to ∑ d q(zdg = k) by E[ngk], i.e., the expected frequency of the assignment of gene g to latent process k. Further, we define xgk ≡ ∑ d q(zdg = k)xdg. By combining Eq. (25) and Eq. (26), we obtain q(µgk) ∝ exp { − E[ρ] 2 (µgk − µ0)2 } exp ( − E[λgk] E[ngk]µ2 gk − 2µgkxgk 2 ) ∝ exp { − 1 2 ( E[ρ]µ2 gk − 2E[ρ]µ0µgk + E[ngk]E[λgk]µ2 gk − 2E[λgk]xgkµgk )} ∝ exp { − E[ρ] + E[ngk]E[λgk] 2 ( µgk − µ0E[ρ] + xgkE[λgk] E[ρ] + E[ngk]E[λgk] )2 } . (27) Eq. (27) tells that the mean parameter mgk and the precision parameter rgk of the variational Gaussian posterior q(µgk) can be written as follows: mgk = µ0E[ρ] + xgkE[λgk] rgk , rgk = E[ρ] + E[ngk]E[λgk] . (28) 6
  • 7. 4.3 Precision posteriors q(λ) By taking a functional derivative of L with respect to q(λ), we obtain δL δq(λ) = ∫ ∑ z q(z)q(µ) log p(x|z, µ, λ)dµ + log p(λ|a0, b0) − q(λ) + const. (29) Therefore, q(λ) can be written as follows: q(λ) ∝ p(λ|a0, b0) exp { ∫ ∑ z q(z)q(µ) log p(x|z, µ, λ)dµ } . (30) The integral inside the exponential function can be evaluated as follows: ∫ ∑ z q(z)q(µ) log p(x|z, µ, λ)dµ = ∑ d,g K∑ k=1 q(zdg = k) ∫ q(µgk) log [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }] dµgk = ∑ g,k log √ λgk 2π ∑ d q(zdg = k) − ∑ g,k λgk 2 ∑ d q(zdg = k) ∫ q(µgk)(µ2 gk − 2xdgµgk + x2 dg)dµgk = ∑ g,k E[ngk] log √ λgk 2π − ∑ g,k λgk 2 ( E[ngk]E[µ2 gk] − 2xgkE[µgk] + vgk ) = ∑ g,k E[ngk] log √ λgk 2π − ∑ g,k λgk {E[ngk]/rgk + ∑ d q(zdg = k)(xdg − mgk)2 2 } (31) where we define vgk ≡ ∑ d q(zdg = k)x2 dg and replace the variance E[µ2 gk] − E[µgk]2 of µgk with the inversion of the precision rgk, which is introduced in Eq. (28). Consequently, we can obtain q(λgk) as follows: q(λgk) ∝ λa0−1 gk e−b0λgk · exp [ E[ngk] log √ λgk 2π − λgk {E[ngk]/rgk + ∑ d q(zdg = k)(xdg − mgk)2 2 }] ∝ λ E[ngk]/2+a0−1 gk exp [ − λgk {E[ngk]/rgk + ∑ d q(zdg = k)(xdg − mgk)2 2 + b0 }] . (32) Eq. (32) tells that the shape parameter agk and the rate parameter bgk of the variational Gamma posterior q(λgk) can be written as follows: agk = E[ngk] 2 + a0 , bgk = E[ngk]/rgk + ∑ d q(zdg = k)(xdg − mgk)2 2 + b0 . (33) By using Eq. (33), the expectation E[λgk] appearing in Eq. (28) can be evaluated as agk/bgk. 4.4 Precision hyperparameter posterior q(ρ) We take a functional derivative of L with respect to q(ρ) as follows: δL δq(λ) = ∫ q(µ) log p(µ|µ0, ρ)dµ + log p(ρ|aρ, bρ) − log q(ρ) + const. (34) Then, q(ρ) can be written as q(ρ) ∝ p(ρ|aρ, bρ) · exp { ∫ q(µ) log p(µ|µ0, ρ)dµ } . (35) 7
  • 8. The integral inside the exponential function can be evaluated as follows: ∫ q(µ) log p(µ|µ0, ρ)dµ = ∑ g,k ∫ q(µgk) log [√ ρ 2π exp { − ρ 2 (µgk − µ0)2 }] dµgk = GK 2 log ( ρ 2π ) − ρ 2 ∑ g,k ∫ q(µgk)(µgk − µ0)2 dµgk = GK 2 log ( ρ 2π ) − ρ 2 ∑ g,k ( E[µ2 gk] − 2µ0E[µgk] + µ2 0 ) = GK 2 log ( ρ 2π ) − ρ 2 ∑ g,k {( E[µ2 gk] − E[µgk]2 ) + E[µgk]2 − 2µ0E[µgk] + µ2 0 } = GK 2 log ( ρ 2π ) − ρ 2 ∑ g,k { 1 rgk + ( E[µgk] − µ0 )2 } (36) Therefore, q(ρ) is obtained as q(ρ) ∝ ρaρ−1 e−bρρ · exp [ GK 2 log ( ρ 2π ) − ρ 2 ∑ g,k { 1 rgk + ( E[µgk] − µ0 )2 }] ∝ ρGK/2+aρ−1 · exp [ − ρ { bρ + ∑ g,k r−1 gk + ( mgk − µ0 )2 2 }] . (37) Eq. (37) tells that the shape parameter a and the rate parameter b of the variational Gamma posterior q(ρ) can be written as follows: a = aρ + GK 2 , b = bρ + ∑ g,k r−1 gk + ( mgk − µ0 )2 2 . (38) 4.5 Latent process assignment posteriors q(z) Recall that q(z) is factorized as ∏ d ∏ g q(zdg). For each q(zdg), we take a functional derivative of L in Eq. (9) as follows: δL δq(z′ dg) = ∫ ∑ z¬dg q(z¬dg )q(α)q(π)q(µ)q(λ) log p(x, z¬dg , z′ dg|α, π, µ, λ)dαdτdµdλ − log q(z′ dg) + const. (39) Therefore, we obtain a function form of q(zdg = k) as follows: q(zdg = k) ∝ exp { ∫ ∑ z¬dg q(z¬dg )q(α)q(π)q(µ)q(λ) log p(x, z¬dg , zdg = k|α, π, µ, λ)dαdπdµdλ } . (40) The integral inside the exponential function can be evaluated as below. First, we rewrite p(x, z|α, π, µ, λ) as: p(x, z|α, π, µ, λ) = p(z|α, π)p(x|z, µ, λ) = ∏ d Γ(α) Γ(nd + α) ∏ k Γ(ndk + απk) Γ(απk) · ∏ d ∏ g ∏ k [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }]ndgk . (41) By removing gene g from sample d, we obtain the following distribution: p(x¬dg , z¬dg |α, π, µ, λ) = p(z¬dg |α, π)p(x¬dg |z¬dg , µ, λ) = ∏ d Γ(α) Γ(n¬dg d + α) ∏ k Γ(n¬dg dk + απk) Γ(απk) · ∏ d ∏ g′̸=g ∏ k [√ λg′k 2π exp { − λg′k 2 (xdg′ − µg′k)2 }]ndg′k . (42) We divide p(x, z|α, π, µ, λ) by p(x¬dg , z¬dg |α, π, µ, λ) and obtain the following result: p(xdg, zdg = k|x¬dg , z¬dg , α, π, µ, λ) = p(x, z¬dg , zdg = k|α, π, µ, λ) p(x¬dg, z¬dg|α, π, µ, λ) = n¬dg dk + απk n¬dg d + α · √ λgk 2π exp { − λgk 2 (xdg − µgk)2 } . (43) 8
  • 9. Therefore, the integral inside the exponential function in Eq. (40) can be evaluated as follows: ∫ ∑ z¬dg q(z¬dg )q(α)q(π)q(µ)q(λ) log p(x, z¬dg , zdg = k|α, π, µ, λ)dαdπdµdλ = ∫ ∑ z¬dg q(z¬dg )q(α)q(π)q(µ)q(λ) log { p(xdg, zdg = k|x¬dg , z¬dg , α, π, µ, λ)p(x¬dg , z¬dg |α, π, µ, λ) } dαdπdµdλ = ∫ ∑ z¬dg q(z¬dg )q(α)q(π) log n¬dg dk + απk n¬dg d + α dαdπ + ∫ q(µgk)q(λgk) log [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }] dµgkdλgk + const. (44) The first term in Eq. (44) can be approximated as is discussed in [6]: ∫ ∑ z¬dg q(z¬dg )q(α)q(π) log n¬dg dk + απk n¬dg d + α dαdπ ≈ log(G[απk] + E[n¬dg dk ]) − V[n¬dg dk ] 2(G[απk] + E[n¬dg dk ])2 , (45) where G[·] means the geometric expectation G[·] ≡ eE[log ·] . The second term in Eq. (44) can be evaluated as follows: ∫ q(µgk)q(λgk) log [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }] dµgkdλgk = 1 2 E[log λgk] − 1 2 x2 dgE[λgk] + 2xdgE[λgk]E[µgk] − 1 2 E[λgk]E[µgk]2 − 1 2 E[λgk](E[µ2 gk] − E[µgk]2 ) + const. = 1 2 E[log λgk] − agk bgk · (xdg − mgk)2 + r−1 gk 2 + const. (46) Therefore, we have obtained q(zdg = k) as below: q(zdg = k) ∝≈ (G[απk] + E[n¬dg dk ]) exp { − V[n¬dg dk ] 2(G[απk] + E[n¬dg dk ])2 } · √ G[λgk] exp { − agk bgk · (xdg − mgk)2 + r−1 gk 2 } . (47) 5 Lower Bound When we implement the inference for Bayesian probabilistic models, we often monitor the progress of the inference by evaluating the lower bound of the log evidence per several iterations. The lower bound is expected to be increased as the inference proceeds. Therefore, we can use the lower bound evaluation for checking the correctness of the implementation. Further, we can also use the lower bound achieved at the final iteration of the inference to compare e.g. the convergence efficiency of different inference approaches over the same training data. In this section, we try to rewrite L only by using the parameters and their expectations so as to evaluate L based on the results given in the preceding sections. First, we rewrite L in Eq. (9) as a sum of various terms depending on different sets of parameters. L = ∫ ∑ z q(z)q(α)q(π) log p(z|α, π)dαdπ + ∫ ∑ z q(z)q(µ)q(λ) log p(x|z, µ, λ)dµdλ + ∫ q(α) log p(α|aα, bα)dα + ∫ q(π) log p(˜π|γ)dπ + ∫ q(γ) log p(γ|aγ, bγ)dγ + ∫ q(µ)q(ρ) log p(µ|µ0, ρ)dµdρ + ∫ q(λ) log p(λ|a0, b0)dλ + ∫ q(ρ) log p(ρ|aρ, bρ)dρ − ∑ z q(z) log q(z) − ∫ q(α) log q(α)dα − ∫ q(π) log q(π)dπ − ∫ q(γ) log q(γ)dγ − ∫ q(µ) log q(µ)dµ − ∫ q(λ) log q(λ)dλ − ∫ q(ρ) log q(ρ)dρ . (48) 9
  • 10. From now on, we explain how to evaluate the terms in the right hand side of Eq.(48) one by one. 1. The first term in Eq. (48) is related to the posterior distribution of latent process assignments. Here we use the approximation technique proposed in [6] as is and obtain the following result: ∫ ∑ z q(z)q(α)q(π) log p(z|α, π)dαdπ = D ∫ q(α) log Γ(α)dα − ∑ d ∫ q(α) log Γ(α + nd)dα + ∑ d ∑ k ∑ z q(z) ∫ q(α)q(π) log Γ(απk + ndk)dπdα − D ∑ k ∫ q(α)q(π) log Γ(απk)dπdα ≈ D log Γ(E[α]) − ∑ d log Γ(E[α] + nd) + ∑ d ∑ k { 1 − ∏ g q(zdg ̸= k) }{ log Γ(G[απk] + E+[ndk]) + V+[ndk]Ψ′ (G[απk] + E+[ndk]) 2 } . (49) The CVB for LPD [8] adopts an approximation method where we do not need to evaluate the trigamma function Ψ′ (·), which appears in Eq. (49). However, this approximation has a serious drawback. The eval- uation of this term, i.e., ∫ ∑ z q(z)q(α)q(π) log p(z|α, π)dαdπ, depends on the ordering of genes {1, . . . , G}. Therefore, we use an approximation proposed in [6] and remove this dependence on the ordering of genes. Further, we can treat the case ndk· = 0 exactly. The approximation method used in Eq. (49) is independent of the assumption of infinite latent processes. 2. We focus on the second term related to the posterior of the observed data and rewrite it as follows: ∫ ∑ z q(z)q(λ)q(µ) log p(x|z, µ, λ)dλdµ = ∑ d,g K∑ k=1 q(zdg = k) ∫ q(λgk)q(µgk) log [√ λgk 2π exp { − λgk 2 (xdg − µgk)2 }] dλgkdµgk = ∑ g,k E[ngk] ( E[log λgk] − log 2π ) 2 − ∑ g,k E[λgk] {E[ngk]/rgk + ∑ d q(zdg = k)(xdg − mgk)2 2 } . (50) 3. The four terms ∫ q(π) log p(˜π|γ)dπ, ∫ q(γ) log p(γ|aγ, bγ)dγ, − ∫ q(π) log q(π)dπ, and − ∫ q(γ) log q(γ)dγ in the right hand side of Eq. (48) can be combined as ∫ q(γ)q(π) log p(π,γ|aγ ,bγ ) q(π)q(γ) dπdγ. This is the negative of the Kullback-Leibler divergence of q(γ)q(π) from p(π, γ|aγ, bγ) and can be evaluated as follows: ∫ q(γ)q(π) log p(π, γ|aγ, bγ) q(π)q(γ) dπdγ = aγ log bγ − log Γ(aγ) − (aγ + K) log ( bγ − ∑ k E[log(1 − ˜πk)] ) + log Γ(aγ + K) − ∑ k log Γ(E[s·≥k] + E[γ] + 1) + ∑ k log Γ(E[s·k] + 1) + ∑ k log Γ(E[s·>k] + E[γ]) − ∑ k E[s·k]E[log ˜πk] − ∑ k (E[s·>k] + E[γ])E[log(1 − ˜πk)] , (51) where E[log ˜πk] can be evaluated as Ψ(E[s·k] + 1) − Ψ(E[s·≥k] + E[γ] + 1) based on Eq. (12). 4. By combining the terms related to the concentration parameter α, we obtain the negative of the Kullback- 10
  • 11. Leibler divergence of q(α) from p(α|aα, bα) as follows: ∫ q(α) log p(α|aα, bα) q(α) dα = { aα log bα − log Γ(aα) + (aα − 1)Ψ(aα + E[s··]) − (aα − 1) log ( bα − ∑ d E[log ηd] ) − bαE[α] } − { log ( bα − ∑ d E[log ηd] ) − log Γ(aα + E[s··]) + (aα + E[s··] − 1)Ψ(aα + E[s··]) − (aα + E[s··]) } . = aα log bα bα − ∑ d E[log ηd] + log Γ(aα + E[s··]) Γ(aα) − E[s··]Ψ(aα + E[s··]) − E[α] ∑ d E[log ηd] . (52) 5. We can rewrite the term ∫ q(µ)q(ρ) log p(µ|µ0, ρ)dµ as follows: ∫ q(µ)q(ρ) log p(µ|µ0, ρ)dµgkdρ = ∑ g,k ∫ q(ρ) log √ ρ 2π dρ − ∑ g,k ∫ q(µgk)q(ρgk) ρ 2 (µgk − µ0)2 dµgkdρ = GK 2 { E[log ρ] − log(2π) } − E[ρ] 2 ∑ g,k { 1 rgk + ( E[µgk] − µ0 )2 } . (53) 6. The terms ∫ q(λ) log p(λ|a0, b0)dλ and ∫ q(ρ) log p(ρ|aρ, bρ)dρ can be evaluated as follows: ∫ q(λ) log p(λ|a0, b0)dλ = ∑ g,k ∫ q(λgk) log ba0 0 Γ(a0) λa0−1 gk e−b0λgk dλgk = GKa0 log b0 − GK log Γ(a0) + (a0 − 1) ∑ g,k E[log λgk] − b0 ∑ g,k E[λgk] , (54) ∫ q(ρ) log p(ρ|aρ, bρ)dρ = aρ log bρ − log Γ(aρ) + (aρ − 1)E[log ρ] − bρE[ρ] . (55) 7. The term ∫ q(µ) log q(µ)dµ is evaluated by using the parameters mgk and rgk obtained in Eq. (28) as follows: ∫ q(µ) log q(µ)dµ = ∑ g,k ∫ q(µgk) log q(µgk)dµgk = ∑ g,k ∫ q(µgk) log √ rgk 2π exp { − rgk 2 (µgk − mgk)2 } dµgk = ∑ g,k log √ rgk 2π − ∑ g,k rgk 2 ∫ q(µgk)(µgk − mgk)2 dµgk = ∑ g,k log rgk 2 − GK log(2π) 2 − GK 2 (56) 8. The terms ∫ q(λ) log q(λ)dλ and ∫ q(ρ) log q(ρ)dρ are evaluated based on Eq. (33) and Eq. (37), respectively: ∫ q(λ) log q(λ)dλ = ∑ g,k agk log bgk − ∑ g,k log Γ(agk) + ∑ g,k (agk − 1)E[log λgk] − ∑ g,k bgkE[λgk] = ∑ g,k log bgk − ∑ g,k log Γ(agk) + ∑ g,k (agk − 1)Ψ(agk) − ∑ g,k agk (57) ∫ q(ρ) log q(ρ)dρ = log b − log Γ(a) + (a − 1)Ψ(a) − a . (58) 9. Finally, the term ∑ z q(z) log q(z) = ∑ d,g,k q(zdg = k) log q(zdg = k) can be evaluated by using the proba- bilities obtained in Eq. (47). 6 Conclusion We have implemented the proposed CVB based on the mathematical descriptions given in this paper. To achieve the efficiency in computational cost, we have parallelized the inference with OpenMP library, because we have already confirmed the efficiency of OpenMP parallelization in text mining using LDA-like topic models [2]. Further, the experiment comparing iLPD with LPD is now being conducted on the microarray data available at http://www.gems-system.org/. 11
  • 12. References [1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003. [2] T. Masada, D. Fukagawa, A. Takasu, Y. Shibata, and K. Oguri. Modeling topical trends over continuous time with priors. in Proceedings of ISNN’10, pages 302–311, 2010. [3] S. Rogers, M. Girolami, C. Campbell, and R. Breitling. The latent process decomposition of cDNA microarray data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(2):143–156, 2005. [4] J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639–650, 1994. [5] Y.-W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006. [6] Y.-W. Teh, K. Kurihara, and M. Welling. Collapsed variational inference for HDP. in Advances in Neural Information Processing Systems 20, pages 1481–1488, 2008. [7] Y.-W. Teh, D. Newman, and M. Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. in Advances in Neural Information Processing Systems 19, pages 1353–1360, 2007. [8] Y.-M. Ying, P. Li, and C. Campbell. A marginalized variational Bayesian approach to the analysis of array data. BMC Proceedings, 2(Suppl 4):S7, 2008. 12