SlideShare ist ein Scribd-Unternehmen logo
1 von 72
Bayesian Generalization Error and
Real Log Canonical Threshold in
Non-negative Matrix Factorization
and Latent Dirichlet Allocation
NAOKI HAYASHI (1,2)
2020/06/25
(1) NTT DATA MATHEMATICAL SYSTEMS INC.
SIMULATION & MINING DIVISION
(2) TOKYO INSTITUTE OF TECHNOLOGY
SUMIO WATANABE LABORATORY
1
Symbol Notations
𝑞 𝑥 : the true (i.e. data generating) distribution
𝑝 𝑥|𝑤 : statistical model given parameter w
𝜑 𝑤 : prior distribution
𝑋 𝑛 = 𝑋1, … , 𝑋 𝑛 : i.i.d. sample (r.v.) from 𝑞 𝑥
𝑃 𝑋 𝑛|𝑤 : likelihood
𝜓 𝑤 𝑋 𝑛
: posterior distribution
𝑍 𝑋 𝑛 : marginal likelihood (a.k.a. evidence)
𝑝∗
𝑥 : predictive distribution
Note: 𝑃 𝑋 𝑛|𝑤 , 𝜓 𝑤 𝑋 𝑛 , 𝑍 𝑋 𝑛 , 𝑝∗ 𝑥 depend on 𝑋 𝑛
thus, they are random variables in function spaces.
2
Outline
1. Singular Learning Theory
2. Parameter Region Restriction
3. Summary
3
Outline
1. Singular Learning Theory
2.
3.
4
1. Singular Learning Theory
Problem Setting
• Important random variables are the generalization
error 𝐺 𝑛 and the marginal likelihood 𝑍 𝑛 = 𝑍 𝑋 𝑛 .
‒ 𝐺 𝑛 = 𝑞 𝑥 log
𝑞 𝑥
𝑝∗ 𝑥
d𝑥.
 It represents how different between the true and the predictive in
the sense of a new data generating process.
‒ 𝑍 𝑛 = 𝑖=1
𝑛
𝑝 𝑋𝑖 𝑤 𝜑 𝑤 d𝑤 .
 It represents how similar the true to the model in the sense of the
dataset generating process.
5
1. Singular Learning Theory
Problem Setting
• Important random variables are the generalization
error 𝐺 𝑛 and the marginal likelihood 𝑍 𝑛 = 𝑍 𝑋 𝑛 .
‒ 𝐺 𝑛 = 𝑞 𝑥 log
𝑞 𝑥
𝑝∗ 𝑥
d𝑥.
 It represents how different between the true and the predictive in
the sense of a new data generating process.
‒ 𝑍 𝑛 = 𝑖=1
𝑛
𝑝 𝑋𝑖 𝑤 𝜑 𝑤 d𝑤 .
 It represents how different between the true and the model in the
sense of the dataset generating process.
• How do they behave?
6
1. Singular Learning Theory
Regular Case
• From regular learning theory, if the posterior can be
approximated by a normal dist., the followings hold:
‒ 𝔼 𝐺 𝑛 =
𝑑
2𝑛
+ 𝑜
1
𝑛
,
‒ − log 𝑍 𝑛 = 𝑛𝑆 𝑛 +
𝑑
2
log 𝑛 + 𝑂𝑝 1 ,
where 𝑑 is the dim. of params. and 𝑆 𝑛 is the empirical entropy.
• AIC and BIC are based on regular learning theory.
7
1. Singular Learning Theory
Regular Case
• From regular learning theory, if the posterior can be
approximated by a normal dist., the followings hold:
‒ 𝔼 𝐺 𝑛 =
𝑑
2𝑛
+ 𝑜
1
𝑛
,
‒ − log 𝑍 𝑛 = 𝑛𝑆 𝑛 +
𝑑
2
log 𝑛 + 𝑂𝑝 1 ,
where 𝑑 is the dim. of params. and 𝑆 𝑛 is the empirical entropy.
• AIC and BIC are based on regular learning theory.
8
How about singular cases?
(singular = non-regular)
• Hierarchical models and latent variable
models are typical singular models.
• Their likelihood and posterior cannot
be approximated by any normal dist.
‒ Simple example:
the log likelihood is −𝑏2 𝑏 − 𝑎3 2
in w=(a, b) space.
9
1. Singular Learning Theory
Singular Case
• Hierarchical models and latent variable
models are typical singular models.
• Their likelihood and posterior cannot
be approximated by any normal dist.
‒ Simple example:
the log likelihood is −𝑏2 𝑏 − 𝑎3 2
in w=(a, b) space.
10
1. Singular Learning Theory
Singular Case
Regular learning theory cannot clarify the behavior of their
generalization errors and marginal likelihoods.
1. Singular Learning Theory
Singular Case
• Singular learning theory provides a general
theory for the above issue.
• Suppose some technical assumption, the
followings hold, even if the posterior cannot be
approximated by any normal dist.:
‒ 𝔼 𝐺 𝑛 =
𝜆
𝑛
−
𝑚−1
𝑛 log 𝑛
+ 𝑜
1
𝑛 log 𝑛
,
‒ − log 𝑍 𝑛 = 𝑛𝑆 𝑛 + 𝜆 log 𝑛 − 𝑚 − 1 log log 𝑛 + 𝑂𝑝 1 ,
where 𝜆, 𝑚 are constants which depend on 𝑝 𝑥 𝑤 , 𝜑 𝑤 , and 𝑞 𝑥 .
11
[1] Watanabe. 2001
1. Singular Learning Theory
Singular Case
• Singular learning theory provides a general
theory for the above issue.
• Suppose some technical assumption, the
followings hold, even if the posterior cannot be
approximated by any normal dist.:
‒ 𝔼 𝐺 𝑛 =
𝜆
𝑛
−
𝑚−1
𝑛 log 𝑛
+ 𝑜
1
𝑛 log 𝑛
,
‒ − log 𝑍 𝑛 = 𝑛𝑆 𝑛 + 𝜆 log 𝑛 − 𝑚 − 1 log log 𝑛 + 𝑂𝑝 1 ,
where 𝜆, 𝑚 are constants which depend on 𝑝 𝑥 𝑤 , 𝜑 𝑤 , and 𝑞 𝑥 .
12
What are these constants?
[1] Watanabe. 2001
1. Singular Learning Theory
Invariants in Algebraic Geometry
• Def. A real log canonical threshold (RLCT) is defined
by a negative maximum pole of a zeta function
𝜁 𝑧 =
𝑊
𝐾 𝑤 𝑧 𝑏 𝑤 d𝑤 ,
‒ where K(w) and b(w) are non-negative and analytic.
• Thm. Put 𝐾 𝑤 = KL 𝑞||𝑝 and 𝑏 𝑤 = 𝜑 𝑤 ,
then the RLCT is the learning coefficient 𝜆 and the
order of the maximum pole is the multiplicity 𝑚.
13
This is an important result in singular learning theory.
[1] Watanabe. 2001
1. Singular Learning Theory
Invariants in Algebraic Geometry
14
The lines are the set K(w)=0 in the parameter space
and the star is the “deepest” singularity.
1. Singular Learning Theory
Invariants in Algebraic Geometry
15
The lines are the set K(w)=0 in the parameter space
and the star is the “deepest” singularity.
Corresponding to the
maximum pole of the
zeta function
𝜁 𝑧 =
𝐶
𝑧 + 𝜆 𝑚
+ ⋯
𝐎𝐗 𝐗 𝐗
𝒛 = −𝝀
ℂ
1. Singular Learning Theory
Invariants in Algebraic Geometry
• Properties of 𝜆 and 𝑚 are as follows:
‒ 𝜆 is a positive rational # and 𝑚 is a positive integer.
‒ They are birational invariants.
 We can determine them using blowing-ups, mathematically supported
by Hironaka’s Singularity Resolution Theorem.
• Application of 𝜆 and 𝑚 are as follows:
‒ Nagata showed that an exchange probability in replica
MCMC is represented by using 𝜆.
‒ Drton proposed sBIC which approximates log 𝑍 𝑛 by using the
RLCTs and the multiplicities of candidates 𝑝𝑖, 𝜑, 𝑞 = 𝑝𝑗 :
𝑠𝐵𝐼𝐶𝑖𝑗 "=" loglike 𝑤MLE − 𝜆𝑖𝑗 log 𝑛 + 𝑚𝑖𝑗 − 1 log log 𝑛 .
16
[3] Nagata. 2008
[2] Hironaka. 1964
[4] Drton. 2017-a
1. Singular Learning Theory
Invariants in Algebraic Geometry
> “We can determine them using blowing-ups”
In fact, statistician and machine learning researchers have
studied RLCTs for concreate models:
Besides, Imai has proposed a consistent estimator of an RLCT
([9] Imai. 2019). He’ll talk about it tomorrow at this conf.
17
Singular model Author and year
Gaussian mixture model Yamazaki et. al. in 2003 [5]
Reduced rank regression = MF Aoyagi et. al. in 2005 [6]
Naïve Bayesian network Rusakov et. al. in 2005 [7]
Markov model Zwiernik in 2011 [8]
… …
Outline
1.
2. Parameter Region Restriction
1. Introduction
2. Non-negative Matrix Factorization
3. Latent Dirichlet Allocation
4. Effect of Restriction
3.
18
Outline
1.
2. Parameter Region Restriction
1. Introduction
3.
19
2. Parameter Region Restriction
Motivation
• Sometimes, parameter regions are often
restricted because of interpretability.
1. Non-negative restriction
2. Simplex restriction, etc.
20
Coefficients Coefficients
Non-negative
restriction
Legend
・TVCM
・DM
・Rating
・Reviews
E.g. Logistic regression of purchase existence for a product.
[14] Kohjima. 2016
2. Parameter Region Restriction
Motivation
• Sometimes, parameter regions are often
restricted because of interpretability.
1. Non-negative restriction
2. Simplex restriction, etc.
21
Coefficients Coefficients
Non-negative
restriction
Legend
・TVCM
・DM
・Rating
・Reviews
E.g. Logistic regression of purchase existence for a product.
[14] Kohjima. 2016
What happens when restrictions
are added?
How does the generalization
error change?
2. Parameter Region Restriction
Motivation
• A future application of clarifying the effect of
restriction to generalization is as follows.
22
CustomerStatistician
2. Parameter Region Restriction
Motivation
• A future application of clarifying the effect of
restriction to generalization is as follows.
23
CustomerStatistician
We want to know what happens
when the contributions of
explanatories are restricted to
non-negative. We need high
prediction accuracy and an
interpretable model.
2. Parameter Region Restriction
Motivation
• A future application of clarifying the effect of
restriction to generalization is as follows.
24
CustomerStatistician
We want to know what happens
when the contributions of
explanatories are restricted to
non-negative. We need high
prediction accuracy and an
interpretable model.
We can answer it. If the
parameter is restricted to non-
negative, the prediction
performance is reduced by Foo
points when n = Bar.
To achieve the needed accuracy,
we recommend increasing n to
Bar+Baz.
2. Parameter Region Restriction
Revisiting Analytic Set
25
The lines are the set K(w)=0 in the parameter space
and the star is the “deepest” singularity.
Corresponding to the
maximum pole of the
zeta function
𝜁 𝑧 =
𝐶
𝑧 + 𝜆 𝑚
+ ⋯
2. Parameter Region Restriction
Revisiting Analytic Set
26
The lines are the set K(w)=0 in the parameter space
and the star is the “deepest” singularity.
When a restriction is added,
2. Parameter Region Restriction
Revisiting Analytic Set
27
The lines are the set K(w)=0 in the parameter space
and the star is the “deepest” singularity.
When a restriction is added,
The deepest singularity is changed.
2. Parameter Region Restriction
Revisiting Analytic Set
28
The lines are the set K(w)=0 in the parameter space
and the star is the “deepest” singularity.
When a restriction is added,
The deepest singularity is changed.
I.e.
2. Parameter Region Restriction
Revisiting Analytic Set
29
The lines are the set K(w)=0 in the parameter space
and the star is the “deepest” singularity.
When a restriction is added,
The deepest singularity is changed.
I.e.
The RLCT and the multiplicity
become different.
A theoretical simple example is in Appendix.
2. Parameter Region Restriction
Recent Studies
Two recent studies of singular learning theory for
parameter restricted models will be introduced.
• Non-negative matrix factorization (NMF)
‒ Based on our previous works:
https://doi.org/10.1016/j.neucom.2017.04.068 [10]
https://doi.org/10.1109/ssci.2017.8280811 [11]
https://doi.org/10.1016/j.neunet.2020.03.009 [12]
• Latent Dirichlet allocation (LDA)
‒ Based on our previous work:
https://doi.org/10.1007/s42979-020-0071-3 [13]
30
Outline
1.
2. Parameter Region Restriction
2. Non-negative Matrix Factorization
3.
31
2. Parameter Region Restriction
Non-negative matrix factorization
• NMF as a statistical model is formulized as follows.
‒ Data matrices: 𝑋 𝑛 = 𝑋 1 , … , 𝑋 𝑛 ; 𝑀 × 𝑁 × 𝑛
are i.i.d. and subject to 𝑞 𝑋𝑖𝑗 = Poi 𝑋𝑖𝑗| 𝐴0 𝐵0 𝑖𝑗 .
 True factorization: 𝐴; 𝑀 × 𝐻0, 𝐵; 𝐻0 × 𝑁
‒ Set the model 𝑝 𝑋𝑖𝑗|𝐴, 𝐵 = Poi 𝑋𝑖𝑗| 𝐴𝐵 𝑖𝑗 and
the prior 𝜑 𝐴, 𝐵 = Gam 𝐴𝑖𝑘|𝜙 𝐴, 𝜃 𝐴 Gam 𝐵 𝑘𝑗|𝜙 𝐵, 𝜃 𝐵 .
 Learner factorization: 𝐴; 𝑀 × 𝐻, 𝐵; 𝐻 × 𝑁
32
n
X
A
B
𝑃 𝑋, 𝐴, 𝐵 = 𝑃 𝑋 𝐴, 𝐵 𝑃 𝐴 𝑃 𝐵
Poi 𝑥|𝑐 =
𝑐 𝑥
𝑒−𝑐
𝑥!
𝑐 > 0 + 𝛿 𝑥 𝑐 = 0
Gam 𝑎|𝜙, 𝜃 =
𝜃 𝜙
Γ 𝜃
𝑎 𝜙
𝑒−𝜃𝑎
2. Parameter Region Restriction
Non-negative matrix factorization
• NMF as a statistical model is formulized as follows.
‒ Data matrices: 𝑋 𝑛 = 𝑋 1 , … , 𝑋 𝑛 ; 𝑀 × 𝑁 × 𝑛
are i.i.d. and subject to 𝑞 𝑋𝑖𝑗 = Poi 𝑋𝑖𝑗| 𝐴0 𝐵0 𝑖𝑗 .
 True factorization: 𝐴; 𝑀 × 𝐻0, 𝐵; 𝐻0 × 𝑁
‒ Set the model 𝑝 𝑋𝑖𝑗|𝐴, 𝐵 = Poi 𝑋𝑖𝑗| 𝐴𝐵 𝑖𝑗 and
the prior 𝜑 𝐴, 𝐵 = Gam 𝐴|𝜙 𝐴, 𝜃 𝐴 Gam 𝐵|𝜙 𝐵, 𝜃 𝐵 .
 Learner factorization: 𝐴; 𝑀 × 𝐻, 𝐵; 𝐻 × 𝑁
33
n
X
A
B
𝑃 𝑋, 𝐴, 𝐵 = 𝑃 𝑋 𝐴, 𝐵 𝑃 𝐴 𝑃 𝐵
Poi 𝑥|𝑐 =
𝑐 𝑥
𝑒−𝑐
𝑥!
𝑐 > 0 + 𝛿 𝑥 𝑐 = 0
Gam 𝑎|𝜙, 𝜃 =
𝜃 𝜙
Γ 𝜃
𝑎 𝜙
𝑒−𝜃𝑎
Matrix X is factorized to a product of
two matrices.
𝑿𝑴
𝑵 𝑯 𝑵
𝑴
𝑨 𝑩𝑯
[14] Kohjima. 2016
2. Parameter Region Restriction
RLCT of NMF
• The RLCT of NMF 𝝀 satisfies the following inequality:
𝝀 ≤
𝟏
𝟐
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴𝝓 𝑨, 𝑵𝝓 𝑩 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟏 .
The equality holds if 𝑯 = 𝑯 𝟎 = 𝟏 or 𝑯 𝟎 = 𝟎.
• Tighter upper bound is also derived
if 𝝓 𝑨 = 𝝓 𝑩 = 𝟏.
• This result gives a lower bound of
variational approximation error in NMF.
34
[15] Kohjima. 2017
[12] H. 2020-a
[11] H. and Watanabe 2017-b
[10] H. and Watanabe 2017-a
[12] H. 2020-a
Outline
1.
2. Parameter Region Restriction
3. Latent Dirichlet Allocation
3.
35
2. Parameter Region Restriction
Latent Dirichlet Allocation
• Situation:
‒ LDA treats a bag of words.
‒ Each document (=word list) has some latent topics.
 E.g. A mathematics paper is a document.
It is expected that there exists “name” topic, “math” topic, etc.
In “name” topic, appearance probability of mathematician’s
names may be high.
36
MATH
NAME
…
Riemann,
Lebesgue,
Atiyah,
Hironaka,
…
integral,
measure,
distribution,
singularity,
…
document
topic
word
word
2. Parameter Region Restriction
Latent Dirichlet Allocation
• Documents 𝑧 𝑛 and words 𝑥 𝑛 are observable and
topics 𝑦 𝑛 are not.
• LDA assumes that words occur given documents.
37
MATH
NAME
…
Riemann,
Lebesgue,
Atiyah,
Hironaka,
…
integral,
measure,
distribution,
singularity,
…
document
topic
word
word
n
xyz
𝑥 𝑛
∼ 𝑞 𝑥 𝑧
𝑝 𝑥, 𝑦 𝑧, 𝑤
estimate
2. Parameter Region Restriction
Latent Dirichlet Allocation
38
FOOD
Alice
sushi
NAME
MATH
Riemann
integral
NAME
・
・
・
・
・
・
FOOD pudding
・
・
・
NAME Lebesgue
Data(=word) generating process of LDA
Document 1
Document N
[13] H. and Watanabe 2020-b
2. Parameter Region Restriction
Latent Dirichlet Allocation
39
FOOD
Alice
sushi
NAME
MATH
Riemann
integral
NAME
・
・
・
・
・
・
FOOD pudding
・
・
・
NAME Lebesgue
Data(=word) generating process of LDA
Document 1
Document N
Topic proportion 𝑏𝑗 = 𝑏1𝑗, … , 𝑏 𝐻𝑗 is
corresponding to each document.
Topic 1
Topic 2
Topic H
[13] H. and Watanabe 2020-b
2. Parameter Region Restriction
Latent Dirichlet Allocation
40
FOOD
Alice
sushi
NAME
MATH
Riemann
integral
NAME
・
・
・
・
・
・
FOOD pudding
・
・
・
NAME Lebesgue
Data(=word) generating process of LDA
Document 1
Document N
Topic proportion 𝑏𝑗 = 𝑏1𝑗, … , 𝑏 𝐻𝑗 is
corresponding to each document.
Word appearance probability
𝑎 𝑘 = 𝑎1𝑘, … , 𝑎 𝑀𝑘 is
corresponding to each topic.
Topic 1
Topic 2
Topic H
[13] H. and Watanabe 2020-b
2. Parameter Region Restriction
Latent Dirichlet Allocation
• LDA is formulized as follows.
‒ Onehot-encoded words: 𝑥 𝑛
= 𝑥 1
, … , 𝑥 𝑛
; 𝑀 × 𝑛 and
documents: 𝑧 𝑛 = 𝑧 1 , … , 𝑧 𝑛 ; 𝑁 × 𝑛 are i.i.d. and
generated from 𝑞 𝑥|𝑧 𝑞 𝑧 .
‒ Set the model
𝑝 𝑥, 𝑦|𝑧, 𝐴, 𝐵 =
𝑗
𝑁
𝑘
𝐻
𝑏 𝑘𝑗
𝑖
𝑀
𝑎𝑖𝑘
𝑥 𝑖
𝑦 𝑘
𝑧 𝑗
and the prior 𝜑 𝐴, 𝐵 = 𝑘 Dir 𝑎 𝑘|𝜙 𝐴 𝑗 Dir 𝑏 𝑘|𝜙 𝐵 .
 Latent topics: 𝑦 𝑛
= 𝑦
1
, … , 𝑦
𝑛
; 𝐾 × 𝑛.
 Stochastic matrices: 𝐴; 𝑀 × 𝐻, 𝐵; 𝐻 × 𝑁, 𝑘 𝑎𝑖𝑘 = 1, 𝑗 𝑏 𝑘𝑗 = 1.
41
𝑃 𝑋, 𝑌, 𝐴, 𝐵|𝑍 = 𝑃 𝑋, 𝑌 𝑍, 𝐴, 𝐵 𝑃 𝐴 𝑃 𝐵 ; Dir 𝑐|𝜙 =
Γ 𝑘
𝐻
𝜙 𝑘
𝑘
𝐻 𝜙 𝑘
𝑘
𝐻
𝑐 𝑘
𝜙 𝑘−1
, 𝑘 𝑐 𝑘 = 1.
2. Parameter Region Restriction
Stochastic Matrix Factorization
• In the NMF, consider formally replacing parameter
non-negative matrices to stochastic matrices.
‒ A stochastic matrix is defined by a non-negative matrix
wherein the sum of the entries in its columns is equal to 1.
Example:
0.1 0.1 0.4 0
0.5 0.1 0.4 0
0.4 0.8 0.2 1
.
• This replaced model is called stochastic matrix
factorization (SMF).
42
2. Parameter Region Restriction
Equivalence of LDA and SMF
• Let 𝐾 𝑤 = 𝑧 𝑥 𝑞 𝑥 𝑧 𝑞 𝑧 log
𝑞 𝑥 𝑧
𝑝 𝑥 𝑧, 𝐴, 𝐵
and 𝐻 𝑤 = 𝐴𝐵 − 𝐴 𝑜 𝐵𝑜
2 where 𝑞 𝑥 𝑧 is
𝑦 𝑝 𝑥, 𝑦 𝑧, 𝐴0, 𝐵0 .
• It can be proved that 𝐾 𝑤 ∼ 𝐻 𝑤 , i.e. the RLCT
of LDA is equal to the one of SMF.
43
[13] H. 2020-b
2. Parameter Region Restriction
Equivalence of LDA and SMF
• Let 𝐾 𝑤 = 𝑧 𝑥 𝑞 𝑥 𝑧 𝑞 𝑧 log
𝑞 𝑥 𝑧
𝑝 𝑥 𝑧, 𝐴, 𝐵
and 𝐻 𝑤 = 𝐴𝐵 − 𝐴 𝑜 𝐵𝑜
2 where 𝑞 𝑥 𝑧 is
𝑦 𝑝 𝑥, 𝑦 𝑧, 𝐴0, 𝐵0 .
• It can be proved that 𝐾 𝑤 ∼ 𝐻 𝑤 , i.e. the RLCT
of LDA is equal to the one of SMF.
44
Thus, we only have to consider SMF
to determine 𝜆 and 𝑚 of LDA.
[13] H. 2020-b
2. Parameter Region Restriction
RLCT of LDA = RLCT of SMF
• If the prior is positive and bounded, the RLCT of LDA 𝝀
satisfies the following inequality:
𝝀 ≤
𝟏
𝟐
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴 𝟏, 𝑵 + 𝑯 𝟎 − 1 𝑴 𝟏 + 𝑵 − 𝟐 + 𝑴 𝟏 ,
where 𝑴 𝟏 = 𝑴 − 𝟏.
The equality holds if 𝑯 = 𝑯 𝟎 = 𝟏, 𝟐.
• Also, if 𝑯 = 𝟐 𝐚𝐧𝐝 𝑯 𝟎 = 𝟏,
𝝀 =
1
2
max 𝑀, 𝑁 + 𝑀 − 2 .
45
[13] H. 2020-b
Outline
1.
2. Parameter Region Restriction
4. Effect of Restriction
3.
46
2. Parameter Region Restriction
Effect of Restriction
• Parameter region of NMF is non-negative matrix.
• Parameter region of SMF is stochastic matrix.
• If the parameter region is real matrix, then the
model is called matrix factorization.
‒ In the non-restricted case, Aoyagi clarified the exact
value of 𝜆 and 𝑚.
• How different?
47
[6] Aoyagi. 2005
2. Parameter Region Restriction
Effect of Restriction
• Because of the restriction, the rank of matrix has
different meaning.
‒ In NMF, the minimal inner dimension of the true matrix
factorization is NOT rank; non-negative rank.
‒ The boundary of the parameter region causes that the
usual rank is not equal to the non-negative rank.
‒ Since SMF is a restricted NMF, its minimal factorizations
are also affected.
48
[16] Cohen, et. al. 1993
2. Parameter Region Restriction
Effect of Restriction
• Because of the restriction, the rank of matrix has
different meaning.
‒ In NMF, the minimal inner dimension of the true matrix
factorization is NOT rank; non-negative rank.
‒ The boundary of the parameter region causes that the
usual rank is not equal to the non-negative rank.
‒ Since SMF is a restricted NMF, its minimal factorizations
are also affected.
49
𝐻0 = 0 in NMF and 𝐻0 = 1 in SMF
are such cases.
In fact, the RLCTs are not equal to
the one in the non-restricted case.
[16] Cohen, et. al. 1993
2. Parameter Region Restriction
Effect of Restriction
• In general, narrowing the parameter region
increases RLCTs; generalization error increases.
‒ NMF is such case. Difference of RLCTs gives the effect.
50
2. Parameter Region Restriction
Effect of Restriction
• In general, narrowing the parameter region
increases RLCTs; generalization error increases.
‒ NMF is such case. Difference of RLCTs shows the effect.
• However, restriction with decreasing the
parameter dimension does not increase them.
‒ SMF is such case because simplex constraint obviously
decreases the dimension.
51
0.1 0.1 0.4 0
0.5 0.1 0.4 0
0.4 0.8 0.2 1
𝑎 𝑀𝑘 = 1 −
𝑖=1
𝑀−1
𝑎𝑖𝑘
2. Parameter Region Restriction
Effect of Restriction
• In general, narrowing the parameter region
increases RLCTs; generalization error increases.
‒ NMF is such case. Difference of RLCTs gives the effect.
• However, restriction with decreasing the
parameter dimension does not increase them.
‒ SMF is such case because simplex constraint obviously
decrease the dimension.
52
0.1 0.1 0.4 0
0.5 0.1 0.4 0
0.4 0.8 0.2 1
𝑎 𝑀𝑘 = 1 −
𝑖=1
𝑀−1
𝑎𝑖𝑘
One of future work is more
precious evaluation for effect
of parameter region restriction.
Outline
1.
2.
3. Summary
53
3. Summary
• Singular learning theory provides a general theory
for determination of behaviors of Bayesian
generalization error and marginal likelihood by using
algebraic geometry, even if the model is singular.
• In the above theory, as a foundation for clarifying
the effect of parameter region restriction, we derive
upper bounds of the RLCTs for typical restricted
models.
54
References
[1] Watanabe S. Algebraic geometrical methods for hierarchical learning
machines. Neural Netw. 2001;13(4):1049–60.
[2] Hironaka H. Resolution of singularities of an algbraic variety over a field of
characteristic zero. Ann Math. 1964;79:109–326.
[3] Nagata K, Watanabe S. Asymptotic behavior of exchange ratio in exchange
monte carlo method. Neural Netw. 2008;21(7):980–8.
[4] Drton M, Plummer M. A bayesian information criterion for singular models. J
R Stat Soc B. 2017;79:323–80 with discussion.
[5] Yamazaki K, Watanabe S. Singularities in mixture models and upper bounds
of stochastic complexity. Neural Netw. 2003;16(7):1029–38.
[6] Aoyagi M, Watanabe S. Stochastic complexities of reduced rank regression in
bayesian estimation. Neural Netw. 2005;18(7):924–33.
[7] Rusakov D, Geiger D. Asymptotic model selection for naive bayesian
networks. J Mach Learn Res. 2005;6(Jan):1–35.
[8] Zwiernik P. An asymptotic behaviour of the marginal likelihood for general
markov models. J Mach Learn Res. 2011;12(Nov):3283–310.
[9] Imai T. Estimating real log canonical threshold. arXiv1906.01341. 1-28.
55
References
[10] H N, Watanabe S. Upper bound of bayesian generalization error in non-negative
matrix factorization. Neurocomputing. 2017;266C(29 November):21–8.
[11] H, N., & Watanabe, S. (2017a). Tighter upper bound of real log canonical
threshold of non-negative matrix factorization and its application to bayesian
inference. In IEEE symposium series on computational intelligence (IEEE SSCI)
(pp. 718–725).
[12] H N. Variational approximation error in non-negative matrix factorization. Neural
Netw. 2020;126(June):65-75.
[13] H N, Watanabe S. Asymptotic bayesian generalization error in
latent dirichlet allocation and stochastic matrix factorization. SN Computer
Science. 2020:1(2);1–22.
[14] Kohjima M, Matsubayashi T, Sawada H. Multiple data analysis and non-negative
matrix/tensor factorization [I]: multiple data analysis and its advances. IEICE Transaction.
2016:99(6);543-550. In Japanese.
[15] Kohjima M., & Watanabe S. (2017). Phase transition structure of variational
bayesian nonnegative matrix factorization. In International conference on
artificial neural networks (ICANN) (pp. 146–154).
[16] Cohen J.E. , Rothblum U.G. Nonnegative ranks, decompositions, and factoriza-
tions of nonnegative matrices. Linear Algebra Appl. 1993;190:149–168 .
56
Q&A
57
Question1
How much is the difference
Q. How much is the difference of the RLCTs between
the restricted case (our result) and the non-restricted
case (Aoyagi’s result)?
A. The effect of restriction occurs in the boundary of
the parameter space.
I cannot answer the exact value of the difference in the
all cases (I haven’t memorized Aoyagi’s result……);
however, in the case our result clarifies the exact value
of the RLCT in restricted case, the difference between
the RLCTs is the exact answer.
58
Question1
How much is the difference
Q. How much is the difference of the RLCTs between the
restricted case (our result) and the non-restricted case
(Aoyagi’s result)?
From Aoyagi’s result, in the case 𝐻0 = 0 in NMF (same
dimension and boundary case), the difference is as follows.
diff = 𝜆 𝑁𝑀𝐹 − 𝜆 𝑀𝐹 =
1
2
𝐻 min 𝑀, 𝑁 − 𝜆 𝑀𝐹
Note: Aoyagi’s result assumes the prior is posisitive and
bounded; thus, the hyperparameters in NMF are set to 1 in
the above.
Note2: 𝜆 𝑀𝐹 is changed by the condition of the matrix size.
The next slide describes.
59
Question1
How much is the difference
In the case 𝐻0 = 0 , the RLCT in Aoyagi’s result is as
follows. Assume min{M,N}>=2 and min{M,N}>= H>=1.
If N<M+H and M<N+H and H<M+N and M+H+N is even,
𝜆 𝑀𝐹 =
1
8
2𝐻 𝑀 + 𝑁 − 𝑀 − 𝑁 2 − 𝐻2 ,
Else if N<M+H and M<N+H and H<M+N and M+H+N is
odd, 𝜆 𝑀𝐹 =
1
8
2𝐻 𝑀 + 𝑁 − 𝑀 − 𝑁 2 − 𝐻2 + 1 ,
Else if M+H<N (-> M<N), 𝜆 𝑀𝐹 =
1
2
𝑀𝐻,
Else if N+H<M (-> N<M), 𝜆 𝑀𝐹 =
1
2
𝑁𝐻.
60
Question1
How much is the difference
In the case 𝐻0 = 0 , the difference in the question is as follows.
Assume min{M,N}>=2 and min{M,N}>= H>=1.
If N<M+H and M<N+H and H<M+N and M+H+N is even,
diff =
1
8
4𝐻 min 𝑀, 𝑁 − 2𝐻 𝑀 + 𝑁 + 𝑀 − 𝑁 2 + 𝐻2 ,
Else if N<M+H and M<N+H and H<M+N and M+H+N is odd,
diff =
1
8
4𝐻 min 𝑀, 𝑁 − 2𝐻 𝑀 + 𝑁 + 𝑀 − 𝑁 2
+ 𝐻2
− 1 ,
Else if M+H<N (-> M<N), diff = 0,
Else if N+H<M (-> N<M), diff = 0.
61
E.g. If M=N=4 and H=2, the difference is
equal to
(1/8)*(4*2*4 - 2*2*8 + 0 + 4)=1/2
by using the first case.
Question2
The exact value of the RLCT of LDA
Q. In the inequality that shows an upper bound of the
RLCT of LDA, the equality hold if H=H_0=1 or 2.
However, the exact value in the case of H=2 and H_0 =
1 is also found. What does it mean?
A. When H=H_0=1 or 2, the upper bound is equal to
the exact value of the RLCT. If H=2 and H_0=1, the
upper bound is strictly larger than the exact value but
we can find the exact value by using the method which
is different from the method to derive the upper bound
and its equality.
62
Appendix
63
Appendix
Simple Example
• Let 𝐾 𝑤 = 𝑎𝑏 + 𝑐𝑑 2 and 𝜑 𝑤 ≡ 1,
where 𝑤 ∈ −1,1 4 =: 𝑊.
• Using blowing-up 𝑎, 𝑏, 𝑐, 𝑑 = 𝑎, 𝑏𝑑, 𝑐, 𝑑 , we have
𝐾 𝑤 = 𝑑2 𝑎𝑏 + 𝑐 2 and its Jacobian is 𝑑 .
• Besides, applying linear transformation
𝑎, 𝑏, 𝑐, 𝑑 = 𝑎, 𝑏, 𝑐 + 𝑎𝑏, 𝑑 , 𝐾 𝑤 = 𝑐2 𝑑2.
64
Appendix
Simple Example
• Then, the zeta function is
−1
1
−1
1
𝑎𝑏−1
𝑎𝑏+1
−1
1
𝑐2𝑧
𝑑2𝑧
𝑑 d𝑤
• For any −1 ≤ 𝑎, 𝑏, ≤ 1, we can take a neighborhood (nbhd) of
c=0 as an open set.
• The RLCT does not change when we change the integral
interval from 𝑎𝑏 − 1, 𝑎𝑏 + 1 to −𝜀, 𝜀 for any 𝜀 > 0.
• Thus, we consider
−1
1
−1
1
−𝜀
𝜀
−1
1
𝑐2𝑧 𝑑2𝑧+1 d𝑎d𝑏d𝑐d𝑑 ∝
1
𝑧 + 1 2𝑧 + 1
and obtain 𝜆 =
1
2
, 𝑚 = 1.
65
Appendix
Simple Example
• Then, the zeta function is
−1
1
−1
1
𝑎𝑏−1
𝑎𝑏+1
−1
1
𝑐2𝑧
𝑑2𝑧
𝑑 d𝑤
• For any −1 ≤ 𝑎, 𝑏, ≤ 1, we can take a neighborhood (nbhd) of
c=0 as an open set.
• The RLCT does not change when we change the integral
interval from 𝑎𝑏 − 1, 𝑎𝑏 + 1 to −𝜀, 𝜀 for any 𝜀 > 0.
• Thus, we consider
−1
1
−1
1
−𝜀
𝜀
−1
1
𝑐2𝑧 𝑑2𝑧+1 d𝑎d𝑏d𝑐d𝑑 ∝
1
𝑧 + 1 2𝑧 + 1
and obtain 𝜆 =
1
2
, 𝑚 = 1.
66
How about it when w is non-negative?
Appendix
Simple Example
• Let 𝐾 𝑤 = 𝑎𝑏 + 𝑐𝑑 2 and 𝜑 𝑤 ≡ 1,
where 𝑤 ∈ 0, 1 4 =: 𝑊.(Non-negative case)
• Using blowing-up 𝑎, 𝑏, 𝑐, 𝑑 = 𝑎, 𝑏𝑑, 𝑐, 𝑑 , we have
𝐾 𝑤 = 𝑑2 𝑎𝑏 + 𝑐 2 and its Jacobian is 𝑑 .
• Besides, applying linear transformation
𝑎, 𝑏, 𝑐, 𝑑 = 𝑎, 𝑏, 𝑐 + 𝑎𝑏, 𝑑 , 𝐾 𝑤 = 𝑐2 𝑑2.
• Then, the zeta function is
0
1
0
1
𝑎𝑏
𝑎𝑏+1
0
1
𝑐2𝑧 𝑑2𝑧 𝑑 d𝑤 .
67
Appendix
Simple Example
•
•
•
•
68
It looks as same as the previous slide.
However, …
Appendix
Simple Example
0
1
0
1
𝑎𝑏
𝑎𝑏+1
0
1
𝑐2𝑧
𝑑2𝑧
𝑑 d𝑤
• Because of non-negativity, we cannot take any
nbhd of c=0 as an open set.
‒ There is no 𝜀 > 0 such that the RLCT does not change
when we change the integral interval from from
𝑎𝑏, 𝑎𝑏 + 1 to 0, 𝜀 .
• We must consider other method (not directly
calculating zeta function) to determine 𝜆 and 𝑚.
69
Appendix
Simple Example
In this case, we can obtain the exact value of them
by using the following “equivalent lemma”.
70
• Let 𝐾: 𝑊 → ℝ and 𝐻: 𝑊 → ℝ be non-negative and
analytic functions.
• If there exist constants 𝑐1, 𝑐2 such that
𝑐1 𝐾 𝑤 ≤ 𝐻 𝑤 ≤ 𝑐2 𝐾 𝑤 ,
then their RLCTs and their multiplicities are same:
𝜆 𝐾 = 𝜆 𝐻 , 𝑚 𝐾 = 𝑚 𝐻 .
An equivalent relation K~H is defined by the RLCTs
and the multiplicities of K and H are same.
Appendix
Simple Example
Reviting the setting: 𝐾 𝑤 = 𝑎𝑏 + 𝑐𝑑 2, 𝜑 𝑤 ≡ 1,
where 𝑤 = 𝑎, 𝑏, 𝑐, 𝑑 ∈ 0, 1 4 =: 𝑊.
• 2 𝑎2
𝑏2
+ 𝑐2
𝑑2
− 𝑎𝑏 + 𝑐𝑑 2
= 𝑎𝑏 − 𝑐𝑑 2
≥ 0.
• 𝑎𝑏 + 𝑐𝑑 2 − 𝑎2 𝑏2 + 𝑐2 𝑑2 = 2𝑎𝑏𝑐𝑑 ≥ 0.
• Thus, the following holds
𝑎2 𝑏2 + 𝑐2 𝑑2 ≤ 𝐾 𝑤 ≤ 2 𝑎2 𝑏2 + 𝑐2 𝑑2 ,
i.e. the RLCT is same as one of 𝑎2 𝑏2 + 𝑐2 𝑑2.
• By simple calculation, we obtain 𝜆 = 1, 𝑚 = 2.
71
Appendix
Simple Example
Reviting the setting: 𝐾 𝑤 = 𝑎𝑏 + 𝑐𝑑 2, 𝜑 𝑤 ≡ 1,
where 𝑤 = 𝑎, 𝑏, 𝑐, 𝑑 ∈ 0, 1 4 =: 𝑊.
• 2 𝑎2
𝑏2
+ 𝑐2
𝑑2
− 𝑎𝑏 + 𝑐𝑑 2
= 𝑎𝑏 − 𝑐𝑑 2
≥ 0.
• 𝑎𝑏 + 𝑐𝑑 2 − 𝑎2 𝑏2 + 𝑐2 𝑑2 = 2𝑎𝑏𝑐𝑑 ≥ 0.
• Thus, the following holds
𝑎2 𝑏2 + 𝑐2 𝑑2 ≤ 𝐾 𝑤 ≤ 2 𝑎2 𝑏2 + 𝑐2 𝑑2 ,
i.e. the RLCT is same as one of 𝑎2 𝑏2 + 𝑐2 𝑑2.
• By simple calculation, we obtain 𝜆 = 1, 𝑚 = 2.
72
They are different from the non-
restriction case (𝜆 =
1
2
, 𝑚 = 1).

Weitere ähnliche Inhalte

Was ist angesagt?

. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
butest
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learning
butest
 
Figure 1
Figure 1Figure 1
Figure 1
butest
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464
IJRAT
 

Was ist angesagt? (20)

. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learning
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
STUDY OF Ε-SMOOTH SUPPORT VECTOR REGRESSION AND COMPARISON WITH Ε- SUPPORT ...
STUDY OF Ε-SMOOTH SUPPORT VECTOR  REGRESSION AND COMPARISON WITH Ε- SUPPORT  ...STUDY OF Ε-SMOOTH SUPPORT VECTOR  REGRESSION AND COMPARISON WITH Ε- SUPPORT  ...
STUDY OF Ε-SMOOTH SUPPORT VECTOR REGRESSION AND COMPARISON WITH Ε- SUPPORT ...
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Bayesian Core: Chapter 6
Bayesian Core: Chapter 6Bayesian Core: Chapter 6
Bayesian Core: Chapter 6
 
Polikar10missing
Polikar10missingPolikar10missing
Polikar10missing
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
Figure 1
Figure 1Figure 1
Figure 1
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
FUZZY DIAGONAL OPTIMAL ALGORITHM TO SOLVE INTUITIONISTIC FUZZY ASSIGNMENT PRO...
FUZZY DIAGONAL OPTIMAL ALGORITHM TO SOLVE INTUITIONISTIC FUZZY ASSIGNMENT PRO...FUZZY DIAGONAL OPTIMAL ALGORITHM TO SOLVE INTUITIONISTIC FUZZY ASSIGNMENT PRO...
FUZZY DIAGONAL OPTIMAL ALGORITHM TO SOLVE INTUITIONISTIC FUZZY ASSIGNMENT PRO...
 
2008 spie gmm
2008 spie gmm2008 spie gmm
2008 spie gmm
 
Bq25399403
Bq25399403Bq25399403
Bq25399403
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464
 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 

Ähnlich wie Bayesian Generalization Error and Real Log Canonical Threshold in Non-negative Matrix Factorization and Latent Dirichlet Allocation

Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering Report
Miaolan Xie
 

Ähnlich wie Bayesian Generalization Error and Real Log Canonical Threshold in Non-negative Matrix Factorization and Latent Dirichlet Allocation (20)

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Presentation
PresentationPresentation
Presentation
 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdf
 
Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural Networks
 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
 
[Steven h. weintraub]_jordan_canonical_form_appli(book_fi.org)
[Steven h. weintraub]_jordan_canonical_form_appli(book_fi.org)[Steven h. weintraub]_jordan_canonical_form_appli(book_fi.org)
[Steven h. weintraub]_jordan_canonical_form_appli(book_fi.org)
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
 
Av 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background MaterialAv 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background Material
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering Report
 
A simple method to find a robust output feedback controller by random search ...
A simple method to find a robust output feedback controller by random search ...A simple method to find a robust output feedback controller by random search ...
A simple method to find a robust output feedback controller by random search ...
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02
 
Enm fy17nano qsar
Enm fy17nano qsarEnm fy17nano qsar
Enm fy17nano qsar
 
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureRobust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
 
Lesson 26
Lesson 26Lesson 26
Lesson 26
 
AI Lesson 26
AI Lesson 26AI Lesson 26
AI Lesson 26
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 

Mehr von Naoki Hayashi

Mehr von Naoki Hayashi (19)

【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
 
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
 
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介
 
ベイズ統計学の概論的紹介-old
ベイズ統計学の概論的紹介-oldベイズ統計学の概論的紹介-old
ベイズ統計学の概論的紹介-old
 
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
 
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
 
201803NC
201803NC201803NC
201803NC
 
201703NC
201703NC201703NC
201703NC
 
201709ibisml
201709ibisml201709ibisml
201709ibisml
 
すずかけはいいぞ
すずかけはいいぞすずかけはいいぞ
すずかけはいいぞ
 
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
 
RPG世界の形状及び距離の幾何学的考察(rogyconf61)
RPG世界の形状及び距離の幾何学的考察(rogyconf61)RPG世界の形状及び距離の幾何学的考察(rogyconf61)
RPG世界の形状及び距離の幾何学的考察(rogyconf61)
 
Rogyゼミ7thスライドpublic
Rogyゼミ7thスライドpublicRogyゼミ7thスライドpublic
Rogyゼミ7thスライドpublic
 
Rogyゼミスライド6th
Rogyゼミスライド6thRogyゼミスライド6th
Rogyゼミスライド6th
 
Rogy目覚まし(仮)+おまけ
Rogy目覚まし(仮)+おまけRogy目覚まし(仮)+おまけ
Rogy目覚まし(仮)+おまけ
 
ぼくのつくったこうだいさいてんじぶつ
ぼくのつくったこうだいさいてんじぶつぼくのつくったこうだいさいてんじぶつ
ぼくのつくったこうだいさいてんじぶつ
 
情報統計力学のすすめ
情報統計力学のすすめ情報統計力学のすすめ
情報統計力学のすすめ
 
Rogyゼミ2014 10
Rogyゼミ2014 10Rogyゼミ2014 10
Rogyゼミ2014 10
 
Rogyzemi
RogyzemiRogyzemi
Rogyzemi
 

Kürzlich hochgeladen

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 

Kürzlich hochgeladen (20)

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 

Bayesian Generalization Error and Real Log Canonical Threshold in Non-negative Matrix Factorization and Latent Dirichlet Allocation

  • 1. Bayesian Generalization Error and Real Log Canonical Threshold in Non-negative Matrix Factorization and Latent Dirichlet Allocation NAOKI HAYASHI (1,2) 2020/06/25 (1) NTT DATA MATHEMATICAL SYSTEMS INC. SIMULATION & MINING DIVISION (2) TOKYO INSTITUTE OF TECHNOLOGY SUMIO WATANABE LABORATORY 1
  • 2. Symbol Notations 𝑞 𝑥 : the true (i.e. data generating) distribution 𝑝 𝑥|𝑤 : statistical model given parameter w 𝜑 𝑤 : prior distribution 𝑋 𝑛 = 𝑋1, … , 𝑋 𝑛 : i.i.d. sample (r.v.) from 𝑞 𝑥 𝑃 𝑋 𝑛|𝑤 : likelihood 𝜓 𝑤 𝑋 𝑛 : posterior distribution 𝑍 𝑋 𝑛 : marginal likelihood (a.k.a. evidence) 𝑝∗ 𝑥 : predictive distribution Note: 𝑃 𝑋 𝑛|𝑤 , 𝜓 𝑤 𝑋 𝑛 , 𝑍 𝑋 𝑛 , 𝑝∗ 𝑥 depend on 𝑋 𝑛 thus, they are random variables in function spaces. 2
  • 3. Outline 1. Singular Learning Theory 2. Parameter Region Restriction 3. Summary 3
  • 5. 1. Singular Learning Theory Problem Setting • Important random variables are the generalization error 𝐺 𝑛 and the marginal likelihood 𝑍 𝑛 = 𝑍 𝑋 𝑛 . ‒ 𝐺 𝑛 = 𝑞 𝑥 log 𝑞 𝑥 𝑝∗ 𝑥 d𝑥.  It represents how different between the true and the predictive in the sense of a new data generating process. ‒ 𝑍 𝑛 = 𝑖=1 𝑛 𝑝 𝑋𝑖 𝑤 𝜑 𝑤 d𝑤 .  It represents how similar the true to the model in the sense of the dataset generating process. 5
  • 6. 1. Singular Learning Theory Problem Setting • Important random variables are the generalization error 𝐺 𝑛 and the marginal likelihood 𝑍 𝑛 = 𝑍 𝑋 𝑛 . ‒ 𝐺 𝑛 = 𝑞 𝑥 log 𝑞 𝑥 𝑝∗ 𝑥 d𝑥.  It represents how different between the true and the predictive in the sense of a new data generating process. ‒ 𝑍 𝑛 = 𝑖=1 𝑛 𝑝 𝑋𝑖 𝑤 𝜑 𝑤 d𝑤 .  It represents how different between the true and the model in the sense of the dataset generating process. • How do they behave? 6
  • 7. 1. Singular Learning Theory Regular Case • From regular learning theory, if the posterior can be approximated by a normal dist., the followings hold: ‒ 𝔼 𝐺 𝑛 = 𝑑 2𝑛 + 𝑜 1 𝑛 , ‒ − log 𝑍 𝑛 = 𝑛𝑆 𝑛 + 𝑑 2 log 𝑛 + 𝑂𝑝 1 , where 𝑑 is the dim. of params. and 𝑆 𝑛 is the empirical entropy. • AIC and BIC are based on regular learning theory. 7
  • 8. 1. Singular Learning Theory Regular Case • From regular learning theory, if the posterior can be approximated by a normal dist., the followings hold: ‒ 𝔼 𝐺 𝑛 = 𝑑 2𝑛 + 𝑜 1 𝑛 , ‒ − log 𝑍 𝑛 = 𝑛𝑆 𝑛 + 𝑑 2 log 𝑛 + 𝑂𝑝 1 , where 𝑑 is the dim. of params. and 𝑆 𝑛 is the empirical entropy. • AIC and BIC are based on regular learning theory. 8 How about singular cases? (singular = non-regular)
  • 9. • Hierarchical models and latent variable models are typical singular models. • Their likelihood and posterior cannot be approximated by any normal dist. ‒ Simple example: the log likelihood is −𝑏2 𝑏 − 𝑎3 2 in w=(a, b) space. 9 1. Singular Learning Theory Singular Case
  • 10. • Hierarchical models and latent variable models are typical singular models. • Their likelihood and posterior cannot be approximated by any normal dist. ‒ Simple example: the log likelihood is −𝑏2 𝑏 − 𝑎3 2 in w=(a, b) space. 10 1. Singular Learning Theory Singular Case Regular learning theory cannot clarify the behavior of their generalization errors and marginal likelihoods.
  • 11. 1. Singular Learning Theory Singular Case • Singular learning theory provides a general theory for the above issue. • Suppose some technical assumption, the followings hold, even if the posterior cannot be approximated by any normal dist.: ‒ 𝔼 𝐺 𝑛 = 𝜆 𝑛 − 𝑚−1 𝑛 log 𝑛 + 𝑜 1 𝑛 log 𝑛 , ‒ − log 𝑍 𝑛 = 𝑛𝑆 𝑛 + 𝜆 log 𝑛 − 𝑚 − 1 log log 𝑛 + 𝑂𝑝 1 , where 𝜆, 𝑚 are constants which depend on 𝑝 𝑥 𝑤 , 𝜑 𝑤 , and 𝑞 𝑥 . 11 [1] Watanabe. 2001
  • 12. 1. Singular Learning Theory Singular Case • Singular learning theory provides a general theory for the above issue. • Suppose some technical assumption, the followings hold, even if the posterior cannot be approximated by any normal dist.: ‒ 𝔼 𝐺 𝑛 = 𝜆 𝑛 − 𝑚−1 𝑛 log 𝑛 + 𝑜 1 𝑛 log 𝑛 , ‒ − log 𝑍 𝑛 = 𝑛𝑆 𝑛 + 𝜆 log 𝑛 − 𝑚 − 1 log log 𝑛 + 𝑂𝑝 1 , where 𝜆, 𝑚 are constants which depend on 𝑝 𝑥 𝑤 , 𝜑 𝑤 , and 𝑞 𝑥 . 12 What are these constants? [1] Watanabe. 2001
  • 13. 1. Singular Learning Theory Invariants in Algebraic Geometry • Def. A real log canonical threshold (RLCT) is defined by a negative maximum pole of a zeta function 𝜁 𝑧 = 𝑊 𝐾 𝑤 𝑧 𝑏 𝑤 d𝑤 , ‒ where K(w) and b(w) are non-negative and analytic. • Thm. Put 𝐾 𝑤 = KL 𝑞||𝑝 and 𝑏 𝑤 = 𝜑 𝑤 , then the RLCT is the learning coefficient 𝜆 and the order of the maximum pole is the multiplicity 𝑚. 13 This is an important result in singular learning theory. [1] Watanabe. 2001
  • 14. 1. Singular Learning Theory Invariants in Algebraic Geometry 14 The lines are the set K(w)=0 in the parameter space and the star is the “deepest” singularity.
  • 15. 1. Singular Learning Theory Invariants in Algebraic Geometry 15 The lines are the set K(w)=0 in the parameter space and the star is the “deepest” singularity. Corresponding to the maximum pole of the zeta function 𝜁 𝑧 = 𝐶 𝑧 + 𝜆 𝑚 + ⋯ 𝐎𝐗 𝐗 𝐗 𝒛 = −𝝀 ℂ
  • 16. 1. Singular Learning Theory Invariants in Algebraic Geometry • Properties of 𝜆 and 𝑚 are as follows: ‒ 𝜆 is a positive rational # and 𝑚 is a positive integer. ‒ They are birational invariants.  We can determine them using blowing-ups, mathematically supported by Hironaka’s Singularity Resolution Theorem. • Application of 𝜆 and 𝑚 are as follows: ‒ Nagata showed that an exchange probability in replica MCMC is represented by using 𝜆. ‒ Drton proposed sBIC which approximates log 𝑍 𝑛 by using the RLCTs and the multiplicities of candidates 𝑝𝑖, 𝜑, 𝑞 = 𝑝𝑗 : 𝑠𝐵𝐼𝐶𝑖𝑗 "=" loglike 𝑤MLE − 𝜆𝑖𝑗 log 𝑛 + 𝑚𝑖𝑗 − 1 log log 𝑛 . 16 [3] Nagata. 2008 [2] Hironaka. 1964 [4] Drton. 2017-a
  • 17. 1. Singular Learning Theory Invariants in Algebraic Geometry > “We can determine them using blowing-ups” In fact, statistician and machine learning researchers have studied RLCTs for concreate models: Besides, Imai has proposed a consistent estimator of an RLCT ([9] Imai. 2019). He’ll talk about it tomorrow at this conf. 17 Singular model Author and year Gaussian mixture model Yamazaki et. al. in 2003 [5] Reduced rank regression = MF Aoyagi et. al. in 2005 [6] Naïve Bayesian network Rusakov et. al. in 2005 [7] Markov model Zwiernik in 2011 [8] … …
  • 18. Outline 1. 2. Parameter Region Restriction 1. Introduction 2. Non-negative Matrix Factorization 3. Latent Dirichlet Allocation 4. Effect of Restriction 3. 18
  • 19. Outline 1. 2. Parameter Region Restriction 1. Introduction 3. 19
  • 20. 2. Parameter Region Restriction Motivation • Sometimes, parameter regions are often restricted because of interpretability. 1. Non-negative restriction 2. Simplex restriction, etc. 20 Coefficients Coefficients Non-negative restriction Legend ・TVCM ・DM ・Rating ・Reviews E.g. Logistic regression of purchase existence for a product. [14] Kohjima. 2016
  • 21. 2. Parameter Region Restriction Motivation • Sometimes, parameter regions are often restricted because of interpretability. 1. Non-negative restriction 2. Simplex restriction, etc. 21 Coefficients Coefficients Non-negative restriction Legend ・TVCM ・DM ・Rating ・Reviews E.g. Logistic regression of purchase existence for a product. [14] Kohjima. 2016 What happens when restrictions are added? How does the generalization error change?
  • 22. 2. Parameter Region Restriction Motivation • A future application of clarifying the effect of restriction to generalization is as follows. 22 CustomerStatistician
  • 23. 2. Parameter Region Restriction Motivation • A future application of clarifying the effect of restriction to generalization is as follows. 23 CustomerStatistician We want to know what happens when the contributions of explanatories are restricted to non-negative. We need high prediction accuracy and an interpretable model.
  • 24. 2. Parameter Region Restriction Motivation • A future application of clarifying the effect of restriction to generalization is as follows. 24 CustomerStatistician We want to know what happens when the contributions of explanatories are restricted to non-negative. We need high prediction accuracy and an interpretable model. We can answer it. If the parameter is restricted to non- negative, the prediction performance is reduced by Foo points when n = Bar. To achieve the needed accuracy, we recommend increasing n to Bar+Baz.
  • 25. 2. Parameter Region Restriction Revisiting Analytic Set 25 The lines are the set K(w)=0 in the parameter space and the star is the “deepest” singularity. Corresponding to the maximum pole of the zeta function 𝜁 𝑧 = 𝐶 𝑧 + 𝜆 𝑚 + ⋯
  • 26. 2. Parameter Region Restriction Revisiting Analytic Set 26 The lines are the set K(w)=0 in the parameter space and the star is the “deepest” singularity. When a restriction is added,
  • 27. 2. Parameter Region Restriction Revisiting Analytic Set 27 The lines are the set K(w)=0 in the parameter space and the star is the “deepest” singularity. When a restriction is added, The deepest singularity is changed.
  • 28. 2. Parameter Region Restriction Revisiting Analytic Set 28 The lines are the set K(w)=0 in the parameter space and the star is the “deepest” singularity. When a restriction is added, The deepest singularity is changed. I.e.
  • 29. 2. Parameter Region Restriction Revisiting Analytic Set 29 The lines are the set K(w)=0 in the parameter space and the star is the “deepest” singularity. When a restriction is added, The deepest singularity is changed. I.e. The RLCT and the multiplicity become different. A theoretical simple example is in Appendix.
  • 30. 2. Parameter Region Restriction Recent Studies Two recent studies of singular learning theory for parameter restricted models will be introduced. • Non-negative matrix factorization (NMF) ‒ Based on our previous works: https://doi.org/10.1016/j.neucom.2017.04.068 [10] https://doi.org/10.1109/ssci.2017.8280811 [11] https://doi.org/10.1016/j.neunet.2020.03.009 [12] • Latent Dirichlet allocation (LDA) ‒ Based on our previous work: https://doi.org/10.1007/s42979-020-0071-3 [13] 30
  • 31. Outline 1. 2. Parameter Region Restriction 2. Non-negative Matrix Factorization 3. 31
  • 32. 2. Parameter Region Restriction Non-negative matrix factorization • NMF as a statistical model is formulized as follows. ‒ Data matrices: 𝑋 𝑛 = 𝑋 1 , … , 𝑋 𝑛 ; 𝑀 × 𝑁 × 𝑛 are i.i.d. and subject to 𝑞 𝑋𝑖𝑗 = Poi 𝑋𝑖𝑗| 𝐴0 𝐵0 𝑖𝑗 .  True factorization: 𝐴; 𝑀 × 𝐻0, 𝐵; 𝐻0 × 𝑁 ‒ Set the model 𝑝 𝑋𝑖𝑗|𝐴, 𝐵 = Poi 𝑋𝑖𝑗| 𝐴𝐵 𝑖𝑗 and the prior 𝜑 𝐴, 𝐵 = Gam 𝐴𝑖𝑘|𝜙 𝐴, 𝜃 𝐴 Gam 𝐵 𝑘𝑗|𝜙 𝐵, 𝜃 𝐵 .  Learner factorization: 𝐴; 𝑀 × 𝐻, 𝐵; 𝐻 × 𝑁 32 n X A B 𝑃 𝑋, 𝐴, 𝐵 = 𝑃 𝑋 𝐴, 𝐵 𝑃 𝐴 𝑃 𝐵 Poi 𝑥|𝑐 = 𝑐 𝑥 𝑒−𝑐 𝑥! 𝑐 > 0 + 𝛿 𝑥 𝑐 = 0 Gam 𝑎|𝜙, 𝜃 = 𝜃 𝜙 Γ 𝜃 𝑎 𝜙 𝑒−𝜃𝑎
  • 33. 2. Parameter Region Restriction Non-negative matrix factorization • NMF as a statistical model is formulized as follows. ‒ Data matrices: 𝑋 𝑛 = 𝑋 1 , … , 𝑋 𝑛 ; 𝑀 × 𝑁 × 𝑛 are i.i.d. and subject to 𝑞 𝑋𝑖𝑗 = Poi 𝑋𝑖𝑗| 𝐴0 𝐵0 𝑖𝑗 .  True factorization: 𝐴; 𝑀 × 𝐻0, 𝐵; 𝐻0 × 𝑁 ‒ Set the model 𝑝 𝑋𝑖𝑗|𝐴, 𝐵 = Poi 𝑋𝑖𝑗| 𝐴𝐵 𝑖𝑗 and the prior 𝜑 𝐴, 𝐵 = Gam 𝐴|𝜙 𝐴, 𝜃 𝐴 Gam 𝐵|𝜙 𝐵, 𝜃 𝐵 .  Learner factorization: 𝐴; 𝑀 × 𝐻, 𝐵; 𝐻 × 𝑁 33 n X A B 𝑃 𝑋, 𝐴, 𝐵 = 𝑃 𝑋 𝐴, 𝐵 𝑃 𝐴 𝑃 𝐵 Poi 𝑥|𝑐 = 𝑐 𝑥 𝑒−𝑐 𝑥! 𝑐 > 0 + 𝛿 𝑥 𝑐 = 0 Gam 𝑎|𝜙, 𝜃 = 𝜃 𝜙 Γ 𝜃 𝑎 𝜙 𝑒−𝜃𝑎 Matrix X is factorized to a product of two matrices. 𝑿𝑴 𝑵 𝑯 𝑵 𝑴 𝑨 𝑩𝑯 [14] Kohjima. 2016
  • 34. 2. Parameter Region Restriction RLCT of NMF • The RLCT of NMF 𝝀 satisfies the following inequality: 𝝀 ≤ 𝟏 𝟐 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴𝝓 𝑨, 𝑵𝝓 𝑩 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟏 . The equality holds if 𝑯 = 𝑯 𝟎 = 𝟏 or 𝑯 𝟎 = 𝟎. • Tighter upper bound is also derived if 𝝓 𝑨 = 𝝓 𝑩 = 𝟏. • This result gives a lower bound of variational approximation error in NMF. 34 [15] Kohjima. 2017 [12] H. 2020-a [11] H. and Watanabe 2017-b [10] H. and Watanabe 2017-a [12] H. 2020-a
  • 35. Outline 1. 2. Parameter Region Restriction 3. Latent Dirichlet Allocation 3. 35
  • 36. 2. Parameter Region Restriction Latent Dirichlet Allocation • Situation: ‒ LDA treats a bag of words. ‒ Each document (=word list) has some latent topics.  E.g. A mathematics paper is a document. It is expected that there exists “name” topic, “math” topic, etc. In “name” topic, appearance probability of mathematician’s names may be high. 36 MATH NAME … Riemann, Lebesgue, Atiyah, Hironaka, … integral, measure, distribution, singularity, … document topic word word
  • 37. 2. Parameter Region Restriction Latent Dirichlet Allocation • Documents 𝑧 𝑛 and words 𝑥 𝑛 are observable and topics 𝑦 𝑛 are not. • LDA assumes that words occur given documents. 37 MATH NAME … Riemann, Lebesgue, Atiyah, Hironaka, … integral, measure, distribution, singularity, … document topic word word n xyz 𝑥 𝑛 ∼ 𝑞 𝑥 𝑧 𝑝 𝑥, 𝑦 𝑧, 𝑤 estimate
  • 38. 2. Parameter Region Restriction Latent Dirichlet Allocation 38 FOOD Alice sushi NAME MATH Riemann integral NAME ・ ・ ・ ・ ・ ・ FOOD pudding ・ ・ ・ NAME Lebesgue Data(=word) generating process of LDA Document 1 Document N [13] H. and Watanabe 2020-b
  • 39. 2. Parameter Region Restriction Latent Dirichlet Allocation 39 FOOD Alice sushi NAME MATH Riemann integral NAME ・ ・ ・ ・ ・ ・ FOOD pudding ・ ・ ・ NAME Lebesgue Data(=word) generating process of LDA Document 1 Document N Topic proportion 𝑏𝑗 = 𝑏1𝑗, … , 𝑏 𝐻𝑗 is corresponding to each document. Topic 1 Topic 2 Topic H [13] H. and Watanabe 2020-b
  • 40. 2. Parameter Region Restriction Latent Dirichlet Allocation 40 FOOD Alice sushi NAME MATH Riemann integral NAME ・ ・ ・ ・ ・ ・ FOOD pudding ・ ・ ・ NAME Lebesgue Data(=word) generating process of LDA Document 1 Document N Topic proportion 𝑏𝑗 = 𝑏1𝑗, … , 𝑏 𝐻𝑗 is corresponding to each document. Word appearance probability 𝑎 𝑘 = 𝑎1𝑘, … , 𝑎 𝑀𝑘 is corresponding to each topic. Topic 1 Topic 2 Topic H [13] H. and Watanabe 2020-b
  • 41. 2. Parameter Region Restriction Latent Dirichlet Allocation • LDA is formulized as follows. ‒ Onehot-encoded words: 𝑥 𝑛 = 𝑥 1 , … , 𝑥 𝑛 ; 𝑀 × 𝑛 and documents: 𝑧 𝑛 = 𝑧 1 , … , 𝑧 𝑛 ; 𝑁 × 𝑛 are i.i.d. and generated from 𝑞 𝑥|𝑧 𝑞 𝑧 . ‒ Set the model 𝑝 𝑥, 𝑦|𝑧, 𝐴, 𝐵 = 𝑗 𝑁 𝑘 𝐻 𝑏 𝑘𝑗 𝑖 𝑀 𝑎𝑖𝑘 𝑥 𝑖 𝑦 𝑘 𝑧 𝑗 and the prior 𝜑 𝐴, 𝐵 = 𝑘 Dir 𝑎 𝑘|𝜙 𝐴 𝑗 Dir 𝑏 𝑘|𝜙 𝐵 .  Latent topics: 𝑦 𝑛 = 𝑦 1 , … , 𝑦 𝑛 ; 𝐾 × 𝑛.  Stochastic matrices: 𝐴; 𝑀 × 𝐻, 𝐵; 𝐻 × 𝑁, 𝑘 𝑎𝑖𝑘 = 1, 𝑗 𝑏 𝑘𝑗 = 1. 41 𝑃 𝑋, 𝑌, 𝐴, 𝐵|𝑍 = 𝑃 𝑋, 𝑌 𝑍, 𝐴, 𝐵 𝑃 𝐴 𝑃 𝐵 ; Dir 𝑐|𝜙 = Γ 𝑘 𝐻 𝜙 𝑘 𝑘 𝐻 𝜙 𝑘 𝑘 𝐻 𝑐 𝑘 𝜙 𝑘−1 , 𝑘 𝑐 𝑘 = 1.
  • 42. 2. Parameter Region Restriction Stochastic Matrix Factorization • In the NMF, consider formally replacing parameter non-negative matrices to stochastic matrices. ‒ A stochastic matrix is defined by a non-negative matrix wherein the sum of the entries in its columns is equal to 1. Example: 0.1 0.1 0.4 0 0.5 0.1 0.4 0 0.4 0.8 0.2 1 . • This replaced model is called stochastic matrix factorization (SMF). 42
  • 43. 2. Parameter Region Restriction Equivalence of LDA and SMF • Let 𝐾 𝑤 = 𝑧 𝑥 𝑞 𝑥 𝑧 𝑞 𝑧 log 𝑞 𝑥 𝑧 𝑝 𝑥 𝑧, 𝐴, 𝐵 and 𝐻 𝑤 = 𝐴𝐵 − 𝐴 𝑜 𝐵𝑜 2 where 𝑞 𝑥 𝑧 is 𝑦 𝑝 𝑥, 𝑦 𝑧, 𝐴0, 𝐵0 . • It can be proved that 𝐾 𝑤 ∼ 𝐻 𝑤 , i.e. the RLCT of LDA is equal to the one of SMF. 43 [13] H. 2020-b
  • 44. 2. Parameter Region Restriction Equivalence of LDA and SMF • Let 𝐾 𝑤 = 𝑧 𝑥 𝑞 𝑥 𝑧 𝑞 𝑧 log 𝑞 𝑥 𝑧 𝑝 𝑥 𝑧, 𝐴, 𝐵 and 𝐻 𝑤 = 𝐴𝐵 − 𝐴 𝑜 𝐵𝑜 2 where 𝑞 𝑥 𝑧 is 𝑦 𝑝 𝑥, 𝑦 𝑧, 𝐴0, 𝐵0 . • It can be proved that 𝐾 𝑤 ∼ 𝐻 𝑤 , i.e. the RLCT of LDA is equal to the one of SMF. 44 Thus, we only have to consider SMF to determine 𝜆 and 𝑚 of LDA. [13] H. 2020-b
  • 45. 2. Parameter Region Restriction RLCT of LDA = RLCT of SMF • If the prior is positive and bounded, the RLCT of LDA 𝝀 satisfies the following inequality: 𝝀 ≤ 𝟏 𝟐 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴 𝟏, 𝑵 + 𝑯 𝟎 − 1 𝑴 𝟏 + 𝑵 − 𝟐 + 𝑴 𝟏 , where 𝑴 𝟏 = 𝑴 − 𝟏. The equality holds if 𝑯 = 𝑯 𝟎 = 𝟏, 𝟐. • Also, if 𝑯 = 𝟐 𝐚𝐧𝐝 𝑯 𝟎 = 𝟏, 𝝀 = 1 2 max 𝑀, 𝑁 + 𝑀 − 2 . 45 [13] H. 2020-b
  • 46. Outline 1. 2. Parameter Region Restriction 4. Effect of Restriction 3. 46
  • 47. 2. Parameter Region Restriction Effect of Restriction • Parameter region of NMF is non-negative matrix. • Parameter region of SMF is stochastic matrix. • If the parameter region is real matrix, then the model is called matrix factorization. ‒ In the non-restricted case, Aoyagi clarified the exact value of 𝜆 and 𝑚. • How different? 47 [6] Aoyagi. 2005
  • 48. 2. Parameter Region Restriction Effect of Restriction • Because of the restriction, the rank of matrix has different meaning. ‒ In NMF, the minimal inner dimension of the true matrix factorization is NOT rank; non-negative rank. ‒ The boundary of the parameter region causes that the usual rank is not equal to the non-negative rank. ‒ Since SMF is a restricted NMF, its minimal factorizations are also affected. 48 [16] Cohen, et. al. 1993
  • 49. 2. Parameter Region Restriction Effect of Restriction • Because of the restriction, the rank of matrix has different meaning. ‒ In NMF, the minimal inner dimension of the true matrix factorization is NOT rank; non-negative rank. ‒ The boundary of the parameter region causes that the usual rank is not equal to the non-negative rank. ‒ Since SMF is a restricted NMF, its minimal factorizations are also affected. 49 𝐻0 = 0 in NMF and 𝐻0 = 1 in SMF are such cases. In fact, the RLCTs are not equal to the one in the non-restricted case. [16] Cohen, et. al. 1993
  • 50. 2. Parameter Region Restriction Effect of Restriction • In general, narrowing the parameter region increases RLCTs; generalization error increases. ‒ NMF is such case. Difference of RLCTs gives the effect. 50
  • 51. 2. Parameter Region Restriction Effect of Restriction • In general, narrowing the parameter region increases RLCTs; generalization error increases. ‒ NMF is such case. Difference of RLCTs shows the effect. • However, restriction with decreasing the parameter dimension does not increase them. ‒ SMF is such case because simplex constraint obviously decreases the dimension. 51 0.1 0.1 0.4 0 0.5 0.1 0.4 0 0.4 0.8 0.2 1 𝑎 𝑀𝑘 = 1 − 𝑖=1 𝑀−1 𝑎𝑖𝑘
  • 52. 2. Parameter Region Restriction Effect of Restriction • In general, narrowing the parameter region increases RLCTs; generalization error increases. ‒ NMF is such case. Difference of RLCTs gives the effect. • However, restriction with decreasing the parameter dimension does not increase them. ‒ SMF is such case because simplex constraint obviously decrease the dimension. 52 0.1 0.1 0.4 0 0.5 0.1 0.4 0 0.4 0.8 0.2 1 𝑎 𝑀𝑘 = 1 − 𝑖=1 𝑀−1 𝑎𝑖𝑘 One of future work is more precious evaluation for effect of parameter region restriction.
  • 54. 3. Summary • Singular learning theory provides a general theory for determination of behaviors of Bayesian generalization error and marginal likelihood by using algebraic geometry, even if the model is singular. • In the above theory, as a foundation for clarifying the effect of parameter region restriction, we derive upper bounds of the RLCTs for typical restricted models. 54
  • 55. References [1] Watanabe S. Algebraic geometrical methods for hierarchical learning machines. Neural Netw. 2001;13(4):1049–60. [2] Hironaka H. Resolution of singularities of an algbraic variety over a field of characteristic zero. Ann Math. 1964;79:109–326. [3] Nagata K, Watanabe S. Asymptotic behavior of exchange ratio in exchange monte carlo method. Neural Netw. 2008;21(7):980–8. [4] Drton M, Plummer M. A bayesian information criterion for singular models. J R Stat Soc B. 2017;79:323–80 with discussion. [5] Yamazaki K, Watanabe S. Singularities in mixture models and upper bounds of stochastic complexity. Neural Netw. 2003;16(7):1029–38. [6] Aoyagi M, Watanabe S. Stochastic complexities of reduced rank regression in bayesian estimation. Neural Netw. 2005;18(7):924–33. [7] Rusakov D, Geiger D. Asymptotic model selection for naive bayesian networks. J Mach Learn Res. 2005;6(Jan):1–35. [8] Zwiernik P. An asymptotic behaviour of the marginal likelihood for general markov models. J Mach Learn Res. 2011;12(Nov):3283–310. [9] Imai T. Estimating real log canonical threshold. arXiv1906.01341. 1-28. 55
  • 56. References [10] H N, Watanabe S. Upper bound of bayesian generalization error in non-negative matrix factorization. Neurocomputing. 2017;266C(29 November):21–8. [11] H, N., & Watanabe, S. (2017a). Tighter upper bound of real log canonical threshold of non-negative matrix factorization and its application to bayesian inference. In IEEE symposium series on computational intelligence (IEEE SSCI) (pp. 718–725). [12] H N. Variational approximation error in non-negative matrix factorization. Neural Netw. 2020;126(June):65-75. [13] H N, Watanabe S. Asymptotic bayesian generalization error in latent dirichlet allocation and stochastic matrix factorization. SN Computer Science. 2020:1(2);1–22. [14] Kohjima M, Matsubayashi T, Sawada H. Multiple data analysis and non-negative matrix/tensor factorization [I]: multiple data analysis and its advances. IEICE Transaction. 2016:99(6);543-550. In Japanese. [15] Kohjima M., & Watanabe S. (2017). Phase transition structure of variational bayesian nonnegative matrix factorization. In International conference on artificial neural networks (ICANN) (pp. 146–154). [16] Cohen J.E. , Rothblum U.G. Nonnegative ranks, decompositions, and factoriza- tions of nonnegative matrices. Linear Algebra Appl. 1993;190:149–168 . 56
  • 58. Question1 How much is the difference Q. How much is the difference of the RLCTs between the restricted case (our result) and the non-restricted case (Aoyagi’s result)? A. The effect of restriction occurs in the boundary of the parameter space. I cannot answer the exact value of the difference in the all cases (I haven’t memorized Aoyagi’s result……); however, in the case our result clarifies the exact value of the RLCT in restricted case, the difference between the RLCTs is the exact answer. 58
  • 59. Question1 How much is the difference Q. How much is the difference of the RLCTs between the restricted case (our result) and the non-restricted case (Aoyagi’s result)? From Aoyagi’s result, in the case 𝐻0 = 0 in NMF (same dimension and boundary case), the difference is as follows. diff = 𝜆 𝑁𝑀𝐹 − 𝜆 𝑀𝐹 = 1 2 𝐻 min 𝑀, 𝑁 − 𝜆 𝑀𝐹 Note: Aoyagi’s result assumes the prior is posisitive and bounded; thus, the hyperparameters in NMF are set to 1 in the above. Note2: 𝜆 𝑀𝐹 is changed by the condition of the matrix size. The next slide describes. 59
  • 60. Question1 How much is the difference In the case 𝐻0 = 0 , the RLCT in Aoyagi’s result is as follows. Assume min{M,N}>=2 and min{M,N}>= H>=1. If N<M+H and M<N+H and H<M+N and M+H+N is even, 𝜆 𝑀𝐹 = 1 8 2𝐻 𝑀 + 𝑁 − 𝑀 − 𝑁 2 − 𝐻2 , Else if N<M+H and M<N+H and H<M+N and M+H+N is odd, 𝜆 𝑀𝐹 = 1 8 2𝐻 𝑀 + 𝑁 − 𝑀 − 𝑁 2 − 𝐻2 + 1 , Else if M+H<N (-> M<N), 𝜆 𝑀𝐹 = 1 2 𝑀𝐻, Else if N+H<M (-> N<M), 𝜆 𝑀𝐹 = 1 2 𝑁𝐻. 60
  • 61. Question1 How much is the difference In the case 𝐻0 = 0 , the difference in the question is as follows. Assume min{M,N}>=2 and min{M,N}>= H>=1. If N<M+H and M<N+H and H<M+N and M+H+N is even, diff = 1 8 4𝐻 min 𝑀, 𝑁 − 2𝐻 𝑀 + 𝑁 + 𝑀 − 𝑁 2 + 𝐻2 , Else if N<M+H and M<N+H and H<M+N and M+H+N is odd, diff = 1 8 4𝐻 min 𝑀, 𝑁 − 2𝐻 𝑀 + 𝑁 + 𝑀 − 𝑁 2 + 𝐻2 − 1 , Else if M+H<N (-> M<N), diff = 0, Else if N+H<M (-> N<M), diff = 0. 61 E.g. If M=N=4 and H=2, the difference is equal to (1/8)*(4*2*4 - 2*2*8 + 0 + 4)=1/2 by using the first case.
  • 62. Question2 The exact value of the RLCT of LDA Q. In the inequality that shows an upper bound of the RLCT of LDA, the equality hold if H=H_0=1 or 2. However, the exact value in the case of H=2 and H_0 = 1 is also found. What does it mean? A. When H=H_0=1 or 2, the upper bound is equal to the exact value of the RLCT. If H=2 and H_0=1, the upper bound is strictly larger than the exact value but we can find the exact value by using the method which is different from the method to derive the upper bound and its equality. 62
  • 64. Appendix Simple Example • Let 𝐾 𝑤 = 𝑎𝑏 + 𝑐𝑑 2 and 𝜑 𝑤 ≡ 1, where 𝑤 ∈ −1,1 4 =: 𝑊. • Using blowing-up 𝑎, 𝑏, 𝑐, 𝑑 = 𝑎, 𝑏𝑑, 𝑐, 𝑑 , we have 𝐾 𝑤 = 𝑑2 𝑎𝑏 + 𝑐 2 and its Jacobian is 𝑑 . • Besides, applying linear transformation 𝑎, 𝑏, 𝑐, 𝑑 = 𝑎, 𝑏, 𝑐 + 𝑎𝑏, 𝑑 , 𝐾 𝑤 = 𝑐2 𝑑2. 64
  • 65. Appendix Simple Example • Then, the zeta function is −1 1 −1 1 𝑎𝑏−1 𝑎𝑏+1 −1 1 𝑐2𝑧 𝑑2𝑧 𝑑 d𝑤 • For any −1 ≤ 𝑎, 𝑏, ≤ 1, we can take a neighborhood (nbhd) of c=0 as an open set. • The RLCT does not change when we change the integral interval from 𝑎𝑏 − 1, 𝑎𝑏 + 1 to −𝜀, 𝜀 for any 𝜀 > 0. • Thus, we consider −1 1 −1 1 −𝜀 𝜀 −1 1 𝑐2𝑧 𝑑2𝑧+1 d𝑎d𝑏d𝑐d𝑑 ∝ 1 𝑧 + 1 2𝑧 + 1 and obtain 𝜆 = 1 2 , 𝑚 = 1. 65
  • 66. Appendix Simple Example • Then, the zeta function is −1 1 −1 1 𝑎𝑏−1 𝑎𝑏+1 −1 1 𝑐2𝑧 𝑑2𝑧 𝑑 d𝑤 • For any −1 ≤ 𝑎, 𝑏, ≤ 1, we can take a neighborhood (nbhd) of c=0 as an open set. • The RLCT does not change when we change the integral interval from 𝑎𝑏 − 1, 𝑎𝑏 + 1 to −𝜀, 𝜀 for any 𝜀 > 0. • Thus, we consider −1 1 −1 1 −𝜀 𝜀 −1 1 𝑐2𝑧 𝑑2𝑧+1 d𝑎d𝑏d𝑐d𝑑 ∝ 1 𝑧 + 1 2𝑧 + 1 and obtain 𝜆 = 1 2 , 𝑚 = 1. 66 How about it when w is non-negative?
  • 67. Appendix Simple Example • Let 𝐾 𝑤 = 𝑎𝑏 + 𝑐𝑑 2 and 𝜑 𝑤 ≡ 1, where 𝑤 ∈ 0, 1 4 =: 𝑊.(Non-negative case) • Using blowing-up 𝑎, 𝑏, 𝑐, 𝑑 = 𝑎, 𝑏𝑑, 𝑐, 𝑑 , we have 𝐾 𝑤 = 𝑑2 𝑎𝑏 + 𝑐 2 and its Jacobian is 𝑑 . • Besides, applying linear transformation 𝑎, 𝑏, 𝑐, 𝑑 = 𝑎, 𝑏, 𝑐 + 𝑎𝑏, 𝑑 , 𝐾 𝑤 = 𝑐2 𝑑2. • Then, the zeta function is 0 1 0 1 𝑎𝑏 𝑎𝑏+1 0 1 𝑐2𝑧 𝑑2𝑧 𝑑 d𝑤 . 67
  • 68. Appendix Simple Example • • • • 68 It looks as same as the previous slide. However, …
  • 69. Appendix Simple Example 0 1 0 1 𝑎𝑏 𝑎𝑏+1 0 1 𝑐2𝑧 𝑑2𝑧 𝑑 d𝑤 • Because of non-negativity, we cannot take any nbhd of c=0 as an open set. ‒ There is no 𝜀 > 0 such that the RLCT does not change when we change the integral interval from from 𝑎𝑏, 𝑎𝑏 + 1 to 0, 𝜀 . • We must consider other method (not directly calculating zeta function) to determine 𝜆 and 𝑚. 69
  • 70. Appendix Simple Example In this case, we can obtain the exact value of them by using the following “equivalent lemma”. 70 • Let 𝐾: 𝑊 → ℝ and 𝐻: 𝑊 → ℝ be non-negative and analytic functions. • If there exist constants 𝑐1, 𝑐2 such that 𝑐1 𝐾 𝑤 ≤ 𝐻 𝑤 ≤ 𝑐2 𝐾 𝑤 , then their RLCTs and their multiplicities are same: 𝜆 𝐾 = 𝜆 𝐻 , 𝑚 𝐾 = 𝑚 𝐻 . An equivalent relation K~H is defined by the RLCTs and the multiplicities of K and H are same.
  • 71. Appendix Simple Example Reviting the setting: 𝐾 𝑤 = 𝑎𝑏 + 𝑐𝑑 2, 𝜑 𝑤 ≡ 1, where 𝑤 = 𝑎, 𝑏, 𝑐, 𝑑 ∈ 0, 1 4 =: 𝑊. • 2 𝑎2 𝑏2 + 𝑐2 𝑑2 − 𝑎𝑏 + 𝑐𝑑 2 = 𝑎𝑏 − 𝑐𝑑 2 ≥ 0. • 𝑎𝑏 + 𝑐𝑑 2 − 𝑎2 𝑏2 + 𝑐2 𝑑2 = 2𝑎𝑏𝑐𝑑 ≥ 0. • Thus, the following holds 𝑎2 𝑏2 + 𝑐2 𝑑2 ≤ 𝐾 𝑤 ≤ 2 𝑎2 𝑏2 + 𝑐2 𝑑2 , i.e. the RLCT is same as one of 𝑎2 𝑏2 + 𝑐2 𝑑2. • By simple calculation, we obtain 𝜆 = 1, 𝑚 = 2. 71
  • 72. Appendix Simple Example Reviting the setting: 𝐾 𝑤 = 𝑎𝑏 + 𝑐𝑑 2, 𝜑 𝑤 ≡ 1, where 𝑤 = 𝑎, 𝑏, 𝑐, 𝑑 ∈ 0, 1 4 =: 𝑊. • 2 𝑎2 𝑏2 + 𝑐2 𝑑2 − 𝑎𝑏 + 𝑐𝑑 2 = 𝑎𝑏 − 𝑐𝑑 2 ≥ 0. • 𝑎𝑏 + 𝑐𝑑 2 − 𝑎2 𝑏2 + 𝑐2 𝑑2 = 2𝑎𝑏𝑐𝑑 ≥ 0. • Thus, the following holds 𝑎2 𝑏2 + 𝑐2 𝑑2 ≤ 𝐾 𝑤 ≤ 2 𝑎2 𝑏2 + 𝑐2 𝑑2 , i.e. the RLCT is same as one of 𝑎2 𝑏2 + 𝑐2 𝑑2. • By simple calculation, we obtain 𝜆 = 1, 𝑚 = 2. 72 They are different from the non- restriction case (𝜆 = 1 2 , 𝑚 = 1).

Hinweis der Redaktion

  1. NMFを使っている人的にこの研究の有用性を問われた場合: NMFを統計モデルと考えると、内部次元の大きさを適切に選択することは有用であり、そのために理論研究を行っている。
  2. NMFを使っている人的にこの研究の有用性を問われた場合: NMFを統計モデルと考えると、内部次元の大きさを適切に選択することは有用であり、そのために理論研究を行っている。