Hedibert Lopes' talk at BigMC

Example 1

Normal factor
Parsimonious Bayesian factor analysis when
model
Variance
the number of factors is unknown1
structure
Identiﬁcation
issues
Number of
parameters
Ordering of the Hedibert Freitas Lopes
variables

Example 2 Booth School of Business
Reduced-rank
loading matrix
University of Chicago
Structured
loadings

Parsimonious
BFA
New
identiﬁcation
conditions
Theorem

Example 3 S´minaire BIG’MC
e
Example 4: M´thodes de Monte Carlo en grande dimension
e
simulating
large data Institut Henri Poincar´
e
Example 1:
revisited
March 11th 2011
1
Conclusion Joint work with Sylvia Fr¨hwirth-Schnatter.
u

1 Example 1
2 Normal factor model
Example 1
Variance structure
Normal factor
model Identification issues
Variance
structure
Identification
Number of parameters
issues
Number of
Ordering of the variables
parameters
Ordering of the
variables 3 Example 2
Example 2 Reduced-rank loading matrix
Reduced-rank
loading matrix
4 Structured loadings
Structured
loadings
5 Parsimonious BFA
Parsimonious
BFA New identification conditions
New
identification
conditions
Theorem
Theorem

Example 3
6 Example 3
Example 4: 7 Example 4: simulating large data
simulating
large data
8 Example 1: revisited
Example 1:
revisited
9 Conclusion
Conclusion

Example 1. Early origins of health
Example 1

Normal factor
model Conti, Heckman, Lopes and Piatek (2011) Constructing
Variance
structure
Identification
economically justified aggregates: an application to the early
issues
Number of origins of health. Journal of Econometrics (to appear).
parameters
Ordering of the
variables Here we focus on a subset of the British Cohort Study started
Example 2
Reduced-rank
in 1970.
loading matrix

Structured 7 continuous cognitive tests (Picture Language
loadings

Parsimonious
Comprehension,Friendly Math, Reading, Matrices, Recall
BFA Digits, Similarities, Word Definition),
New
identification
conditions
Theorem 10 continuous noncognitive measurements (Child
Example 3 Developmental Scale).
Example 4:
simulating A total of m = 17 measurements on T = 2397 10-year old
large data

Example 1:
individuals.
revisited

Conclusion

Correlation matrix (rounded)
Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Number of 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0
parameters
0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0
Ordering of the
variables 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Example 2 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
Reduced-rank 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
loading matrix
0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1
Structured 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
loadings 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0
Parsimonious 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1
BFA 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
New 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0
identiﬁcation 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 -1 0
conditions
0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 1 0
Theorem
0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1
Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Normal factor model
Example 1

Normal factor
model
For any specified positive integer k ≤ m, the standard k−factor
Variance
structure
model relates each yt to an underlying k−vector of random
Identification
issues variables ft , the common factors, via
Number of
parameters
Ordering of the
variables yt |ft ∼ N(βft , Σ)
Example 2
Reduced-rank
loading matrix where
Structured ft ∼ N(0, Ik )
loadings

Parsimonious
BFA
and
2 2
New
identification
Σ = diag(σ1 , · · · , σm )
conditions
Theorem

Example 3
Unconditional covariance matrix
Example 4:
simulating
large data
V (yt |β, Σ) = Ω = ββ + Σ
Example 1:
revisited

Conclusion

Variance structure
Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues
Number of
parameters
Ordering of the
variables Conditional Unconditional
Example 2 var(yit |f ) = σi2 var(yit ) = k βil + σi2
l=1
2
Reduced-rank
loading matrix
cov(yit , yjt |f ) = 0 cov(yit , yjt ) = k βil βjl
l=1
Structured
loadings

Parsimonious
BFA
New
Common factors explain all the dependence structure among
identiﬁcation
conditions the m variables.
Theorem

Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Identification issues
Example 1

Normal factor
model
Variance
structure
Identification
issues
Number of
parameters A rather trivial non-identifiability problem is sign-switching.
Ordering of the
variables

Example 2
Reduced-rank
A more serious problem is factor rotation: invariance under any
loading matrix
transformation of the form
Structured
loadings

Parsimonious β ∗ = βP and ft∗ = Pft ,
BFA
New
identification
conditions where P is any orthogonal k × k matrix.
Theorem

Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Solutions
Example 1

Normal factor
1. β Σ−1 β = I .
model
Variance
Pretty hard to impose!
structure
Identiﬁcation
issues
Number of
2. β is a block lower triangular2 .
parameters
Ordering of the  
variables
β11 0 0 ··· 0
Example 2
Reduced-rank
 β21
 β22 0 ··· 0 

loading matrix  β31 β32 β33 ··· 0 
Structured  
loadings  .
. .
. .
. .. .
. 
 . . . . . 
Parsimonious β=  
BFA
 βk1 βk2 βk3 ··· βkk 

New
identiﬁcation
conditions
 βk+1,1 βk+1,2 βk+1,3
 ··· βk+1,k 

Theorem  .
. .
. .
. .
. .
. 
Example 3
 . . . . . 
Example 4: βm1 βm2 βm3 ··· βmk
simulating
large data

Example 1: Somewhat restrictive, but useful for estimation.
revisited
2
Conclusion Geweke and Zhou (1996) and Lopes and West (2004).

Number of parameters
Example 1
The resulting factor form of Ω has
Normal factor
model
Variance m(k + 1) − k(k − 1)/2
structure
Identiﬁcation
issues
Number of parameters, compared with the total
parameters
Ordering of the
variables
m(m + 1)/2
Example 2
Reduced-rank
loading matrix in an unconstrained (or k = m) model, leading to the
Structured
loadings
constraint that
Parsimonious
BFA 3 9
New
k ≤m+ − 2m + .
identiﬁcation 2 4
conditions
Theorem

Example 3
For example,
Example 4: • m = 6 implies k ≤ 3,
simulating
large data • m = 7 implies k ≤ 4,
Example 1: • m = 10 implies k ≤ 6,
revisited

Conclusion • m = 17 implies k ≤ 12.

Ordering of the variables
Example 1

Normal factor
model
Variance
structure Alternative orderings are trivially produced via
Identiﬁcation
issues
Number of
parameters yt∗ = Ayt
Ordering of the
variables

Example 2 for some switching matrix A.
Reduced-rank
loading matrix

Structured
loadings The new rotation has the same latent factors but transformed
Parsimonious loadings matrix Aβ.
BFA
New
identiﬁcation
conditions y ∗ = Aβf + εt = β ∗ f + εt
Theorem

Example 3

Example 4:
simulating Problem: β ∗ not necessarily block lower triangular.
large data

Example 1:
revisited

Conclusion

Solution
Example 1

Normal factor
model
Variance
structure
Identification
We can always find an orthonormal matrix P such that
issues
Number of
parameters
Ordering of the β = β ∗ P = AβP
˜
variables

Example 2
Reduced-rank is block lower triangular and common factors
loading matrix

Structured
loadings ˜
ft = Pft
Parsimonious
BFA
New
still N(0, Ik ) (Lopes and West, 2004).
identification
conditions
Theorem

Example 3
The order of the variables in yt is immaterial when k is
Example 4: properly chosen, i.e. when β is full-rank.
simulating
large data

Example 1:
revisited

Conclusion

Example 2. Exchange rate data
Example 1

Normal factor
model
Variance
structure
Identiﬁcation • Monthly international exchange rates.
issues
Number of
parameters • The data span the period from 1/1975 to 12/1986
Ordering of the
variables inclusive.
Example 2
Reduced-rank
• Time series are the exchange rates in British pounds of
loading matrix
• US dollar (US)
Structured
loadings • Canadian dollar (CAN)
Parsimonious • Japanese yen (JAP)
BFA
New
• French franc (FRA)
identiﬁcation
conditions • Italian lira (ITA)
Theorem
• (West) German (Deutsch)mark (GER)
Example 3

Example 4:
• Example taken from Lopes and West (2004)
simulating
large data

Example 1:
revisited

Conclusion

Posterior inference of (β, Σ)
Example 1

Normal factor 1st ordering
model
Variance
   
structure US 0.99 0.00 0.05
Identiﬁcation
issues

 CAN 0.95 0.05 


 0.13 

Number of
parameters  JAP 0.46 0.42   0.62 
Ordering of the E (β|y ) =   E (Σ|y ) = diag  
variables 
 FRA 0.39 0.91 


 0.04 

Example 2  ITA 0.41 0.77   0.25 
Reduced-rank
loading matrix GER 0.40 0.77 0.28
Structured
loadings

Parsimonious
2nd ordering
BFA    
New
identiﬁcation
US 0.98 0.00 0.06
conditions
Theorem

 JAP 0.45 0.42 


 0.62 

Example 3
 CAN 0.95 0.03   0.12 
E (β|y ) =   E (Σ|y ) = diag  
Example 4:

 FRA 0.39 0.91 


 0.04 

simulating
large data
 ITA 0.41 0.77   0.25 
Example 1:
GER 0.40 0.77 0.26
revisited

Conclusion

Posterior inference of ft
Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues
Number of
parameters
Ordering of the
variables

Example 2
Reduced-rank
loading matrix

Structured
loadings

Parsimonious
BFA
New
identiﬁcation
conditions
Theorem

Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Reduced-rank loading matrix
Example 1

Normal factor
model
Geweke and Singleton (1980) show that, if β has rank r < k
Variance
structure
then there exists a matrix Q such that
Identification
issues
Number of
parameters
βQ = 0 and Q Q = I
Ordering of the
variables

Example 2 and, for any orthogonal matrix M,
Reduced-rank
loading matrix

Structured ββ + Σ = (β + MQ ) (β + MQ ) + (Σ − MM ).
loadings

Parsimonious
BFA
New
identification This translation invariance of Ω under the factor model implies
conditions
Theorem lack of identification and, in application, induces symmetries
Example 3 and potential multimodalities in resulting likelihood functions.
Example 4:
simulating
large data
This issue relates intimately to the question of uncertainty of
Example 1:
revisited the number of factors.
Conclusion

Example 2. Multimodality
Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues
Number of
parameters
Ordering of the
variables

Example 2
Reduced-rank
loading matrix

Structured
loadings

Parsimonious
BFA
New
identiﬁcation
conditions
Theorem

Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Structured loadings
Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues
Number of
parameters Proposed by Kaiser (1958) the varimax method of orthogonal
Ordering of the
variables rotation aims at providing axes with as few large loadings and
Example 2 as many near-zero loadings as possible.
Reduced-rank
loading matrix

Structured
loadings They are also known as dedicated factors.
Parsimonious
BFA
New Notice that the method can be applied to classical or Bayesian
identiﬁcation
conditions
Theorem
estimates in order to obtain more interpretable factors.
Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Modern structured factor analysis
Example 1

Normal factor
model
Variance
structure • Time-varying factor loadings: dynamic correlations
Identiﬁcation
issues
Number of
Lopes and Carvalho (2007)
parameters
Ordering of the
variables

Example 2 • Spatial dynamic factor analysis
Reduced-rank
loading matrix Lopes, Salazar and Gamerman (2008)
Structured
loadings

Parsimonious • Spatial hierarchical factors: ranking vulnerability
BFA
New Lopes, Schmidt, Salazar, Gomez and Achkar (2010)
identiﬁcation
conditions
Theorem

Example 3 • Sparse factor models
Example 4:
simulating
Fr¨hwirth-Schnatter and Lopes (2010)
u
large data

Example 1:
revisited

Conclusion

Example 1: ML estimates
Example 1

Normal factor ˆ ˆ
model i Measurement βi1 βi2 σi2
ˆ
Variance
structure
1 Picture Language 0.32 0.46 0.68
Identification 2 Friendly Math 0.57 0.59 0.34
issues
Number of 3 Reading 0.56 0.62 0.30
parameters
Ordering of the 4 Matrices 0.41 0.51 0.57
variables
5 Recall digits 0.31 0.29 0.82
Example 2
Reduced-rank
6 Similarities 0.39 0.53 0.57
loading matrix 7 Word definition 0.43 0.55 0.52
Structured 8 Child Dev 2 -0.67 0.18 0.52
loadings
9 Child Dev 17 -0.69 -0.11 0.51
Parsimonious
BFA 10 Child Dev 19 -0.74 0.32 0.35
New 11 Child Dev 20 -0.45 0.33 0.69
identification
conditions 12 Child Dev 23 -0.75 0.42 0.26
Theorem
13 Child Dev 30 -0.52 0.28 0.66
Example 3
14 Child Dev 31 -0.45 0.25 0.73
Example 4:
simulating
15 Child Dev 33 -0.42 0.31 0.73
large data 16 Child Dev 52 0.41 -0.28 0.76
Example 1: 17 Child Dev 53 -0.69 0.41 0.36
revisited

Conclusion

Example 1: varimax rotation
Example 1

Normal factor ˆ ˆ
ˆ
Variance
structure
1 Picture Language 0.00 0.56 0.68
Identification 2 Friendly Math 0.12 0.81 0.34
issues
Number of 3 Reading 0.10 0.83 0.30
parameters
Ordering of the 4 Matrices 0.04 0.66 0.57
variables
5 Recall digits 0.09 0.41 0.82
Example 2
Reduced-rank
6 Similarities 0.01 0.66 0.57
loading matrix 7 Word definition 0.03 0.69 0.52
Structured 8 Child Dev 2 -0.65 -0.25 0.52
loadings
9 Child Dev 17 -0.50 -0.49 0.51
Parsimonious
BFA 10 Child Dev 19 -0.79 -0.17 0.35
New 11 Child Dev 20 -0.56 0.01 0.69
identification
conditions 12 Child Dev 23 -0.85 -0.09 0.26
Theorem
13 Child Dev 30 -0.58 -0.07 0.66
Example 3
14 Child Dev 31 -0.51 -0.05 0.73
Example 4:
simulating
15 Child Dev 33 -0.52 0.02 0.73
large data 16 Child Dev 52 0.49 0.01 0.76
Example 1: 17 Child Dev 53 -0.80 -0.06 0.36
revisited

Conclusion

Example 1: Parsimonious BFA
Example 1

Normal factor ˜ ˜
˜
Variance
structure
1 Picture Language 0 0.57 0.68
Identification 2 Friendly Math 0 0.81 0.34
issues
Number of 3 Reading 0 0.83 0.31
parameters
Ordering of the 4 Matrices 0 0.66 0.56
variables
5 Recall digits 0 0.42 0.83
Example 2
Reduced-rank
6 Similarities 0 0.66 0.56
loading matrix 7 Word definition 0 0.7 0.51
Structured 8 Child Dev 2 -0.68 0 0.54
loadings
9 Child Dev 17 -0.56 0 0.68
Parsimonious
BFA 10 Child Dev 19 -0.81 0 0.35
New 11 Child Dev 20 -0.55 0 0.70
identification
conditions 12 Child Dev 23 -0.86 0 0.27
Theorem
13 Child Dev 30 -0.59 0 0.66
Example 3
14 Child Dev 31 -0.51 0 0.74
Example 4:
simulating
15 Child Dev 33 -0.51 0 0.74
large data 16 Child Dev 52 0.49 0 0.76
Example 1: 17 Child Dev 53 -0.8 0 0.36
revisited

Conclusion

Parsimonious BFA
Example 1

Normal factor
model
Variance
We introduce a more general set of identifiability conditions for
structure
Identification the basic model
issues
Number of
parameters
Ordering of the
variables
yt = Λft + t t ∼ Nm (0, Σ0 ) ,
Example 2
Reduced-rank
loading matrix
which handles the ordering problem in a more flexible way:
Structured C1. Λ has full column-rank, i.e. r = rank(Λ).
loadings

Parsimonious C2. Λ is a generalized lower triangular matrix, i.e.
BFA
New
l1 < . . . < lr , where lj denotes for j = 1, . . . , r the row
identification
conditions index of the top non-zero entry in the jth column of Λ, i.e.
Theorem

Example 3
Λlj ,j = 0; Λij = 0, ∀ i < lj .
Example 4: C3. Λ does not contain any column j where Λlj ,j is the only
simulating
large data non-zero element in column j.
Example 1:
revisited

Conclusion

The regression-type representation
Example 1

Normal factor
model
Variance
structure
Identification
issues Assume that data y = {y1 , . . . , yT } were generated by the
Number of
parameters
Ordering of the
previous model and that the number of factors r , as well as Λ
variables
and Σ0 , should be estimated.
Example 2
Reduced-rank
loading matrix

Structured The usual procedure is to fit a model with k factors,
loadings

Parsimonious
BFA
yt = βft + t, t ∼ Nm (0, Σ) ,
New
identification
conditions
Theorem
where β is a m × k coefficient matrix with elements βij and Σ
Example 3 is a diagonal matrix.
Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Theorem
Example 1
Theorem. Assume that the data were generated by a basic factor model
Normal factor
model
obeying conditions C1 − C3 and that a regression-type representation with
Variance k ≥ r potential factors is fitted. Assume that the following condition holds
structure
Identification for β:
issues
Number of
parameters
B1 The row indices of the top non-zero entry in each non-zero column of
Ordering of the
variables
β are different.
Example 2
Reduced-rank
loading matrix Then (r , Λ, Σ0 ) are related in the following way to (β, Σ):
Structured
loadings (a) r columns of β are identical to the r columns of Λ.
Parsimonious (b) If rank(β) = r , then the remaining k − r columns of β are zero
BFA
New
columns. The number of factors is equal to r and Σ0 = Σ,
identification
conditions (c) If rank(β) > r , then only k − rank(β) of the remaining k − r
Theorem
columns are zero columns, while s = rank(β) − r columns with
Example 3
column indices j1 , . . . , js differ from a zero column for a single
Example 4:
simulating
element lying in s different rows r1 , . . . , rs . The number of factors is
large data equal to r = rank(β) − s, while Σ0 = Σ + D, where D is a m × m
Example 1: diagonal matrix of rank s with non-zero diagonal elements
revisited Drl ,rl = βr2l ,jl for l = 1, . . . , s.
Conclusion

Indicator matrix
Example 1

Normal factor
model
Variance
structure
Identification
Since the identification of r and Λ from β by Theorem 1 relies
issues
Number of
on identifying zero and non-zero elements in β, we follow
parameters
Ordering of the previous work on parsimonious factor modeling and consider the
variables

Example 2
selection of the elements of β as a variable selection problem.
Reduced-rank
loading matrix

Structured
loadings We introduce for each element βij a binary indicator δij and
Parsimonious define βij to be 0, if δij = 0, and leave βij unconstrained
BFA
New
identification
otherwise.
conditions
Theorem

Example 3 In this way, we obtain an indicator matrix δ of the same
Example 4: dimension as β.
simulating
large data

Example 1:
revisited

Conclusion

Number of factors
Example 1

Normal factor
Theorem 1 allows the identification of the true number of
model
Variance
factors r directly from the indicator matrix δ:
structure
Identification
issues k m
Number of
parameters r= I{ δij > 1},
Ordering of the
variables
j=1 i=1
Example 2
Reduced-rank
loading matrix where I {·} is the indicator function, by taking spurious factors
Structured
loadings
into account.
Parsimonious
BFA
New
This expression is invariant to permuting the columns of δ
identification
conditions which is helpful for MCMC based inference with respect to r .
Theorem

Example 3

Example 4: Our approach provides a principled way for inference on r , as
simulating
large data opposed to previous work which are based on rather heuristic
Example 1: procedures to infer this quantity (Carvalho et al., 2008;
revisited
Bhattacharya and Dunson, 2009).
Conclusion

Model size
Example 1

Normal factor
model
Variance
structure
Identification
issues
Number of
The model size d is defined as the number of non-zero
parameters
Ordering of the elements in Λ, i.e.
variables

Example 2 k m
Reduced-rank
loading matrix d= dj I {dj > 1}, dj = δij .
Structured
loadings j=1 j=1
Parsimonious
BFA
New
identification
conditions
˜
Model size could be estimated by the posterior mode d or the
Theorem
posterior mean E(d|y) of p(d|y).
Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

The Prior of the Indicator Matrix
Example 1
To define p(δ), we use a hierarchical prior which allows
Normal factor
model different occurrence probabilities τ = (τ1 , . . . , τk ) of non-zero
Variance
structure elements in the different columns of β and assume that all
Identification
issues
Number of
indicators are independent a priori given τ :
parameters
Ordering of the
variables
Pr(δij = 1|τj ) = τj , τj ∼ B (a0 , b0 ) .
Example 2
Reduced-rank
loading matrix

Structured
loadings A priori r may be represented as r = k I {Xj > 1}, where
j=1
Parsimonious
BFA
X1 , . . . , Xk are independent random variables and Xj follows a
New
identification
Beta-binomial distribution with parameters Nj = m − j + 1, a0 ,
conditions
Theorem and b0 .
Example 3

Example 4:
simulating
We recommend to tune for a given data set with known values
large data m and k the hyperparameters a0 and b0 in such a way that
Example 1:
revisited
p(r |a0 , b0 , m, k) is in accordance with prior expectations
Conclusion
concerning the number of factors.

The Prior on the Idiosyncratic
Example 1

Normal factor
Variances
model
Variance
structure
Identiﬁcation
issues
Number of Heywood problems typically occur, if the constraint
parameters
Ordering of the
variables 1 1
Example 2 ≥ (Ω−1 )ii ⇔ σi2 ≤
Reduced-rank σi2 (Ω−1 ) ii
loading matrix

Structured
loadings is violated, where the matrix Ω is the covariance matrix of yt .
Parsimonious
BFA
New Improper priors on the idiosyncratic variances such as
identiﬁcation
conditions

p(σi2 ) ∝ 1/σi2 ,
Theorem

Example 3

Example 4:
simulating are not able to prevent Heywood problems.
large data

Example 1:
revisited

Conclusion

We assume instead a proper inverted Gamma prior

Example 1 σi2 ∼ G −1 (c0 , Ci0 )
Normal factor
model
Variance
for each of the idiosyncratic variances σi2 .
structure
Identiﬁcation
issues
Number of
parameters
c0 is large enough to bound the prior away from 0, typically
Ordering of the
variables
c0 = 2.5.
Example 2
Reduced-rank
loading matrix Ci0 is chose by controlling the probability of a Heywood
Structured
loadings
problem
Parsimonious Pr(X ≤ Ci0 (Ω−1 )ii )
BFA
New
identiﬁcation
where X ∼ G (c0 , 1). So, Ci0 as the largest value for which
conditions
Theorem
1
Example 3
Ci0 /(c0 − 1) ≤
Example 4: (S−1 )ii
y
simulating
large data
or
Example 1:
revisited σi2 ∼ G −1 c0 , (c0 − 1)/(S−1 )ii .
y
Conclusion

The Prior on the Factor Loadings
Example 1

Normal factor
model
Variance
structure We assume that the rows of the coefficient matrix β are
Identification
issues
Number of
independent a priori given the factors f1 , . . . , fT .
parameters
Ordering of the
variables

Example 2
Let β δ be the vector of unconstrained elements in the ith row
i·
Reduced-rank
loading matrix
of β corresponding to δ. For each i = 1, . . . , m, we assume
Structured that
β δ |σi2 ∼ N bδ , Bδ σi2 .
loadings
i· i0 i0
Parsimonious
BFA
New
identification
conditions
Theorem The prior moments are either chosen as in Lopes and West
Example 3 (2004) or Ghosh and Dunson (2009) who considered a “unit
Example 4: scale” prior where bδ = 0 and Bδ = I.
i0 i0
simulating
large data

Example 1:
revisited

Conclusion

Fractional prior
Example 1

Normal factor
model We use a fractional prior (O’Hagan, 1995) which was applied by
Variance
structure
Identiﬁcation
several authors for variable selection in latent variable models3
issues
Number of
parameters
Ordering of the
variables
The fractional prior can be interpreted as the posterior of a
Example 2 non-informative prior and a fraction b of the data.
Reduced-rank
loading matrix

Structured
loadings
We consider a conditionally fractional prior for the “regression
Parsimonious
model”
BFA yi = Xδ β δ + ˜i ,
˜ i i·
New
identiﬁcation
conditions
Theorem where yi = (yi1 · · · yiT ) and ˜i = ( i1 · · · iT ) . Xδ is a
˜ i
Example 3 regressor matrix for β δ constructed from the latent factor
i·
Example 4: matrix F = (f1 · · · fT ) .
simulating
large data

Example 1:
revisited 3
Smith & Kohn, 2002; Fr¨hwirth-Schnatter & T¨chler, 2008; T¨chler,
u u u
Conclusion 2008.

Using
Example 1 p(β δ |σi2 ) ∝ p(˜i |β δ , σi2 )b
i· y i·
Normal factor
model we obtain from regression model:
Variance
structure
Identiﬁcation
issues β δ |σi2 ∼ N biT , BiT σi2 /b ,
i·
Number of
parameters
Ordering of the
variables where biT and BiT are the posterior moments under an
Example 2 non-informative prior:
Reduced-rank
loading matrix

Structured
−1
loadings BiT = (Xδ ) Xδ
i i , biT = BiT (Xδ ) yi .
i ˜
Parsimonious
BFA
New
identiﬁcation
conditions
Theorem
It is not entirely clear how to choose the fraction b for a factor
Example 3 model. If the regressors f1 , . . . , fT were observed, then we
Example 4: would deal with m independent regression models for each of
simulating
large data which T observations are available and the choice b = 1/T
Example 1: would be appropriate.
revisited

Conclusion

The factors, however, are latent and are estimated together
with the other parameters. This ties the m regression models
Example 1
together.
Normal factor
model
Variance
structure
Identification
If we consider the multivariate regression model as a whole,
issues
Number of
then the total number N = mT of observations has to be
parameters
Ordering of the taken into account which motivates choosing bN = 1/(Tm).
variables

Example 2
Reduced-rank
loading matrix In cases where the number of regressors d is of the same
Structured magnitude as the number of observations, Ley & Steel (2009)
loadings
recommend to choose instead the risk inflation criterion
Parsimonious
BFA bR = 1/d 2 suggested by Foster & George (1994), because bN
New
identification
conditions
implies a fairly small penalty for model size and may lead to
Theorem
overfitting models.
Example 3

Example 4:
simulating In the present context this implies choosing bR = 1/d(k, m)2
large data
where d(k, m) = (km − k(k − 1)/2) is the number of free
Example 1:
revisited elements in the coefficient matrix β.
Conclusion

Most visited configuration
Example 1

Normal factor
model
Variance Let l = (l1 , . . . , lrM ) be the most visited configuration.
structure
Identification
issues
Number of
parameters We may then estimate for each indicator the marginal inclusion
Ordering of the
variables probability
Λ
Example 2 Pr(δij = 1|y, l )
Reduced-rank
loading matrix

Structured under l as the average over the elements of (δ Λ ) .
loadings

Parsimonious
BFA Note that Pr(δlΛ,j = 1|y, l ) = 1 for j = 1, . . . , rM .
j
New
identification
conditions
Theorem

Example 3
Following Scott and Berger (2006), we determine the median
Example 4:
probability model (MPM) by setting the indicators δij in δ Λ to
Λ
M
simulating Λ
1 iff Pr(δij = 1|y, l ) ≥ 0.5.
large data

Example 1:
revisited

Conclusion

Example 1 The number of non-zero top elements rM in the identifiability
Normal factor constraint l is a third estimator of the number of factors, while
model
Variance
structure
the number dM of non-zero elements in the MPM is yet
Identification
issues
another estimator of model size.
Number of
parameters
Ordering of the
variables A discrepancy between the various estimators of the number of
Example 2
Reduced-rank
factors r is often a sign of a weakly informative likelihood and
loading matrix
it might help to use a more informative prior for p(r ) by
Structured
loadings choosing the hyperparameters a0 and b0 accordingly.
Parsimonious
BFA
New
identification
Also the structure of the indicator matrices δ Λ and δ Λ
H M
conditions
Theorem
corresponding, respectively, to the HPM and the MPM may
Example 3 differ, in particular if the frequency pH is small and some of the
Example 4:
Λ
inclusion probabilities Pr(δij = 1|y, l ) are close to 0.5.
simulating
large data

Example 1:
revisited

Conclusion

MCMC
Example 1 Highly efficient MCMC scheme:
Normal factor (a) Sample from p(δ|f1 , . . . , fT , τ , y):
model
Variance
structure
(a-1) Try to turn all non-zero columns of δ into zero columns.
Identification (a-2) Update the indicators jointly for each of the remaining
issues
Number of non-zero columns j of δ:
parameters
Ordering of the (a-2-1) try to move the top non-zero element lj ;
variables
(a-2-2) update the indicators in the rows lj + 1, . . . , m.
Example 2
Reduced-rank (a-3) Try to turn all (or at least some) zero columns of δ into a
loading matrix
non-zero column.
Structured
(b) 2 2
Sample from p(β, σ1 , . . . , σm |δ, f1 , . . . , fT , y).
loadings
(c) 2 2
Sample from p(f1 , . . . , fT |β, σ1 , . . . , σm , y).
Parsimonious
BFA (d) Perform an acceleration step (expanded factor model)
New (e) For each j = 1, . . . , k, perform a random sign switch: substitute the draws
identification
conditions of {fjt }T and {βij }m with probability 0.5 by {−fjt }T and {−βij }m ,
t=1 i=j t=1 i=j
Theorem
otherwise leave these draws unchanged. `
Example 3 ´
(f) Sample τj for j = 1, . . . , k from τj |δ ∼ B a0 + dj , b0 + pj , where pj is the
number of free elements and dj = m δij is number of non-zero elements
P
Example 4: i=1
simulating in column j.
large data

Example 1:
revisited
To generate sensible values for the latent factors in the initial model specification,
Conclusion we found it useful to run the first few steps without variable selection.

Adding a new factor
Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues
Number of
parameters    
Ordering of the
variables
X 0 0 X 0 0 0
Example 2

 X 0 0 


 X 0 0 0 

Reduced-rank
loading matrix

 X 0 X 


 X 0 X 0 

Structured  X X X  =⇒  X X X 0 
loadings    
 X X X   X X X X 
Parsimonious    
BFA  X X X   X X X 0 
New
identiﬁcation
conditions
X X X X X X X
Theorem

Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Removing a factor
Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues
Number of
parameters
Ordering of the
     
variables X 0 0 0 X 0 0 0 X 0 0
Example 2  X 0 0 0   X 0 0 0   X 0 0 
Reduced-rank      
loading matrix 
 X 0 X 0 


 X 0 X 0 


 X 0 X 

Structured ⇒ ⇒
loadings

 X X X 0   X X X 0   X X X 

Parsimonious

 X X X X 


 X X X X 


 X X X 

BFA
New
 X X X 0   X X X 0   X X X 
identiﬁcation
conditions X X X X X X X 0 X X X
Theorem

Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Example 3: Maxwell’s Data
Example 1

Normal factor Scores on 10 tests for a sample of T = 148 children attending
model
Variance
a psychiatric clinic as well as a sample of T = 810 normal
structure
Identification children.
issues
Number of
parameters
Ordering of the
variables The first five tests are cognitive tests – (1) verbal ability, (2)
Example 2 spatial ability, (3) reasoning, (4) numerical ability and (5)
Reduced-rank
loading matrix verbal fluency. The resulting tests are inventories for assessing
Structured
loadings
orectic tendencies, namely (6) neurotic questionaire, (7) ways
Parsimonious to be different, (8) worries and anxieties, (9) interests and (10)
BFA
New
annoyances.
identification
conditions
Theorem

Example 3
While psychological theory suggests that a 2-factor model is
Example 4: sufficient to account for the variation between the test scores,
simulating
large data
the significance test considered in Maxwell (1961) suggested to
Example 1: fit a 3-factor model to the first and a 4-factor model to the
revisited
second data set.
Conclusion

Example 1
Table: Maxwell’s Children Data - neurotic children; posterior
Normal factor
distribution p(r |y) of the number r of factors (bold number
model corresponding to the posterior mode ˜) and highest posterior
r
Variance
structure identifiability constraint l with corresponding posterior probability
Identification
issues p(l |y) for various priors; number of visited models Nv ; frequency pH ,
Number of
parameters number of factors rH , and model size dH of the HPM; number of
Ordering of the
variables factors rM and model size dM of the MPM corresponding to l ;
Example 2 inefficiency factor τd of the posterior draws of the model size d.
Reduced-rank
loading matrix
p(r |y)
Structured Prior 2 3 4 5-6 l p(l |y)
loadings b = 10−3 0.755 0.231 0.014 0 (1,6) 0.532
bN = 6.8 · 10−4 0.828 0.160 0.006 0 (1,6) 0.623
Parsimonious
bR = 4.9 · 10−4 0.871 0.127 0.001 0 (1,6) 0.589
BFA
b = 10−4 0.897 0.098 0.005 0 (1,6) 0.802
New
identification GD 0.269 0.482 0.246 0.003 (1,2,3) 0.174
conditions LW 0.027 0.199 0.752 0.023 (1,2,3,6) 0.249
Theorem Prior Nv pH rH dH rM dM τd
Example 3 b = 10−3 1472 0.20 2 12 2 12 30.9
bN = 6.8 · 10−4 976 0.27 2 12 2 12 27.5
Example 4: bR = 4.9 · 10−4 768 0.34 2 12 2 12 22.6
simulating b = 10−4 421 0.45 2 12 2 12 18.1
large data GD 4694 0.06 2 15 3 19 40.6
LW 7253 0.01 4 24 4 24 32.5
Example 1:
revisited

Conclusion

Example 1 Table: Maxwell’s Children Data - normal children; posterior
Normal factor distribution p(r |y) of the number r of factors (bold number
model
Variance corresponding to the posterior mode ˜) and highest posterior
r
structure
Identification identifiability constraint l with corresponding posterior probability
issues
Number of p(l |y) for various priors; number of visited models Nv ; frequency pH ,
parameters
Ordering of the number of factors rH , and model size dH of the HPM; number of
variables

Example 2
factors rM and model size dM of the MPM corresponding to l ;
Reduced-rank inefficiency factor τd of the posterior draws of the model size d.
loading matrix

Structured p(r |y)
loadings Prior 3 4 5 6 l p(l |y)
b = 10−3 0 0.391 0.604 0.005 (1,2,4,5,6) 0.254
Parsimonious bN = 1.2 · 10−4 0 0.884 0.116 0 (1,2,4,5) 0.366
BFA b = 10−4 0 0.891 0.104 0.005 (1,2,4,5) 0.484
New GD 0 0.396 0.594 0 (1,2,4,5,6) 0.229
identification
conditions LW 0 0.262 0.727 0.011 (1,2,4,5,6) 0.259
Theorem Prior Nv pH rH dH rM dM τd
b = 10−3 4045 1.79 5 26 5 27 30.1
Example 3
bN = 1.2 · 10−4 1272 11.41 4 23 4 23 28.5
Example 4: b = 10−4 1296 12.17 4 23 4 23 29.1
simulating GD 4568 1.46 5 29 5 28 32.7
large data LW 5387 1.56 5 28 5 28 30.7

Example 1:
revisited

Conclusion

Example 4: simulating large data
Example 1

Normal factor item1
item2
item3
item4
1.0

model item5
item6
item7
item8
item9
item10

Variance item11
item12
item13
item14 0.9
structure item15
item16
item17
item18

Identiﬁcation item19
item20
item21
item22
issues item23
item24
item25
item26
0.8
Number of item27
item28
item29

parameters item30
item31
item32
item33
item34

Ordering of the item35
item36
item37

variables item38
item39
item40
item41
0.7
item42
item43
item44
item45

Example 2 item46
item47
item48
item49
item50
item51
Reduced-rank item52
item53
item54
0.6
loading matrix item55
item56
item57
item58
item59
item60
item61

Structured
Variables

item62
item63
item64
item65
item66 0.5
loadings item67
item68
item69
item70
item71
item72
item73
item74

Parsimonious
item75
item76
item77
item78
item79 0.4
BFA item80
item81
item82
item83
item84
item85

New item86
item87
item88

identiﬁcation item89
item90
item91
item92 0.3
conditions item93
item94
item95
item96

Theorem item97
item98
item99
item100
item101
item102
item103
item104

Example 3 item105
item106
item107
item108
0.2

item109
item110
item111
item112

Example 4: item113
item114
item115
item116
item117
0.1
simulating item118
item119
item120
item121
item122

large data item123
item124
item125
item126
item127
item128
item129
item130
0.0
Example 1: f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
Factors
revisited

Conclusion Continuous (100), binary (30), 2K observations and 10 factors.

Example 4: Posterior probabilities
Example 1

Normal factor item1
item2
item3
1.0

item4

model item5
item6
item7
item8
item9
item10
Variance item11
item12
item13
0.9
structure item14
item15
item16
item17

item19
item20
item21
issues item22
item23
item24
item25

Number of item26
item27
item28
item29
0.8

parameters item30
item31
item32
item33

item35
item36
item37
variables item38
item39
item40 0.7
item41
item42
item43
item44

Example 2 item45
item46
item47
item48
item49
item50
Reduced-rank item51
item52
item53 0.6
item55
item56
item57
item58
item59
item60

Structured item61
Variables

item62
item63
item64
item65
0.5
loadings item66
item67
item68
item69
item70
item71
item72
item73
item74

Parsimonious item75
item76
item77
item78
0.4
BFA
item79
item80
item81
item82
item83
item84

New item85
item86
item87
item88
item90
item91
0.3
conditions item92
item93
item94
item95
item96
Theorem item97
item98
item99
item100
item101
item102
item103

Example 3 item104
item105
item106
item107
0.2
item108
item109
item110
item111
item112

Example 4: item113
item114
item115
item116

simulating item117
item118
item119
item120
0.1
item121

large data item122
item123
item124
item125
item126
item127
item128
item129
item130

Example 1: f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20
0.0

revisited Factors

Conclusion

Example 4: Estimated loadings
Example 1

Normal factor item1
item2
item3
1.1

item4

model item5
item6
item7
item8
item9
item10
Variance item11
item12
item13
1.0
structure item14
item15
item16
item17

item19
item20
item21
issues item22
item23
item24 0.9
item25

Number of item26
item27
item28
item29
parameters item30
item31
item32
item33

item35
item36
item37
0.8
variables item38
item39
item40
item41
item42
item43
item44

Example 2 item45
item46
item47
item48 0.7
item49
item50
Reduced-rank item51
item52
item53

item55
item56
item57
item58
item59
item60 0.6
Structured item61
Variables

item62
item63
item64
item65

loadings item66
item67
item68
item69
item70
item71
item72 0.5
item73
item74

Parsimonious item75
item76
item77
item78

BFA
item79
item80
item81
item82
item83 0.4
item84

New item85
item86
item87
item88
item90
item91

conditions item92
item93
item94
item95
item96
0.3
Theorem item97
item98
item99
item100
item101
item102
item103

Example 3 item104
item105
item106
item107 0.2
item108
item109
item110
item111
item112

Example 4: item113
item114
item115
item116

simulating item117
item118
item119
item120
0.1
item121

large data item122
item123
item124
item125
item126
item127
item128
item129
item130

Example 1: f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20
0.0

revisited Factors

Conclusion

Example 1: revisited
Example 1

Normal factor
model
Variance
structure
Identification Outcome system Measurement system (age
issues
Number of
parameters Education: A-level or above (i.e., 10)
Ordering of the
variables Diploma or Degree) Cognitive tests: 7 scales
Example 2 Health (age 30): self-reported poor Personality traits: 5 sub-scales
Reduced-rank health, obesity, daily smoking =⇒ see next slide
loading matrix
Log hourly wage (age 30)
Structured
loadings

Parsimonious
BFA Control variables
New mother’s age and education at birth, father’s high social class at birth,
identification
conditions total gross family income and number of siblings at age 10, broken family,
Theorem
number of previous livebirths.
Example 3
Exclusions in choice equation: deviation of gender-specific county-level
Example 4:
simulating
unemployment rate and of gross weekly wage from their long-run average
large data wages, for the years 1987-88.
Example 1:
revisited

Conclusion

Measurement system: 126 items
Example 1

Normal factor
(age 10)
model
Variance
structure
Identification
issues
Number of
Cognitive tests (continuous Personality trait scales
parameters
Ordering of the scales)
variables
• Rutter Parental ‘A’ Scale of
Example 2
Reduced-rank
• Picture Language Behavioral Disorder: 19
loading matrix Comprehension Test continuous items
Structured
loadings • Friendly Math Test • Conners Hyperactivity Scale:
• Shortened Edinburgh Reading 19 continuous items
Parsimonious
BFA
Test • Child Developmental Scale
New
identification [CDS]: 53 continuous items
conditions
British Ability Scales (BAS):
Theorem • Locus of Control Scale [LoC]:
Example 3 • BAS - Matrices 16 binary items
Example 4: • BAS - Recall Digits • Self-Esteem Scale [S-E]: 12
simulating
large data • BAS - Similarities binary items
Example 1:
revisited
• BAS - Word Definition
Conclusion

Model ingredients
Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues
Number of
parameters
Ordering of the
variables

Example 2
Reduced-rank
loading matrix

Structured
loadings

Parsimonious
BFA
New
identiﬁcation
conditions
Theorem

Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues
Number of
parameters
Ordering of the
variables

Example 2
Reduced-rank
loading matrix

Structured
loadings

Parsimonious
BFA
New
identiﬁcation
conditions
Theorem

Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Conclusion
Example 1

Normal factor
model
Variance
structure
Identification • Summary
issues
Number of • New take on factor model identifiability
parameters
Ordering of the • Generalized triangular parametrization
variables

Example 2
• Customized MCMC scheme
Reduced-rank • Highly efficient model search strategy
loading matrix

Structured
• Posterior distribution for the number of common factors
loadings

Parsimonious
BFA
New • Extensions
identification
conditions
Theorem
• Nonlinear (discrete choice) factor models
Example 3 • Non-normal (mixture of) factor models
Example 4:
• Dynamic factor models
simulating
large data

Example 1:
revisited

Conclusion

Example 1

Normal factor
model
Variance
structure
Identiﬁcation
issues
Number of
parameters
Ordering of the
variables

Example 2
Reduced-rank
loading matrix

Structured
loadings Thank you!
Parsimonious
BFA
New
identiﬁcation
conditions
Theorem

Example 3

Example 4:
simulating
large data

Example 1:
revisited

Conclusion

Hedibert Lopes' talk at BigMC

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Empfohlen

Empfohlen (20)

Hedibert Lopes' talk at BigMC