Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Ā
Hedibert Lopes' talk at BigMC
1. Example 1
Normal factor
Parsimonious Bayesian factor analysis when
model
Variance
the number of factors is unknown1
structure
Identiļ¬cation
issues
Number of
parameters
Ordering of the Hedibert Freitas Lopes
variables
Example 2 Booth School of Business
Reduced-rank
loading matrix
University of Chicago
Structured
loadings
Parsimonious
BFA
New
identiļ¬cation
conditions
Theorem
Example 3 SĀ“minaire BIGāMC
e
Example 4: MĀ“thodes de Monte Carlo en grande dimension
e
simulating
large data Institut Henri PoincarĀ“
e
Example 1:
revisited
March 11th 2011
1
Conclusion Joint work with Sylvia FrĀØhwirth-Schnatter.
u
2. 1 Example 1
2 Normal factor model
Example 1
Variance structure
Normal factor
model Identiļ¬cation issues
Variance
structure
Identiļ¬cation
Number of parameters
issues
Number of
Ordering of the variables
parameters
Ordering of the
variables 3 Example 2
Example 2 Reduced-rank loading matrix
Reduced-rank
loading matrix
4 Structured loadings
Structured
loadings
5 Parsimonious BFA
Parsimonious
BFA New identiļ¬cation conditions
New
identiļ¬cation
conditions
Theorem
Theorem
Example 3
6 Example 3
Example 4: 7 Example 4: simulating large data
simulating
large data
8 Example 1: revisited
Example 1:
revisited
9 Conclusion
Conclusion
3. Example 1. Early origins of health
Example 1
Normal factor
model Conti, Heckman, Lopes and Piatek (2011) Constructing
Variance
structure
Identiļ¬cation
economically justiļ¬ed aggregates: an application to the early
issues
Number of origins of health. Journal of Econometrics (to appear).
parameters
Ordering of the
variables Here we focus on a subset of the British Cohort Study started
Example 2
Reduced-rank
in 1970.
loading matrix
Structured 7 continuous cognitive tests (Picture Language
loadings
Parsimonious
Comprehension,Friendly Math, Reading, Matrices, Recall
BFA Digits, Similarities, Word Deļ¬nition),
New
identiļ¬cation
conditions
Theorem 10 continuous noncognitive measurements (Child
Example 3 Developmental Scale).
Example 4:
simulating A total of m = 17 measurements on T = 2397 10-year old
large data
Example 1:
individuals.
revisited
Conclusion
5. Normal factor model
Example 1
Normal factor
model
For any speciļ¬ed positive integer k ā¤ m, the standard kāfactor
Variance
structure
model relates each yt to an underlying kāvector of random
Identiļ¬cation
issues variables ft , the common factors, via
Number of
parameters
Ordering of the
variables yt |ft ā¼ N(Ī²ft , Ī£)
Example 2
Reduced-rank
loading matrix where
Structured ft ā¼ N(0, Ik )
loadings
Parsimonious
BFA
and
2 2
New
identiļ¬cation
Ī£ = diag(Ļ1 , Ā· Ā· Ā· , Ļm )
conditions
Theorem
Example 3
Unconditional covariance matrix
Example 4:
simulating
large data
V (yt |Ī², Ī£) = ā¦ = Ī²Ī² + Ī£
Example 1:
revisited
Conclusion
6. Variance structure
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
parameters
Ordering of the
variables Conditional Unconditional
Example 2 var(yit |f ) = Ļi2 var(yit ) = k Ī²il + Ļi2
l=1
2
Reduced-rank
loading matrix
cov(yit , yjt |f ) = 0 cov(yit , yjt ) = k Ī²il Ī²jl
l=1
Structured
loadings
Parsimonious
BFA
New
Common factors explain all the dependence structure among
identiļ¬cation
conditions the m variables.
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
7. Identiļ¬cation issues
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
parameters A rather trivial non-identiļ¬ability problem is sign-switching.
Ordering of the
variables
Example 2
Reduced-rank
A more serious problem is factor rotation: invariance under any
loading matrix
transformation of the form
Structured
loadings
Parsimonious Ī² ā = Ī²P and ftā = Pft ,
BFA
New
identiļ¬cation
conditions where P is any orthogonal k Ć k matrix.
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
8. Solutions
Example 1
Normal factor
1. Ī² Ī£ā1 Ī² = I .
model
Variance
Pretty hard to impose!
structure
Identiļ¬cation
issues
Number of
2. Ī² is a block lower triangular2 .
parameters
Ordering of the ļ£« ļ£¶
variables
Ī²11 0 0 Ā·Ā·Ā· 0
Example 2
Reduced-rank
ļ£¬ Ī²21
ļ£¬ Ī²22 0 Ā·Ā·Ā· 0 ļ£·
ļ£·
loading matrix ļ£¬ Ī²31 Ī²32 Ī²33 Ā·Ā·Ā· 0 ļ£·
Structured ļ£¬ ļ£·
loadings ļ£¬ .
. .
. .
. .. .
. ļ£·
ļ£¬ . . . . . ļ£·
Parsimonious Ī²=ļ£¬ ļ£¬ ļ£·
BFA
ļ£¬ Ī²k1 Ī²k2 Ī²k3 Ā·Ā·Ā· Ī²kk ļ£·
ļ£·
New
identiļ¬cation
conditions
ļ£¬ Ī²k+1,1 Ī²k+1,2 Ī²k+1,3
ļ£¬ Ā·Ā·Ā· Ī²k+1,k ļ£·
ļ£·
Theorem ļ£¬ .
. .
. .
. .
. .
. ļ£·
Example 3
ļ£ . . . . . ļ£ø
Example 4: Ī²m1 Ī²m2 Ī²m3 Ā·Ā·Ā· Ī²mk
simulating
large data
Example 1: Somewhat restrictive, but useful for estimation.
revisited
2
Conclusion Geweke and Zhou (1996) and Lopes and West (2004).
9. Number of parameters
Example 1
The resulting factor form of ā¦ has
Normal factor
model
Variance m(k + 1) ā k(k ā 1)/2
structure
Identiļ¬cation
issues
Number of parameters, compared with the total
parameters
Ordering of the
variables
m(m + 1)/2
Example 2
Reduced-rank
loading matrix in an unconstrained (or k = m) model, leading to the
Structured
loadings
constraint that
Parsimonious
BFA 3 9
New
k ā¤m+ ā 2m + .
identiļ¬cation 2 4
conditions
Theorem
Example 3
For example,
Example 4: ā¢ m = 6 implies k ā¤ 3,
simulating
large data ā¢ m = 7 implies k ā¤ 4,
Example 1: ā¢ m = 10 implies k ā¤ 6,
revisited
Conclusion ā¢ m = 17 implies k ā¤ 12.
10. Ordering of the variables
Example 1
Normal factor
model
Variance
structure Alternative orderings are trivially produced via
Identiļ¬cation
issues
Number of
parameters ytā = Ayt
Ordering of the
variables
Example 2 for some switching matrix A.
Reduced-rank
loading matrix
Structured
loadings The new rotation has the same latent factors but transformed
Parsimonious loadings matrix AĪ².
BFA
New
identiļ¬cation
conditions y ā = AĪ²f + Īµt = Ī² ā f + Īµt
Theorem
Example 3
Example 4:
simulating Problem: Ī² ā not necessarily block lower triangular.
large data
Example 1:
revisited
Conclusion
11. Solution
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
We can always ļ¬nd an orthonormal matrix P such that
issues
Number of
parameters
Ordering of the Ī² = Ī² ā P = AĪ²P
Ė
variables
Example 2
Reduced-rank is block lower triangular and common factors
loading matrix
Structured
loadings Ė
ft = Pft
Parsimonious
BFA
New
still N(0, Ik ) (Lopes and West, 2004).
identiļ¬cation
conditions
Theorem
Example 3
The order of the variables in yt is immaterial when k is
Example 4: properly chosen, i.e. when Ī² is full-rank.
simulating
large data
Example 1:
revisited
Conclusion
12. Example 2. Exchange rate data
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation ā¢ Monthly international exchange rates.
issues
Number of
parameters ā¢ The data span the period from 1/1975 to 12/1986
Ordering of the
variables inclusive.
Example 2
Reduced-rank
ā¢ Time series are the exchange rates in British pounds of
loading matrix
ā¢ US dollar (US)
Structured
loadings ā¢ Canadian dollar (CAN)
Parsimonious ā¢ Japanese yen (JAP)
BFA
New
ā¢ French franc (FRA)
identiļ¬cation
conditions ā¢ Italian lira (ITA)
Theorem
ā¢ (West) German (Deutsch)mark (GER)
Example 3
Example 4:
ā¢ Example taken from Lopes and West (2004)
simulating
large data
Example 1:
revisited
Conclusion
13. Posterior inference of (Ī², Ī£)
Example 1
Normal factor 1st ordering
model
Variance
ļ£« ļ£¶ ļ£« ļ£¶
structure US 0.99 0.00 0.05
Identiļ¬cation
issues
ļ£¬
ļ£¬ CAN 0.95 0.05 ļ£·
ļ£·
ļ£¬
ļ£¬ 0.13 ļ£·
ļ£·
Number of
parameters ļ£¬ JAP 0.46 0.42 ļ£· ļ£¬ 0.62 ļ£·
Ordering of the E (Ī²|y ) = ļ£¬ ļ£· E (Ī£|y ) = diag ļ£¬ ļ£·
variables ļ£¬
ļ£¬ FRA 0.39 0.91 ļ£·
ļ£·
ļ£¬
ļ£¬ 0.04 ļ£·
ļ£·
Example 2 ļ£ ITA 0.41 0.77 ļ£ø ļ£ 0.25 ļ£ø
Reduced-rank
loading matrix GER 0.40 0.77 0.28
Structured
loadings
Parsimonious
2nd ordering
BFA ļ£« ļ£¶ ļ£« ļ£¶
New
identiļ¬cation
US 0.98 0.00 0.06
conditions
Theorem
ļ£¬
ļ£¬ JAP 0.45 0.42 ļ£·
ļ£·
ļ£¬
ļ£¬ 0.62 ļ£·
ļ£·
Example 3
ļ£¬ CAN 0.95 0.03 ļ£· ļ£¬ 0.12 ļ£·
E (Ī²|y ) = ļ£¬ ļ£· E (Ī£|y ) = diag ļ£¬ ļ£·
Example 4:
ļ£¬
ļ£¬ FRA 0.39 0.91 ļ£·
ļ£·
ļ£¬
ļ£¬ 0.04 ļ£·
ļ£·
simulating
large data
ļ£ ITA 0.41 0.77 ļ£ø ļ£ 0.25 ļ£ø
Example 1:
GER 0.40 0.77 0.26
revisited
Conclusion
14. Posterior inference of ft
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
parameters
Ordering of the
variables
Example 2
Reduced-rank
loading matrix
Structured
loadings
Parsimonious
BFA
New
identiļ¬cation
conditions
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
15. Reduced-rank loading matrix
Example 1
Normal factor
model
Geweke and Singleton (1980) show that, if Ī² has rank r < k
Variance
structure
then there exists a matrix Q such that
Identiļ¬cation
issues
Number of
parameters
Ī²Q = 0 and Q Q = I
Ordering of the
variables
Example 2 and, for any orthogonal matrix M,
Reduced-rank
loading matrix
Structured Ī²Ī² + Ī£ = (Ī² + MQ ) (Ī² + MQ ) + (Ī£ ā MM ).
loadings
Parsimonious
BFA
New
identiļ¬cation This translation invariance of ā¦ under the factor model implies
conditions
Theorem lack of identiļ¬cation and, in application, induces symmetries
Example 3 and potential multimodalities in resulting likelihood functions.
Example 4:
simulating
large data
This issue relates intimately to the question of uncertainty of
Example 1:
revisited the number of factors.
Conclusion
16. Example 2. Multimodality
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
parameters
Ordering of the
variables
Example 2
Reduced-rank
loading matrix
Structured
loadings
Parsimonious
BFA
New
identiļ¬cation
conditions
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
17. Structured loadings
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
parameters Proposed by Kaiser (1958) the varimax method of orthogonal
Ordering of the
variables rotation aims at providing axes with as few large loadings and
Example 2 as many near-zero loadings as possible.
Reduced-rank
loading matrix
Structured
loadings They are also known as dedicated factors.
Parsimonious
BFA
New Notice that the method can be applied to classical or Bayesian
identiļ¬cation
conditions
Theorem
estimates in order to obtain more interpretable factors.
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
18. Modern structured factor analysis
Example 1
Normal factor
model
Variance
structure ā¢ Time-varying factor loadings: dynamic correlations
Identiļ¬cation
issues
Number of
Lopes and Carvalho (2007)
parameters
Ordering of the
variables
Example 2 ā¢ Spatial dynamic factor analysis
Reduced-rank
loading matrix Lopes, Salazar and Gamerman (2008)
Structured
loadings
Parsimonious ā¢ Spatial hierarchical factors: ranking vulnerability
BFA
New Lopes, Schmidt, Salazar, Gomez and Achkar (2010)
identiļ¬cation
conditions
Theorem
Example 3 ā¢ Sparse factor models
Example 4:
simulating
FrĀØhwirth-Schnatter and Lopes (2010)
u
large data
Example 1:
revisited
Conclusion
19. Example 1: ML estimates
Example 1
Normal factor Ė Ė
model i Measurement Ī²i1 Ī²i2 Ļi2
Ė
Variance
structure
1 Picture Language 0.32 0.46 0.68
Identiļ¬cation 2 Friendly Math 0.57 0.59 0.34
issues
Number of 3 Reading 0.56 0.62 0.30
parameters
Ordering of the 4 Matrices 0.41 0.51 0.57
variables
5 Recall digits 0.31 0.29 0.82
Example 2
Reduced-rank
6 Similarities 0.39 0.53 0.57
loading matrix 7 Word deļ¬nition 0.43 0.55 0.52
Structured 8 Child Dev 2 -0.67 0.18 0.52
loadings
9 Child Dev 17 -0.69 -0.11 0.51
Parsimonious
BFA 10 Child Dev 19 -0.74 0.32 0.35
New 11 Child Dev 20 -0.45 0.33 0.69
identiļ¬cation
conditions 12 Child Dev 23 -0.75 0.42 0.26
Theorem
13 Child Dev 30 -0.52 0.28 0.66
Example 3
14 Child Dev 31 -0.45 0.25 0.73
Example 4:
simulating
15 Child Dev 33 -0.42 0.31 0.73
large data 16 Child Dev 52 0.41 -0.28 0.76
Example 1: 17 Child Dev 53 -0.69 0.41 0.36
revisited
Conclusion
20. Example 1: varimax rotation
Example 1
Normal factor Ė Ė
model i Measurement Ī²i1 Ī²i2 Ļi2
Ė
Variance
structure
1 Picture Language 0.00 0.56 0.68
Identiļ¬cation 2 Friendly Math 0.12 0.81 0.34
issues
Number of 3 Reading 0.10 0.83 0.30
parameters
Ordering of the 4 Matrices 0.04 0.66 0.57
variables
5 Recall digits 0.09 0.41 0.82
Example 2
Reduced-rank
6 Similarities 0.01 0.66 0.57
loading matrix 7 Word deļ¬nition 0.03 0.69 0.52
Structured 8 Child Dev 2 -0.65 -0.25 0.52
loadings
9 Child Dev 17 -0.50 -0.49 0.51
Parsimonious
BFA 10 Child Dev 19 -0.79 -0.17 0.35
New 11 Child Dev 20 -0.56 0.01 0.69
identiļ¬cation
conditions 12 Child Dev 23 -0.85 -0.09 0.26
Theorem
13 Child Dev 30 -0.58 -0.07 0.66
Example 3
14 Child Dev 31 -0.51 -0.05 0.73
Example 4:
simulating
15 Child Dev 33 -0.52 0.02 0.73
large data 16 Child Dev 52 0.49 0.01 0.76
Example 1: 17 Child Dev 53 -0.80 -0.06 0.36
revisited
Conclusion
21. Example 1: Parsimonious BFA
Example 1
Normal factor Ė Ė
model i Measurement Ī²i1 Ī²i2 Ļi2
Ė
Variance
structure
1 Picture Language 0 0.57 0.68
Identiļ¬cation 2 Friendly Math 0 0.81 0.34
issues
Number of 3 Reading 0 0.83 0.31
parameters
Ordering of the 4 Matrices 0 0.66 0.56
variables
5 Recall digits 0 0.42 0.83
Example 2
Reduced-rank
6 Similarities 0 0.66 0.56
loading matrix 7 Word deļ¬nition 0 0.7 0.51
Structured 8 Child Dev 2 -0.68 0 0.54
loadings
9 Child Dev 17 -0.56 0 0.68
Parsimonious
BFA 10 Child Dev 19 -0.81 0 0.35
New 11 Child Dev 20 -0.55 0 0.70
identiļ¬cation
conditions 12 Child Dev 23 -0.86 0 0.27
Theorem
13 Child Dev 30 -0.59 0 0.66
Example 3
14 Child Dev 31 -0.51 0 0.74
Example 4:
simulating
15 Child Dev 33 -0.51 0 0.74
large data 16 Child Dev 52 0.49 0 0.76
Example 1: 17 Child Dev 53 -0.8 0 0.36
revisited
Conclusion
22. Parsimonious BFA
Example 1
Normal factor
model
Variance
We introduce a more general set of identiļ¬ability conditions for
structure
Identiļ¬cation the basic model
issues
Number of
parameters
Ordering of the
variables
yt = Īft + t t ā¼ Nm (0, Ī£0 ) ,
Example 2
Reduced-rank
loading matrix
which handles the ordering problem in a more ļ¬exible way:
Structured C1. Ī has full column-rank, i.e. r = rank(Ī).
loadings
Parsimonious C2. Ī is a generalized lower triangular matrix, i.e.
BFA
New
l1 < . . . < lr , where lj denotes for j = 1, . . . , r the row
identiļ¬cation
conditions index of the top non-zero entry in the jth column of Ī, i.e.
Theorem
Example 3
Īlj ,j = 0; Īij = 0, ā i < lj .
Example 4: C3. Ī does not contain any column j where Īlj ,j is the only
simulating
large data non-zero element in column j.
Example 1:
revisited
Conclusion
23. The regression-type representation
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues Assume that data y = {y1 , . . . , yT } were generated by the
Number of
parameters
Ordering of the
previous model and that the number of factors r , as well as Ī
variables
and Ī£0 , should be estimated.
Example 2
Reduced-rank
loading matrix
Structured The usual procedure is to ļ¬t a model with k factors,
loadings
Parsimonious
BFA
yt = Ī²ft + t, t ā¼ Nm (0, Ī£) ,
New
identiļ¬cation
conditions
Theorem
where Ī² is a m Ć k coeļ¬cient matrix with elements Ī²ij and Ī£
Example 3 is a diagonal matrix.
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
24. Theorem
Example 1
Theorem. Assume that the data were generated by a basic factor model
Normal factor
model
obeying conditions C1 ā C3 and that a regression-type representation with
Variance k ā„ r potential factors is ļ¬tted. Assume that the following condition holds
structure
Identiļ¬cation for Ī²:
issues
Number of
parameters
B1 The row indices of the top non-zero entry in each non-zero column of
Ordering of the
variables
Ī² are diļ¬erent.
Example 2
Reduced-rank
loading matrix Then (r , Ī, Ī£0 ) are related in the following way to (Ī², Ī£):
Structured
loadings (a) r columns of Ī² are identical to the r columns of Ī.
Parsimonious (b) If rank(Ī²) = r , then the remaining k ā r columns of Ī² are zero
BFA
New
columns. The number of factors is equal to r and Ī£0 = Ī£,
identiļ¬cation
conditions (c) If rank(Ī²) > r , then only k ā rank(Ī²) of the remaining k ā r
Theorem
columns are zero columns, while s = rank(Ī²) ā r columns with
Example 3
column indices j1 , . . . , js diļ¬er from a zero column for a single
Example 4:
simulating
element lying in s diļ¬erent rows r1 , . . . , rs . The number of factors is
large data equal to r = rank(Ī²) ā s, while Ī£0 = Ī£ + D, where D is a m Ć m
Example 1: diagonal matrix of rank s with non-zero diagonal elements
revisited Drl ,rl = Ī²r2l ,jl for l = 1, . . . , s.
Conclusion
25. Indicator matrix
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
Since the identiļ¬cation of r and Ī from Ī² by Theorem 1 relies
issues
Number of
on identifying zero and non-zero elements in Ī², we follow
parameters
Ordering of the previous work on parsimonious factor modeling and consider the
variables
Example 2
selection of the elements of Ī² as a variable selection problem.
Reduced-rank
loading matrix
Structured
loadings We introduce for each element Ī²ij a binary indicator Ī“ij and
Parsimonious deļ¬ne Ī²ij to be 0, if Ī“ij = 0, and leave Ī²ij unconstrained
BFA
New
identiļ¬cation
otherwise.
conditions
Theorem
Example 3 In this way, we obtain an indicator matrix Ī“ of the same
Example 4: dimension as Ī².
simulating
large data
Example 1:
revisited
Conclusion
26. Number of factors
Example 1
Normal factor
Theorem 1 allows the identiļ¬cation of the true number of
model
Variance
factors r directly from the indicator matrix Ī“:
structure
Identiļ¬cation
issues k m
Number of
parameters r= I{ Ī“ij > 1},
Ordering of the
variables
j=1 i=1
Example 2
Reduced-rank
loading matrix where I {Ā·} is the indicator function, by taking spurious factors
Structured
loadings
into account.
Parsimonious
BFA
New
This expression is invariant to permuting the columns of Ī“
identiļ¬cation
conditions which is helpful for MCMC based inference with respect to r .
Theorem
Example 3
Example 4: Our approach provides a principled way for inference on r , as
simulating
large data opposed to previous work which are based on rather heuristic
Example 1: procedures to infer this quantity (Carvalho et al., 2008;
revisited
Bhattacharya and Dunson, 2009).
Conclusion
27. Model size
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
The model size d is deļ¬ned as the number of non-zero
parameters
Ordering of the elements in Ī, i.e.
variables
Example 2 k m
Reduced-rank
loading matrix d= dj I {dj > 1}, dj = Ī“ij .
Structured
loadings j=1 j=1
Parsimonious
BFA
New
identiļ¬cation
conditions
Ė
Model size could be estimated by the posterior mode d or the
Theorem
posterior mean E(d|y) of p(d|y).
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
28. The Prior of the Indicator Matrix
Example 1
To deļ¬ne p(Ī“), we use a hierarchical prior which allows
Normal factor
model diļ¬erent occurrence probabilities Ļ = (Ļ1 , . . . , Ļk ) of non-zero
Variance
structure elements in the diļ¬erent columns of Ī² and assume that all
Identiļ¬cation
issues
Number of
indicators are independent a priori given Ļ :
parameters
Ordering of the
variables
Pr(Ī“ij = 1|Ļj ) = Ļj , Ļj ā¼ B (a0 , b0 ) .
Example 2
Reduced-rank
loading matrix
Structured
loadings A priori r may be represented as r = k I {Xj > 1}, where
j=1
Parsimonious
BFA
X1 , . . . , Xk are independent random variables and Xj follows a
New
identiļ¬cation
Beta-binomial distribution with parameters Nj = m ā j + 1, a0 ,
conditions
Theorem and b0 .
Example 3
Example 4:
simulating
We recommend to tune for a given data set with known values
large data m and k the hyperparameters a0 and b0 in such a way that
Example 1:
revisited
p(r |a0 , b0 , m, k) is in accordance with prior expectations
Conclusion
concerning the number of factors.
29. The Prior on the Idiosyncratic
Example 1
Normal factor
Variances
model
Variance
structure
Identiļ¬cation
issues
Number of Heywood problems typically occur, if the constraint
parameters
Ordering of the
variables 1 1
Example 2 ā„ (ā¦ā1 )ii ā Ļi2 ā¤
Reduced-rank Ļi2 (ā¦ā1 ) ii
loading matrix
Structured
loadings is violated, where the matrix ā¦ is the covariance matrix of yt .
Parsimonious
BFA
New Improper priors on the idiosyncratic variances such as
identiļ¬cation
conditions
p(Ļi2 ) ā 1/Ļi2 ,
Theorem
Example 3
Example 4:
simulating are not able to prevent Heywood problems.
large data
Example 1:
revisited
Conclusion
30. We assume instead a proper inverted Gamma prior
Example 1 Ļi2 ā¼ G ā1 (c0 , Ci0 )
Normal factor
model
Variance
for each of the idiosyncratic variances Ļi2 .
structure
Identiļ¬cation
issues
Number of
parameters
c0 is large enough to bound the prior away from 0, typically
Ordering of the
variables
c0 = 2.5.
Example 2
Reduced-rank
loading matrix Ci0 is chose by controlling the probability of a Heywood
Structured
loadings
problem
Parsimonious Pr(X ā¤ Ci0 (ā¦ā1 )ii )
BFA
New
identiļ¬cation
where X ā¼ G (c0 , 1). So, Ci0 as the largest value for which
conditions
Theorem
1
Example 3
Ci0 /(c0 ā 1) ā¤
Example 4: (Sā1 )ii
y
simulating
large data
or
Example 1:
revisited Ļi2 ā¼ G ā1 c0 , (c0 ā 1)/(Sā1 )ii .
y
Conclusion
31. The Prior on the Factor Loadings
Example 1
Normal factor
model
Variance
structure We assume that the rows of the coeļ¬cient matrix Ī² are
Identiļ¬cation
issues
Number of
independent a priori given the factors f1 , . . . , fT .
parameters
Ordering of the
variables
Example 2
Let Ī² Ī“ be the vector of unconstrained elements in the ith row
iĀ·
Reduced-rank
loading matrix
of Ī² corresponding to Ī“. For each i = 1, . . . , m, we assume
Structured that
Ī² Ī“ |Ļi2 ā¼ N bĪ“ , BĪ“ Ļi2 .
loadings
iĀ· i0 i0
Parsimonious
BFA
New
identiļ¬cation
conditions
Theorem The prior moments are either chosen as in Lopes and West
Example 3 (2004) or Ghosh and Dunson (2009) who considered a āunit
Example 4: scaleā prior where bĪ“ = 0 and BĪ“ = I.
i0 i0
simulating
large data
Example 1:
revisited
Conclusion
32. Fractional prior
Example 1
Normal factor
model We use a fractional prior (OāHagan, 1995) which was applied by
Variance
structure
Identiļ¬cation
several authors for variable selection in latent variable models3
issues
Number of
parameters
Ordering of the
variables
The fractional prior can be interpreted as the posterior of a
Example 2 non-informative prior and a fraction b of the data.
Reduced-rank
loading matrix
Structured
loadings
We consider a conditionally fractional prior for the āregression
Parsimonious
modelā
BFA yi = XĪ“ Ī² Ī“ + Ėi ,
Ė i iĀ·
New
identiļ¬cation
conditions
Theorem where yi = (yi1 Ā· Ā· Ā· yiT ) and Ėi = ( i1 Ā· Ā· Ā· iT ) . XĪ“ is a
Ė i
Example 3 regressor matrix for Ī² Ī“ constructed from the latent factor
iĀ·
Example 4: matrix F = (f1 Ā· Ā· Ā· fT ) .
simulating
large data
Example 1:
revisited 3
Smith & Kohn, 2002; FrĀØhwirth-Schnatter & TĀØchler, 2008; TĀØchler,
u u u
Conclusion 2008.
33. Using
Example 1 p(Ī² Ī“ |Ļi2 ) ā p(Ėi |Ī² Ī“ , Ļi2 )b
iĀ· y iĀ·
Normal factor
model we obtain from regression model:
Variance
structure
Identiļ¬cation
issues Ī² Ī“ |Ļi2 ā¼ N biT , BiT Ļi2 /b ,
iĀ·
Number of
parameters
Ordering of the
variables where biT and BiT are the posterior moments under an
Example 2 non-informative prior:
Reduced-rank
loading matrix
Structured
ā1
loadings BiT = (XĪ“ ) XĪ“
i i , biT = BiT (XĪ“ ) yi .
i Ė
Parsimonious
BFA
New
identiļ¬cation
conditions
Theorem
It is not entirely clear how to choose the fraction b for a factor
Example 3 model. If the regressors f1 , . . . , fT were observed, then we
Example 4: would deal with m independent regression models for each of
simulating
large data which T observations are available and the choice b = 1/T
Example 1: would be appropriate.
revisited
Conclusion
34. The factors, however, are latent and are estimated together
with the other parameters. This ties the m regression models
Example 1
together.
Normal factor
model
Variance
structure
Identiļ¬cation
If we consider the multivariate regression model as a whole,
issues
Number of
then the total number N = mT of observations has to be
parameters
Ordering of the taken into account which motivates choosing bN = 1/(Tm).
variables
Example 2
Reduced-rank
loading matrix In cases where the number of regressors d is of the same
Structured magnitude as the number of observations, Ley & Steel (2009)
loadings
recommend to choose instead the risk inļ¬ation criterion
Parsimonious
BFA bR = 1/d 2 suggested by Foster & George (1994), because bN
New
identiļ¬cation
conditions
implies a fairly small penalty for model size and may lead to
Theorem
overļ¬tting models.
Example 3
Example 4:
simulating In the present context this implies choosing bR = 1/d(k, m)2
large data
where d(k, m) = (km ā k(k ā 1)/2) is the number of free
Example 1:
revisited elements in the coeļ¬cient matrix Ī².
Conclusion
35. Most visited conļ¬guration
Example 1
Normal factor
model
Variance Let l = (l1 , . . . , lrM ) be the most visited conļ¬guration.
structure
Identiļ¬cation
issues
Number of
parameters We may then estimate for each indicator the marginal inclusion
Ordering of the
variables probability
Ī
Example 2 Pr(Ī“ij = 1|y, l )
Reduced-rank
loading matrix
Structured under l as the average over the elements of (Ī“ Ī ) .
loadings
Parsimonious
BFA Note that Pr(Ī“lĪ,j = 1|y, l ) = 1 for j = 1, . . . , rM .
j
New
identiļ¬cation
conditions
Theorem
Example 3
Following Scott and Berger (2006), we determine the median
Example 4:
probability model (MPM) by setting the indicators Ī“ij in Ī“ Ī to
Ī
M
simulating Ī
1 iļ¬ Pr(Ī“ij = 1|y, l ) ā„ 0.5.
large data
Example 1:
revisited
Conclusion
36. Example 1 The number of non-zero top elements rM in the identiļ¬ability
Normal factor constraint l is a third estimator of the number of factors, while
model
Variance
structure
the number dM of non-zero elements in the MPM is yet
Identiļ¬cation
issues
another estimator of model size.
Number of
parameters
Ordering of the
variables A discrepancy between the various estimators of the number of
Example 2
Reduced-rank
factors r is often a sign of a weakly informative likelihood and
loading matrix
it might help to use a more informative prior for p(r ) by
Structured
loadings choosing the hyperparameters a0 and b0 accordingly.
Parsimonious
BFA
New
identiļ¬cation
Also the structure of the indicator matrices Ī“ Ī and Ī“ Ī
H M
conditions
Theorem
corresponding, respectively, to the HPM and the MPM may
Example 3 diļ¬er, in particular if the frequency pH is small and some of the
Example 4:
Ī
inclusion probabilities Pr(Ī“ij = 1|y, l ) are close to 0.5.
simulating
large data
Example 1:
revisited
Conclusion
37. MCMC
Example 1 Highly eļ¬cient MCMC scheme:
Normal factor (a) Sample from p(Ī“|f1 , . . . , fT , Ļ , y):
model
Variance
structure
(a-1) Try to turn all non-zero columns of Ī“ into zero columns.
Identiļ¬cation (a-2) Update the indicators jointly for each of the remaining
issues
Number of non-zero columns j of Ī“:
parameters
Ordering of the (a-2-1) try to move the top non-zero element lj ;
variables
(a-2-2) update the indicators in the rows lj + 1, . . . , m.
Example 2
Reduced-rank (a-3) Try to turn all (or at least some) zero columns of Ī“ into a
loading matrix
non-zero column.
Structured
(b) 2 2
Sample from p(Ī², Ļ1 , . . . , Ļm |Ī“, f1 , . . . , fT , y).
loadings
(c) 2 2
Sample from p(f1 , . . . , fT |Ī², Ļ1 , . . . , Ļm , y).
Parsimonious
BFA (d) Perform an acceleration step (expanded factor model)
New (e) For each j = 1, . . . , k, perform a random sign switch: substitute the draws
identiļ¬cation
conditions of {fjt }T and {Ī²ij }m with probability 0.5 by {āfjt }T and {āĪ²ij }m ,
t=1 i=j t=1 i=j
Theorem
otherwise leave these draws unchanged. `
Example 3 Ā“
(f) Sample Ļj for j = 1, . . . , k from Ļj |Ī“ ā¼ B a0 + dj , b0 + pj , where pj is the
number of free elements and dj = m Ī“ij is number of non-zero elements
P
Example 4: i=1
simulating in column j.
large data
Example 1:
revisited
To generate sensible values for the latent factors in the initial model speciļ¬cation,
Conclusion we found it useful to run the ļ¬rst few steps without variable selection.
38. Adding a new factor
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
parameters ļ£« ļ£¶ ļ£« ļ£¶
Ordering of the
variables
X 0 0 X 0 0 0
Example 2
ļ£¬
ļ£¬ X 0 0 ļ£·
ļ£·
ļ£¬
ļ£¬ X 0 0 0 ļ£·
ļ£·
Reduced-rank
loading matrix
ļ£¬
ļ£¬ X 0 X ļ£·
ļ£·
ļ£¬
ļ£¬ X 0 X 0 ļ£·
ļ£·
Structured ļ£¬ X X X ļ£· =ā ļ£¬ X X X 0 ļ£·
loadings ļ£¬ ļ£· ļ£¬ ļ£·
ļ£¬ X X X ļ£· ļ£¬ X X X X ļ£·
Parsimonious ļ£¬ ļ£· ļ£¬ ļ£·
BFA ļ£ X X X ļ£ø ļ£ X X X 0 ļ£ø
New
identiļ¬cation
conditions
X X X X X X X
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
39. Removing a factor
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
parameters
Ordering of the
ļ£« ļ£¶ ļ£« ļ£¶ ļ£« ļ£¶
variables X 0 0 0 X 0 0 0 X 0 0
Example 2 ļ£¬ X 0 0 0 ļ£· ļ£¬ X 0 0 0 ļ£· ļ£¬ X 0 0 ļ£·
Reduced-rank ļ£¬ ļ£· ļ£¬ ļ£· ļ£¬ ļ£·
loading matrix ļ£¬
ļ£¬ X 0 X 0 ļ£·
ļ£·
ļ£¬
ļ£¬ X 0 X 0 ļ£·
ļ£·
ļ£¬
ļ£¬ X 0 X ļ£·
ļ£·
Structured ļ£·āļ£¬ ļ£·āļ£¬
loadings
ļ£¬
ļ£¬ X X X 0 ļ£· ļ£¬ X X X 0 ļ£· ļ£¬ X X X ļ£·
ļ£·
Parsimonious
ļ£¬
ļ£¬ X X X X ļ£·
ļ£·
ļ£¬
ļ£¬ X X X X ļ£·
ļ£·
ļ£¬
ļ£¬ X X X ļ£·
ļ£·
BFA
New
ļ£ X X X 0 ļ£ø ļ£ X X X 0 ļ£ø ļ£ X X X ļ£ø
identiļ¬cation
conditions X X X X X X X 0 X X X
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
40. Example 3: Maxwellās Data
Example 1
Normal factor Scores on 10 tests for a sample of T = 148 children attending
model
Variance
a psychiatric clinic as well as a sample of T = 810 normal
structure
Identiļ¬cation children.
issues
Number of
parameters
Ordering of the
variables The ļ¬rst ļ¬ve tests are cognitive tests ā (1) verbal ability, (2)
Example 2 spatial ability, (3) reasoning, (4) numerical ability and (5)
Reduced-rank
loading matrix verbal ļ¬uency. The resulting tests are inventories for assessing
Structured
loadings
orectic tendencies, namely (6) neurotic questionaire, (7) ways
Parsimonious to be diļ¬erent, (8) worries and anxieties, (9) interests and (10)
BFA
New
annoyances.
identiļ¬cation
conditions
Theorem
Example 3
While psychological theory suggests that a 2-factor model is
Example 4: suļ¬cient to account for the variation between the test scores,
simulating
large data
the signiļ¬cance test considered in Maxwell (1961) suggested to
Example 1: ļ¬t a 3-factor model to the ļ¬rst and a 4-factor model to the
revisited
second data set.
Conclusion
41. Example 1
Table: Maxwellās Children Data - neurotic children; posterior
Normal factor
distribution p(r |y) of the number r of factors (bold number
model corresponding to the posterior mode Ė) and highest posterior
r
Variance
structure identiļ¬ability constraint l with corresponding posterior probability
Identiļ¬cation
issues p(l |y) for various priors; number of visited models Nv ; frequency pH ,
Number of
parameters number of factors rH , and model size dH of the HPM; number of
Ordering of the
variables factors rM and model size dM of the MPM corresponding to l ;
Example 2 ineļ¬ciency factor Ļd of the posterior draws of the model size d.
Reduced-rank
loading matrix
p(r |y)
Structured Prior 2 3 4 5-6 l p(l |y)
loadings b = 10ā3 0.755 0.231 0.014 0 (1,6) 0.532
bN = 6.8 Ā· 10ā4 0.828 0.160 0.006 0 (1,6) 0.623
Parsimonious
bR = 4.9 Ā· 10ā4 0.871 0.127 0.001 0 (1,6) 0.589
BFA
b = 10ā4 0.897 0.098 0.005 0 (1,6) 0.802
New
identiļ¬cation GD 0.269 0.482 0.246 0.003 (1,2,3) 0.174
conditions LW 0.027 0.199 0.752 0.023 (1,2,3,6) 0.249
Theorem Prior Nv pH rH dH rM dM Ļd
Example 3 b = 10ā3 1472 0.20 2 12 2 12 30.9
bN = 6.8 Ā· 10ā4 976 0.27 2 12 2 12 27.5
Example 4: bR = 4.9 Ā· 10ā4 768 0.34 2 12 2 12 22.6
simulating b = 10ā4 421 0.45 2 12 2 12 18.1
large data GD 4694 0.06 2 15 3 19 40.6
LW 7253 0.01 4 24 4 24 32.5
Example 1:
revisited
Conclusion
42. Example 1 Table: Maxwellās Children Data - normal children; posterior
Normal factor distribution p(r |y) of the number r of factors (bold number
model
Variance corresponding to the posterior mode Ė) and highest posterior
r
structure
Identiļ¬cation identiļ¬ability constraint l with corresponding posterior probability
issues
Number of p(l |y) for various priors; number of visited models Nv ; frequency pH ,
parameters
Ordering of the number of factors rH , and model size dH of the HPM; number of
variables
Example 2
factors rM and model size dM of the MPM corresponding to l ;
Reduced-rank ineļ¬ciency factor Ļd of the posterior draws of the model size d.
loading matrix
Structured p(r |y)
loadings Prior 3 4 5 6 l p(l |y)
b = 10ā3 0 0.391 0.604 0.005 (1,2,4,5,6) 0.254
Parsimonious bN = 1.2 Ā· 10ā4 0 0.884 0.116 0 (1,2,4,5) 0.366
BFA b = 10ā4 0 0.891 0.104 0.005 (1,2,4,5) 0.484
New GD 0 0.396 0.594 0 (1,2,4,5,6) 0.229
identiļ¬cation
conditions LW 0 0.262 0.727 0.011 (1,2,4,5,6) 0.259
Theorem Prior Nv pH rH dH rM dM Ļd
b = 10ā3 4045 1.79 5 26 5 27 30.1
Example 3
bN = 1.2 Ā· 10ā4 1272 11.41 4 23 4 23 28.5
Example 4: b = 10ā4 1296 12.17 4 23 4 23 29.1
simulating GD 4568 1.46 5 29 5 28 32.7
large data LW 5387 1.56 5 28 5 28 30.7
Example 1:
revisited
Conclusion
46. Example 1: revisited
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation Outcome system Measurement system (age
issues
Number of
parameters Education: A-level or above (i.e., 10)
Ordering of the
variables Diploma or Degree) Cognitive tests: 7 scales
Example 2 Health (age 30): self-reported poor Personality traits: 5 sub-scales
Reduced-rank health, obesity, daily smoking =ā see next slide
loading matrix
Log hourly wage (age 30)
Structured
loadings
Parsimonious
BFA Control variables
New motherās age and education at birth, fatherās high social class at birth,
identiļ¬cation
conditions total gross family income and number of siblings at age 10, broken family,
Theorem
number of previous livebirths.
Example 3
Exclusions in choice equation: deviation of gender-speciļ¬c county-level
Example 4:
simulating
unemployment rate and of gross weekly wage from their long-run average
large data wages, for the years 1987-88.
Example 1:
revisited
Conclusion
47. Measurement system: 126 items
Example 1
Normal factor
(age 10)
model
Variance
structure
Identiļ¬cation
issues
Number of
Cognitive tests (continuous Personality trait scales
parameters
Ordering of the scales)
variables
ā¢ Rutter Parental āAā Scale of
Example 2
Reduced-rank
ā¢ Picture Language Behavioral Disorder: 19
loading matrix Comprehension Test continuous items
Structured
loadings ā¢ Friendly Math Test ā¢ Conners Hyperactivity Scale:
ā¢ Shortened Edinburgh Reading 19 continuous items
Parsimonious
BFA
Test ā¢ Child Developmental Scale
New
identiļ¬cation [CDS]: 53 continuous items
conditions
British Ability Scales (BAS):
Theorem ā¢ Locus of Control Scale [LoC]:
Example 3 ā¢ BAS - Matrices 16 binary items
Example 4: ā¢ BAS - Recall Digits ā¢ Self-Esteem Scale [S-E]: 12
simulating
large data ā¢ BAS - Similarities binary items
Example 1:
revisited
ā¢ BAS - Word Deļ¬nition
Conclusion
48. Model ingredients
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
parameters
Ordering of the
variables
Example 2
Reduced-rank
loading matrix
Structured
loadings
Parsimonious
BFA
New
identiļ¬cation
conditions
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
52. Conclusion
Example 1
Normal factor
model
Variance
structure
Identiļ¬cation ā¢ Summary
issues
Number of ā¢ New take on factor model identiļ¬ability
parameters
Ordering of the ā¢ Generalized triangular parametrization
variables
Example 2
ā¢ Customized MCMC scheme
Reduced-rank ā¢ Highly eļ¬cient model search strategy
loading matrix
Structured
ā¢ Posterior distribution for the number of common factors
loadings
Parsimonious
BFA
New ā¢ Extensions
identiļ¬cation
conditions
Theorem
ā¢ Nonlinear (discrete choice) factor models
Example 3 ā¢ Non-normal (mixture of) factor models
Example 4:
ā¢ Dynamic factor models
simulating
large data
Example 1:
revisited
Conclusion
53. Example 1
Normal factor
model
Variance
structure
Identiļ¬cation
issues
Number of
parameters
Ordering of the
variables
Example 2
Reduced-rank
loading matrix
Structured
loadings Thank you!
Parsimonious
BFA
New
identiļ¬cation
conditions
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion