SlideShare ist ein Scribd-Unternehmen logo
1 von 52
ATHENS UNIVERSITY OF ECONOMICS AND BUSINESS
DEPARTMENT OF STATISTICS

Efficient

Bayesian Marginal Likelihood
estimation in

Generalised Linear Latent Variable Models
thesis submitted by

Silia Vitoratou
advisors

Ioannis Ntzoufras
Irini Moustaki
Athens, 2013
Overview

Thesis structure

Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
2
Chapter 1

Key ideas and origins of the latent variable models (LVM).
“...co-relation must be the consequence of the variations
of the two organs being partly due to common causes ...“
Francis Galton, 1888.

• Suppose we want to infer for concepts that cannot be measured directly (such
as emotions, attitudes, perceptions, proficiency etc).
• We assume that they can be measured indirectly through other observed
items.
• The key idea is that all dependencies among p-manifest variables (observed
items) are attributed to k-latent (unobserved) ones.
• By principle, k << p. Hence, at the same time, the LVM methodology is a
multivariate analysis technique which aims to reduce the dimensionality, with
as little loss of information as possible.
3
Chapter 1

A unified approach: Generalised linear latent variable
models (GLLVM).
Generalized linear latent variable model (GLLVM; Bartholomew &Knott, 1999; Skrondal and
Rabe-Hesketh, 2004) . The models assumes that the response variables are linear combinations
of the latent ones and it consists of three components:
(a) the multivariate random component: where each observed item Yj, (j = 1, ..., p)
has a distribution from the exponential family (Bernoulli, Multinomial, Normal,
Gamma),
(b) the systematic component: where the latent variables Zℓ, ℓ = 1, ..., k, produce the
linear predictor ηj for each Yj

(c) the link function : which connects the previous two components

4
Chapter 1

A unified approach: Generalised linear latent variable
models (GLLVM).
Special case: Generalized linear latent trait model- with binary items (Moustaki
&Knott, 2000) .

The conditionals
are in this case Bernoulli(
), where
is
the conditional probability of a positive response to the observed item. The
logistic model is used for the response probabilities:

• The item parameters
are often referred to as the difficulty and
the discrimination parameters (respectively) of the item j.
All examples considered in this thesis refer to multivariate IRT (2-PL) models.
Current findings apply directly or can be expanded to any type of GLLVM.
5
Chapter 1

A unified approach: Generalised linear latent variable
models (GLLVM).
As only the p-items can be observed, any inference must be based on their joint
distribution.

All data dependencies are attributed to the existence of the latent variables.
Hence, the observed variables are assumed independent given the latent (local
independence assumption) :

where
is the prior distribution for the latent variables. A fully Bayesian
approach requires that the item parameter vector
is also
stochastic, associated with a prior probability.
6
Chapter 2

The fully Bayesian analogue: GLLTM with binary items
A) Priors
All model parameters are assumed a-priori independent

where
Prior from
Ntzoufras et al. (2000)
Fouskakis et al. (2009)

leading to

For unique solution we use the
Cholesky decomposition on B:

7
Chapter 2

The fully Bayesian analogue: GLLTM with binary items
B) Sampling from the posterior
• A Metropolis-within-Gibbs algorithm initially presented for IRT models by Patz and
Junker (1996) was used here for the multivariate case (k>1).
• Each item is updated in one block. So are the latent variables for each person.

C) Model evaluation
• In this thesis, the Bayes Factor (BF; Jeffreys, 1961; Kass and Raftery, 1995) was used for
model comparison.
• The BF is defined as the ratio of the posterior odds of two competing models (say m1
and m2) multiplied by their corresponding prior odds. Provided that the models have
equal prior probabilities, is given by:

that is, the ratio of the two models’ marginal or integrated likelihoods (hereafter
Bayesian marginal likelihood; BML).
8
Chapter 2

Estimating the Bayesian marginal likelihood
The BML (also known as the prior predictive distribution) is defined as the
expected model likelihood over the model parameters’ prior:

that quite often is a high dimensional integral, not available in closed form.
Monte Carlo integration is often used to estimate it, as for instance the
arithmetic mean:

This simple estimator does not really work adequately and a plethora of
Markov chains Monte Carlo (MCMC) techniques are employed instead in the
literature.
9
Chapter 2

Estimating the Bayesian marginal likelihood
 The point based estimators (PBE) employ the candidates’ identity (Besag, 1989),
in a point of high density:
• Laplace-Metropolis (LM; Lewis & Raftery, 1997)
• Gaussian copula (GC; Nott et al, 2008)
• Chib & Jeliazkov (CJ; Chib & Jeliazkov, 2001)

 The bridge sampling estimators (BSE), employ a bridge function
, based
on the form of which, several BML identities can be derived (even pre–
existing):
• Harmonic mean (HM; Newton & Raftery, 1994)
• Reciprocal mean (RM; Gelfand & Dey, 1994)

• Bridge harmonic (BH; Meng & Wong, 1996)
• Bridge geometric (BG; Meng & Wong, 1996)

 The path sampling estimators (PSE), employ a continuous and differential
path
, to link two un-normalised densities and compute the ratio of the
corresponding constants:
• Power posteriors (PPT; Friel & Pettitt, 2008; Lartillot &Philippe, 2006)
• Steppingstone (PPS ; Xie at al, 2011)
• Generalised steppingstone (IPS; Fan et al, 2011)

10
Chapter 3

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Monte Carlo integration: the case of GLLVM
From the early readings the methods applied for the parameter estimation of
model settings with latent variables relied on the
joint likelihood
Lord and Novick, 1968;
Lord,1980

or the
marginal likelihood
Bock and Aitkin, 1981;
Moustaki and Knott, 2000

Under the conditional independence assumptions of the GLLVMs, there are two
equivalent formulations of the BML, which lead to different MC estimators, namely the

joint BML
and the
marginal BML
11
Chapter 3

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Monte Carlo integration: the case of GLLVM
A motivating example
A simulated data set with p = 6 items, N = 600 cases and k = 2 factors was considered.
Three popular BSE were computed under both approaches (R= 50,000 posterior
observations , after burn in period of 10,000 and thinning interval of 10).

• BH: Largest error difference
but rather close estimation...
• BG: Largest difference in the
estimation without large error
difference...
Differences are due to Monte
Carlo integration, under
independence assumptions
12
Chapter 3

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Monte Carlo integration: the case of GLLVM

The joint version of BH comes with much
higher MCE than the RM...

...but is the joint version of RM that fails to
converge to the true value.

?

13
Chapter 3

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Monte Carlo integration under independence
•

Consider any integral of the form:

•

The corresponding MC estimator is:
assuming a random sample of points drawn from h

•

The corresponding Monte Carlo Error (MCE) is:

•

Assume independence, that is,
hence

14
Chapter 3

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Monte Carlo integration under independence
The two estimators are associated with different MCEs. Based on the early results of
Goodman (1962), for the variance of N independent variables, the variances of the
estimators are:

for each term

In finite settings, the difference can be outstanding

15
Chapter 3

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Monte Carlo integration under independence
In particular, the difference in the variances is given by

Naturally, it depends on R. Note however that also it depends on
• dimensionality (N), since more positive terms are added, and
• on the means and variances of the N variables involved

At the same time, the difference in the means is given by

• Total covariation index (multivariate extension of the covariance).
• Under independence the index should be zero (the reverse statement does not hold)
• At the sample, the covariances, no matter how small, are non-zero leading to non zero TCI.
•Depends also on the number of the variables (N), their means, and their variation through the
covariances
16
Chapter 3

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Monte Carlo integration: the case of GLLVM
A motivating example-Revisited

Different
variables are
being
averaged,
leading to
different
variance
components

Total covariance cancels out for the BH.
17
Chapter 3

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Monte Carlo integration & independence
Refer to Chapter 3 of the current thesis for:
•

more results on the error difference,

•

properties of the TCI,

•

extension to conditional independence,

•

and more illustrative examples.

18
Chapter 4

Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Basic idea
Based on the work of Chib & Jeliazkov(2001), it is shown in Chapter 3 that the
Metropolis kernel can be used to marginalise out any subset of the parameter vector,
that otherwise would not be feasible.
• Consider the kernel of the Metropolis – Hastings algorithm, which denotes the
transition probability of sampling
, given that
has been already generated:

Transition
probability

Acceptance
probability

Proposal
density

• Then, the latent vector can be marginalised out directly from the Metropolis kernel as
follows:

19
Chapter 4

Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Chib & Jeliazkov estimator
Let us suppose that the parameter space is divided into p blocks of parameters. Then, using the Law
of total probability, the posterior at a specific point can be decomposed to

• If analytically available use candidates’ (Besag, 1989) formula to compute the BML directly.
• If the full conditionals are known, Chib (1995) uses the output from the Gibbs sampler to estimate them.
• Otherwise Chib and Jeliazkov (2001) show that each posterior ordinate can be computed by

Requires p
sequential
MCMC
runs.

20
Chapter 4

Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Chib & Jeliazkov estimator for models with latent vectors
The number of latent variables can be hundreds if not thousands. Hence the method is time
consuming. Chib & Jeliazkov suggest to use the last ordinate to marginalise out the latent vector,
provided that
is analytically tractable (often it is not).
In Chapter 4 of the thesis, it is shown that the latent vector can be marginalised out directly from
the MH kernel, as follows:
Hence the
dimension of the
latent vector is
not an issue.

This observation however leads to another result. Assuming local independence, prior
independence and a Metropolis - within – Gibbs algorithm, as in the case of the GLLVM, the Chib
& Jeliazkov identity is drastically simplified as follows:
Hence the number
of blocks , also, is
not an issue.

• The latent vector is marginalised out as previously.
• Moreover, even there are p-blocks for the model parameters, only the full MCMC is required.
• Can be used under data augmentations schemes that produce independence

21
Chapter 4

Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Independence Chib & Jeliazkov estimator
Three simulated data sets – under different scenarios. Compare CJI with ML estimators.

Rtotal

1st batch

30 batches
1000
2000
3000
iterations
per batch

22
Chapter 6

Implementation in simulated and real life datasets

Some results
•p =6 items,
•N=600 individuals,
•k=1 factor

kmodel = ktrue

23
Chapter 6

Implementation in simulated and real life datasets

Some results
•p =6 items,
•N=600 individuals,
•k=2 factors

kmodel = ktrue

24
Chapter 6

Implementation in simulated and real life datasets

Some results
•p =8 items,
•N=700 individuals,
•k=3 factor

kmodel = ktrue

25
Chapter 6

Implementation in simulated and real life datasets

Some results
•p =6 items,
•N=600 individuals,
•k=1 factor

kmodel <ktrue

26
Chapter 6

Implementation in simulated and real life datasets

Some results
•p =6 items,
•N=600 individuals,
•k=2 factors

kmodel >ktrue

27
Chapter 6

Implementation in simulated and real life datasets

Concluding comments
 Refer to Chapter 4 of the current thesis for more details on the implementation of the CJI
(or see Vitoratou et al, 2013) :
More comparisons are presented in Chapter 6 of the thesis, in simulated and real data
sets. Some comments:
• The harmonic mean failed in all cases.
• The BSE were successful in all examples.
o The BG estimator was consistently associated with the smallest error.

o The RM was also well behaved in all cases.
o The BH was associated with more error that the former two BSE.

• The PBE are well behaved:
o LM is very quick and efficient – but might fail if the posterior is not symmetrical.
o Similarly for the GC.
o CJI is well behaved but time consuming. Since it is distributional free, can be used
as a benchmark method to get an idea of the BML.
28
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamics and Bayes
Ideas initially implemented in thermodynamics are currently explored in Bayesian
model evaluation.
Assume two unnormalised densities (q1 and q0) and we are interested in the
ratio of their normalising constants (λ).
For that purpose we use a continuous and differential function of the form
geometric path which
links the endpoint densities
temperature parameter
Boltzmann-Gibbs distribution
Partition function

Then the ratio λ can be computed via the thermodynamic integration identity (TI):
Bayes free energy
29
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamics and BML: Power Posteriors
The first application of the TI to the problem of estimating the BML is the power
posteriors (PP) method (Friel and Pettitt, 2008; Lartillot and Philippe, 2006). Let

then
prior-posterior path
power posterior

leading via the thermodynamic integration to the Bayesian marginal likelihood
For ts close to 0 we
sample from densities
close to the prior,
where the variability is
typically high.
30
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamics and BML: Importance Posteriors
Lefebvre et al. (2010) considered other options than the prior for the zero
endpoint, keeping the unnormalised posterior at the unit endpoint. Any proper
density g() will do:

An appealing option is to use an importance (envelope) function, that is a
density as close as possible to the posterior).
importance-posterior path

importance posterior

For ts close to 0 we
sample from densities
close to the importance
function, solving the
problem
of
high
variability.
31
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

An alternative approach: stepping-stone identities
Xie et al (2011) using the prior and the posterior as endpoint densities, considered a different approach
to compute the BMI, also related to thermodynamics (Neal, 1993). First, the interval [0,1] is partitioned
into n points and the free energy can be computed as:
Stepping
stone
• Under the power posteriors path, Xie et al (2011) showed that the BML occurs as:

• Under the importance posteriors path, Fan et al (2011) showed that the BML occurs as:

However, the stepping–stone identity (SI) is even more general and can be used under
different paths, as an alternative to the TI:

32
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Path sampling identities for the BML- revisited
Hence, there are two general identities to compute a ratio of normalising
constants, within the path sampling framework, namely

Different paths lead to different expressions for the BML:

Identity for the BML

path

TI

SI

Prior
posterior

Power posteriors (PPT)

Stepping-stone (PPS)

Importance
posterior

Importance posteriors (IPT)

Generalised stepping stone (IPS)

Friel and Pettitt, 2008
Lartillot and Philippe, 2006

inspired by Lefebvre et al. (2010)

Xie et al (2011)

Fan et al (2011)

Other paths can be used, under both approaches, to derive identities
for the BML or any other ratio of normalising constants.
Hereafter, the identities with be named by the path employed, with a
subscript denoting the method implemented, e.g. IPS

33
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamics & direct BF identities: Model switching
Lartillot and Philippe (2006) considered as endpoint densities the unormalised
posteriors of two competing models:

leading to the model switching path

leading via the thermodynamic integration to the Bayes Factor
bidirectional
melting-annealing
sampling scheme.

While it is easy to derive the SI counterpart expression:

34
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamics & direct BF identities: Quadrivials
Based on the idea of Lartillot and Philippe (2006) we may proceed with the compound paths.
which consist of
• a hyper, geometric path

which links two competing models, and

• a nested, geometric path

for each endpoint function Qi , i=0,1.

The two intersecting paths form a quadrivial

Which can be used either with the TI or the SI approach. If the ratio of interest is the BF, the two
BMLs should be derived at the endpoints of [0,1]. The PP and the IP paths are natural choices for
the nested part of the identity. For the latter

35
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Sources of error in path sampling estimators
a) The integral over [0,1] in the TI is typically approximated via numerical
approaches, such as the trapezoidal or Simpson’s rule (Neal, 1993; Gelman and
Meng, 1998), which require an n-point discretisation of [0,1]:

Note that the temperature schedule is also required for the SI method (it defines
the stepping stone ratios) . The discretisation introduces error to the TI and SI
estimators, that is referred to as the discretisation error.
It can be reduced by a) increasing the number of points n and/or b) by assigning
more points closer to the endpoint that is associated higher variability.
b) At each point
the corresponding

, a separate MCMC run is performed with target distribution
. Hence, Monte Carlo error occurs also at each run.

c) As a third source of error can be considered also the path-related error.
We may gain insight into a) and c) by considering the measures of
entropy related to the TI.
36
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Performance: Pine data-a simple regression example
Measurements taken on 42 specimens. A linear regression model was fitted for the
specimen’s maximum compressive strength (y), using their density (x) as independent
variable:
The objective in this example is to illustrate how each method and path combination
responds to prior uncertainty. To do so, we use three different prior schemes, namely:

The ratios of the corresponding BMLs under the three priors were estimated over n1 = 50
and n2 = 100 evenly spaced temperatures. At each temperature, a Gibbs algorithm was
implemented and 30,000 posterior observations were generated; after discarding 5,000 as a
burn-in period.
37
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Performance: Pine data-a simple regression example
Implementing a uniform temperature schedule:

Reflects
difference
in the
path-related
error

Reflects
difference
in the
discretisation
error

All quadrivals
come with
smaller batch
mean error

Note: PP works just fine under a geometric temperature schedule that samples
more points from the prior.

38
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamic integration & distribution divergencies
Based on the prior-posterior path, Friel and Pettitt (2008) and Lefebvre et al. (2010)
showed that the PP method is connected with the Kullback – Leibler diveregence
(KL; Kullback & Leibler, 1951).
Relative entropy

Differential entropy

Cross entropy

Here we present their findings on a general form, that is, for any geometric path
according to the TI

it holds that

symmetrised KL
39
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamic integration & distribution divergencies
Graphical representation of the TI

What about
the
intermediate
points?

40
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamic integration & distribution divergencies
TI minus free energy at each point

Instead of integrating
the mean energy over
the entire interval [0,1],
there is an optimal
temperature, where the
mean energy equals the
free energy.

41
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamic integration & distribution divergencies
Graphical representation of the NTI

functional KL

difference in the
KL-distance of
the sampling
distribution pt
from p1 and p0

The ratio of
interest occurs at
the point
where the
sampling
distribution is
equidistant from
the endpoint
densities

42
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamic integration & distribution divergencies
The normalised thermodynamic integral
Hence:
•According to the PPT method, the BML occurs at the point where the sampling
distribution is equidistant from the prior and the posterior.
•According to the QMST method, the BF occurs at the point where the sampling
distribution is equidistant from the two posteriors.

The sampling distribution pt is the Boltzmann-Gibbs distribution pertaining to the Hamiltonian
(energy function)
. Therefore

•according to the NTI, when geometric paths are employed, the free energy
occurs at the point where the Boltzmann-Gibbs distribution is equidistant from the
distributions at the endpoint states.

43
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamic integration & distribution divergencies
Graphical representation of the NTI

What are
the areas
stand for?

44
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamic integration & distribution divergencies
The normalised thermodynamic integral and probability distribution divergencies
A key observation here is that the sampling distribution embodies the Chernoff coefficient
(Chernoff, 1952) :

Based on that, the NTI can be written as:

meaning that

and therefore, the areas correspond to the Chernoff t-divergence. At t=t*, we obtain
the so-called Chernoff information:

45
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamic integration & distribution divergencies
Using the output from path sampling, the Chernoff divergence can be
computed easily (see Chapter 5 of the thesis for a step-by step algorithm).
Along with the Chernoff estimation, a number of other f-divergencies can
be directly estimated, namely

• the Bhattacharyya distance (Bhattacharyya, 1943) at t = 0.5,
• the Hellinger distance (Bhattacharyya, 1943; Hellinger, 1909),
• the Rényi t-divergence (Rényi, 1961) and
• the Tsallis t-relative entropy (Tsallis, 2001) .
These measures of entropy are commonly used in
• information theory, pattern recognition, cryptography, machine learning,

• hypothesis testing
• and recently, in non-equilibrium thermodynamics.

46
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Thermodynamic integration & distribution divergencies
Measures of entropy and the NTI

47
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Path selection, temperature schedule and error.
These results provide insight also on the error of the path sampling estimators. To begin with
Lefebre et al (2010) have showed that the total variance is associated with the J−divergence of
the endpoint densities and therefore with the choice of the path. Graphically
• the J-distance
coincides with the
slope of the secant
defined at the
endpoint densities.

The shape of the
curve is a
graphical
representation of
the total
variance.

• the slope of the tangent at
a particular point ti,
coincides with the local
variance

Higher local
variances, at the
points where the
curve is steeper.

• the graphical
representation of two
competing paths provides
information about the
estimators’ variances.

Paths with
smaller cliffs are
easier to take!

48
Chapter 5

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Path selection, temperature schedule and error.
Numerical approximation of the TI:
Assign more tis at points where the curve is steeper (higher local variances)

Different level
of accuracy
towards the
two endpoints

The discretization
error depends
primarily on the
path

49
Future work
Currently developing a library in R for BML estimation in GLLTM with Danny Arends.

Expand results (and R library) to account for other type of data.
Further study on the TCI (Chapter 3).
Use the ideas in Chapter 4 to construct a better Metropolis algorithm for GLLVMs.
Proceed further on the ideas presented in Chapter 5, with regard to the quadrivials, the
temperature schedule and the optimal t*. Explore applications to information criteria.

50
Bibliography
Bartholomew, D. and Knott, M. (1999). Latent variable models and factor analysis. Kendall’s Library of Statistics, 7. Wiley.
Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of
the Calcutta Mathematical Society, 35:99–109.
Besag, J. (1989). A candidate’s formula: A curious result in Bayesian prediction. Biometrika, 76:183.
Bock, R. and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika,
46:443–459.
Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical
Statistics, 23(4).
Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90:1313–1321.
Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association,
96:270–281.
Fan, Y., Wu, R., Chen, M., Kuo, L., and Lewis, P. (2011). Choosing among partition models in Bayesian phylogenetics. Molecular Biology and
Evolution, 28(2):523–532.
Fouskakis, D., Ntzoufras, I., and Draper, D. (2009). Bayesian variable selection using cost-adjusted BIC, with application to cost-effective
measurement of quality of healthcare. Annals of Applied Statistics, 3:663–690.
Friel, N. and Pettitt, N. (2008). Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society Series B (Statistical
Methodology), 70(3):589–607.
Gelfand, A. E. and Dey, D. K. (1994). Bayesian Model Choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society. Series
B (Methodological), 56(3):501–514.
Gelman, A. and Meng, X. (1998). Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical
Science, 13(2):163–185.
Goodman, L. A. (1962). The variance of the product of K random variables. Journal of the American Statistical Association, 57:54–60.
Hellinger, E. (1909). Neue Begr¨undung der Theorie quadratischer Formen von unendlichvielen Veranderlichen. Journal fddotur die reine und
angewandte Mathematik, 136:210–271.
Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A,
Mathematical and Physical Sciences, 186(1007):453–461.
Kass, R. and Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:773–795.
Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22:49–86.
Lewis, S. and Raftery, A. (1997). Estimating Bayes factors via posterior simulation with the Laplace Metropolis estimator. Journal of the
American Statistical Association, 92:648–655.
Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using Thermodynamic Integration. Systematic Biology, 55:195–207.
Lefebvre, G., Steele, R., and Vandal, A. C. (2010). A path sampling identity for computing the Kullback-Leibler and J divergences.
Computational Statistics and Data Analysis, 54(7):1719–1731.
Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems.Erlbaum Associates, Hillsdale, NJ.
Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley, Oxford, UK

51
Meng, X.-L. and Wong, W.-H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica
Sinica, 6:831–860.
Moustaki, I. and Knott, M. (2000). Generalized Latent Trait Models. Psychometrika, 65:391–411.
Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods.Technical Report CRG-TR-93-1, University of Toronto.
Newton, M. and Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, 56:3–48.
Nott, D., Kohn, R., and Fielding, M. (2008). Approximating the marginal likelihood using copula. arXiv:0810.5474v1. Available at
http://arxiv.org/abs/0810.5474v1
Ntzoufras, I., Dellaportas, P., and Forster, J. (2000). Bayesian variable and link determination for Generalised Linear Models. Journal of Statistical Planning
and Inference,111(1-2):165–180.
Patz, R. J. and Junker, B. W. (1999b). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational
and Behavioral Statistics, 24(2):146–178.
Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models
with nested random effects. Journal of Econometrics, 128:301–323.
Raftery, A. and Banleld, J. (1991). Stopping the Gibbs sampler, the use of morphology, and other issues in spatial statistics. Annals of the Institute of
Statistical Mathematics, 43(430):32–43.
Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Paedagogiske Institut, Copenhagen.
Renyi, A. (1961). On measures of entropy and information. In Proceedings of the 4 th Berkeley Symposium on Mathematics, Statistics and Probability, pages
547–561.
Tsallis et al., Nonextensive Statistical Mechanics and Its Applications, edited by S.Abe and Y. Okamoto (Springer-Verlag, Heidelberg, 2001); see also the
comprehensive list of references at http://tsallis.cat.cbpf.br/biblio.htm.
Vitoratou, S., Ntzoufras, I., and Moustaki, I. (2013). Marginal likelihood estimation from the Metropolis output: tips and tricks for efficient implementation in
generalized linear latent variable models. To appear in: Journal of Statistical Computation and Simulation.
Xie, W., Lewis, P., Fan, Y., Kuo, L., and Chen, M. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic
Biology, 60(2):150–160.

This thesis is dedicated to

52

Weitere ähnliche Inhalte

Was ist angesagt?

Final generalized linear modeling by idrees waris iugc
Final generalized linear modeling by idrees waris iugcFinal generalized linear modeling by idrees waris iugc
Final generalized linear modeling by idrees waris iugcId'rees Waris
 
Generalized Linear Models
Generalized Linear ModelsGeneralized Linear Models
Generalized Linear ModelsAvinash Chamwad
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
 
Structural Dynamic Reanalysis of Beam Elements Using Regression Method
Structural Dynamic Reanalysis of Beam Elements Using Regression MethodStructural Dynamic Reanalysis of Beam Elements Using Regression Method
Structural Dynamic Reanalysis of Beam Elements Using Regression MethodIOSR Journals
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONijnlc
 
Use of eigenvalues and eigenvectors to analyze bipartivity of network graphs
Use of eigenvalues and eigenvectors to analyze bipartivity of network graphsUse of eigenvalues and eigenvectors to analyze bipartivity of network graphs
Use of eigenvalues and eigenvectors to analyze bipartivity of network graphscsandit
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5Sunwoo Kim
 
Capital market applications of neural networks etc
Capital market applications of neural networks etcCapital market applications of neural networks etc
Capital market applications of neural networks etc23tino
 
FUZZY CONTROL OF A SERVOMECHANISM: PRACTICAL APPROACH USING MAMDANI AND TAKAG...
FUZZY CONTROL OF A SERVOMECHANISM: PRACTICAL APPROACH USING MAMDANI AND TAKAG...FUZZY CONTROL OF A SERVOMECHANISM: PRACTICAL APPROACH USING MAMDANI AND TAKAG...
FUZZY CONTROL OF A SERVOMECHANISM: PRACTICAL APPROACH USING MAMDANI AND TAKAG...ijfls
 
Novel algorithms for detection of unknown chemical molecules with specific bi...
Novel algorithms for detection of unknown chemical molecules with specific bi...Novel algorithms for detection of unknown chemical molecules with specific bi...
Novel algorithms for detection of unknown chemical molecules with specific bi...Aboul Ella Hassanien
 
Morse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingMorse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingColleen Farrelly
 
PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9Sunwoo Kim
 
Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Rakibul Hasan Pranto
 
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...IJECEIAES
 
PRML Chapter 8
PRML Chapter 8PRML Chapter 8
PRML Chapter 8Sunwoo Kim
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4Sunwoo Kim
 

Was ist angesagt? (20)

Final generalized linear modeling by idrees waris iugc
Final generalized linear modeling by idrees waris iugcFinal generalized linear modeling by idrees waris iugc
Final generalized linear modeling by idrees waris iugc
 
Generalized Linear Models
Generalized Linear ModelsGeneralized Linear Models
Generalized Linear Models
 
10.1.1.34.7361
10.1.1.34.736110.1.1.34.7361
10.1.1.34.7361
 
C1802041824
C1802041824C1802041824
C1802041824
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
 
Structural Dynamic Reanalysis of Beam Elements Using Regression Method
Structural Dynamic Reanalysis of Beam Elements Using Regression MethodStructural Dynamic Reanalysis of Beam Elements Using Regression Method
Structural Dynamic Reanalysis of Beam Elements Using Regression Method
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
 
A0310112
A0310112A0310112
A0310112
 
Use of eigenvalues and eigenvectors to analyze bipartivity of network graphs
Use of eigenvalues and eigenvectors to analyze bipartivity of network graphsUse of eigenvalues and eigenvectors to analyze bipartivity of network graphs
Use of eigenvalues and eigenvectors to analyze bipartivity of network graphs
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
 
Capital market applications of neural networks etc
Capital market applications of neural networks etcCapital market applications of neural networks etc
Capital market applications of neural networks etc
 
FUZZY CONTROL OF A SERVOMECHANISM: PRACTICAL APPROACH USING MAMDANI AND TAKAG...
FUZZY CONTROL OF A SERVOMECHANISM: PRACTICAL APPROACH USING MAMDANI AND TAKAG...FUZZY CONTROL OF A SERVOMECHANISM: PRACTICAL APPROACH USING MAMDANI AND TAKAG...
FUZZY CONTROL OF A SERVOMECHANISM: PRACTICAL APPROACH USING MAMDANI AND TAKAG...
 
Matters of State
Matters of StateMatters of State
Matters of State
 
Novel algorithms for detection of unknown chemical molecules with specific bi...
Novel algorithms for detection of unknown chemical molecules with specific bi...Novel algorithms for detection of unknown chemical molecules with specific bi...
Novel algorithms for detection of unknown chemical molecules with specific bi...
 
Morse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingMorse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk Modeling
 
PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9
 
Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019
 
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
 
PRML Chapter 8
PRML Chapter 8PRML Chapter 8
PRML Chapter 8
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4
 

Andere mochten auch

An Overview of ROC Curves in SAS PROC LOGISTIC
An Overview of ROC Curves in SAS PROC LOGISTICAn Overview of ROC Curves in SAS PROC LOGISTIC
An Overview of ROC Curves in SAS PROC LOGISTICQuanticate
 
Quantitative methods
Quantitative methodsQuantitative methods
Quantitative methodsHimanshu Shah
 
MLPI Lecture 2: Monte Carlo Methods (Basics)
MLPI Lecture 2: Monte Carlo Methods (Basics)MLPI Lecture 2: Monte Carlo Methods (Basics)
MLPI Lecture 2: Monte Carlo Methods (Basics)Dahua Lin
 
Anderson-Darling Test and ROC Curve
Anderson-Darling Test and ROC CurveAnderson-Darling Test and ROC Curve
Anderson-Darling Test and ROC CurveKavi
 
DV Analytics and SAS Training in Bangalore
DV Analytics and SAS Training in BangaloreDV Analytics and SAS Training in Bangalore
DV Analytics and SAS Training in BangaloreDV Analytics
 
Improving Effeciency with Options in SAS
Improving Effeciency with Options in SASImproving Effeciency with Options in SAS
Improving Effeciency with Options in SASguest2160992
 
General Introduction to ROC Curves
General Introduction to ROC CurvesGeneral Introduction to ROC Curves
General Introduction to ROC CurvesAustin Powell
 
Quantitative method intro variable_levels_measurement
Quantitative method intro variable_levels_measurementQuantitative method intro variable_levels_measurement
Quantitative method intro variable_levels_measurementKeiko Ono
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questionsDr P Deepak
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 
How to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveHow to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveSamir Haffar
 
Kendall coefficient of concordance
Kendall coefficient of concordance Kendall coefficient of concordance
Kendall coefficient of concordance Zha Jie
 

Andere mochten auch (20)

An Overview of ROC Curves in SAS PROC LOGISTIC
An Overview of ROC Curves in SAS PROC LOGISTICAn Overview of ROC Curves in SAS PROC LOGISTIC
An Overview of ROC Curves in SAS PROC LOGISTIC
 
Quantitative methods
Quantitative methodsQuantitative methods
Quantitative methods
 
MLPI Lecture 2: Monte Carlo Methods (Basics)
MLPI Lecture 2: Monte Carlo Methods (Basics)MLPI Lecture 2: Monte Carlo Methods (Basics)
MLPI Lecture 2: Monte Carlo Methods (Basics)
 
Anderson-Darling Test and ROC Curve
Anderson-Darling Test and ROC CurveAnderson-Darling Test and ROC Curve
Anderson-Darling Test and ROC Curve
 
Variance reduction techniques (vrt)
Variance reduction techniques (vrt)Variance reduction techniques (vrt)
Variance reduction techniques (vrt)
 
DV Analytics and SAS Training in Bangalore
DV Analytics and SAS Training in BangaloreDV Analytics and SAS Training in Bangalore
DV Analytics and SAS Training in Bangalore
 
Non Parametric Statistics
Non Parametric StatisticsNon Parametric Statistics
Non Parametric Statistics
 
linear classification
linear classificationlinear classification
linear classification
 
Improving Effeciency with Options in SAS
Improving Effeciency with Options in SASImproving Effeciency with Options in SAS
Improving Effeciency with Options in SAS
 
Excel/R
Excel/RExcel/R
Excel/R
 
General Introduction to ROC Curves
General Introduction to ROC CurvesGeneral Introduction to ROC Curves
General Introduction to ROC Curves
 
Quantitative method intro variable_levels_measurement
Quantitative method intro variable_levels_measurementQuantitative method intro variable_levels_measurement
Quantitative method intro variable_levels_measurement
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questions
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
Statistical Distributions
Statistical DistributionsStatistical Distributions
Statistical Distributions
 
Data analysis Design Document
Data analysis Design DocumentData analysis Design Document
Data analysis Design Document
 
How to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveHow to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curve
 
Kendall coefficient of concordance
Kendall coefficient of concordance Kendall coefficient of concordance
Kendall coefficient of concordance
 

Ähnlich wie Viva extented final

A researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docxA researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docxevonnehoggarth79783
 
Samplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdfSamplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdfFadel Adoe
 
Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
 
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...Zac Darcy
 
Prob and statistics models for outlier detection
Prob and statistics models for outlier detectionProb and statistics models for outlier detection
Prob and statistics models for outlier detectionTrilochan Panigrahi
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchAdrian Olszewski
 
Matt Fagan MSM Analysis
Matt Fagan MSM AnalysisMatt Fagan MSM Analysis
Matt Fagan MSM AnalysisMatt Fagan
 
1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempola1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempolaRising Media, Inc.
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfAlemAyahu
 
Advanced microeconometric project
Advanced microeconometric projectAdvanced microeconometric project
Advanced microeconometric projectLaurentCyrus
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelSayed Abulhasan Quadri
 

Ähnlich wie Viva extented final (20)

A researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docxA researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docx
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Samplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdfSamplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdf
 
Afta2.pptx
Afta2.pptxAfta2.pptx
Afta2.pptx
 
gamdependence_revision1
gamdependence_revision1gamdependence_revision1
gamdependence_revision1
 
Recommender system
Recommender systemRecommender system
Recommender system
 
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
 
Panel Data Models
Panel Data ModelsPanel Data Models
Panel Data Models
 
Prob and statistics models for outlier detection
Prob and statistics models for outlier detectionProb and statistics models for outlier detection
Prob and statistics models for outlier detection
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental research
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
Matt Fagan MSM Analysis
Matt Fagan MSM AnalysisMatt Fagan MSM Analysis
Matt Fagan MSM Analysis
 
1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempola1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempola
 
Report
ReportReport
Report
 
Types of models
Types of modelsTypes of models
Types of models
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
 
Advanced microeconometric project
Advanced microeconometric projectAdvanced microeconometric project
Advanced microeconometric project
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis Model
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Viva extented final

  • 1. ATHENS UNIVERSITY OF ECONOMICS AND BUSINESS DEPARTMENT OF STATISTICS Efficient Bayesian Marginal Likelihood estimation in Generalised Linear Latent Variable Models thesis submitted by Silia Vitoratou advisors Ioannis Ntzoufras Irini Moustaki Athens, 2013
  • 2. Overview Thesis structure Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 2
  • 3. Chapter 1 Key ideas and origins of the latent variable models (LVM). “...co-relation must be the consequence of the variations of the two organs being partly due to common causes ...“ Francis Galton, 1888. • Suppose we want to infer for concepts that cannot be measured directly (such as emotions, attitudes, perceptions, proficiency etc). • We assume that they can be measured indirectly through other observed items. • The key idea is that all dependencies among p-manifest variables (observed items) are attributed to k-latent (unobserved) ones. • By principle, k << p. Hence, at the same time, the LVM methodology is a multivariate analysis technique which aims to reduce the dimensionality, with as little loss of information as possible. 3
  • 4. Chapter 1 A unified approach: Generalised linear latent variable models (GLLVM). Generalized linear latent variable model (GLLVM; Bartholomew &Knott, 1999; Skrondal and Rabe-Hesketh, 2004) . The models assumes that the response variables are linear combinations of the latent ones and it consists of three components: (a) the multivariate random component: where each observed item Yj, (j = 1, ..., p) has a distribution from the exponential family (Bernoulli, Multinomial, Normal, Gamma), (b) the systematic component: where the latent variables Zℓ, ℓ = 1, ..., k, produce the linear predictor ηj for each Yj (c) the link function : which connects the previous two components 4
  • 5. Chapter 1 A unified approach: Generalised linear latent variable models (GLLVM). Special case: Generalized linear latent trait model- with binary items (Moustaki &Knott, 2000) . The conditionals are in this case Bernoulli( ), where is the conditional probability of a positive response to the observed item. The logistic model is used for the response probabilities: • The item parameters are often referred to as the difficulty and the discrimination parameters (respectively) of the item j. All examples considered in this thesis refer to multivariate IRT (2-PL) models. Current findings apply directly or can be expanded to any type of GLLVM. 5
  • 6. Chapter 1 A unified approach: Generalised linear latent variable models (GLLVM). As only the p-items can be observed, any inference must be based on their joint distribution. All data dependencies are attributed to the existence of the latent variables. Hence, the observed variables are assumed independent given the latent (local independence assumption) : where is the prior distribution for the latent variables. A fully Bayesian approach requires that the item parameter vector is also stochastic, associated with a prior probability. 6
  • 7. Chapter 2 The fully Bayesian analogue: GLLTM with binary items A) Priors All model parameters are assumed a-priori independent where Prior from Ntzoufras et al. (2000) Fouskakis et al. (2009) leading to For unique solution we use the Cholesky decomposition on B: 7
  • 8. Chapter 2 The fully Bayesian analogue: GLLTM with binary items B) Sampling from the posterior • A Metropolis-within-Gibbs algorithm initially presented for IRT models by Patz and Junker (1996) was used here for the multivariate case (k>1). • Each item is updated in one block. So are the latent variables for each person. C) Model evaluation • In this thesis, the Bayes Factor (BF; Jeffreys, 1961; Kass and Raftery, 1995) was used for model comparison. • The BF is defined as the ratio of the posterior odds of two competing models (say m1 and m2) multiplied by their corresponding prior odds. Provided that the models have equal prior probabilities, is given by: that is, the ratio of the two models’ marginal or integrated likelihoods (hereafter Bayesian marginal likelihood; BML). 8
  • 9. Chapter 2 Estimating the Bayesian marginal likelihood The BML (also known as the prior predictive distribution) is defined as the expected model likelihood over the model parameters’ prior: that quite often is a high dimensional integral, not available in closed form. Monte Carlo integration is often used to estimate it, as for instance the arithmetic mean: This simple estimator does not really work adequately and a plethora of Markov chains Monte Carlo (MCMC) techniques are employed instead in the literature. 9
  • 10. Chapter 2 Estimating the Bayesian marginal likelihood  The point based estimators (PBE) employ the candidates’ identity (Besag, 1989), in a point of high density: • Laplace-Metropolis (LM; Lewis & Raftery, 1997) • Gaussian copula (GC; Nott et al, 2008) • Chib & Jeliazkov (CJ; Chib & Jeliazkov, 2001)  The bridge sampling estimators (BSE), employ a bridge function , based on the form of which, several BML identities can be derived (even pre– existing): • Harmonic mean (HM; Newton & Raftery, 1994) • Reciprocal mean (RM; Gelfand & Dey, 1994) • Bridge harmonic (BH; Meng & Wong, 1996) • Bridge geometric (BG; Meng & Wong, 1996)  The path sampling estimators (PSE), employ a continuous and differential path , to link two un-normalised densities and compute the ratio of the corresponding constants: • Power posteriors (PPT; Friel & Pettitt, 2008; Lartillot &Philippe, 2006) • Steppingstone (PPS ; Xie at al, 2011) • Generalised steppingstone (IPS; Fan et al, 2011) 10
  • 11. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration: the case of GLLVM From the early readings the methods applied for the parameter estimation of model settings with latent variables relied on the joint likelihood Lord and Novick, 1968; Lord,1980 or the marginal likelihood Bock and Aitkin, 1981; Moustaki and Knott, 2000 Under the conditional independence assumptions of the GLLVMs, there are two equivalent formulations of the BML, which lead to different MC estimators, namely the joint BML and the marginal BML 11
  • 12. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration: the case of GLLVM A motivating example A simulated data set with p = 6 items, N = 600 cases and k = 2 factors was considered. Three popular BSE were computed under both approaches (R= 50,000 posterior observations , after burn in period of 10,000 and thinning interval of 10). • BH: Largest error difference but rather close estimation... • BG: Largest difference in the estimation without large error difference... Differences are due to Monte Carlo integration, under independence assumptions 12
  • 13. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration: the case of GLLVM The joint version of BH comes with much higher MCE than the RM... ...but is the joint version of RM that fails to converge to the true value. ? 13
  • 14. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration under independence • Consider any integral of the form: • The corresponding MC estimator is: assuming a random sample of points drawn from h • The corresponding Monte Carlo Error (MCE) is: • Assume independence, that is, hence 14
  • 15. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration under independence The two estimators are associated with different MCEs. Based on the early results of Goodman (1962), for the variance of N independent variables, the variances of the estimators are: for each term In finite settings, the difference can be outstanding 15
  • 16. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration under independence In particular, the difference in the variances is given by Naturally, it depends on R. Note however that also it depends on • dimensionality (N), since more positive terms are added, and • on the means and variances of the N variables involved At the same time, the difference in the means is given by • Total covariation index (multivariate extension of the covariance). • Under independence the index should be zero (the reverse statement does not hold) • At the sample, the covariances, no matter how small, are non-zero leading to non zero TCI. •Depends also on the number of the variables (N), their means, and their variation through the covariances 16
  • 17. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration: the case of GLLVM A motivating example-Revisited Different variables are being averaged, leading to different variance components Total covariance cancels out for the BH. 17
  • 18. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration & independence Refer to Chapter 3 of the current thesis for: • more results on the error difference, • properties of the TCI, • extension to conditional independence, • and more illustrative examples. 18
  • 19. Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models Basic idea Based on the work of Chib & Jeliazkov(2001), it is shown in Chapter 3 that the Metropolis kernel can be used to marginalise out any subset of the parameter vector, that otherwise would not be feasible. • Consider the kernel of the Metropolis – Hastings algorithm, which denotes the transition probability of sampling , given that has been already generated: Transition probability Acceptance probability Proposal density • Then, the latent vector can be marginalised out directly from the Metropolis kernel as follows: 19
  • 20. Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models Chib & Jeliazkov estimator Let us suppose that the parameter space is divided into p blocks of parameters. Then, using the Law of total probability, the posterior at a specific point can be decomposed to • If analytically available use candidates’ (Besag, 1989) formula to compute the BML directly. • If the full conditionals are known, Chib (1995) uses the output from the Gibbs sampler to estimate them. • Otherwise Chib and Jeliazkov (2001) show that each posterior ordinate can be computed by Requires p sequential MCMC runs. 20
  • 21. Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models Chib & Jeliazkov estimator for models with latent vectors The number of latent variables can be hundreds if not thousands. Hence the method is time consuming. Chib & Jeliazkov suggest to use the last ordinate to marginalise out the latent vector, provided that is analytically tractable (often it is not). In Chapter 4 of the thesis, it is shown that the latent vector can be marginalised out directly from the MH kernel, as follows: Hence the dimension of the latent vector is not an issue. This observation however leads to another result. Assuming local independence, prior independence and a Metropolis - within – Gibbs algorithm, as in the case of the GLLVM, the Chib & Jeliazkov identity is drastically simplified as follows: Hence the number of blocks , also, is not an issue. • The latent vector is marginalised out as previously. • Moreover, even there are p-blocks for the model parameters, only the full MCMC is required. • Can be used under data augmentations schemes that produce independence 21
  • 22. Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models Independence Chib & Jeliazkov estimator Three simulated data sets – under different scenarios. Compare CJI with ML estimators. Rtotal 1st batch 30 batches 1000 2000 3000 iterations per batch 22
  • 23. Chapter 6 Implementation in simulated and real life datasets Some results •p =6 items, •N=600 individuals, •k=1 factor kmodel = ktrue 23
  • 24. Chapter 6 Implementation in simulated and real life datasets Some results •p =6 items, •N=600 individuals, •k=2 factors kmodel = ktrue 24
  • 25. Chapter 6 Implementation in simulated and real life datasets Some results •p =8 items, •N=700 individuals, •k=3 factor kmodel = ktrue 25
  • 26. Chapter 6 Implementation in simulated and real life datasets Some results •p =6 items, •N=600 individuals, •k=1 factor kmodel <ktrue 26
  • 27. Chapter 6 Implementation in simulated and real life datasets Some results •p =6 items, •N=600 individuals, •k=2 factors kmodel >ktrue 27
  • 28. Chapter 6 Implementation in simulated and real life datasets Concluding comments  Refer to Chapter 4 of the current thesis for more details on the implementation of the CJI (or see Vitoratou et al, 2013) : More comparisons are presented in Chapter 6 of the thesis, in simulated and real data sets. Some comments: • The harmonic mean failed in all cases. • The BSE were successful in all examples. o The BG estimator was consistently associated with the smallest error. o The RM was also well behaved in all cases. o The BH was associated with more error that the former two BSE. • The PBE are well behaved: o LM is very quick and efficient – but might fail if the posterior is not symmetrical. o Similarly for the GC. o CJI is well behaved but time consuming. Since it is distributional free, can be used as a benchmark method to get an idea of the BML. 28
  • 29. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics and Bayes Ideas initially implemented in thermodynamics are currently explored in Bayesian model evaluation. Assume two unnormalised densities (q1 and q0) and we are interested in the ratio of their normalising constants (λ). For that purpose we use a continuous and differential function of the form geometric path which links the endpoint densities temperature parameter Boltzmann-Gibbs distribution Partition function Then the ratio λ can be computed via the thermodynamic integration identity (TI): Bayes free energy 29
  • 30. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics and BML: Power Posteriors The first application of the TI to the problem of estimating the BML is the power posteriors (PP) method (Friel and Pettitt, 2008; Lartillot and Philippe, 2006). Let then prior-posterior path power posterior leading via the thermodynamic integration to the Bayesian marginal likelihood For ts close to 0 we sample from densities close to the prior, where the variability is typically high. 30
  • 31. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics and BML: Importance Posteriors Lefebvre et al. (2010) considered other options than the prior for the zero endpoint, keeping the unnormalised posterior at the unit endpoint. Any proper density g() will do: An appealing option is to use an importance (envelope) function, that is a density as close as possible to the posterior). importance-posterior path importance posterior For ts close to 0 we sample from densities close to the importance function, solving the problem of high variability. 31
  • 32. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison An alternative approach: stepping-stone identities Xie et al (2011) using the prior and the posterior as endpoint densities, considered a different approach to compute the BMI, also related to thermodynamics (Neal, 1993). First, the interval [0,1] is partitioned into n points and the free energy can be computed as: Stepping stone • Under the power posteriors path, Xie et al (2011) showed that the BML occurs as: • Under the importance posteriors path, Fan et al (2011) showed that the BML occurs as: However, the stepping–stone identity (SI) is even more general and can be used under different paths, as an alternative to the TI: 32
  • 33. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Path sampling identities for the BML- revisited Hence, there are two general identities to compute a ratio of normalising constants, within the path sampling framework, namely Different paths lead to different expressions for the BML: Identity for the BML path TI SI Prior posterior Power posteriors (PPT) Stepping-stone (PPS) Importance posterior Importance posteriors (IPT) Generalised stepping stone (IPS) Friel and Pettitt, 2008 Lartillot and Philippe, 2006 inspired by Lefebvre et al. (2010) Xie et al (2011) Fan et al (2011) Other paths can be used, under both approaches, to derive identities for the BML or any other ratio of normalising constants. Hereafter, the identities with be named by the path employed, with a subscript denoting the method implemented, e.g. IPS 33
  • 34. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics & direct BF identities: Model switching Lartillot and Philippe (2006) considered as endpoint densities the unormalised posteriors of two competing models: leading to the model switching path leading via the thermodynamic integration to the Bayes Factor bidirectional melting-annealing sampling scheme. While it is easy to derive the SI counterpart expression: 34
  • 35. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics & direct BF identities: Quadrivials Based on the idea of Lartillot and Philippe (2006) we may proceed with the compound paths. which consist of • a hyper, geometric path which links two competing models, and • a nested, geometric path for each endpoint function Qi , i=0,1. The two intersecting paths form a quadrivial Which can be used either with the TI or the SI approach. If the ratio of interest is the BF, the two BMLs should be derived at the endpoints of [0,1]. The PP and the IP paths are natural choices for the nested part of the identity. For the latter 35
  • 36. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Sources of error in path sampling estimators a) The integral over [0,1] in the TI is typically approximated via numerical approaches, such as the trapezoidal or Simpson’s rule (Neal, 1993; Gelman and Meng, 1998), which require an n-point discretisation of [0,1]: Note that the temperature schedule is also required for the SI method (it defines the stepping stone ratios) . The discretisation introduces error to the TI and SI estimators, that is referred to as the discretisation error. It can be reduced by a) increasing the number of points n and/or b) by assigning more points closer to the endpoint that is associated higher variability. b) At each point the corresponding , a separate MCMC run is performed with target distribution . Hence, Monte Carlo error occurs also at each run. c) As a third source of error can be considered also the path-related error. We may gain insight into a) and c) by considering the measures of entropy related to the TI. 36
  • 37. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Performance: Pine data-a simple regression example Measurements taken on 42 specimens. A linear regression model was fitted for the specimen’s maximum compressive strength (y), using their density (x) as independent variable: The objective in this example is to illustrate how each method and path combination responds to prior uncertainty. To do so, we use three different prior schemes, namely: The ratios of the corresponding BMLs under the three priors were estimated over n1 = 50 and n2 = 100 evenly spaced temperatures. At each temperature, a Gibbs algorithm was implemented and 30,000 posterior observations were generated; after discarding 5,000 as a burn-in period. 37
  • 38. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Performance: Pine data-a simple regression example Implementing a uniform temperature schedule: Reflects difference in the path-related error Reflects difference in the discretisation error All quadrivals come with smaller batch mean error Note: PP works just fine under a geometric temperature schedule that samples more points from the prior. 38
  • 39. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Based on the prior-posterior path, Friel and Pettitt (2008) and Lefebvre et al. (2010) showed that the PP method is connected with the Kullback – Leibler diveregence (KL; Kullback & Leibler, 1951). Relative entropy Differential entropy Cross entropy Here we present their findings on a general form, that is, for any geometric path according to the TI it holds that symmetrised KL 39
  • 40. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Graphical representation of the TI What about the intermediate points? 40
  • 41. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies TI minus free energy at each point Instead of integrating the mean energy over the entire interval [0,1], there is an optimal temperature, where the mean energy equals the free energy. 41
  • 42. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Graphical representation of the NTI functional KL difference in the KL-distance of the sampling distribution pt from p1 and p0 The ratio of interest occurs at the point where the sampling distribution is equidistant from the endpoint densities 42
  • 43. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies The normalised thermodynamic integral Hence: •According to the PPT method, the BML occurs at the point where the sampling distribution is equidistant from the prior and the posterior. •According to the QMST method, the BF occurs at the point where the sampling distribution is equidistant from the two posteriors. The sampling distribution pt is the Boltzmann-Gibbs distribution pertaining to the Hamiltonian (energy function) . Therefore •according to the NTI, when geometric paths are employed, the free energy occurs at the point where the Boltzmann-Gibbs distribution is equidistant from the distributions at the endpoint states. 43
  • 44. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Graphical representation of the NTI What are the areas stand for? 44
  • 45. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies The normalised thermodynamic integral and probability distribution divergencies A key observation here is that the sampling distribution embodies the Chernoff coefficient (Chernoff, 1952) : Based on that, the NTI can be written as: meaning that and therefore, the areas correspond to the Chernoff t-divergence. At t=t*, we obtain the so-called Chernoff information: 45
  • 46. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Using the output from path sampling, the Chernoff divergence can be computed easily (see Chapter 5 of the thesis for a step-by step algorithm). Along with the Chernoff estimation, a number of other f-divergencies can be directly estimated, namely • the Bhattacharyya distance (Bhattacharyya, 1943) at t = 0.5, • the Hellinger distance (Bhattacharyya, 1943; Hellinger, 1909), • the Rényi t-divergence (Rényi, 1961) and • the Tsallis t-relative entropy (Tsallis, 2001) . These measures of entropy are commonly used in • information theory, pattern recognition, cryptography, machine learning, • hypothesis testing • and recently, in non-equilibrium thermodynamics. 46
  • 47. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Measures of entropy and the NTI 47
  • 48. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Path selection, temperature schedule and error. These results provide insight also on the error of the path sampling estimators. To begin with Lefebre et al (2010) have showed that the total variance is associated with the J−divergence of the endpoint densities and therefore with the choice of the path. Graphically • the J-distance coincides with the slope of the secant defined at the endpoint densities. The shape of the curve is a graphical representation of the total variance. • the slope of the tangent at a particular point ti, coincides with the local variance Higher local variances, at the points where the curve is steeper. • the graphical representation of two competing paths provides information about the estimators’ variances. Paths with smaller cliffs are easier to take! 48
  • 49. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Path selection, temperature schedule and error. Numerical approximation of the TI: Assign more tis at points where the curve is steeper (higher local variances) Different level of accuracy towards the two endpoints The discretization error depends primarily on the path 49
  • 50. Future work Currently developing a library in R for BML estimation in GLLTM with Danny Arends. Expand results (and R library) to account for other type of data. Further study on the TCI (Chapter 3). Use the ideas in Chapter 4 to construct a better Metropolis algorithm for GLLVMs. Proceed further on the ideas presented in Chapter 5, with regard to the quadrivials, the temperature schedule and the optimal t*. Explore applications to information criteria. 50
  • 51. Bibliography Bartholomew, D. and Knott, M. (1999). Latent variable models and factor analysis. Kendall’s Library of Statistics, 7. Wiley. Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society, 35:99–109. Besag, J. (1989). A candidate’s formula: A curious result in Bayesian prediction. Biometrika, 76:183. Bock, R. and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46:443–459. Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4). Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90:1313–1321. Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96:270–281. Fan, Y., Wu, R., Chen, M., Kuo, L., and Lewis, P. (2011). Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution, 28(2):523–532. Fouskakis, D., Ntzoufras, I., and Draper, D. (2009). Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of healthcare. Annals of Applied Statistics, 3:663–690. Friel, N. and Pettitt, N. (2008). Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society Series B (Statistical Methodology), 70(3):589–607. Gelfand, A. E. and Dey, D. K. (1994). Bayesian Model Choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological), 56(3):501–514. Gelman, A. and Meng, X. (1998). Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science, 13(2):163–185. Goodman, L. A. (1962). The variance of the product of K random variables. Journal of the American Statistical Association, 57:54–60. Hellinger, E. (1909). Neue Begr¨undung der Theorie quadratischer Formen von unendlichvielen Veranderlichen. Journal fddotur die reine und angewandte Mathematik, 136:210–271. Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186(1007):453–461. Kass, R. and Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:773–795. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22:49–86. Lewis, S. and Raftery, A. (1997). Estimating Bayes factors via posterior simulation with the Laplace Metropolis estimator. Journal of the American Statistical Association, 92:648–655. Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using Thermodynamic Integration. Systematic Biology, 55:195–207. Lefebvre, G., Steele, R., and Vandal, A. C. (2010). A path sampling identity for computing the Kullback-Leibler and J divergences. Computational Statistics and Data Analysis, 54(7):1719–1731. Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems.Erlbaum Associates, Hillsdale, NJ. Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley, Oxford, UK 51
  • 52. Meng, X.-L. and Wong, W.-H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6:831–860. Moustaki, I. and Knott, M. (2000). Generalized Latent Trait Models. Psychometrika, 65:391–411. Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods.Technical Report CRG-TR-93-1, University of Toronto. Newton, M. and Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, 56:3–48. Nott, D., Kohn, R., and Fielding, M. (2008). Approximating the marginal likelihood using copula. arXiv:0810.5474v1. Available at http://arxiv.org/abs/0810.5474v1 Ntzoufras, I., Dellaportas, P., and Forster, J. (2000). Bayesian variable and link determination for Generalised Linear Models. Journal of Statistical Planning and Inference,111(1-2):165–180. Patz, R. J. and Junker, B. W. (1999b). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2):146–178. Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128:301–323. Raftery, A. and Banleld, J. (1991). Stopping the Gibbs sampler, the use of morphology, and other issues in spatial statistics. Annals of the Institute of Statistical Mathematics, 43(430):32–43. Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Paedagogiske Institut, Copenhagen. Renyi, A. (1961). On measures of entropy and information. In Proceedings of the 4 th Berkeley Symposium on Mathematics, Statistics and Probability, pages 547–561. Tsallis et al., Nonextensive Statistical Mechanics and Its Applications, edited by S.Abe and Y. Okamoto (Springer-Verlag, Heidelberg, 2001); see also the comprehensive list of references at http://tsallis.cat.cbpf.br/biblio.htm. Vitoratou, S., Ntzoufras, I., and Moustaki, I. (2013). Marginal likelihood estimation from the Metropolis output: tips and tricks for efficient implementation in generalized linear latent variable models. To appear in: Journal of Statistical Computation and Simulation. Xie, W., Lewis, P., Fan, Y., Kuo, L., and Chen, M. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology, 60(2):150–160. This thesis is dedicated to 52