SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
QMC and thinning for empirical datasets
Mike Giles, Fei Xie
Mathematical Institute, University of Oxford
SAMSI Opening Workshop on QMC and High-Dimensional
Sampling Methods for Applied Mathematics
August 28 – Sept 1, 2017
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 1 / 26
MLMC Motivation
In a MLMC nested expectation application (EVPPI), a new challenge
is that some of the random inputs come from a Bayesian posterior
distribution, with samples generated by MCMC methods.
x1 ∼ N(0, 1), x2 ∼ N(0, 1)
MCMC sampler: (y1, y2)
MCMC sampler: (z1, z2)
−→
−→
−→
fd (X, Y , Z) −→ fd
The standard procedure (?) would be to generate MCMC samples on
demand, and then use them immediately.
But how can this be extended to MLMC?
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 2 / 26
MCMC within MLMC
An alternative idea is to first pre-generate and store a large collection of
MCMC samples:
{Y1, Y2, Y3, . . . , YN}
Then, random samples for Y can be obtained from this collection by
uniformly distributed selection.
This avoids the problem of strong correlation between successive MCMC
samples, and also works fine for the MLMC calculation.
This approach also leads to ideas on QMC for empirical datasets, and
dataset thinning, reducing the number of samples which are stored.
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 3 / 26
Motivation
QMC methods generate quasi-uniformly distributed points in [0, 1]d ,
then map them to variables in some target domain X.
Requires a mapping f : [0, 1]d → X giving the desired distribution on X.
Question 1: what do we do if we want a uniform distribution on a large
empirical dataset X = {X0, X1, . . . , XN−1} ?
This question arose in a medical statistics application with some random
inputs coming from a Bayesian posterior distribution generated by MCMC.
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 4 / 26
Motivation
Our aim is to end up with the following mappings when N = 2dk
for some integer k:
[0, 1]d
−→ {0, 1, . . . 2k
−1}d
−→ {0, 1, . . . 2dk
−1} −→ X
first step is simply U → ⌊2k U⌋ applied to each dimension
other two steps are 1–1 giving uniform distribution on X
Question 2: if N is very large, can we thin it down to a subset of size
M ≪ N without introducing much error?
e.g. for 3D dataset, maybe generate 23×8 ≈ 16M samples, then thin it
down to perhaps 23×6 ≈ 260k samples
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 5 / 26
Recursive bisection
A simple idea: apply recursive bisection in alternating directions.
Step 1: sort points based on first coordinate to bisect into two halves
-4 -2 0 2 4
-4
-3
-2
-1
0
1
2
3
4
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 6 / 26
Recursive bisection
Step 2: within each half, sort points based on second coordinate to again
bisect
-4 -2 0 2 4
-4
-3
-2
-1
0
1
2
3
4
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 7 / 26
Recursive bisection
Step 3: in 2D, within each quarter, sort points based on first coordinate
-4 -2 0 2 4
-4
-3
-2
-1
0
1
2
3
4
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 8 / 26
Recursive bisection
Repeat until there is just one point in each subset
-4 -2 0 2 4
-4
-3
-2
-1
0
1
2
3
4
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 9 / 26
Recursive bisection
In this 2D example, we end up with 2k × 2k subsets (with one point each)
with the single entry for subset (i1, i2) stored at index I formed by
interleaving the bits:
i1 = 1 0 1
i2 = 0 1 0
=⇒ I = 1 0 0 1 1 0
This gives us our mapping: {0, 1, . . . 2k −1}2 −→ {0, 1, . . . 22d −1}
for the re-ordered data, but it is convenient to do a further re-ordering
so that we can access the same data through the simpler mapping
(i1, i2) −→ i1 + 2k i2
This all generalises naturally to higher dimensions.
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 10 / 26
QMC tests
We hope that this new approach applied to a dataset generated from
a known distribution is comparable to applying standard QMC to the
same distribution.
To test this, we consider the following 4 distributions:
2D independent Gaussians: X1 ∼ N(0, 1), X2 ∼ N(0, 1),
2D multivariate Gaussian: (X1, X2) ∼ N 0,
1 0
1/
√
2 1
2D distorted Gaussian: X1 = Z1, X2 = Z2+Z2
1 where
Z1 ∼ N(0, 1), Z2 ∼ N(0, 1),
double well distribution: X1 = Z1, X2 = Z2 where
U1 ∼ U(0, 1), Z2 ∼ N(0, 1),
and in each case use the test function f (x1, x2) = sin(x1+x2).
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 11 / 26
QMC tests
In each case, we
generate multiple random datasets of 220 points
for each, generate randomised QMC estimates using varying
numbers of Sobol points with digital scrambling
compare overall RMS error from 256 randomisations (of datasets and
scramblings) to that given by standard QMC and MC applied to same
distributions
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 12 / 26
QMC tests
Results for independent Gaussians
10 0
10 1
10 2
10 3
10 4
10 5
Number of Samples
10 -5
10 -4
10 -3
10 -2
10 -1
10 0
RootMeanSquaredError Independent Normals
std MC
std QMC
new
selection
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 13 / 26
QMC tests
Results for multivariate Gaussian
10 0
10 1
10 2
10 3
10 4
10 5
Number of Samples
10 -5
10 -4
10 -3
10 -2
10 -1
10 0
RootMeanSquaredError Correlated Normals
std MC
QMC+PCA
new
selection
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 14 / 26
QMC tests
Results for distorted Gaussian
10 0
10 1
10 2
10 3
10 4
10 5
Number of Samples
10 -4
10 -3
10 -2
10 -1
10 0
RootMeanSquaredError
Distorted Gaussian
std MC
std QMC
new
selection
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 15 / 26
QMC tests
Results for double well:
10 0
10 1
10 2
10 3
10 4
10 5
Number of Samples
10 -4
10 -3
10 -2
10 -1
10 0
RootMeanSquaredError
Double well
std MC
std QMC
new
selection
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 16 / 26
Thinning
The thinning idea builds on the same recursive bisection:
start with N = 2dk1 samples
recursively bisect into M = 2dk2 subsets, with k2 < k1
from each subset Sk, randomly select one point Y (j)
Note that the expectation over all random selections is unbiased, i.e.
E

 1
M
j
f (Y (j)
)

 =
1
M
j

M
N
X(j)∈Sk
f (X(j)
)


=
1
N
j
f (X(j)
)
Surprisingly (?), this is better in general (particularly in higher dimensions)
than using the mean of each subset, which gives a biased estimator.
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 17 / 26
Thinning
We are able to do some error analysis if we make the following assumption:
Assumption
The recursive bisection results in subsets Sk, k = 1, . . . , M in which
max
k
max
X(i),X(j)∈Sk
X(i)
− X(j)
= O(M−1/d
).
This seems reasonable, at least for distributions on bounded domains with
strictly positive densities, since there are O(M1/d ) blocks in each direction.
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 18 / 26
Thinning
Theorem
If the Assumption is satisfied, and the function f (x) is Lipschitz, with
|f (x)−f (y)| ≤ x−y , then the RMS thinning error is O(M−1/2−1/d ).
Proof: the error in subset Sk,
ek ≡ f (Y (k)
) −
M
N
X(j)∈Sk
f (X(j)
)
is zero mean, and because of the Assumption and the Lipschitz property
we have |ek | ≤ c M−1/d .
Hence, due to independence, the variance of the overall error is
V M−1
k
ek = M−2
k
V[ek ] ≤ M−1
c2
M−2/d
.
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 19 / 26
Thinning
A planar cut through a regular d-dimensional grid of blocks with N1/d in
each direction cuts at most d N(d−1)/d of the blocks. This inspires the
following theorem which considers functions with discontinuities.
Theorem
If the Assumption is satisfied, and the function f (x) is such that
|f (x)−f (y)| ≤ 1 for pairs (x, y) in O(N(d−1)/d ) of the subsets
|f (x)−f (y)| ≤ x−y for pairs (x, y) in the other subsets
then the RMS thinning error is O(M−1/2−1/2d ).
Proof: very similar to the previous one, with the dominant error
contribution from the first category of subsets.
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 20 / 26
Thinning
In both cases, the thinning error should be compared to the O(N−1/2)
sampling error arising from the original sample set being a collection of iid
samples from the true distribution.
Equating the two errors gives
M =
O(Nd/(d+2)), Lipschitz case
O(Nd/(d+1)), discontinuous case
This indicates the potential for significant savings in the size of M when
the dimension is low, but not so much in higher dimensions.
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 21 / 26
Thinning tests
10 1
10 2
10 3
10 4
10 5
10 6
Number of Samples
10 -5
10 -4
10 -3
10 -2
10 -1
RootMeanSquaredError
Independent Normals
original samples
thinning method 1
thinning method 2
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 22 / 26
Thinning tests
10 1
10 2
10 3
10 4
10 5
10 6
Number of Samples
10 -6
10 -5
10 -4
10 -3
10 -2
10 -1
RootMeanSquaredError
Correlated Normals
original samples
thinning method 1
thinning method 2
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 23 / 26
Thinning tests
10 1
10 2
10 3
10 4
10 5
10 6
Number of Samples
10 -5
10 -4
10 -3
10 -2
10 -1
RootMeanSquaredError
Distorted Gaussian
original samples
thinning method 1
thinning method 2
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 24 / 26
Thinning tests
10 1
10 2
10 3
10 4
10 5
10 6
Number of Samples
10 -5
10 -4
10 -3
10 -2
10 -1
10 0
RootMeanSquaredError
Double well
original samples
thinning method 1
thinning method 2
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 25 / 26
Conclusions
it seems the QMC approach is effective, at least in low dimensions,
so maybe a good way to apply QMC to samples generated by MCMC
the thinning also seems effective, particularly in low dimensions,
with its effectiveness depending on the smoothness of the integrand
Webpage: http://people.maths.ox.ac.uk/gilesm/slides.html
Email: mike.giles@maths.ox.ac.uk
Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 26 / 26

Weitere ähnliche Inhalte

Was ist angesagt?

Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT posterAlexander Litvinenko
 

Was ist angesagt? (20)

CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT poster
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 

Ähnlich wie Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, QMC and Thinning for Empirical Datasets - Mike Giles, Sep 1, 2017

A new solver for the ARZ traffic flow model on a junction
A new solver for the ARZ traffic flow model on a junctionA new solver for the ARZ traffic flow model on a junction
A new solver for the ARZ traffic flow model on a junctionGuillaume Costeseque
 
A multi-objective optimization framework for a second order traffic flow mode...
A multi-objective optimization framework for a second order traffic flow mode...A multi-objective optimization framework for a second order traffic flow mode...
A multi-objective optimization framework for a second order traffic flow mode...Guillaume Costeseque
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion PresentationMichael Hankin
 
Finance through algorithmic lens
Finance through algorithmic lensFinance through algorithmic lens
Finance through algorithmic lensMaxim Litvak
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
NIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningNIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningzukun
 
23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature GenerationAndres Mendez-Vazquez
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
Some sampling techniques for big data analysis
Some sampling techniques for big data analysisSome sampling techniques for big data analysis
Some sampling techniques for big data analysisJae-kwang Kim
 
Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Patrick Diehl
 
Low-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamicsLow-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamicsAlexander Litvinenko
 

Ähnlich wie Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, QMC and Thinning for Empirical Datasets - Mike Giles, Sep 1, 2017 (20)

A new solver for the ARZ traffic flow model on a junction
A new solver for the ARZ traffic flow model on a junctionA new solver for the ARZ traffic flow model on a junction
A new solver for the ARZ traffic flow model on a junction
 
A multi-objective optimization framework for a second order traffic flow mode...
A multi-objective optimization framework for a second order traffic flow mode...A multi-objective optimization framework for a second order traffic flow mode...
A multi-objective optimization framework for a second order traffic flow mode...
 
Clustering-beamer.pdf
Clustering-beamer.pdfClustering-beamer.pdf
Clustering-beamer.pdf
 
Lecture8 xing
Lecture8 xingLecture8 xing
Lecture8 xing
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion Presentation
 
Cluster
ClusterCluster
Cluster
 
Finance through algorithmic lens
Finance through algorithmic lensFinance through algorithmic lens
Finance through algorithmic lens
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
NIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningNIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learning
 
Clustering lect
Clustering lectClustering lect
Clustering lect
 
mlcourse.ai. Clustering
mlcourse.ai. Clusteringmlcourse.ai. Clustering
mlcourse.ai. Clustering
 
Bayes gauss
Bayes gaussBayes gauss
Bayes gauss
 
Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
 
23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature Generation
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Some sampling techniques for big data analysis
Some sampling techniques for big data analysisSome sampling techniques for big data analysis
Some sampling techniques for big data analysis
 
Subquad multi ff
Subquad multi ffSubquad multi ff
Subquad multi ff
 
Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...
 
Low-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamicsLow-rank response surface in numerical aerodynamics
Low-rank response surface in numerical aerodynamics
 

Mehr von The Statistical and Applied Mathematical Sciences Institute

Mehr von The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Kürzlich hochgeladen

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Kürzlich hochgeladen (20)

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, QMC and Thinning for Empirical Datasets - Mike Giles, Sep 1, 2017

  • 1. QMC and thinning for empirical datasets Mike Giles, Fei Xie Mathematical Institute, University of Oxford SAMSI Opening Workshop on QMC and High-Dimensional Sampling Methods for Applied Mathematics August 28 – Sept 1, 2017 Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 1 / 26
  • 2. MLMC Motivation In a MLMC nested expectation application (EVPPI), a new challenge is that some of the random inputs come from a Bayesian posterior distribution, with samples generated by MCMC methods. x1 ∼ N(0, 1), x2 ∼ N(0, 1) MCMC sampler: (y1, y2) MCMC sampler: (z1, z2) −→ −→ −→ fd (X, Y , Z) −→ fd The standard procedure (?) would be to generate MCMC samples on demand, and then use them immediately. But how can this be extended to MLMC? Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 2 / 26
  • 3. MCMC within MLMC An alternative idea is to first pre-generate and store a large collection of MCMC samples: {Y1, Y2, Y3, . . . , YN} Then, random samples for Y can be obtained from this collection by uniformly distributed selection. This avoids the problem of strong correlation between successive MCMC samples, and also works fine for the MLMC calculation. This approach also leads to ideas on QMC for empirical datasets, and dataset thinning, reducing the number of samples which are stored. Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 3 / 26
  • 4. Motivation QMC methods generate quasi-uniformly distributed points in [0, 1]d , then map them to variables in some target domain X. Requires a mapping f : [0, 1]d → X giving the desired distribution on X. Question 1: what do we do if we want a uniform distribution on a large empirical dataset X = {X0, X1, . . . , XN−1} ? This question arose in a medical statistics application with some random inputs coming from a Bayesian posterior distribution generated by MCMC. Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 4 / 26
  • 5. Motivation Our aim is to end up with the following mappings when N = 2dk for some integer k: [0, 1]d −→ {0, 1, . . . 2k −1}d −→ {0, 1, . . . 2dk −1} −→ X first step is simply U → ⌊2k U⌋ applied to each dimension other two steps are 1–1 giving uniform distribution on X Question 2: if N is very large, can we thin it down to a subset of size M ≪ N without introducing much error? e.g. for 3D dataset, maybe generate 23×8 ≈ 16M samples, then thin it down to perhaps 23×6 ≈ 260k samples Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 5 / 26
  • 6. Recursive bisection A simple idea: apply recursive bisection in alternating directions. Step 1: sort points based on first coordinate to bisect into two halves -4 -2 0 2 4 -4 -3 -2 -1 0 1 2 3 4 Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 6 / 26
  • 7. Recursive bisection Step 2: within each half, sort points based on second coordinate to again bisect -4 -2 0 2 4 -4 -3 -2 -1 0 1 2 3 4 Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 7 / 26
  • 8. Recursive bisection Step 3: in 2D, within each quarter, sort points based on first coordinate -4 -2 0 2 4 -4 -3 -2 -1 0 1 2 3 4 Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 8 / 26
  • 9. Recursive bisection Repeat until there is just one point in each subset -4 -2 0 2 4 -4 -3 -2 -1 0 1 2 3 4 Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 9 / 26
  • 10. Recursive bisection In this 2D example, we end up with 2k × 2k subsets (with one point each) with the single entry for subset (i1, i2) stored at index I formed by interleaving the bits: i1 = 1 0 1 i2 = 0 1 0 =⇒ I = 1 0 0 1 1 0 This gives us our mapping: {0, 1, . . . 2k −1}2 −→ {0, 1, . . . 22d −1} for the re-ordered data, but it is convenient to do a further re-ordering so that we can access the same data through the simpler mapping (i1, i2) −→ i1 + 2k i2 This all generalises naturally to higher dimensions. Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 10 / 26
  • 11. QMC tests We hope that this new approach applied to a dataset generated from a known distribution is comparable to applying standard QMC to the same distribution. To test this, we consider the following 4 distributions: 2D independent Gaussians: X1 ∼ N(0, 1), X2 ∼ N(0, 1), 2D multivariate Gaussian: (X1, X2) ∼ N 0, 1 0 1/ √ 2 1 2D distorted Gaussian: X1 = Z1, X2 = Z2+Z2 1 where Z1 ∼ N(0, 1), Z2 ∼ N(0, 1), double well distribution: X1 = Z1, X2 = Z2 where U1 ∼ U(0, 1), Z2 ∼ N(0, 1), and in each case use the test function f (x1, x2) = sin(x1+x2). Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 11 / 26
  • 12. QMC tests In each case, we generate multiple random datasets of 220 points for each, generate randomised QMC estimates using varying numbers of Sobol points with digital scrambling compare overall RMS error from 256 randomisations (of datasets and scramblings) to that given by standard QMC and MC applied to same distributions Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 12 / 26
  • 13. QMC tests Results for independent Gaussians 10 0 10 1 10 2 10 3 10 4 10 5 Number of Samples 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 RootMeanSquaredError Independent Normals std MC std QMC new selection Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 13 / 26
  • 14. QMC tests Results for multivariate Gaussian 10 0 10 1 10 2 10 3 10 4 10 5 Number of Samples 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 RootMeanSquaredError Correlated Normals std MC QMC+PCA new selection Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 14 / 26
  • 15. QMC tests Results for distorted Gaussian 10 0 10 1 10 2 10 3 10 4 10 5 Number of Samples 10 -4 10 -3 10 -2 10 -1 10 0 RootMeanSquaredError Distorted Gaussian std MC std QMC new selection Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 15 / 26
  • 16. QMC tests Results for double well: 10 0 10 1 10 2 10 3 10 4 10 5 Number of Samples 10 -4 10 -3 10 -2 10 -1 10 0 RootMeanSquaredError Double well std MC std QMC new selection Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 16 / 26
  • 17. Thinning The thinning idea builds on the same recursive bisection: start with N = 2dk1 samples recursively bisect into M = 2dk2 subsets, with k2 < k1 from each subset Sk, randomly select one point Y (j) Note that the expectation over all random selections is unbiased, i.e. E   1 M j f (Y (j) )   = 1 M j  M N X(j)∈Sk f (X(j) )   = 1 N j f (X(j) ) Surprisingly (?), this is better in general (particularly in higher dimensions) than using the mean of each subset, which gives a biased estimator. Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 17 / 26
  • 18. Thinning We are able to do some error analysis if we make the following assumption: Assumption The recursive bisection results in subsets Sk, k = 1, . . . , M in which max k max X(i),X(j)∈Sk X(i) − X(j) = O(M−1/d ). This seems reasonable, at least for distributions on bounded domains with strictly positive densities, since there are O(M1/d ) blocks in each direction. Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 18 / 26
  • 19. Thinning Theorem If the Assumption is satisfied, and the function f (x) is Lipschitz, with |f (x)−f (y)| ≤ x−y , then the RMS thinning error is O(M−1/2−1/d ). Proof: the error in subset Sk, ek ≡ f (Y (k) ) − M N X(j)∈Sk f (X(j) ) is zero mean, and because of the Assumption and the Lipschitz property we have |ek | ≤ c M−1/d . Hence, due to independence, the variance of the overall error is V M−1 k ek = M−2 k V[ek ] ≤ M−1 c2 M−2/d . Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 19 / 26
  • 20. Thinning A planar cut through a regular d-dimensional grid of blocks with N1/d in each direction cuts at most d N(d−1)/d of the blocks. This inspires the following theorem which considers functions with discontinuities. Theorem If the Assumption is satisfied, and the function f (x) is such that |f (x)−f (y)| ≤ 1 for pairs (x, y) in O(N(d−1)/d ) of the subsets |f (x)−f (y)| ≤ x−y for pairs (x, y) in the other subsets then the RMS thinning error is O(M−1/2−1/2d ). Proof: very similar to the previous one, with the dominant error contribution from the first category of subsets. Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 20 / 26
  • 21. Thinning In both cases, the thinning error should be compared to the O(N−1/2) sampling error arising from the original sample set being a collection of iid samples from the true distribution. Equating the two errors gives M = O(Nd/(d+2)), Lipschitz case O(Nd/(d+1)), discontinuous case This indicates the potential for significant savings in the size of M when the dimension is low, but not so much in higher dimensions. Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 21 / 26
  • 22. Thinning tests 10 1 10 2 10 3 10 4 10 5 10 6 Number of Samples 10 -5 10 -4 10 -3 10 -2 10 -1 RootMeanSquaredError Independent Normals original samples thinning method 1 thinning method 2 Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 22 / 26
  • 23. Thinning tests 10 1 10 2 10 3 10 4 10 5 10 6 Number of Samples 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 RootMeanSquaredError Correlated Normals original samples thinning method 1 thinning method 2 Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 23 / 26
  • 24. Thinning tests 10 1 10 2 10 3 10 4 10 5 10 6 Number of Samples 10 -5 10 -4 10 -3 10 -2 10 -1 RootMeanSquaredError Distorted Gaussian original samples thinning method 1 thinning method 2 Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 24 / 26
  • 25. Thinning tests 10 1 10 2 10 3 10 4 10 5 10 6 Number of Samples 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 RootMeanSquaredError Double well original samples thinning method 1 thinning method 2 Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 25 / 26
  • 26. Conclusions it seems the QMC approach is effective, at least in low dimensions, so maybe a good way to apply QMC to samples generated by MCMC the thinning also seems effective, particularly in low dimensions, with its effectiveness depending on the smoothness of the integrand Webpage: http://people.maths.ox.ac.uk/gilesm/slides.html Email: mike.giles@maths.ox.ac.uk Mike Giles (Oxford) QMC & thinning August 28 – Sept 1, 2017 26 / 26