Introduction to Bootstrap and elements of Markov Chains
1. Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy
Lesson 5 - Introduction to Bootstrap and Introduction to
Markov Chains - 23.01.2014
2. Introduction
Let’s assume, for a moment, the CLT:
If random samples of n observations y1 , y2 , ..., yn are drawn from a
population of meran µ and sd σ, for n sufficienly large, the
sampling distribution of the sample mean can be approximated by
a normal density with mean µ and variance σ
Averages taken from any distribution will have a normal
distribution
The standard deviation decreases as the number of
observation increases
But .. nobody tells us exactly how big the sample has to be.
3. Why Bootstrap?
Sometimes we can not make use of the CLT, because:
1. Nobody tells us exactly how big the sample has to be
2. The sample can be really small.
So that, we are not encouraged to hypotize any distribution
assumption. We just have the data and let the raw data
speak.
The bootstrap method attempts to determine the probability
distribution from the data itself, without recourse to CLT.
N.B. The Bootstrap method is not a way of reducing the error! it
only tries to estimate it.
4. Basic Idea of Bootstrap
Using the original sample as the population, and draw N samples
from the original sample (which are the bootstrap samples).
Defining the estimator using the bootstrap samples.
Figure: Real World versus Bootstrap World
5. Structure of Bootstrap
1. Originally, from a list of data (the sample), one compute a
statistic (an estimation)
2. Create an artifical list of data (a new sample), by randomly
drawing elements from the list
3. Compute a new statistic (estimation), from the new sample
4. Repeat, let’s say, 1000 times the point 2) and 3) and look to
the distribution of these 1000 statistics.
6. Sample mean
Suppose we extracted a sample x = (x1 , x2 , ..., xn ) from the
population X . Let’s say the sample size is small: n = 10.
We can compute the sample mean x using the values of the
ˆ
sample x. But, since n is small, the CLT does not hold, so that we
can say anything about the sample mean distribution.
APPROACH: We extract M samples (or sub-samples) of dimension
n from the sample x (with replacement).
We can define the bootstrap sample means: xbi ∀i = 1..., M. This
become the new sample with dimension M
Bootstrap sample mean:
Mb(X ) =
M
i
xbi /M
Bootstrap sample variance:
Vb(X ) =
M
i (xbi
− Mb(X ))2 /M − 1
7. Bootstrap Confidence interval with variance
estimation
Let’s take a random sample of size 25 from a normal distribution
with mean 10 and standard deviation 3.
We can consider the sampling distribution of the sample mean.
From that, we estimate the intervals.
The bootstrap estimates standard error by resampling the data in
our original sample. Instead of repeatedly drawing samples of size
100 from the population, we will repeatedly draw new samples of
size 100 from our original sample, resampling with
replacement.
We can estimate the standard error of the sample mean using the
standard deviation of the bootstrapped sample means.
8. Confidence interval with quantiles
Suppose we have a sample of data from an exponential distribution
with parameter λ:
ˆ
f (x|λ) = λe −λx (remember the estimation of λ = 1/ˆn ).
x
An alternative solution to the use of bootstrap estimated standard
errors (the estimation of the sd from an exponential is not
straightforward) is the use of bootstrap quantiles.
ˆ
We can obtain M bootstrap estimates λb and define q ∗ (α) the α
quantile of the bootstrap distribution.
The new bootstrap confidence interval for λ will be:
ˆ
ˆ
[2 ∗ λ − q ∗ (1 − α/2); 2 ∗ λ − q ∗ (α/2)]
9. Regression model coefficient estimate with Bootstrap
Now we will consider the situation where we have data on two
variables. This is the type of data that arises in a linear regression
situation. It doesnt make sense to bootstrap the two variables
separately, and so must remain linked when bootstrapped.
For example, if our original data contains the observations (1,3),
(2,6), (4,3), and (6, 2), we re-sample this original sample in
pairs.
Recall that the linear regression model is: y = β0 + β1 x
We are going to construct a bootstrap interval for the slope
coefficient β1
1. Draw M bootstrap samples
2. Define the olsβ1 coefficient for each bootstrap sample
3. Define the bootstrap quantiles, and use the 0.025 and the
0.975 to define the confidence interval for β1
10. Regression model coefficient estimate with Bootstrap:
sampling the residuals
An alternative solution for the regression coefficient is a two stage
methods in which:
1. You draw M samples, for eah one you run a regression and
you define M bootstrap regression residuals (dim=n)
2. You add those residuals to each M draw sampled dependent
variable, to define M bootstrapped β1
The method consists in using the 0.025 and the 0.975 quantiles of
bootstrapped β1 to define the confidence interval.
11. References
Efron, B., Tibshirani, R. (1993). An introduction to the
bootstrap (Vol. 57). CRC press
Figure: Efron and Tbishirani foundational book
12. Routines in R
1. boot, by Brian Ripley.
Functions and datasets for bootstrapping from the book
Bootstrap Methods and Their Applications by A. C. Davison
and D. V. Hinkley (1997, CUP).
2. bootstrap, by Rob Tibshirani.
Software (bootstrap, cross-validation, jackknife) and data for
the book An Introduction to the Bootstrap by B. Efron and
R. Tibshirani, 1993, Chapman and Hall
13. Markov Chain
Markov Chain are important concept in probability and many other
area of research.
They are used to model the probability to belong to a certain state
in a certain period, given the state in the past period.
Example of weather: What is the markov probability for the state
tomorrow will be sunny, given that today is rainy ?
The main properties of Markov Chain processes are:
Memory of the process (usually the memory is fixed to 1)
Stationarity of the distribution
14. Chart 1
A picture of an easy example of markov chain with 2 possible
states and transition probabilities.
Figure: An example of 2 states markov chain
15. Notation
We define a stochastic process {Xn , n = 0, 1, 2, ...} that takes on a
finite or countable number of possible values.
Let the possible values be non negative integers (i.e.Xn ∈ Z+ ). If
Xn = i, then the process is said to be in state i at time n.
The Markov process (in discrete time) is defined as follows:
Pij = P[Xn+1 = j|Xn = in , Xn−1 = in−1 , ..., X0 = i0 ] = P[Xn+1 =
j|Xn = in ], ∀i, j ∈ Z+
We call Pij a 1-step transition probability because we moved from
time n to time n + 1.
It is a first order Markov Chain (memory = 1) because the
probability of being in state j at time (n + 1) only depends on the
state at time n.
16. Notation - 2
The n − step transition probability
Pnij = P[Xn+k = j|Xk = i], ∀n ≥ 0, i, j ≥ 0
The Champman Kolmogorov equations allow us to compute these
n − step transition probabilities. It states that:
Pnij =
k
Pnik Pmkj , ∀n, m ≥ 0, ∀i, j ≥ 0
N.B. Base probability properties:
1. Pij ≥ 0, ∀i, j ≥ 0
2.
j≥0 Pij
= 1, i = 0, 1, 2, ...
17. Example: conditional probability
Consider two states: 0 = rain and 1 = no rain.
Define two probabilities:
α = P00 = P[Xn+1 = 0|Xn = 0] the probability it will rain
tomorrow given it rained today
β = P01 = P[Xn+1 = 1|Xn = 0] the probability it will rain
tomorrow given it did not rain today.
What is the probability it will rain the day after tomorrow given it
rained today?
The transition probability matrix will be:
P = [α, β, 1 − α, 1 − β]
18. Example: uncoditional probababily
What is the unconditional probability it will rain the day after
tomorrow?
We need to define the uncoditional or marginal distribution of the
state at time n:
P[Xn = j] =
i
P[Xn = j|X0 = 1]P[X0 = i] =
i
Pnij ∗ αi ,
where αi = P[X0 = i], ∀i ≥ 0
and P[Xn = j|X0 = 1] is the conditional probability just computed
before.
19. Stationary distributions
A stationary distribution π is the probability distribution such that
when the Markov chain reaches the stationary distribution, them it
remains in that probability forever.
It means we are asking this question: What is the probability to be
in a particular state in the long-run?
Let’s define πj as the limiting probability that the process will be in
state j at time n, or
πj = limn→∞ Pnij
Using Fubini’s theorem, we can define the stationary distribution
as:
πj =
i
Pij πi
20. Example: stationary distribution
Back to out example.
We can compute the 2 step, 3 step, ..., n- step transition
distribution, and give a look WHEN it reach the
convergence.
An alternative method to compute the stationary transition
distribution consists in using this easy formula:
π0 =
β
α
π1 =
1−α
α
21. References
Ross, S. M. (2006). Introduction to probability models. Access
Online via Elsevier.
Figure: Cover of the 10th edition
22. Routines in R
markovchain, by Giorgio Alfredo Spedicato.
A package for easily handling discrete Markov chains.
MCMCpack, by Andrew D. Martin, Kevin M. Quinn, and
Jong Hee Park.
Perform Monte Carlo simulations based on Markov Chain
approach.