SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy
Lesson 5 - Introduction to Bootstrap (and hints on Markov
Chains) - 27.01.2015
Introduction
Let’s assume, for a moment, the Central Limit Theorem
(CLT):
If a random sample of n observations y1, y2, ..., yn is drawn from a
population of mean µ and sd σ2, for n enough large, the sample
distribution of the sample mean can be approximated by a normal
density with mean µ and variance σ2
n
Averages taken from any distribution will have a normal
distribution
The standard deviation decreases as the number of
observation increases
But .. nobody tells us exactly how big the sample has to be.
Why Bootstrap?
1. Sometimes we cannot take advantages of the CLT, because:
Nobody tells us exactly how big the sample has to be.
Empirically, in some cases the sample is really small.
So, we are not encouraged to conjecture any distribution
assumption. We just have the data and we let the raw data
speak.
The bootstrap method attempts to determine the probability
distribution from the data itself, without recourse to CLT.
2. To better estimate the variance of a parameter, and
consequently having more accurate confidence intervals and
hypothesis testing.
Basic Idea of Bootstrap
To use the original sample as the population, and to draw M
samples from the original sample (the bootstrap samples). To
Define the estimator using the bootstrap samples.
Figure: Real World versus Bootstrap World
Structure of Bootstrap
1. Originally, from a list of data (the sample), one computes a
statistic (an estimation).
2. Then, he/she can creates an artificial list of data (a new
sample), by randomly drawing elements from the list.
3. He/she computes a new statistic (estimation), from the new
sample.
4. He/she repeats, let’s say, M = 1000 times the point 2) and 3)
and he/she looks to the distribution of these 1000 statistics.
Type of resampling methods
1. The Monte Carlo algorithm: with replacement, the size of the
bootstrap sample must be equal to the size of the original data set
2. Jackknife algorithm: we simply re sample from the original sample
deleting one value at a time, the size is equal to n - 1.
Estimation of the sample mean
Suppose we extracted a sample x = (x1, x2, ..., xn) from the
population X. Let’s say the sample size is small: n = 10.
We can compute the sample mean ˆXn using the values of the
sample x. But, since n is small, the CLT does not hold, so that we
can say anything about the sample mean distribution.
APPROACH: We extract M samples (or sub-samples) of dimension
n from the sample x (with replacement, MC).
We can define the bootstrap sample means: ˆxi,b, ∀i = 1..., M. This
become the new sample with dimension M.
Bootstrap sample mean:
Mb(X) = M
i ˆxi,b/M
Bootstrap sample variance:
Vb(X) = M
i (ˆxi,b − Mb(X))2/M − 1 –(Chunk 1)
Bootstrap Confidence interval with variance
estimation
Let’s take a random sample of size n= 25 from a normal
distribution with mean 10 and standard deviation 3.
We can consider the sampling distribution of the sample mean.
From that, we estimate the intervals.
The bootstrap estimates standard error by re sampling the data in
our original sample.
Instead of repeatedly drawing samples of size n= 25 from the
population, we will repeatedly draw new samples of size n=25 from
our original sample, re sampling with replacement.
We can estimate the standard error of the sample mean using the
standard deviation of the bootstrapped sample means. –(Chunk
2)
Bootstrap confidence intervals: formula
Confidence interval with quantiles
Suppose we have a sample of data from an exponential distribution
with parameter λ:
f (x|λ) = λe−λx (remember: the estimation of λ is
ˆλ = 1/ˆxn).
An alternative solution to the use of bootstrap estimated standard
errors (since the estimation of the standard errors from an
exponential is not straightforward) is the use of bootstrap
quantiles.
We can obtain M bootstrap estimates ˆλb and define q∗(α) the α
quantile of the bootstrap distribution of the M λ estimates.
The new bootstrap confidence interval for λ will be:
[2 ∗ ˆλ − q∗(1 − α/2); 2 ∗ ˆλ − q∗(α/2)] –(Chunk 3)
Regression model coefficient estimate with Bootstrap
Now we will consider the situation where we have data on two variables.
This is the type of data that arises in linear regression models. It does
not make sense to bootstrap the two variables separately, so they remain
linked when bootstrapped.
If our original n=4 sample contains the observations (y1=1,x1=3),
(y2=2,x2=6), (y3=4,x3=3), and (y4=6,x4=2), we re-sample these
original couples in pairs.
Recall that the linear regression model is: yi = β1 + β2xi + i . We are
going to construct a bootstrap interval for the slope coefficient β2:
1. We draw M bootstrap bivariate samples.
2. We define the OLS ˆβ2 coefficient for each bootstrap sample.
3. We define the bootstrap quantiles, and we use the 0.025 (α/2) and
the 0.975 (1 − α/2) to define the confidence interval for ˆβ2.
–(Chunk 4)
Regression model coefficient estimate with Bootstrap
(alternative): sampling the residuals
An alternative solution for bootstrap estimating the regression
coefficient is a two stage methods in which:
1. You draw M samples. For each one you run a regression and
you define M bootstrap residual vectors (M vectors of
dimension n).
2. You add those residuals to each of the M dependent variable’s
vector.
3. You perform M new regression models using the new
dependent variables, to estimate M bootstrapped β2.
The method consists in using the (α/2) and the (1 - α/2)
quantiles of bootstrapped β2 to define the confidence interval.
–(Chunk 5)
References
Efron, B., Tibshirani, R. (1993). An introduction to the
bootstrap (Vol. 57). CRC press
Figure: Efron and Tbishirani foundational book
Routines in R
1. boot, by Brian Ripley.
Functions and datasets for bootstrapping from the book
Bootstrap Methods and Their Applications by A. C. Davison
and D. V. Hinkley (1997, CUP).
2. bootstrap, by Rob Tibshirani.
Software (bootstrap, cross-validation, jackknife) and data for
the book An Introduction to the Bootstrap by B. Efron and
R. Tibshirani, 1993, Chapman and Hall
Markov Chain
Markov Chain is an important method in probability and many
other area of research.
They are used to model the probability to belong to a certain state
in a certain period, given that the state in the past period is
known.
Example of weather: What is the markov probability for the state
tomorrow will be sunny, given that today is rainy?
The main properties of Markov Chain processes are:
Memory of the process (usually the memory is fixed to 1).
Stationarity of the distribution.
Chart 1
A picture of an easy example of markov chain with two possible
states and reported transition probabilities.
Figure: An example of 2 states markov chain
Notation
We define a stochastic process {Xt, t = 0, 1, 2, ...} that takes on a
finite or countable number of possible values.
Let the possible values be non negative integers (i.e.Xt ∈ Z+). If
Xt = i, then the process is said to be in state i at time t.
The Markov process (in discrete time) is defined as follows:
Pij = P[Xt+1 = j|Xt = i, Xt−1 = i, ..., X0 = i] = P[Xt+1 = j|Xt =
i], ∀i, j ∈ Z+
We call Pij a 1-step transition probability because we move from
time t to time t + 1.
It is a first order Markov Chain (memory = 1) because the
probability of being in state j at time (t + 1) only depends on the
state at time t.
Notation - 2
The t − step transition probability
Ptij = P[Xt+k = j|Xk = i], ∀t ≥ 0, i, j ≥ 0
The Champman Kolmogorov equations allow us to compute these
t − step transition probabilities. It states that:
Ptij = k PtikPmkj , ∀t, m ≥ 0, ∀i, j ≥ 0
N.B. Base probability properties:
1. Pij ≥ 0, ∀i, j ≥ 0
2. j≥0 Pij = 1, i = 0, 1, 2, ...
Example: conditional probability
Consider two states: 0 = rain and 1 = no rain.
Define two probabilities:
α = P00 = P[Xt+1 = 0|Xt = 0] the probability it will rain
tomorrow given it rains today
β = P01 = P[Xt+1 = 1|Xt = 0] the probability it will rain
tomorrow given it does not rain today. What is the probability it
will rain the day after tomorrow given it rains today, given α = 0.7
and β = 0.3?
The transition probability matrix will be:
P = [P00, P01, P10, P11], or
P = [α = 0.7, β = 0.3, 1 − α = 0.4, 1 − β = 0.6] –(Chunk 6)
Example: unconditional probababily
What is the unconditional probability it will rain the day after
tomorrow?
We need to define the unconditional or marginal distribution of the
state at time t:
P[Xt = j] = i P[Xt = j|X0 = 1]P[X0 = i] = i Ptij ∗ αi ,
where αi = P[X0 = i], ∀i ≥ 0
and P[Xt = j|X0 = 1] is the conditional probability just computed
before. –(Chunk 7)
Stationary distributions
A stationary distribution π is the probability distribution such that
when the Markov chain reaches the stationary distribution, then it
remains in that probability forever.
It means we are asking this question: What is the probability to be
in a particular state in the long-run?
Let’s define πj as the limiting probability that the process will be in
state j at time t, or
πj = limt→∞Pnij
Using Fubini’s theorem
(https://www.youtube.com/watch?v=6-sGhUeOOk8), we can
define the stationary distribution as:
πj = i Pij πi , or, better, with these approximations: π0 = β
α;
π1 = 1−α
α
Example: stationary distribution
Back to our example.
We can compute the 2 step, 3 step, ..., n- step transition
distributions, and give a look WHEN it reach the
convergence.
An alternative method to compute the stationary transition
distribution consists in using this easy formula:
π0 = β
α
π1 = 1−α
α
References
Ross, S. M. (2006). Introduction to probability models. Access
Online via Elsevier.
Figure: Cover of the 10th edition
Routines in R
markovchain, by Giorgio Alfredo Spedicato.
A package for easily handling discrete Markov chains.
MCMCpack, by Andrew D. Martin, Kevin M. Quinn, and
Jong Hee Park.
Perform Monte Carlo simulations based on Markov Chain
approach.

Weitere ähnliche Inhalte

Was ist angesagt?

Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloMonte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloXin-She Yang
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Prateek Omer
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
Using Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowUsing Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowMartin Harrigan
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialRalph Schlosser
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big DataGianvito Siciliano
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsChristian Robert
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Christian Robert
 
Likelihood survey-nber-0713101
Likelihood survey-nber-0713101Likelihood survey-nber-0713101
Likelihood survey-nber-0713101NBER
 
Quantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous TimeQuantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous TimeDavid Yonge-Mallo
 

Was ist angesagt? (20)

Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloMonte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
 
mcmc
mcmcmcmc
mcmc
 
Hastings 1970
Hastings 1970Hastings 1970
Hastings 1970
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
Statistical Physics Assignment Help
Statistical Physics Assignment HelpStatistical Physics Assignment Help
Statistical Physics Assignment Help
 
Hmm
HmmHmm
Hmm
 
Using Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowUsing Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication Flow
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methods
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
Likelihood survey-nber-0713101
Likelihood survey-nber-0713101Likelihood survey-nber-0713101
Likelihood survey-nber-0713101
 
Quantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous TimeQuantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous Time
 
NC time seminar
NC time seminarNC time seminar
NC time seminar
 

Ähnlich wie Talk 5

Some real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time SeriesSome real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time SeriesAjay Bidyarthy
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionbutest
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionbutest
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlonozomuhamada
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920Karl Rudeen
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmmnozomuhamada
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Arthur Charpentier
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMIJCSEA Journal
 

Ähnlich wie Talk 5 (20)

Some real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time SeriesSome real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time Series
 
intro
introintro
intro
 
Montecarlophd
MontecarlophdMontecarlophd
Montecarlophd
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognition
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognition
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
 
Project Paper
Project PaperProject Paper
Project Paper
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmm
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
Stochastic Processes Assignment Help
Stochastic Processes Assignment HelpStochastic Processes Assignment Help
Stochastic Processes Assignment Help
 
Lec12
Lec12Lec12
Lec12
 
MT2
MT2MT2
MT2
 
probability assignment help (2)
probability assignment help (2)probability assignment help (2)
probability assignment help (2)
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
 

Mehr von University of Salerno

Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataUniversity of Salerno
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2University of Salerno
 
A strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataA strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataUniversity of Salerno
 
Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...University of Salerno
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSUniversity of Salerno
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataUniversity of Salerno
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team PerformanceUniversity of Salerno
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...University of Salerno
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...University of Salerno
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...University of Salerno
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...University of Salerno
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramUniversity of Salerno
 

Mehr von University of Salerno (20)

Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large data
 
Regression models for panel data
Regression models for panel dataRegression models for panel data
Regression models for panel data
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
A strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataA strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census data
 
Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORS
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone data
 
Poster venezia
Poster veneziaPoster venezia
Poster venezia
 
Metulini280818 iasi
Metulini280818 iasiMetulini280818 iasi
Metulini280818 iasi
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team Performance
 
Big Data Analytics for Smart Cities
Big Data Analytics for Smart CitiesBig Data Analytics for Smart Cities
Big Data Analytics for Smart Cities
 
Meeting progetto ode_sm_rm
Meeting progetto ode_sm_rmMeeting progetto ode_sm_rm
Meeting progetto ode_sm_rm
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
 
Metulini1503
Metulini1503Metulini1503
Metulini1503
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
 
The Global Virtual Water Network
The Global Virtual Water NetworkThe Global Virtual Water Network
The Global Virtual Water Network
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with Kriskogram
 
Ad b 1702_metu_v2
Ad b 1702_metu_v2Ad b 1702_metu_v2
Ad b 1702_metu_v2
 

Kürzlich hochgeladen

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Kürzlich hochgeladen (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Talk 5

  • 1. Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Lesson 5 - Introduction to Bootstrap (and hints on Markov Chains) - 27.01.2015
  • 2. Introduction Let’s assume, for a moment, the Central Limit Theorem (CLT): If a random sample of n observations y1, y2, ..., yn is drawn from a population of mean µ and sd σ2, for n enough large, the sample distribution of the sample mean can be approximated by a normal density with mean µ and variance σ2 n Averages taken from any distribution will have a normal distribution The standard deviation decreases as the number of observation increases But .. nobody tells us exactly how big the sample has to be.
  • 3. Why Bootstrap? 1. Sometimes we cannot take advantages of the CLT, because: Nobody tells us exactly how big the sample has to be. Empirically, in some cases the sample is really small. So, we are not encouraged to conjecture any distribution assumption. We just have the data and we let the raw data speak. The bootstrap method attempts to determine the probability distribution from the data itself, without recourse to CLT. 2. To better estimate the variance of a parameter, and consequently having more accurate confidence intervals and hypothesis testing.
  • 4. Basic Idea of Bootstrap To use the original sample as the population, and to draw M samples from the original sample (the bootstrap samples). To Define the estimator using the bootstrap samples. Figure: Real World versus Bootstrap World
  • 5. Structure of Bootstrap 1. Originally, from a list of data (the sample), one computes a statistic (an estimation). 2. Then, he/she can creates an artificial list of data (a new sample), by randomly drawing elements from the list. 3. He/she computes a new statistic (estimation), from the new sample. 4. He/she repeats, let’s say, M = 1000 times the point 2) and 3) and he/she looks to the distribution of these 1000 statistics.
  • 6. Type of resampling methods 1. The Monte Carlo algorithm: with replacement, the size of the bootstrap sample must be equal to the size of the original data set 2. Jackknife algorithm: we simply re sample from the original sample deleting one value at a time, the size is equal to n - 1.
  • 7. Estimation of the sample mean Suppose we extracted a sample x = (x1, x2, ..., xn) from the population X. Let’s say the sample size is small: n = 10. We can compute the sample mean ˆXn using the values of the sample x. But, since n is small, the CLT does not hold, so that we can say anything about the sample mean distribution. APPROACH: We extract M samples (or sub-samples) of dimension n from the sample x (with replacement, MC). We can define the bootstrap sample means: ˆxi,b, ∀i = 1..., M. This become the new sample with dimension M. Bootstrap sample mean: Mb(X) = M i ˆxi,b/M Bootstrap sample variance: Vb(X) = M i (ˆxi,b − Mb(X))2/M − 1 –(Chunk 1)
  • 8. Bootstrap Confidence interval with variance estimation Let’s take a random sample of size n= 25 from a normal distribution with mean 10 and standard deviation 3. We can consider the sampling distribution of the sample mean. From that, we estimate the intervals. The bootstrap estimates standard error by re sampling the data in our original sample. Instead of repeatedly drawing samples of size n= 25 from the population, we will repeatedly draw new samples of size n=25 from our original sample, re sampling with replacement. We can estimate the standard error of the sample mean using the standard deviation of the bootstrapped sample means. –(Chunk 2)
  • 10. Confidence interval with quantiles Suppose we have a sample of data from an exponential distribution with parameter λ: f (x|λ) = λe−λx (remember: the estimation of λ is ˆλ = 1/ˆxn). An alternative solution to the use of bootstrap estimated standard errors (since the estimation of the standard errors from an exponential is not straightforward) is the use of bootstrap quantiles. We can obtain M bootstrap estimates ˆλb and define q∗(α) the α quantile of the bootstrap distribution of the M λ estimates. The new bootstrap confidence interval for λ will be: [2 ∗ ˆλ − q∗(1 − α/2); 2 ∗ ˆλ − q∗(α/2)] –(Chunk 3)
  • 11. Regression model coefficient estimate with Bootstrap Now we will consider the situation where we have data on two variables. This is the type of data that arises in linear regression models. It does not make sense to bootstrap the two variables separately, so they remain linked when bootstrapped. If our original n=4 sample contains the observations (y1=1,x1=3), (y2=2,x2=6), (y3=4,x3=3), and (y4=6,x4=2), we re-sample these original couples in pairs. Recall that the linear regression model is: yi = β1 + β2xi + i . We are going to construct a bootstrap interval for the slope coefficient β2: 1. We draw M bootstrap bivariate samples. 2. We define the OLS ˆβ2 coefficient for each bootstrap sample. 3. We define the bootstrap quantiles, and we use the 0.025 (α/2) and the 0.975 (1 − α/2) to define the confidence interval for ˆβ2. –(Chunk 4)
  • 12. Regression model coefficient estimate with Bootstrap (alternative): sampling the residuals An alternative solution for bootstrap estimating the regression coefficient is a two stage methods in which: 1. You draw M samples. For each one you run a regression and you define M bootstrap residual vectors (M vectors of dimension n). 2. You add those residuals to each of the M dependent variable’s vector. 3. You perform M new regression models using the new dependent variables, to estimate M bootstrapped β2. The method consists in using the (α/2) and the (1 - α/2) quantiles of bootstrapped β2 to define the confidence interval. –(Chunk 5)
  • 13. References Efron, B., Tibshirani, R. (1993). An introduction to the bootstrap (Vol. 57). CRC press Figure: Efron and Tbishirani foundational book
  • 14. Routines in R 1. boot, by Brian Ripley. Functions and datasets for bootstrapping from the book Bootstrap Methods and Their Applications by A. C. Davison and D. V. Hinkley (1997, CUP). 2. bootstrap, by Rob Tibshirani. Software (bootstrap, cross-validation, jackknife) and data for the book An Introduction to the Bootstrap by B. Efron and R. Tibshirani, 1993, Chapman and Hall
  • 15. Markov Chain Markov Chain is an important method in probability and many other area of research. They are used to model the probability to belong to a certain state in a certain period, given that the state in the past period is known. Example of weather: What is the markov probability for the state tomorrow will be sunny, given that today is rainy? The main properties of Markov Chain processes are: Memory of the process (usually the memory is fixed to 1). Stationarity of the distribution.
  • 16. Chart 1 A picture of an easy example of markov chain with two possible states and reported transition probabilities. Figure: An example of 2 states markov chain
  • 17. Notation We define a stochastic process {Xt, t = 0, 1, 2, ...} that takes on a finite or countable number of possible values. Let the possible values be non negative integers (i.e.Xt ∈ Z+). If Xt = i, then the process is said to be in state i at time t. The Markov process (in discrete time) is defined as follows: Pij = P[Xt+1 = j|Xt = i, Xt−1 = i, ..., X0 = i] = P[Xt+1 = j|Xt = i], ∀i, j ∈ Z+ We call Pij a 1-step transition probability because we move from time t to time t + 1. It is a first order Markov Chain (memory = 1) because the probability of being in state j at time (t + 1) only depends on the state at time t.
  • 18. Notation - 2 The t − step transition probability Ptij = P[Xt+k = j|Xk = i], ∀t ≥ 0, i, j ≥ 0 The Champman Kolmogorov equations allow us to compute these t − step transition probabilities. It states that: Ptij = k PtikPmkj , ∀t, m ≥ 0, ∀i, j ≥ 0 N.B. Base probability properties: 1. Pij ≥ 0, ∀i, j ≥ 0 2. j≥0 Pij = 1, i = 0, 1, 2, ...
  • 19. Example: conditional probability Consider two states: 0 = rain and 1 = no rain. Define two probabilities: α = P00 = P[Xt+1 = 0|Xt = 0] the probability it will rain tomorrow given it rains today β = P01 = P[Xt+1 = 1|Xt = 0] the probability it will rain tomorrow given it does not rain today. What is the probability it will rain the day after tomorrow given it rains today, given α = 0.7 and β = 0.3? The transition probability matrix will be: P = [P00, P01, P10, P11], or P = [α = 0.7, β = 0.3, 1 − α = 0.4, 1 − β = 0.6] –(Chunk 6)
  • 20. Example: unconditional probababily What is the unconditional probability it will rain the day after tomorrow? We need to define the unconditional or marginal distribution of the state at time t: P[Xt = j] = i P[Xt = j|X0 = 1]P[X0 = i] = i Ptij ∗ αi , where αi = P[X0 = i], ∀i ≥ 0 and P[Xt = j|X0 = 1] is the conditional probability just computed before. –(Chunk 7)
  • 21. Stationary distributions A stationary distribution π is the probability distribution such that when the Markov chain reaches the stationary distribution, then it remains in that probability forever. It means we are asking this question: What is the probability to be in a particular state in the long-run? Let’s define πj as the limiting probability that the process will be in state j at time t, or πj = limt→∞Pnij Using Fubini’s theorem (https://www.youtube.com/watch?v=6-sGhUeOOk8), we can define the stationary distribution as: πj = i Pij πi , or, better, with these approximations: π0 = β α; π1 = 1−α α
  • 22. Example: stationary distribution Back to our example. We can compute the 2 step, 3 step, ..., n- step transition distributions, and give a look WHEN it reach the convergence. An alternative method to compute the stationary transition distribution consists in using this easy formula: π0 = β α π1 = 1−α α
  • 23. References Ross, S. M. (2006). Introduction to probability models. Access Online via Elsevier. Figure: Cover of the 10th edition
  • 24. Routines in R markovchain, by Giorgio Alfredo Spedicato. A package for easily handling discrete Markov chains. MCMCpack, by Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park. Perform Monte Carlo simulations based on Markov Chain approach.