Computer Intensive Statistical Methods

Computer Intensive Models
Daniele and Steph

Playing Card Example
• 52 cards are population
▫ 25 card sample…. What is the mean, median, SE
• Randomization
▫ Randomly reallocate cards to groups (i.e. diamonds,
spades..)
▫ 4 people divide up cards
▫ Do 1000+ times randomly allocating to the 4 groups
▫ Graph up distribution of parameter
▫ If your original parameter is outside of the 95% CI then
it is significantly different from random

• JackKnife
▫ Take 24 cards… (leave 1 out) and complete test
statistics
▫ Redo for all possible combinations (25 x)
• Bootstrapping
▫ Pick out 25 cards, but put them back each time
▫ Calculate new parameters
 Redo 1000+ times
▫ If your sample parameter falls within the 95% CI of
the distribution, then it isn’t statistically different
from random

• Monte Carlo
▫ Find a model that would fit the card trend
 Relative frequency plot, examine shape
▫ Randomly select value for model parameters or data (cards
picked)
 Complete 1000+ times
 Analyze your parameters to fit the data vs. random generated
parameters
http://www.vertex42.com/ExcelArticles/mc/MonteCarloSimulation.html

COMPARISON
Randomize Jackknife Bootstrap Monte Carlo
With
Replacement
No No Yes Yes
Exact P-values Yes Likely no Yes Yes
Resample a
theoretical PDF
Parametric
Resample an
empirical
distribution
Yes Yes Non-
parametric
Non-
parametric
Good to… Deal with
unknown
distribution
Detect bias,
calc. SE, good
for biases
parameters
Calc. sample
size for exp.
Design, CI, SE
and Test Hypot.
Flexible,
generic, SE,
CI, Test Hypot.
Good for sparse data sets
Limitations Can’t calc SE,
or CI (weak)
Bad CI

4 Methods
• Randomization
▫ Ho: each group of obs. is a random sample of 1 pop.
▫ Must be characterized by a test stat
▫ Combine all groups, then reallocate, and compare
 Repeat 1000+ times
▫ Compare obs. Test stat with empirical distribution of
that test stat given available data.
 A sig diff is when obs. test stat is beyond empirical
distribution

4 Methods
• Jackknife
▫ Sample could be from one arm of the distribution
▫ Subset data (for all combinations of all data minus 1
pt) (total = n-1)
▫ Calculate pseudo-values , diff. btw. this and obs =
estimate of bias
▫ Good when estimating something other than mean
▫ Calculate Jackknife SE and parameter of interest
 CI can be fitted but issues with DFs

4 Methods
• Bootstrapping
▫ Random samples of the observation (with
replacement
 Each treated as a separate random sample
▫ Should = the distribution if you had repeatedly
sampled the original population
▫ Provides better CI than the Jackknife
 Can determine SE, CI and test hypotheses
▫ Have been used for multiple regression and stratified
sampling

4 Methods
• Monte Carlo
▫ Mathematical model of the situation + model
parameters
▫ Randomly select variable, parameter or data values
and then use to determine model output
 Do 1000+ times and use to test hypotheses or
determine Confidence Intervals
▫ *Resampling from a theoretical distribution
▫ Compare observations with data from a model of the
system + for use in risk assessment

Chapter 5: Randomization TestsChapter 5: Randomization Tests
• ANOVA (e.g.) vs. Randomization
▫ Assumptions
• Hypothesis testing
▫ Determinations of likelihood that observations in
nature could have arisen by chance
▫ Relativity: group 1 vs. group 2, hypothesis

Standard Significance Testing
• Test statistic (e.g. t-test)
• Significant difference
• Statistically 3 things
needed to test
hypothesis:
1. Formally stated hypothesis
(Ho and Ha)
2. Test statistic: t-test, F-ratio,
r correlation, etc.
3. Means of generating PD of
test statistic under
assumption Ho is true
• idrc.ca
idrc.ca

• Observed vs. expected values (d.f., etc.)
• Determine how likely observed values,
assuming Ho to be true
• α-value: 0.05, 0.01, 0.001
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25
X2
Statistic
Probability
α 0.05 = ~15

However…
• Test statistics are not valid if assumptions are
falsely made
• Can, therefore, reject a real difference in data or
accept a non-existent difference
• Problem with theoretically derived PDF:
▫ If test statistic is not significant, cannot tell without
further analyses if the test failed b/c:
1. Samples are not independently distributed (thing being tested)
or,
2. The data failed to conform to the assumptions necessary for the
validity of the test (e.g. samples were not normally distributed)
This is where randomization tests come into play…

Significance Testing by Randomization
Independent of any determined parametric
PDF
Generates empirical PDF for the test-statistic

Given a null hypothesis, the expected PDF for a
test statistic can be generated by repeatedly
randomizing the data with respect to sample
group membership and recalculating the test
statistic
1. Done many times (min. 1000)
2. Test statistic values tabulated
3. Compared with original value from un-
randomized data
4. If original value is unusual relative to
permutations, Ho can be rejected

The Three R’s
sharonlanegetaz.v2efoliomn.mnscu.edu/Research

Key Points
• Essentially, the null-hypothesis is that the
groups being compared are random samples
from the same population
• Thus, a test of significance is an attempt to
determine whether observed samples could
have been randomly drawn from the same
population
• Answer = probability
▫ PROBABILITY = possibly from same pop. (never
claim definitively)
▫ PROBABILITY = not likely from same pop.

Ex: Fish length for in-shore fish vs. off-
shore fish
• Ha: In-shore fish are smaller on average than off-
shore fish
0 50 100 150 200 250 300
Fork Length
Off-shore
In-shore

Ex. 5.2
• Randomization can be used to test mean
difference
• Original mean difference occurred 25 x’s out of
1000
• What does this tell us about the data?
• Weight of evidence, not significant difference
(e.g. p=0.05)

0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40 45 50 55
Abs(Mean Difference)
SortedRandomization
Replicates
p=0.025

Selection of a Test Statistic
• Chosen based on sensitivity to the hypothetical
property being checked or compared (e.g. mean
vs. median (ex.5.3))
• Precaution should be taken when selecting a
non-standard test statistic
• Multivariate comparisons
• Determine exactly which hypothesis is being
tested by the test statistic
▫ “When in doubt, be conservative”
• Ex 5.3

Ideal Test Statistics
• Greatest statistical power
• Significance – probability of making a Type I
error
• Power – probability of making a Type II error
• Unbiased – using a test that is more likely to
reject a false hypothesis than a true one
No difference
Null true
Difference exists
Null False
Null accepted OK TYPE II ERROR
Null rejected TYPE I ERROR OK

Randomization of Structured Data
• Restricted to comparison tests (cannot be used
for parameter estimation)
• Differences in variation – randomize residuals
instead of data values
• Basic rule: unbalanced and highly non-normal
data a randomization procedure should be used
• Question: what should be randomized?
▫ Raw data,
▫ Sub-set of raw data, or
▫ residuals from model

Take Home Message…
With structured or non-linear data, care needs to
be taken in what components should be
randomized

Summary
• Randomization requires less assumptions than
standard parametric stats
• Significance tests test whether the observed
samples could be from the same pop.
• State hypotheses and determine significance
level
• Test statistics that yield the greatest power
should be utilized

Computer Intensive Statistical Methods

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Computer Intensive Statistical Methods

Ähnlich wie Computer Intensive Statistical Methods (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Computer Intensive Statistical Methods

Hinweis der Redaktion