Statr sessions 11 to 12

Learning Objectives
• Determine when to use sampling.
• Determine the pros and cons of various sampling
techniques.
• Be aware of the different types of errors that can
occur in a study.
• Understand the impact of the Central Limit
Theorem on statistical analysis.
• Use the sampling distributions of the sample
mean and sample proportion.

Reasons for Sampling
• Sampling – A means for gathering information
about a population without conducting a
census
– Information is gathered from sample, and
inference is made about the population

• Sampling has advantages over a census
– Sampling can save money.
– Sampling can save time.

Random versus non-random Sampling
• Nonrandom Sampling - Every unit of the
population does not have the same
probability of being included in the sample
• Random sampling - Every unit of the
population has the same probability of being
included in the sample.

Sampling from a Frame
• A sample is taken from a population list, map ,
directory, or other source used to represent
the population, which is called a frame.
• Frames can be Telephone Directory, School
lists, trade association lists, or even lists sold
by brokers.
• In theory, the target population and the frame
are same. But in reality, frames may have
over-registration or under-registration.

Random Sampling Techniques
• Simple Random Sampling – basis for other
random sampling techniques
– Each unit is numbered from 1 to N (the size of the
population)
– A random number generator can be used to select
n items that form the sample
– Easier to perform on small populations. The
process of numbering all members of a population
is cumbersome for large populations

• Systematic Random Sampling
– Every kth item is selected to produce a sample of
size n from a population of size N
– Value of k is called sampling cycle
– Define k = N/n. Choose one random unit from
first k units, and then select every kth unit from
there
– Used because of convenience and relative ease of
administration
– A knowledgeable person can easily determine
whether a sampling plan has been followed.

Systematic Random Sampling:
Example
• Purchase orders for the previous fiscal year are
serialized 1 to 10,000 (N = 10,000).
• A sample of fifty (n = 50) purchases orders is
needed for an audit.
• k = 10,000/50 = 200

Systematic Sampling: Example
• First sample element randomly selected from the
first 200 purchase orders. Assume the 45th
purchase order was selected.
• Subsequent sample elements: 45, 245, 445, 645, . . .

• Systematic Random Sampling: Problems
– Problems can occur if the data are subject to any
periodicity and the sampling interval is in
syncopation with it, and sampling will be nonrandom
– Example: a list of 150 college students, actually a
merged list of 5 classes with 30 students in each
class, the list in each class being ordered with
names of top students first and bottom students
last. Systematic sampling of every 30th student
may cause selection of all top or bottom or
mediocre students i.e. the list is subject to cyclical
organizations

• Stratified Random Sampling
– The population is broken down into strata i.e.
homogeneous segments with like characteristics (i.e.
men and women OR old, young, and middle-aged
people, OR high-income, mid-income and low-income
group ) and then Simple/Systematic Random Sampling
is done.
– Efficient when differences between strata exist
– The technique capitalizes on the known homogeneity
of subpopulations so that only relatively small
samples are required to estimate the characteristic for
each stratum or group
– Proportionate (% of the sample from each stratum
equals % that subpopulation of each stratum is within
the whole population)

• Cluster (or Area) Sampling
– The population is in pre-determined clusters (students
in classes, colleges, towns, companies, areas of a city,
geographic regions etc.)
– The technique identifies clusters that tend to be
internally heterogeneous
– Each cluster contains a wide variety of elements, and
is miniature of the population
– A random sample of clusters is chosen and all or some
units within the cluster is used as the sample
– Advantages: Convenience and Cost, Convenient to
obtain and cost of sampling is reduced as the scope of
study is reduced to clusters


Important to remember:
in Stratified Random Sampling, each stratum is a
homogeneous group of population
in Cluster Sampling, each cluster is a
heterogeneous group of population

Convenience (NonRandom) Sampling
• Non-Random sampling – sampling techniques
used to select elements from the population by
any mechanism that does not involve a random
selection process
– These techniques are not desirable for making
statistical inferences
– Example – choosing members of this class as an
accurate representation of all students at our
university, selecting the first five people that walk into
a store and ask them about their shopping
preferences, etc.

Non-sampling Errors
• Non-sampling Errors – all errors that exist
other than the variation expected due to
random sampling
– Missing data, data entry, and analysis errors
– Leading questions, poorly conceived
concepts, unclear definitions, and defective
questionnaires
– Response errors occur when people do not
know, will not say, or overstate in their answers

Proper analysis and interpretation of a sample statistic
requires knowledge of its distribution.

Process of
Inferential Statistics

Select a
random sample

What is a Sampling Distribution?
• Recall that Statistic has a numerical value that can be
computed (observed) once a sample data set is
available.
• Three points are crucial in this context:
 Because a sample is only a part of the population, the
numerical value of a statistic cannot be expected to
give us the exact value of the parameter
 The observed value of a statistic depends on the
particular sample that happens to be selected
 There will be some variability in the observed values
of a statistic over different occasions of sampling

What is a Sampling Distribution?
• The value of a Statistic varies in repeated sampling.
• In other words, a Statistic is a random variable and
hence has its own probability distribution
• Sampling Distribution is the Probability Distribution
of a Statistic
• The qualifier Sampling indicates that the distribution
is conceived in the context of repeated sampling from
a population
• The qualifier is often dropped to say the distribution
of a statistic

Statistic and Sampling Distribution
• In any given situation, we are often limited to one
sample and the corresponding single observed value
of a statistic
• However, over different samples the statistic varies
according to its sampling distribution
• The sampling distribution of a statistic is determined
- from the probability distribution f(x) that governs
the population
- sample size n

Central Limit Theorem
• Consider taking a sample of size n from a population
• The sampling distribution of the sample mean is the
distribution of the means of repeated samples of size
n from a population
• The central limit theorem states that as the sample
size increases,
 The shape of the distribution becomes a normal
distribution (this condition is typically considered
to be met when n is at least 30)
 The variance decreases by a factor of n

Sampling from a Normal Population
The distribution of sample means is normal for
any sample size.

z Formula for Sample Means
The distribution of sample means is normal for
any sample size.

Tyre Store Example
Suppose that the mean expenditure per customer at a
tyre store is $85.00, with a standard deviation of
$9.00. If a random sample of 40 customers is taken,
what is the probability that the sample average
expenditure per customer for this sample will be
$87.00 or more?
Solution: Because the sample size is greater than 30,
the central limit theorem can be used to state that the
sample mean is normally distributed and the problem
can proceed using the normal distribution calculations.

Solution to Tyre Store Example

Graphic Solution to
Tyre Store Example
9
X

1

40
.5000

.5000

1 . 42

.4207

.4207
85

Z=

X-

87

85
9

n

40

87

2
1 . 42

X

0

1 . 41

Equal Areas
of .0793

1.41 Z

Demonstration Problem 7.1
Suppose that during any hour in a large department
store, the average number of shoppers is 448, with
a standard deviation of 21 shoppers. What is the
probability that a random sample of 49 different
shopping hours will yield a sample mean between
441 and 446 shoppers?

Graphic Solution for
X

1

3

.4901

.4901

.2486

.2415
441

446 448

.2486

.2415
X

-2.33

-.67 0

Z

Exercise in R:
Normal Distribution
The commands you will learn
• dnorm
• lines
• qqnorm
• qqline
• rnorm
• qqnormsim
• pnorm
• qnorm
Open URL: www.openintro.org
Go to Labs in R and select 3-Distributions

Exercise in R:
Sampling Distribution

Here you will learn Central Limit Theorem using the
sample() command
Open URL: www.openintro.org
Go to Labs in R and select 4A – Intro to inference

Z Formula
for Sample Proportions


If 10% of a population of parts is defective, what is the
probability of randomly selecting 80 parts and finding
that 12 or more parts are defective?

Solution for

Graphic Solution for

p

1

0 . 0335
.5000

.5000

.4319

^
0.15 p

0.10

Z =

ˆ
p

.4319

p

0 . 15

0

0 . 10

p q

(. 10 )(. 90 )

n

80

0 . 05
0 . 0335

1.49 Z

1 . 49

Statr sessions 11 to 12

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Statr sessions 11 to 12

Ähnlich wie Statr sessions 11 to 12 (20)

Mehr von Ruru Chowdhury

Mehr von Ruru Chowdhury (19)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Statr sessions 11 to 12