2. Basic Statistical Concepts
Population
The specific group of individuals or individual objects or events to
be studied; the total group to which we make projections and
inferences (hence, the term âinferential statisticsâ)
Sample
A subset of individual elements from a population which is
examined and from which we draw conclusions about the
population as a whole
Random Sample
A sample selected in such a way that every individual member of
the population has an equal chance of being selected
3. Basic Statistical Concepts
Bias
In statistics, bias is bad, nasty, evil. In research, it is simply an
obstacle. Bias is the systematic favoritism that is present in the
data collection process which may result in skewed or misleading
results.
Sources of Bias
Sample selection: Non-random samples may be biased.
Data collection: The way questions are asked, as well as the
processing and handling of data may create bias.
Bias is often referred to as Error
4. Basic Statistical Concepts
Data
The actual measurements obtained through a study or procedure
(âdataâ is plural, the singular is âdatumâ)
Types of Data
Numerical: Measurements for which the numbers have value such
as height and weight. Something which has quantity (hence,
âquantitative dataâ)
Categorical: Observations of categories such as gender or race.
Numbers may be used to label categories but there is no
relationship between the number and its value (e.g., 1 = male and 2
= female)
5. Basic Statistical Concepts
Statistic
A number that summarizes the data collected from a sample.
Some examples include frequencies, percentages, percentiles, and
averages.
Parameter
Statistics are based on sample data. If the summary number is
from the entire population then it is a parameter. A study that
obtains data from an entire population and is summarized by using
parameters is a census.
6. Basic Statistical Concepts
Mean
The average or middle of a data set obtained by summing all the
values in the data set and dividing by the total number of values.
Also called the arithmetic mean; different calculations are used to
create a geometric mean or a harmonic mean; these are not
generally used in social and market research.
Median
When the data values are lined up in order from smallest to largest,
the median is the middle value, the point where half of the values
are above the median and half are below.
Mode
When data are grouped into categories, the mode is the largest
category, based on the number of individuals in the category.
7. Basic Statistical Concepts
Mean vs Median (vs Mode)
x = â.
x
The mean is calculated with the formula n
The median is the middle value in an ordered distribution.
Consider these data:
40 38
Car Prices N of
35
Whatâs âaverageâ?
Models
$15K 38 30
$20K 16 25
$25K 11 20
16
15
14
$30K 6 11
10
9
$35K 3 6
5 3 2
$40K 2 1 0 0 0
0
$45K 1 $15K $20K $25K $30K $35K $40K $45K $50K $55K $60K $65K $70K
$65K 9
$70K 14
The mean is $31,400 (n = 100) and the
median is $20,000. The mode is the
most frequent category, $15,000.
8. Basic Statistical Concepts
Variation
Not every score is the same. There are different prices for different
cars. Some people pay different prices for the same car. Prices
change over time. As the previous exhibit shows, measures of
central tendency are not sufficient to describe the distribution (and
variability) of scores.
Standard Deviation
â ( x â. x )
2
The formula for standard deviation is s=
n â1
The standard deviation tells you whether the scores are tightly
grouped or widely distributed. Two data sets can have the same
mean but have very different distributions. In the previous bi-modal
example, the standard deviation is $21,119, indicating a high degree
of variation. Note that is the variance.2
s
9. Basic Statistical Concepts
Normal Distribution
The normal distribution is described graphically by the bell-shaped
curve. As the number of values in a distribution grows large, there
is a tendency in many situations for the largest group of individuals
to cluster in the middle of the distribution with successively fewer
individuals as the values move out to the tails or ends of the
distribution. Due to symmetry, the mean and median are equal and
in the middle of the distribution.
10. Basic Statistical Concepts
Normal Distribution (continued)
The normal distribution is the starting point for understanding
variability. With a normal distribution, standard deviation has
special significance. It is the distance from the mean to the saddle
point or point where the curvature changes from concave up to
concave down. At
this point, about 68% of
the values lie within one
standard deviation (this
is know as the empirical
rule). 95% of the values
will fall within two
standard deviations and
99.7% will fall within
three standard
deviations.
11. Basic Statistical Concepts
Normal Distribution The difference in variability is clear
300 when two normal distributions with
250
252
x = 50
the same mean are shown on the
200
210 210
s = 1.6 same scale
150 300
120 120
100 250
45 45 200
50
10 10
1 1
0 150
45 46 47 48 49 50 51 52 53 54 55
30
100
25
50
x = 50
20 s = 16 0
0 10 20 30 40 50 60 70 80 90 100
15
10
5
0
0 10 20 30 40 50 60 70 80 90 100
12. Basic Statistical Concepts
Standard Scores
( x â x)
The formula for a standard score is s
, where x is the original
score and s is the standard deviation.
Among other things, a standard score allows for comparisons when
means and distributions may be different for the scores being
compared. The standard score gives the relative standing of the
original score taking into account the mean and the variation in the
distribution. Standard scores are used in statements like, âSales at
the Troy store are +2 standard deviations (above the mean).â
Knowing that a score is above or below the mean and that it is 2, 3,
or more standard deviations identifies the scores position relative
to all other scores both in terms of direction (from the mean) and
how extreme the score is given how other scores are distributed.
13. Basic Statistical Concepts
Standard Error
Standard error is the same basic concept as standard deviation,
both represent a typical distance from the mean. The difference is
that the original population values will deviate from each other due
to natural phenomena (different height, different ideas, different
characteristics). But standard error is the deviation of the sample
means (from multiple samples of the population).
Sample means vary due to the error that occurs from not doing a
census (hence, âstandard errorâ). According to the central limit
theorem if the samples are large enough the distribution of all
possible sample means will have a bell-shaped or normal
distribution. Error above and below the mean cancels out and the
distribution is symmetrical.
Ï
n is
the standard error, where Ï
is the population standard
deviation.