Descriptive statistics

DESCRIPTIVE STATISTICS
CENTRAL TENDENCY AND VARIABILITY
DESMOND AYIM-ABOAGYE, PHD

IMPORTANCE OF CENTRAL TENDENCY
• The goal of using a measure of central tendency is to identify a
numerical value that is the most representative one within a
distribution or set of data.
• Central tendency and variability are inextricably linked– in
order to make sense of one you must know the other.

CENTRAL TENDENCY
• Measures of central tendency are descriptive statistics that
identify the central location of a sample of data. The central
tendency of a data set is the best single indicator describing
the representative value (s) of any sample.

MEAN
• The mean is by far the single most useful measure of central
tendency available, and it can be used to analyze data from
interval or ratio scales of measurement.
• The study of means in psychology literature is extensive
(ANOVA, Analysis of Variance).

MEAN DEFINITION
• The mean is the arithmetic average of a set of scores. The mean is calculated by summing the set of
scores and dividing by the N of scores.
• CALCULATING THE MEAN FOR A SAMPLE OF DATA (FORMULA)
• ¯X = Σ𝑥/N =
• ¯X = 𝑥1 + 𝑥2 + 𝑥3 + … + 𝑥N/N
• ∑ = Total (from Greek word sigma)
• ¯X = Mean
• 𝑥 = scores
• N = Sample

MEAN OF A POPULATION
• CALCULATING THE MEAN FOR A POPULATION (FORMULA)
• 𝜇 = ∑x/N

Sample Statistic Pop Parameter Unbiased Estimate of
Population Parameters
Mean
Variance
Standard deviation
Pearson correlation coef.
¯X
𝑠2
S
r
𝜇
𝜎
2
𝜎
𝜌
-
ᵔs 2
ᵔs
-
Table 4.1. Some Symbols for Sample Statistics, Population Parameters, and Unbiased
Estimates of Population Parameters.

x f fx
8
7
6
5
4
3
2
1
0
3
5
6
8
10
4
2
5
1
24
35
36
40
40
12
4
5
0
N = 44 ∑fx = 196
¯X = Σ𝑓𝑥/N
¯X = 196/44
¯X = 4.4545 ≅ 4.45

THE MEDIAN
• The Median can be used for calculations involving ordinal,
interval, or ratio scale data.
• The Median is a number or score that precisely divides a
distribution of data in half. Fifty percent of a distribution’s
observations will fall above the median and fifty percent will fall
below it.

26 32 21 12 15 11 27 16 18 21 19 28 10 13 31
These are 15 scores to consider:
10 11 12 13 15 16 18 19 21 21 26 27 28 31 32
To calculate the median, arrange the scores from the lowest
to the highest:
19 because 7
scores appear on
either side of this
median score

FORMULA FOR EVEN NUMBERS
Median score = N + 1
2
15 +1 / 2 =
8
10 11 12 13 15 16 18 19 21 21
26 27 28 31 32

When you have odd number of
scores that splits the distribution
into two halves. Formula
55 67 78 83 88 92 98 99
(83 + 88)/2 = 171/2 = 85.5

THE MODE
• The third and final measure of central tendency is the mode.
• The mode is the most frequently occurring observation within a
distribution.
• The mode of a distribution is the score or category that has the
highest frequency.

10 11 12 13 15 16 18 19 21 21 26 27 27 27 27 31 31 45
The mode is 27

Y
X
Y
X
Y
X
Normal distribution
Positive skew
Negative skew
mdn
Xm
d
n
m
o
d
e
mode
X m
d
n
Figure 4. Relation of Measures of Central
Tendency to the Shape of Distributions

VARIABILITY
• Think in terms of variability:
• A) Are scores clustered close together?
• B) Right on top of one another?,
• C) Spread very far apart?
• Clustering, Spreading, Dispersion – synonym for the concept of
variability

VARIABILITY DEFINITION
• Variability refers to the degree to which sample or population
observations differ or deviate from the distribution’s measures of
central tendency.
• The clustering, spread, or dispersion of data emphasize the relative
amount of variability present in a distribution.
• The shape of a distribution, its skewness or kurtosis, often
characterizes its variability.

FACTORS AFFECTING VARIABILITY
• 1. Sample Size
• 2. Selection Process
• 3. Sample Characteristics
• 4. Independent Variables
• 5. Dependent measures
• 6. Passage of time between the presentation of the independent
variable and dependent measure.

THE RANGE
• The most basic index of variability is the range. It can be crude
range or even the simple range.
• The range is the difference between the highest and the lowest
score in a distribution.
Formula: range = X high
− Xlow

RANGE CALCULATION
• If the lowest observation in a
sample of data were 20 and
the highest was 75, then the
range would be 55, or:
• range = X high − X low
75 − 20 = 55.

RANGE AND ERROR CATCHING
• Checking the ranges of variables is a good way to catch errors
in a data set.
• If a given measure, say a 7 point scale, is used and the range is
found to be a 10 (e.g., a high score of 11 minus a low score of
1), some error occurred; logically, this range cannot exceed 6
(i.e., 7 – 1).

VARIANCE AND STANDARD DEVIATION
• While the Mean is the chief measure of central tendency.
• The standard deviation is the choice measure of dispersion.

VARIANCE
• Variance is equal to the average of the squared deviations from
the mean of a distribution. Symbolically, sample variance is
𝑠2 and the population variance 𝜎2
• Variance is the numerical index of variability, being based on
the average of the squared deviations from the mean of a data
set.

DEVIATION OF THE MEAN?
∑ (x - X‾) = 0 [ you will always get 0]
[ when squared and summed, mean deviation scores yield positive numbers
that are useful in a variety of statistical analyses]
• The sum of squares (ss) is the sum of the squared deviations from the mean
of a distribution. The ss is fundamental to descriptive and inferential
statistics.
SS = ∑ (X - X‾)
2

SUM OF SQUARES (SS)
SS = ∑X
2
– (∑X) 2
/N
Computational Formula

Greater variability means less
consistency (larger deviations
between X and X‾) in behavior, whereas
less variability leads to greater consistency
(smaller deviations between X and X‾)

SUM OF THE SQUARES
• Influence
• A) The size of the SS, then, is influenced by the magnitude of the
deviation scores around the mean.
• B) A second factor, the number of observations available, also plays a
role in the SS’s size.
• C) The more scores available in a distribution, the larger the SS will
be, especially if some of the scores diverge greatly from the mean.

SAMPLE VARIANCE AND STANDARD
DEVIATION
𝑠2 = ∑ (X - X‾)
2
/N
𝑠2 = SS/N
Due to its derivation from ss, the 𝑠2
can be represented by
Sample variance 𝑠2

STANDARD DEVIATION
The standard deviation is the average deviation between an observed score and
the mean of a distribution. Standard deviation, symbolized s, is determined by
taking the square root of the variance, or √𝑠2
= s
S = √𝑠2

STANDARD DEVIATION
The standard deviation is the most
common and, arguably, the best single
index of a distribution’s variability.
Low dispersion = homogeneous
observations; high dispersion =
heterogeneous observations.

24 26 28 30 32 34 36
Y
X
A homogeneous Distribution
A 30 2.0
B 72 5.0
Distrib. Mean SD
62 67 72 77 82
Y
X
A heterogeneous Distribution
Figure 4.3 Comparing Distributions: Homogeneity and
Heterogeneity

POPULATION VARIANCE AND STANDARD
DEVIATION
𝜎2
= ∑ (X - 𝜇)
2
/N
Or
𝜎2
= SS/N
The population variance is the sum of the squared deviations between all observations (X) in the population and
the mean of the population ( 𝜇) , which is divided by the total number of available observations.

Descriptive statistics

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Descriptive statistics

Ähnlich wie Descriptive statistics (20)

Mehr von Regent University

Mehr von Regent University (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Descriptive statistics