1. DESCRIBING A DISTRIBUTION
• A good way to describe the distribution of a
quantitative variable is to take the following
three steps:
o Report the center of the distribution. [Measures of
Central Tendency]
o Report any significant deviations from the center.
[Measures of Variation]
o Report the general shape of the distribution.
[Measures of Skewness and Peakedness]
1
3. AVERAGE
The central tendency is measured by averages.
These describe the point about which the various
observed values cluster.
In mathematics, an average, or central tendency of
a data set refers to a measure of the "middle" or
"expected" value of the data set.
An average is a single value which considered as
the most representative or a typical value for a
given set of data.
Objectives of averaging
o To get one single value that describes the characteristics
of the entire data.
o To facilitate comparison
3
4. CHARACTERISTICS OF A GOOD
AVERAGE
Easy to understand
Simple to compute
Based all observations
Capable of further algebraic treatment
Should not be unduly affected by the presence of
extreme values
4
6. MEAN
It is commonly used measure of central tendency.
The mean is obtained by adding together all
observations and by dividing the total by the number
of observations.
The mean, in most cases, is not an actual data value.
6
7. CALCULATION OF AVERAGE DEVIATION
For ungrouped series
For grouped series
o Direct Method
o Short cut method
Where,
o = Mean
o X=observation
o N = Number of Observations
o A = Assumed mean
o i = Class interval
X
N
X
X
∑=
N
fX
X
∑=
ix
N
fd
AX
∑+=
8. MATHEMATICAL PROPERTIES OF MEAN
The algebraic sum of the deviation of all the
observations from mean is always zero.
The sum of squared deviation of all the observation
from mean is minimum i.e. less than the squared
deviation of all observations from any other value
than the mean.
If we have the mean and number of observations
of two or more than two related groups.
8
.....NN
....XNXN
XMeanCombined
21
2211
12
++
++
==
9. MERITS AND DEMERITS OF MEAN
Merits
o It possesses the first four out of five characteristics of a
good average.
Demerits
o Mean is unduly affected by the presence of extreme values.
o In continuous series, it is difficult to compute mean without
making assumption of mid point of the class.
o Applicable for only quantitative data.
o Some times mean may not be an observation in data.
9
10. MEDIAN
Median is the measure of central tendency which
appears in the ‘middle’ of an ordered sequence
(either in ascending or descending order) of values.
It divides whole data into two equal parts. In other
words, 50% of the observations are smaller than the
median and 50% will larger than it
10
11. CALCULATION OF MEDIAN
Individual Series
M = Size of the
item
Discrete Series
M = Size of the
item
Continuous Series
M = Size of the
item
Where,
o M= Median
o N = Number of
Observations
o L = Lower limit of median
class
o cf = Cumulative
frequency of the class
preceding median class
o f = frequency of median
class observation
o i = Class interval
th
2
1N
+
th
2
N
th
2
N
C
cf
N
LM ×
−
+=
f
2
1
12. MERITS AND DEMERITS OF MEDIAN
Merits
o It is especially useful in case of continuous series because
mid point is not used for calculation.
o It is not influenced by presence of extreme values.
o Applicable for quantitative and qualitative data.
Demerits
o Not based on every observations
o Not capable of algebraic treatment
o Tends to be rather unstable value if the number of
observations is small.
12
13. MODE
• Mode is defined as that value which occurs the
maximum number of times i.e. having the maximum
frequency
• A data set can have more than one mode.
• A data set is said to have no modeno mode if all values occur
with equal frequency.
13
14. CALCULATION OF MODE
Individual Series
Z = The item which repeated
more number of times
Discrete Series
Z = The item which repeated
more number of times i.e
higher frequency
Continuous Series
Where,
o Z= Mode
o L = Lower limit of median
class
o f1= Frequency of modal
class
o fo = Frequency of the
class preceding the
modal class
o f2= Frequency of the class
succeeding the modal
class
o i = Class interval
i
fff
ff
LZ ×
−−
−
+=
201
01
2
15. MERITS AND DEMERITS OF MODE
Merits
o Not affected by extreme values
o Applicable for quantitative and qualitative data.
o Can be obtained in continuous series without assuming
the mid point.
Demerits
o Limited utility compared to mean and median
o Mode can not be determined if modal class is at the
extreme.
o Difficult to compute in case of bimodal distribution
mode
o Possibilities of ‘no mode distribution’
15
16. GENERAL LIMITATION OF AN AVERAGE
Since an average is a single value representing a
group values, it must be properly interpreted,
otherwise, there is every possibility of jumping to
wrong conclusion.
An average may give us a value does not exit in the
data.
Some time an average may give absurd result.
Measure of central value fail to give us any idea
about the formation of the series. Two or more
series may have the same central value but may
differ widely in composition.
16
17. DESCRIBING A DISTRIBUTION
• A good way to describe the distribution of a
quantitative variable is to take the following
three steps:
o Report the center of the distribution. [Measures of
Central Tendency]
o Report any significant deviations from the center.
[Measures of Variation]
o Report the general shape of the distribution.
[Measures of Skewness and Peakedness]
17
19. TODAY’S QUESTION
Two classes took a recent quiz. There were 10
students in each class, and Their scores are as
follows
Each class had an average score of 81.5
Since the averages are the same, can we assume
that the students in both classes all did pretty
much the same on the exam?
The answer is… No.
The average (mean) does not tell us anything about
the distribution or variation in the grades.
19
Class A 72 76 80 80 81 83 84 85 85 89
Class B 57 65 83 94 95 96 98 93 71 63
21. TODAY’S QUESTION
So, we need to come up with some way of
measuring not just the average, but also the
spread of the distribution of our data. i.e.
variation or dispersion
Variation/dispersion means how spread out are
the scores around the mean.
If many observations “bunched up” around the
mean which indicates narrowly spread and
otherwise widely spread.
If the distribution is narrowly spread the better
your ability to make accurate predictions.
21
22. MEASURE OF VARIATION
A measure of variation/dispersion is designed to
state the extent to which the individual
observation differ from mean.
The measure of variation gives the degree of
variation i.e. amount of variation.
22
23. SIGNIFICANCE OF MEASURING
VARIATION
To determine the reliability of an average
To compare two or more series with regard to their
variability
To facilitate the use of other statistical measures
23
24. HOW CAN WE QUANTIFY DISPERSION?
The mean deviation
The standard deviation
24
25. COEFFICIENT OF VARIATION (CV)
All the tools of measurement of variation quantify the
variation/deviation. The CV indicates the degree of
variation in a scale of 0 to 1.
CV is a measure of relative variability used to:
o measure changes that have occurred in a population over
time
o compare variability of two populations that are expressed
in different units of measurement
o expressed as a fraction rather than in terms of the units of
the particular data
o Always lies between 0 to 1
o If CV is near to 0, then the degree of variation less and near
to 1, then degree variation is high.
25
26. RANGE
Range is defined as difference between the value of smallest
observation and largest observation in the distribution.
Range = L-S
Coefficient of Range =
Useful for: daily temperature fluctuations or share price
movement
Is considered primitive as it considers only the extreme
values which may not be useful indicators of the bulk of the
population.
An outlieroutlier is an extremely high or an extremely low
data value when compared with the rest of the data
values.
26
SL
S-L
+
27. MERITS AND DEMERITS OF RANGE
Merits
o Simple to understand and easy to compute
o Less time consuming
Demerits
o Not based on each and every observation of the
distribution
o Can not be calculated in case of open end distribution
o Fails to reveal the character of the distribution
27
28. INTERQUARTILE RANGE OR QUARTILE
DEVIATION
Measures the range of the middle 50% of the values
only
Is defined as the difference between the upper and
lower quartiles
Interquartile range = Q3-Q1
Quartile Deviation =
Coefficient of Q.D. =
28
2
Q-Q
Q.D. 13
=
13
13
QQ
Q-Q
+
29. MERITS AND DEMERITS OF QD
Merits
o Superior than range
o Can be calculated for open end classes also
o Not affected by the presence of extreme values
Demerits
o Considers only 50% of the observations
o Not capable of mathematical manipulation
o Does not show the scatter around an average
29
30. AVERAGE DEVIATION
Average deviation is obtained by calculating the
absolute deviations of each observation from mean
or median and then averaging these deviations by
taking their arithmetic mean.
Measures the ‘average’ distance of each
observation away from the mean of the data
Gives an equal weight to each observation
Generally more sensitive than the range or
interquartile range, since a change in any value will
affect it
30
31. CALCULATION OF AVERAGE DEVIATION
For ungrouped series
For grouped series
Coefficient of Average
Deviation
Where
o AD = Average Deviation
o
o = Mean
o f = Frequency of observation
N
d
AD
∑=
2
N
fd
AD
∑=
2
X
AD
X-Xd =
X
32. MERITS AND DEMERITS OF AD
Merits
o Relatively simple to calculate.
o Based on each and every observations of the data
o Less affected by the values of extreme observations
o Since deviations are taken from central value, comparison
about formation of different distributions can easily be
made.
Demerits
o Algebraic sign are ignored
o May not give accurate result
32
33. STANDARD DEVIATION
Most popular tool of measure of variation.
It is introduced by Karl Pearson in 1893.
It is the square root of the means of square
deviations from the arithmetic mean.
Measures the variation of observations from the
mean
Works with squares of residuals not absolute
values
If the Standard Deviation is large, it means the
observations are spread out from their mean.
If the Standard Deviation is small, it means the
observations are close to their mean.
33
34. CALCULATION OF AVERAGE DEVIATION
For ungrouped series
For grouped series
o Direct Method
o Short cut method
Coefficient of Average
Deviation
Where
o = Standard Deviation
o
o = Mean
o f = Frequency of observation
X-Xd =
X
N
d∑=
2
σ σ
N
fd∑=
2
σ
N
fd
N
fd ∑∑ −=
22
σ
100×=
X
σ
35. MATHEMATICAL PROPERTIES OF STANDARD
DEVIATION
Combined Standard Deviation
Standard Deviation of natural numbers
The sum of the squares of the deviations of all the
observations from their arithmetic mean is minimum.
Standard Deviation is independent of change of origin but
not scale.
35
21
2
22
2
11
2
22
2
11
12
NN
dNdNNN
+
+++
=
∑ σσ
σ
)1(
12
1 2
−= Nσ
36. MERITS AND DEMERITS OF STANDARD
DEVIATION
Merits
o Based on every item of the distribution
o Possible to calculate the combined standard deviation
o For comparing the variability of two or more distribution
coefficient of variation is considered to be most
appropriate
o It is used most prominently used in further statistical
work.
Demerits
o Compare to others it is difficult to compute
o It gives more weight to extreme values and less to those
which near to mean.
36
37. DESCRIBING A DISTRIBUTION
• A good way to describe the distribution of a
quantitative variable is to take the following
three steps:
o Report the center of the distribution. [Measures of
Central Tendency]
o Report any significant deviations from the center.
[Measures of Variation]
o Report the general shape of the distribution.
[Measures of Skewness and Peakedness]
37
39. DISTRIBUTION OF DATA
Data can be "distributed" (spread out) in different ways.
39
spread out more on the left spread out more on the right
all jumbled up
around a central value with no bias left or right
41. CHARACTERISTICS OF THE NORMAL
DISTRIBUTION
• The normal distribution curve is bell-shaped.
• It is symmetrical about mean-50% observations are to one
side of the center; the other 50% observations on the other
side.
• The curve never touches the X-axis
• The height of the normal curve is at its maximum at the
mean.
• The distribution is single peaked, not bimodal or multi-
modal
• Most of the cases will fall in the center portion of the curve
and as values of the variable become more extreme they
become less frequent, with “outliers” at each of the “tails”
of the distribution few in number.
• The Mean, Median, and Mode are the same.
41
42. NORMAL DISTRIBUTION & OTHER
TOOLS
Symmetrical distribution and Mean/Median/Mode
o Mode= 3 Median-2 Mean
Symmetrical distribution and standard deviation
o covers 68.27% observations
o covers 95.45% observations
o covers 99.73% observations
42
σ1±X
σ2±X
σ3±X
43. SKEWNESS
• The term skewness refers to lack of symmetry or departure
from symmetry. When a distribution is not symmetrical it is
called as skewed distribution.
• In a symmetrical distribution, the values of mean, median and
mode are alike.
• If the value of mean is greater than the mode, skewness is said
to be positive. A positive skewed distribution contains some
values that are much larger than the majority of observations.
• If the value of mode is greater than mean, skewness is said to
be negative. A negative skewed distribution contains some
values that are much smaller than the majority of
observations.
• It is important to emphasize that skewness of a distribution
cannot be determined simply by inspection.
• Points to be remember-Zero skewness does not mean that
distribution is normal distribution! [A normal distribution
should have skewness as zero and peakedness as 3.]
43
44. SKEWNESS
If Mean = Mode, the skewness is zero.
If Mean > Mode, the skewness is positive.
If Mean < Mode, the skewness is negative.
44
47. MEASURES OF SKEWNESS
Karl Pearson’s Coefficient of
Skewness
Bowley’s Coefficient of
skewness
σ
Mode)-(Mean
Skp =
13
13
B
QQ
2Median-QQ
Sk
−
+
=
48. COEFFICIENT OF SKEWNESS
Coefficient of skewness measures the degree of
skewness and always lies between +1 to -1.
If the answer is 0, indicates symmetrical distribution
If the answer is negative, then the distribution is
negatively skewed.
o If the answer is close to -1 (say -0.90), then the distribution
is highly negatively skewed.
o If the answer is close to 0 (say -0.20), then the distribution
is slightly negatively skewed.
If the answer is positive, then the distribution is
negatively skewed.
o If the answer is close to 1 (say 0.90), then the distribution
is highly positively skewed.
o If the answer is close to 0 (say 0.20), then the distribution
is slightly negatively skewed.
48
49. A PROBLEM
Following data is related to marks scored by three
different sections in statistics.
Compute the Mean, Median, Mode, Standard
deviation, skewness and interpret the results.
Mark
s
0-10 10-
20
20-
30
30-
40
40-
50
50-
60
60-
70
Num
ber
of
Stud
ents
Sec
A
3 5 11 22 11 5 3
Sec
B
6 15 20 10 5 3 1
Sec
C
1 3 5 10 20 15 6