6.describing a distribution

DESCRIBING A DISTRIBUTION
• A good way to describe the distribution of a
quantitative variable is to take the following
three steps:
o Report the center of the distribution. [Measures of
Central Tendency]
o Report any significant deviations from the center.
[Measures of Variation]
o Report the general shape of the distribution.
[Measures of Skewness and Peakedness]
1

MEASURES OF CENTRAL TENDENCY
CENTER OF DISTRIBUTION
2

AVERAGE
The central tendency is measured by averages.
These describe the point about which the various
observed values cluster.
In mathematics, an average, or central tendency of
a data set refers to a measure of the "middle" or
"expected" value of the data set.
An average is a single value which considered as
the most representative or a typical value for a
given set of data.
Objectives of averaging
o To get one single value that describes the characteristics
of the entire data.
o To facilitate comparison
3

CHARACTERISTICS OF A GOOD
AVERAGE
Easy to understand
Simple to compute
Based all observations
Capable of further algebraic treatment
Should not be unduly affected by the presence of
extreme values
4

TOOLS TO COMPUTE THE AVERAGE
 Mean
 Median
 Mode
5

MEAN
It is commonly used measure of central tendency.
The mean is obtained by adding together all
observations and by dividing the total by the number
of observations.
The mean, in most cases, is not an actual data value.
6

CALCULATION OF AVERAGE DEVIATION
For ungrouped series
For grouped series
o Direct Method
o Short cut method
 Where,
o = Mean
o X=observation
o N = Number of Observations
o A = Assumed mean
o i = Class interval
X
N
X
X
∑=
N
fX
X
∑=
ix
N
fd
AX
∑+=

MATHEMATICAL PROPERTIES OF MEAN
The algebraic sum of the deviation of all the
observations from mean is always zero.
The sum of squared deviation of all the observation
from mean is minimum i.e. less than the squared
deviation of all observations from any other value
than the mean.
If we have the mean and number of observations
of two or more than two related groups.
8
.....NN
....XNXN
XMeanCombined
21
2211
12
++
++
==

MERITS AND DEMERITS OF MEAN
Merits
o It possesses the first four out of five characteristics of a
good average.
Demerits
o Mean is unduly affected by the presence of extreme values.
o In continuous series, it is difficult to compute mean without
making assumption of mid point of the class.
o Applicable for only quantitative data.
o Some times mean may not be an observation in data.
9

MEDIAN
Median is the measure of central tendency which
appears in the ‘middle’ of an ordered sequence
(either in ascending or descending order) of values.
It divides whole data into two equal parts. In other
words, 50% of the observations are smaller than the
median and 50% will larger than it
10

CALCULATION OF MEDIAN
 Individual Series
M = Size of the
item
 Discrete Series
M = Size of the
item
 Continuous Series
M = Size of the
item
Where,
o M= Median
o N = Number of
Observations
o L = Lower limit of median
class
o cf = Cumulative
frequency of the class
preceding median class
o f = frequency of median
class observation
th
2
1N





 +
th
2
N






th
2
N






C
cf
N
LM ×
−
+=
f
2
1

MERITS AND DEMERITS OF MEDIAN
Merits
o It is especially useful in case of continuous series because
mid point is not used for calculation.
o It is not influenced by presence of extreme values.
o Applicable for quantitative and qualitative data.
Demerits
o Not based on every observations
o Not capable of algebraic treatment
o Tends to be rather unstable value if the number of
observations is small.
12

MODE
• Mode is defined as that value which occurs the
maximum number of times i.e. having the maximum
frequency
• A data set can have more than one mode.
• A data set is said to have no modeno mode if all values occur
with equal frequency.
13

CALCULATION OF MODE
Individual Series
Z = The item which repeated
more number of times
Discrete Series
Z = The item which repeated
more number of times i.e
higher frequency
Continuous Series
Where,
o Z= Mode
o L = Lower limit of median
class
o f1= Frequency of modal
class
o fo = Frequency of the
class preceding the
modal class
o f2= Frequency of the class
succeeding the modal
class
i
fff
ff
LZ ×
−−
−
+=
201
01
2

MERITS AND DEMERITS OF MODE
Merits
o Not affected by extreme values
o Applicable for quantitative and qualitative data.
o Can be obtained in continuous series without assuming
the mid point.
Demerits
o Limited utility compared to mean and median
o Mode can not be determined if modal class is at the
extreme.
o Difficult to compute in case of bimodal distribution
mode
o Possibilities of ‘no mode distribution’
15

GENERAL LIMITATION OF AN AVERAGE
Since an average is a single value representing a
group values, it must be properly interpreted,
otherwise, there is every possibility of jumping to
wrong conclusion.
An average may give us a value does not exit in the
data.
Some time an average may give absurd result.
Measure of central value fail to give us any idea
about the formation of the series. Two or more
series may have the same central value but may
differ widely in composition.
16

three steps:
Central Tendency]
17

MEASURE OF DISPERSION/VARIATION
DEVIATIONS FROM THE CENTER

TODAY’S QUESTION
Two classes took a recent quiz. There were 10
students in each class, and Their scores are as
follows
Each class had an average score of 81.5
Since the averages are the same, can we assume
that the students in both classes all did pretty
much the same on the exam?
The answer is… No.
The average (mean) does not tell us anything about
the distribution or variation in the grades.
19
Class A 72 76 80 80 81 83 84 85 85 89
Class B 57 65 83 94 95 96 98 93 71 63

TODAY’S QUESTION
So, we need to come up with some way of
measuring not just the average, but also the
spread of the distribution of our data. i.e.
variation or dispersion
Variation/dispersion means how spread out are
the scores around the mean.
If many observations “bunched up” around the
mean which indicates narrowly spread and
otherwise widely spread.
If the distribution is narrowly spread the better
your ability to make accurate predictions.
21

MEASURE OF VARIATION
A measure of variation/dispersion is designed to
state the extent to which the individual
observation differ from mean.
The measure of variation gives the degree of
variation i.e. amount of variation.
22

SIGNIFICANCE OF MEASURING
VARIATION
To determine the reliability of an average
To compare two or more series with regard to their
variability
To facilitate the use of other statistical measures
23

HOW CAN WE QUANTIFY DISPERSION?
The mean deviation
The standard deviation
24

COEFFICIENT OF VARIATION (CV)
All the tools of measurement of variation quantify the
variation/deviation. The CV indicates the degree of
variation in a scale of 0 to 1.
CV is a measure of relative variability used to:
o measure changes that have occurred in a population over
time
o compare variability of two populations that are expressed
in different units of measurement
o expressed as a fraction rather than in terms of the units of
the particular data
o Always lies between 0 to 1
o If CV is near to 0, then the degree of variation less and near
to 1, then degree variation is high.
25

RANGE
 Range is defined as difference between the value of smallest
observation and largest observation in the distribution.
 Range = L-S
 Coefficient of Range =
 Useful for: daily temperature fluctuations or share price
movement
 Is considered primitive as it considers only the extreme
values which may not be useful indicators of the bulk of the
population.
 An outlieroutlier is an extremely high or an extremely low
data value when compared with the rest of the data
values.
26
SL
S-L
+

MERITS AND DEMERITS OF RANGE
Merits
o Simple to understand and easy to compute
o Less time consuming
Demerits
o Not based on each and every observation of the
distribution
o Can not be calculated in case of open end distribution
o Fails to reveal the character of the distribution
27

INTERQUARTILE RANGE OR QUARTILE
DEVIATION
Measures the range of the middle 50% of the values
only
Is defined as the difference between the upper and
lower quartiles
Interquartile range = Q3-Q1
Quartile Deviation =
Coefficient of Q.D. =
28
2
Q-Q
Q.D. 13
=
13
13
QQ
Q-Q
+

MERITS AND DEMERITS OF QD
Merits
o Superior than range
o Can be calculated for open end classes also
o Not affected by the presence of extreme values
Demerits
o Considers only 50% of the observations
o Not capable of mathematical manipulation
o Does not show the scatter around an average
29

AVERAGE DEVIATION
Average deviation is obtained by calculating the
absolute deviations of each observation from mean
or median and then averaging these deviations by
taking their arithmetic mean.
Measures the ‘average’ distance of each
observation away from the mean of the data
Gives an equal weight to each observation
Generally more sensitive than the range or
interquartile range, since a change in any value will
affect it
30

 For ungrouped series
 For grouped series
 Coefficient of Average
Deviation
 Where
o AD = Average Deviation
o
o = Mean
o f = Frequency of observation
N
d
AD
∑=
2
N
fd
AD
∑=
2
X
AD
X-Xd =
X

MERITS AND DEMERITS OF AD
Merits
o Relatively simple to calculate.
o Based on each and every observations of the data
o Less affected by the values of extreme observations
o Since deviations are taken from central value, comparison
about formation of different distributions can easily be
made.
Demerits
o Algebraic sign are ignored
o May not give accurate result
32

STANDARD DEVIATION
Most popular tool of measure of variation.
It is introduced by Karl Pearson in 1893.
It is the square root of the means of square
deviations from the arithmetic mean.
Measures the variation of observations from the
mean
Works with squares of residuals not absolute
values
If the Standard Deviation is large, it means the
observations are spread out from their mean.
If the Standard Deviation is small, it means the
observations are close to their mean.
33

 For ungrouped series
 For grouped series
o Direct Method
o Short cut method
 Coefficient of Average
Deviation
 Where
o = Standard Deviation
o
o = Mean
o f = Frequency of observation
X-Xd =
X
N
d∑=
2
σ σ
N
fd∑=
2
σ
N
fd
N
fd ∑∑ −=
22
σ
100×=
X
σ

MATHEMATICAL PROPERTIES OF STANDARD
DEVIATION
 Combined Standard Deviation
 Standard Deviation of natural numbers
 The sum of the squares of the deviations of all the
observations from their arithmetic mean is minimum.
 Standard Deviation is independent of change of origin but
not scale.
35
21
2
22
2
11
2
22
2
11
12
NN
dNdNNN
+
+++
=
∑ σσ
σ
)1(
12
1 2
−= Nσ

MERITS AND DEMERITS OF STANDARD
DEVIATION
Merits
o Based on every item of the distribution
o Possible to calculate the combined standard deviation
o For comparing the variability of two or more distribution
coefficient of variation is considered to be most
appropriate
o It is used most prominently used in further statistical
work.
Demerits
o Compare to others it is difficult to compute
o It gives more weight to extreme values and less to those
which near to mean.
36

three steps:
Central Tendency]
37

MEASURES OF SKEWNESS AND PEAKEDNESS
SHAPE OF THE DISTRIBUTION
38

DISTRIBUTION OF DATA
 Data can be "distributed" (spread out) in different ways.
39
spread out more on the left spread out more on the right
all jumbled up
around a central value with no bias left or right

NORMAL DISTRIBUTION CURVE [BELL SHAPED
CURVE]
40

CHARACTERISTICS OF THE NORMAL
DISTRIBUTION
• The normal distribution curve is bell-shaped.
• It is symmetrical about mean-50% observations are to one
side of the center; the other 50% observations on the other
side.
• The curve never touches the X-axis
• The height of the normal curve is at its maximum at the
mean.
• The distribution is single peaked, not bimodal or multi-
modal
• Most of the cases will fall in the center portion of the curve
and as values of the variable become more extreme they
become less frequent, with “outliers” at each of the “tails”
of the distribution few in number.
• The Mean, Median, and Mode are the same.
41

NORMAL DISTRIBUTION & OTHER
TOOLS
 Symmetrical distribution and Mean/Median/Mode
o Mode= 3 Median-2 Mean
 Symmetrical distribution and standard deviation
o covers 68.27% observations
42
σ1±X
σ2±X
σ3±X

SKEWNESS
• The term skewness refers to lack of symmetry or departure
from symmetry. When a distribution is not symmetrical it is
called as skewed distribution.
• In a symmetrical distribution, the values of mean, median and
mode are alike.
• If the value of mean is greater than the mode, skewness is said
to be positive. A positive skewed distribution contains some
values that are much larger than the majority of observations.
• If the value of mode is greater than mean, skewness is said to
be negative. A negative skewed distribution contains some
values that are much smaller than the majority of
observations.
• It is important to emphasize that skewness of a distribution
cannot be determined simply by inspection.
• Points to be remember-Zero skewness does not mean that
distribution is normal distribution! [A normal distribution
should have skewness as zero and peakedness as 3.]
43

SKEWNESS
If Mean = Mode, the skewness is zero.
If Mean > Mode, the skewness is positive.
If Mean < Mode, the skewness is negative.
44

MEASURES OF SKEWNESS
Karl Pearson’s Coefficient of
Skewness
Bowley’s Coefficient of
skewness
σ
Mode)-(Mean
Skp =
13
13
B
QQ
2Median-QQ
Sk
−
+
=

COEFFICIENT OF SKEWNESS
Coefficient of skewness measures the degree of
skewness and always lies between +1 to -1.
If the answer is 0, indicates symmetrical distribution
If the answer is negative, then the distribution is
negatively skewed.
o If the answer is close to -1 (say -0.90), then the distribution
is highly negatively skewed.
o If the answer is close to 0 (say -0.20), then the distribution
is slightly negatively skewed.
If the answer is positive, then the distribution is
negatively skewed.
o If the answer is close to 1 (say 0.90), then the distribution
is highly positively skewed.
o If the answer is close to 0 (say 0.20), then the distribution
is slightly negatively skewed.
48

A PROBLEM
 Following data is related to marks scored by three
different sections in statistics.
 Compute the Mean, Median, Mode, Standard
deviation, skewness and interpret the results.
Mark
s
0-10 10-
20
20-
30
30-
40
40-
50
50-
60
60-
70
Num
ber
of
Stud
ents
Sec
A
3 5 11 22 11 5 3
Sec
B
6 15 20 10 5 3 1
Sec
C
1 3 5 10 20 15 6

SECTION A
Marks X f cf fX d fd fd2
0-10 5 3 3 15 -30 -90 2700
10-20 15 5 8 75 -20 -100 2000
20-30 25 11 19 275 -10 -110 1100
30-40 35 22 41 770 0 0 0
40-50 45 11 52 495 10 110 1100
50-60 55 5 57 275 20 100 2000
60-70 65 3 60 195 30 90 2700
60 2100 0 0 10800
MEAN 35
MEDIAN 35
MODE 35
SD 13.9
SKEWNESS 0
50

SECTION B
0-10 5 6 6 30 -21 -126 2646
10-20 15 15 21 225 -11 -165 1815
20-30 25 20 41 500 -1 -20 20
30-40 35 10 51 350 9 90 810
40-50 45 5 56 225 19 95 1805
50-60 55 3 59 165 29 87 2523
60-70 65 1 60 65 39 39 1521
60 1560 63 0 11140
MEAN 26.00
MEDIAN 24.50
MODE 23.33
SD 13.63
SKEWNESS 0.11
51

SECTION C
0-10 5 1 1 5 -39 -39 1521
10-20 15 3 4 45 -29 -87 2523
20-30 25 5 9 125 -19 -95 1805
30-40 35 10 19 350 -9 -90 810
40-50 45 20 39 900 1 20 20
50-60 55 15 54 825 11 165 1815
60-70 65 6 60 390 21 65 2646
60 2640 -63 0 16000
MEAN 44.00
MEDIAN 45.50
MODE 46.67
SD
13.6
3
SKEWNESS -0.11
52

GRAPHS OF SECTION A, B AND C
.
53

6.describing a distribution

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie 6.describing a distribution

Ähnlich wie 6.describing a distribution (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

6.describing a distribution