Summary statistics

Introduction
Summary Statistics

Definition
Summary statistics are used to summarize a
set of observations in order to
communicate as much as information
about the data as possible. It is part of
descriptive statistics and are used to
basically summarize or describe a set of
observations.
Rupak Roy

Example
The weight of the population are
45 kg
57kg
72 kg
52 kg
Now what we want here is the summary of
weight of the population , we can say it is the
average weight of the population is 56.5 kg and
now we can describe the population in the
simplest way as possible.
Rupak Roy

Types
Summary statistics
Measures of Central
Tendency
1 . Mean
2 . Median
3 . Mode
5 . Geometric Mean
Measures of
Dispersion
1. Standard
Deviation
2. Variance
3. Interquartile
Range
Others
1. Co efficient
2. Skewness
3. Kurtosis
4. Probability
Distributions.
5. Distribution plot
Rupak Roy

Definition
 Measures of central tendency : is the value that describes
which group of data clusters around a central value. In
simple words , it is a way to describe the center of a data
set. Again what is center of data ? A single number that
summarizes the entire dataset using techniques such as
mean/average or median of the dataset.
 Measures of Dispersion: “dispersion (also
called variability, scatter, or spread) is the extent to which
a distribution of data is stretched or squeezed.”
Here in the graph we can see the
distribution of data (assume population)
is more stretched at the right side
ranging from 50 to 80

Measures of Central Tendency
1. Mean : is the average of observations. Most effective
when data is not heavily skewed.
2. Median: represents the middle value of the dataset.
Useful for skewed data.
We will talk about skewed data in the upcoming
slides.
3. Mode: means max no of times the data has occurred.
4. Geometric mean: nth root of a product of n numbers.
It is used when we want to get the average rate of the
event and the event rate is determined by multiplication.
For example growth of a bank account per year in a
ABC bank is calculated by geometric mean since the
growth event rate is determined by multiplying the
amount of a bank account by the percentage of growth.
then we use geometric mean.
Rupak Roy

 Formula for calculating Geometric Mean
GM =
example: Geometric Mean of 23,56,66 ?
3 23 * 56*66
3 85008 = 43.9696761which means 3times of 43.9696761
is 85008
Note:
if one of the observation in the event is zero , Geometric
Mean becomes Zero and also it doesn’t works with
negative numbers like -1 , -4 , -5 and so on.
Rupak Roy

Calculation of Mode ; <- Delta
For ungrouped data = Max no of items
Example : 23,45,76,33,54,33,76,33 Therefore Mode = 33
For grouped data = = {(L + Delta 1) / Delta 1+Detal2 } * i
Where Delta 1 = f1 +f0
and Delta 2 = f1- f2
Nowadays, we don’t have to worry about the calculation, as in
any statistical software's like R, excel it will automatically calculate
the intense calculation for large amount of data but
for more in-depth information you can visit this website.
https://www.mathsisfun.com/data/frequency-grouped-mean-median-mode.html

Measures of dispersion
Standard Deviation is basically a measure of how near or far the
observations are from the mean.
Variance: the fact or quality of being different , divergent or
inconsistent. A value of zero means that there is no variability , all the
values in the data set are the same.
Interquartile Range: is a measure of variability ,
by dividing a data set into parts that is quartiles .
Say
Q1 is the middle value in the first half of the data set.
Q2 is the median value .
Q3 is the middle value in the second half of the
rank-ordered data set.
There interquartile range = Q3 – Q1

Skewness – refers to the lack of symmetry or imbalance in data
distribution.
In a symmetric distribution the data is
normally distributed where mean,
median, mode is at the same point.
However in real life data is never perfectly
distributed, hence we call it skewed data.
If the Left side has longer tail then the mass
distribution of data is concentrated on the right
side which is known as negatively skewed.

If the Right side has longer tail then the
mass distribution of data is
concentrated on the left side is
known as positive skewed.
Here is the summary of all the skewness as shown in the figure below.

Example (skewed data)
Temp(*c)
10
40
35
33
35
Mean = 153/5 = 30.6, if we apply mean is 30.6
which is incorrect since we can see maximum
number of values are above 35.
So we have to use median For Ungrouped
data ((n+1)/2)th
That will be ((5+1)/2)th = 6/2 = 3
i.e. 3th term ie 35.
For grouped data:
where L, lower class boundary of the group containing the group.
B, Cumulative frequency of the groups
G , Frequency of the median group
W , width/Range of the group
Again, we don’t have to worry about the calculation, as in any statistical software's like R
, excel it will automatically calculate the intense calculation for large amount of data
but for more in-depth information you can visit this website.
https://www.mathsisfun.com/data/frequency-grouped-mean-median-mode.html

Kurtosis : is a measure of whether data are peaked or flat relative to
normal distribution
(+) Leptokurtic
(-) PlatyKurtic
(0) Meskurtic
(+) Leptokurtic
This means the distribution is more clustered near the mean and has a
relativity less standard deviation
(-) PlatyKurtic
Where the distribution is less clustered around the mean and a standard
deviation more then Leptokurtic
(0) Meskurtic is typically measured with respect to the normal
distribution. Meskurtic has tails similar to normal distribution i.e neither
high nor low, rather it is consider to be a baseline for the other two’s.

 Now how to check the data is skewed or not
in Excel:
=skew(select the range of values/numbers)
=skew(10.24,9.48……….-0.42,-0.95)
= - 0.27 means Negatively skewed.
And to check the Kurtosis in Excel
=kurt(select the values/numbers)
=kurt(10.24,9.48……….-0.42,-0.95)
= -1.6 means it is PlatyKurtic

Recap
What we have learned ?
Measures of central tendency,
Measures of dispersion,
Measure of risk,
Next we will see how to compute this theory in
practical and analyze any data using our
everyday simple tools like Excel.
Rupak Roy

Summary statistics

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Summary statistics

Ähnlich wie Summary statistics (20)

Mehr von Rupak Roy

Mehr von Rupak Roy (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Summary statistics