2. varsha Varde 2
Course Coverage
• Essential Basics for Business Executives
• Data Classification & Presentation Tools
• Preliminary Analysis & Interpretation of Data
• Correlation Model
• Regression Model
• Time Series Model
• Forecasting
• Uncertainty and Probability
• Sampling Techniques
• Estimation and Testing of Hypothesis
4. varsha Varde 4
Preliminary Analysis of Data
Central Tendency of the Data at Hand:
• Need to Size Up the Data At A Glance
• Find A Single Number to Summarize the
Huge Mass of Data Meaningfully:
Average
• Tools: Mode
Median
Arithmetic Mean
Weighted Average
5. varsha Varde 5
Mode, Median, and Mean
• Mode: Most Frequently Occurring Score
• Median: That Value of the Variable Above
Which Exactly Half of the Observations Lie
• Arithmetic Mean: Ratio of Sum of the
Values of A Variable to the Total Number
of Values
• Mode by Mere Observation, Median
needs Counting, Mean requires
Computation
7. varsha Varde 7
This Group
This Group of Participants:
Mode of age is years
Median is years,
Arithmetic Mean is years
8. varsha Varde 8
Arithmetic Mean - Example
Product Return on Investment (%)
A 10
B 30
C 5
D 20
Total 65
9. varsha Varde 9
Arithmetic Mean - Example
• Arithmetic Mean: 65 / 4 = 16.25 %
• Query: But, Are All Products of Equal
Importance to the Company?
• For Instance, What Are the Sales Volumes
of Each Product? Are They Identical?
• If Not, Arithmetic Mean Can Mislead.
10. varsha Varde 10
Weighted Average - Example
Product RoI Sales (Mn Rs) Weight RoI x W
A 10 400 0.20 2.00
B 30 200 0.10 3.00
C 5 900 0.45 2.25
D 20 500 0.25 5.00
Total 65 2000 1.00 12.25
Wt. Av.
11. varsha Varde 11
A Comparison
• Mode: Easiest, At A Glance, Crude
• Median: Disregards Magnitude of Obs.,
Only Counts Number of Observations
• Arithmetic Mean: Outliers Vitiate It.
• Weighted Av. Useful for Averaging Ratios
• Symmetrical Distn: Mode=Median=Mean
• +ly Skewed Distribution: Mode < Mean
• -ly Skewed Distribution: Mode > Mean
12. varsha Varde 12
Preliminary Analysis of Data
Measure of Dispersion in the Data:
• ‘Average’ is Insufficient to Summarize
Huge Data Spread over a Wide Range
• Need to Obtain another Number to Know
How Widely the Numbers are Spread
• Tools: Range & Mean Deviation
Variance & Standard Deviation
Coefficient of Variation
13. varsha Varde 13
Range and Mean Deviation
• Range: Difference Between the Smallest
and the Largest Observation
• Mean Deviation: Arithmetic Mean of the
Deviations of the Observations from an
Average, Usually the Mean.
14. varsha Varde 14
Computing Mean Deviation
• Select a Measure of Average, say, Mean.
• Compute the Difference Between Each
Value of the Variable and the Mean.
• Multiply the Difference by the Concerned
Frequency.
• Sum Up the Products.
• Divide by the Sum of All Frequencies.
• Mean Deviation is the Weighted Average.
19. varsha Varde 19
Mean Deviation
• Sum of the Products: 318.12
• Sum of All Frequencies: 50
• Mean Deviation: 318.12 / 50 = 6.36
• Let Us Compute With a Simpler Example
20. varsha Varde 20
Machine Downtime Data in Minutes
per Day for 100 Working Days
Frequency Distribution
Downtime in Minutes No. of Days
00 – 10 20
10 – 20 40
20 – 30 20
30 – 40 10
40 – 50 10
Total 100
21. varsha Varde 21
Machine Downtime Data in Minutes
per Day for 100 Working Days
Frequency Distribution
Downtime Midpoints No. of Days
05 20
15 40
25 20
35 10
45 10
Total 100
22. varsha Varde 22
Arithmetic Mean
Downtime
Midpoints
No. of Days Product
05 20 05 x 20 = 100
15 40 15 x 40 = 600
25 20 25 x 20 = 500
35 10 35 x 10 = 350
45 10 45 x 10 = 450
Total 100 2000
23. varsha Varde 23
Arithmetic Mean
• Arithmetic Mean is the Average of the
Observed Downtimes.
• Arithmetic Mean= Total Observed
Downtime/ total number of days
• Arithmetic Mean= 2000 / 100 = 20 Minutes
• Average Machine Downtime is 20
Minutes.
24. varsha Varde 24
Mean Deviation
Downtime
Midpoints
No. of Days Deviation from
Mean
05 20 |05 – 20| =15
15 40 |15 – 20| = 05
25 20 |25 – 20| = 05
35 10 |35 – 20| = 15
45 10 |45 – 20| = 25
Total 100
25. varsha Varde 25
Mean Deviation
Downtime
Midpoints
No. of
Days
Deviation from
Mean
Products
05 20 |05 – 20| =15 15 x 20 = 300
15 40 |15 – 20| = 05 05 x 40 = 200
25 20 |25 – 20| = 05 05 x 20 = 100
35 10 |35 – 20| = 15 15 x 10 = 150
45 10 |45 – 20| = 25 25 x 10 = 250
Total 100 1000
26. varsha Varde 26
Mean Deviation
• Definition: Mean Deviation is mean of
Deviations (Disregard negative Sign) of
the Observed Values from the Average.
• In this Example, Mean Deviation is the
Weighted Average(weights as
frequencies) of the Deviations of the
Observed Downtimes from the Average
Downtime.
• Mean Deviation = 1000 / 100 = 10 Minutes
27. varsha Varde 27
Variance
• Definition: Variance is the average of the
Squares of the Deviations of the Observed
Values from the mean.
28. varsha Varde 28
Standard Deviation
• Definition: Standard Deviation is the
Average Amount by which the Values
Differ from the Mean, Ignoring the Sign of
Difference.
• Formula: Positive Square Root of the
Variance.
29. varsha Varde 29
Variance
Downtime
Midpoints
No. of
Days
Difference from
Mean
Square Products
05 20 05 – 20 = -15 225 225 x 20 =
4500
15 40 15 – 20 = - 05 25 25 x 40 =
1000
25 20 25 – 20 = 05 25 25 x 20 =
500
35 10 35 – 20 = 15 225 225 x 10 =
2250
45 10 45 – 20 = 25 625 625 x 10 =
6250
Total 100 14500
30. varsha Varde 30
Variance & Standard Deviation
• Variance = 14500 / 100 = 145 Mts Square
• Standard Deviation =
Sq. Root of 145 = 12.04 Minutes
• Exercise: This Group of 65: Compute the
Variance & Standard Deviation of age
31. varsha Varde 31
Simpler Formula for Variance
• Logical Definition: Variance is the
Average of the Squares of the Deviations
of the Observed Values from the mean.
• Simpler Formula: Variance is the Mean
of the Squares of Values Minus the
Square of the Mean of Values..
32. varsha Varde 32
Variance (by Simpler Formula)
Downtime
Midpoints
No. of Days Squares Products
05 20 25 25 x 20 = 500
15 40 225 225 x 40 = 9000
25 20 625 625 x 20 = 12500
35 10 1225 1225 x 10 = 12250
45 10 2025 2025 x 10 = 20250
Total 100 54500
33. varsha Varde 33
Variance (by Simpler Formula)
• Mean of the Squares of Values
= 54500/100 = 545
• Square of the Mean of Values=20x20=400
• Variance = Mean of Squares of Values
Minus Square of Mean of Values
= 545 – 400 = 145
• Standard Deviation = Sq.Root 145 = 12.04
34. varsha Varde 34
Significance of Std. Deviation
In a Normal Frequency Distribution
• 68 % of Values Lie in the Span of Mean
Plus / Minus One Standard Deviation.
• 95 % of Values Lie in the Span of Mean
Plus / Minus Two Standard Deviation.
• 99 % of Values Lie in the Span of Mean
Plus / Minus Three Standard Deviation.
Roughly Valid for Marginally Skewed Distns.
35. varsha Varde 35
Machine Downtime Data in Minutes
per Day for 100 Working Days
Frequency Distribution
Downtime in Minutes No. of Days
00 – 10 20
10 – 20 40
20 – 30 20
30 – 40 10
40 – 50 10
Total 100
36. varsha Varde 36
Interpretation from Mean & Std Dev
Machine Downtime Data
• Mean = 20 and Standard Deviation = 12
• Span of One Std. Dev. = 20–12 to 20+12
= 8 to 32: 60% Values
• Span of Two Std. Dev. = 20–24 to 20+24
= -4 to 44: 95% Values
• Span of Three Std. Dev. = 20–36 to 20+36
= -16 to 56: 100% Values
38. varsha Varde 38
Interpretation from Mean & Std Dev
Sales Orders Data
• Mean = 9.82 & Standard Deviation = 6.36
• Round Off To: Mean 10 and Std. Dev 6
• Span of One Std. Dev. = 10–6 to 10+6 = 4
to 16: 31 Values (62%)
• Span of Two Std. Dev. = 10–12 to 10+12
= -2 to 22: 45 Values (90%)
• Span of Three Std. Dev. = 10–18 to 10+18
= -8 to 28: 47 Values (94%)
39. varsha Varde 39
BIENAYME_CHEBYSHEV RULE
• For any distribution percentage of
observations lying within +/- k standard
deviation of the mean is at least
( 1- 1/k square ) x100 for k>1
• For k=2, at least (1-1/4)100 =75% of
observations are contained within 2
standard deviations of the mean
40. varsha Varde 40
Coefficient of Variation
• Std. Deviation and Dispersion have Units
of Measurement.
• To Compare Dispersion in Many Sets of
Data (Absenteeism, Production, Profit),
We Must Eliminate Unit of Measurement.
• Otherwise it’s Apple vs. Orange vs. Mango
• Coefficient of Variation is the Ratio of
Standard Deviation to Arithmetic Mean.
• CoV is Free of Unit of Measurement.
41. Coefficient of Variation
• In Our Machine Downtime Example,
Coefficient of Variation is 12.04 / 20 = 0.6
or 60%
• In Our Sales Orders Example, Coefficient
of Variation is 6.36 / 9.82 = 0.65 or 65%
• The series for which CV is greater is said
to be more variable or less consistent ,
less uniform, less stable or less
homogeneous.
42. Coefficient of Variation
• In Our Machine Downtime Example,
Coefficient of Variation is 12.04 / 20 = 0.6
• In Our Sales Orders Example, Coefficient
of Variation is 6.36 / 9.82 = 0.65
• The series for which CV is greater is said
to be more variable or less consistent ,
less uniform, less stable or less
homogeneous.
43. Example
• Mean and SD of dividends on equity stocks of
TOMCO & Tinplate for the past six years is as
follows
• Tomco:Mean=15.42%,SD=4.01%
• Tinplate:Mean=13.83%, SD=3.19%
• CV:Tomco=26.01%,Tinplate=23.01%
• Since CV of dividend of Tinplates is less it
implies that return on stocks of Tinplate is more
stable
• For investor seeking stable returns it is better to
invest in scrips of Tinplate
44. varsha Varde 44
Exercise
• List Ratios Commonly used in Cricket.
• Study Individual Scores of Indian Batsmen
at the Last One Day Cricket Match.
• Are they Nominal, Ordinal or Cardinal
Numbers? Discrete or Continuous?
• Find Median & Arithmetic Mean.
• Compute Range, Mean Deviation,
Variance, Standard Deviation & CoV. ..
45. varsha Varde 45
Steps in Constructing a
Frequency Distribution
(Histogram)
1. Determine the number of classes
2. Determine the class width
3. Locate class boundaries
4. Use Tally Marks for Obtaining
Frequencies for each class
46. varsha Varde 46
Rule of thumb
• Not too few to lose information content
and not too many to lose pattern
• The number of classes chosen is usually
between 6 and15.
• Subject to above the number of classes
may be equal to the square root of the
number of data points.
• The more data one has the larger is the
number of classes.
47. varsha Varde 47
Rule of thumb
• Every item of data should be included in
one and only one class
• Adjacent classes should not have interval
in between
• Classes should not overlap
• Class intervals should be of the same
width to the extent possible
48. varsha Varde 48
Illustration
Frequency and relative frequency distributions
(Histograms):
Example
Weight Loss Data
20.5 19.5 15.6 24.1 9.9
15.4 12.7 5.4 17.0 28.6
16.9 7.8 23.3 11.8 18.4
13.4 14.3 19.2 9.2 16.8
8.8 22.1 20.8 12.6 15.9
• Objective: Provide a useful summary of the available
information
49. varsha Varde 49
Illustration
• Method: Construct a statistical graph called a “histogram” (or frequency distribution)
Weight Loss Data
class boundaries - tally class rel
. freq, f freq, f/n
1 5.0-9.0 3 3/25 (.12)
2 9.0-13.0 5 5/25 (.20)
3 13.0-17.0 7 7/25 (.28)
4 17.0-21.0 6 6/25 (.24)
5 21.0-25.0 3 3/25 (.12)
6 25.0-29.0 1 1/25 (.04)
Totals 25 1.00
Let
• k = # of classes
• max = largest measurement
• min = smallest measurement
• n = sample size
• w = class width
50. varsha Varde 50
Formulas
• k = Square Root of n
• w =(max− min)/k
• Square Root of 25 = 5. But we used k=6
• w = (28.6−5.4)/6
w = 4.0
51. varsha Varde 51
Numerical methods
• Measures of Central Tendency
1. Mean( Arithmetic,Geometric,Harmonic)
2 .Median
3. Mode
• Measures of Dispersion (Variability)
1. Range
2. Mean Absolute Deviation (MAD)
3. Variance
4. Standard Deviation
52. varsha Varde 52
Measures of Central Tendency
• Given a sample of measurements (x1, x2, · · ·, xn) where
n = sample size
xi = value of the ith observation in the sample
• 1. Arithmetic Mean
AM of x =( x1+x2+···+xn) / n = ∑ xi /n
• 2. Geometric Mean
GM of x =(x1.x2.x3…..xn) ^1/n
• 3.Weighted Average = (w1.x1+w2.x2+….wn.xn)/(w1+w2+
…wn)
=∑wixi /∑wi
53. varsha Varde 53
Example
• : Given a sample of 5 test grades
(90, 95, 80, 60, 75)
Then n=5; x1=90,x2=95,x3=80,x4=60,x5=75
• AM of x =( 90 + 95 + 80 + 60 + 75)/5 = 400/5=80
• GM of x =( 90 *95* 80 * 60 * 75)^1/5
=(3078000000)^1/5=79
• Weighted verage;w1=1,w2=2,w3=2,w4=3,w5=2
WM of x =( 1*90 + 2*95 + 2*80 +3* 60 +2*75)/10
= 770/10=77
54. varsha Varde 54
Measures of Central Tendency
• Sample Median
• The median of a sample (data set) is the middle number when the measurements are
• arranged in ascending order.
• Note:
• If n is odd, the median is the middle number
If n is even, the median is the average of the middle two numbers.
• Example 1: Sample (9, 2, 7, 11, 14), n = 5
• Step 1: arrange in ascending order
• 2, 7, 9, 11, 14
• Step 2: med = 9.
• Example 2: Sample (9, 2, 7, 11, 6, 14), n = 6
• Step 1: 2, 6, 7, 9, 11, 14
• Step 2: med = (7+9)/2=8
Remarks:
• (i) AM of x is sensitive to extreme values
• (ii) the median is insensitive to extreme values (because median is a measure of
• location or position).
• 3. Mode
• The mode is the value of x (observation) that occurs with the greatest frequency.
• Example: Sample: (9, 2, 7, 11, 14, 7, 2, 7), mode = 7
55. varsha Varde 55
Choosing Appropriate
Measure of Location
• If data are symmetric, the mean, median,
and mode will be approximately the same.
• If data are multimodal, report the mean,
median and/or mode for each subgroup.
• If data are skewed, report the median.
• The AM is the most commonly used and is
preferred unless precluding circumstances
are present
56. varsha Varde 56
Measures of Variation
• Sample range
• Sample variance
• Sample standard deviation
• Sample interquartile range
57. Sample Range
R = largest obs. - smallest obs.
or, equivalently
R = xmax - xmin
58. Coefficient of Range
CR = largest obs. - smallest obs.
-------------- ----------------------------
largest obs. +smallest obs.
or, equivalently
CR = xmax – xmin/ xmax + xmin
61. varsha Varde 61
• it is the typical (standard) difference
(deviation) of an observation from the
mean
• think of it as the average distance a data
point is from the mean, although this is not
strictly true
What is a standard deviation?
63. Quartile Deviation
• Q.D =( third quartile - first quartile)/2
= (Q3 - Q1)/2
• (Median -Q.D) to( Median+Q.D)
covers around 50% of the observations
as economic or business data are
seldom perfectly symmetrical
• Coefficient of Quartile deviation
=( Q3 - Q1)/ Q3 + Q1
64. varsha Varde 64
Measures of Variation -
Some Comments
• Range is the simplest, but is very sensitive
to outliers
• Interquartile range is mainly used with
skewed data (or data with outliers)
• We will use the standard deviation as a
measure of variation often in this course
65. varsha Varde 65
Measures of Variability
• Given: a sample of size n
• sample: (x1, x2, · · ·, xn)
• 1. Range:
• Range = largest measurement - smallest
measurement
• or Range = max - min
• Example 1: Sample (90, 85, 65, 75, 70, 95)
• Range = max - min = 95-65 = 30
66. varsha Varde 66
Measures of Variability
• 2. Mean Absolute Deviation
• MAD = AM of absolute Deviations
• Sum of |xi −¯ x| /n =∑I xi- ¯ x I /n
Example 2: Same sample
x x−¯ x |x −¯ x|
90 10 10
85 5 560
65 -15 15
7 -5 5
70 -10 10
95 15 15
Totals 480 0 60
• MAD =60/10=6
Remarks:
• (i) MAD is a good measure of variability
• (ii) It is difficult for mathematical manipulations
67. varsha Varde 67
Measures of Variability
• 3. Standard Deviation
• Example: Same sample as before (AM of ;x = 80) ;n=6
x x− ¯x (x − ¯x)2
90 10 100
85 5 25
65 -15 225
75 -5 25
70 -10 100
95 15 225
Totals 480 0 700
• Therefore
• Variance of x =700 / 5 =140 ;
•
• Standard deviation of x = square root of 140 = 11.83
68. varsha Varde 68
• Finite Populations
• Let N = population size.
• Data: {x1, x2, · · · , xN}
• Population mean: μ = (x1+x2+………+xN) /N
• Population variance: σ2 = (x1− μ)2+ (x2− μ)2+…….+ (xN− μ)2
-------------------------------------------------------------------
N
• Population standard deviation: σ = √σ2,
69. varsha Varde 69
• Population parameters vs sample
statistics.
• Sample statistics: ¯x, s2
, s.
• Population parameters: μ, σ2
, σ.
• Approximation: s = range /4
• Coefficient of variation (c.v.) = s / ¯x
70. varsha Varde 70
• 4 Percentiles
• Using percentiles is useful if data is badly
skewed.
• Let x1, x2, . . . , xn be a set of measurements
arranged in increasing order.
• Definition. Let 0 < p < 100. The pth percentile is
a number x such that p% of all measurements
fall below the pth percentile and (100 − p)% fall
above it.
74. varsha Varde 74
Sample Mean and Variance
For Grouped Data
• 5 Example: (weight loss data)
• Weight Loss Data
• class boundaries mid-pt. freq. xf x2
f
x f
• 1 5.0-9.0- 7 3 21 147
• 2 9.0-13.0- 11 5 55 605
• 3 13.0-17.0- 15 7 105 1,575
• 4 17.0-21.0- 19 6 114 2,166
• 5 21.0-25.0- 23 3 69 1,587
• 6 25.0-29.0 27 1 27 729
• Totals 25 391 6,809
• Let k = number of classes.
• Formulas.
• AM= (x1f1+x2f2+……..+xkfk)/(f1+f2+……+fk)=391/25=15.64
• Variance= 6809/24-(15.64)^2=283,71-244.61=39
• SD=(39)^1/2=6.24
75. varsha Varde 75
mode for grouped data
f – f1
• Mode=Lmo + ---------- x w
2f-f1-f2
• Lmo= Lower limit of Modal Class
• f1,f2=Frequencies of classes preceding
and succeeding modal class
• f=Frequency of modal class
• w= Width of class interval
77. varsha Varde 77
Formulas for Quartiles
• [ (N+1)/4-(F+1)]
• Q1=Lq + ------------- x W
fq
Where, Lq=Lower limit of quartile class
N= Total frequency
F=Cumulative frequency upto quartile class
fq= frequency of quartile class
w= Width of the class interval
First quartile class is that which includes observation no,
(N+1)/4
78. varsha Varde 78
Formulas for Quartiles
• [ (N+1)/4-(F+1)]
• Q1=Lq + ------------- x W
fq
Where, Lq=Lower limit of quartile class=9
N= Total frequency=25
F=Cumulative frequency upto quartile class=3
fq= frequency of quartile class=5
w= Width of the class interval=4
First quartile class is that which includes observation no,
(N+1)/4=6.5
Q1=9+[{(6.5 -4)/5 }x 4]=9+2=11
79. varsha Varde 79
Formulas for Quartiles
• [ 3(N+1)/4-(F+1)]
• Q3=Lq + --------------------xW
fq
Where, Lq=Lower limit of quartile class
N= Total frequency
F=Cumulative frequency upto quartile class
fq= frequency of quartile class
w= Width of the class interval
Third quartile class is that which includes observation
no.3(N+1)/4
80. varsha Varde 80
Formulas for Quartiles
• [ 3(N+1)/4-(F+1)]
• Q3=Lq + --------------------xW
fq
Where, Lq=Lower limit of quartile class=17
N= Total frequency=25
F=Cumulative frequency upto quartile class=15
fq= frequency of quartile class=6
w= Width of the class interval=4
Third quartile class is that which includes observation
no.3(N+1)/4=19.5
Q3=17 +[ {(19.5-16)/6}x4]=17+2.33=19.33
81. varsha Varde 81
Formulas for Quartiles
• [ 2(N+1)/4-(F+1)]
• Q2=Lq + ------------------ xW
fq
Where, Lq=Lower limit of quartile class
N= Total frequency
F=Cumulative frequency upto quartile class
fq= frequency of quartile class
w= Width of the class interval
Second quartile class is that which includes observation no.
(N+1)/2
82. varsha Varde 82
Formulas for Quartiles
• [ 2(N+1)/4-(F+1)]
• Q2=Lq + ------------------ xW
fq
Where, Lq=Lower limit of quartile class=13
N= Total frequency=25
F=Cumulative frequency upto quartile class=8
fq= frequency of quartile class=7
w= Width of the class interval=4
Second quartile class is that which includes observation no.
(N+1)/2=13
Q2=13 +[{(13-9)/7}x4]=13+5.14=18.14
83. varsha Varde 83
Empirical mode
• Where mode is ill defined its value may be
ascertained by using the following formula
• Mode =3 median-2mean