2. S. No. Reference
No.
Particulars Slide From
– To
1. Chapter 1 Introduction to Business Statistics 08 – 21
2. Chapter 2 Descriptive Statistics: Collection, Processing and Presentation of
Data
22 – 36
3. Chapter 3 Measures of Central Tendency 37 – 51
4. Chapter 4 Measures of Dispersion 52 – 66
5. Chapter 5 Skewness and Kurtosis 67 – 79
6. Chapter 6 Correlation Analysis 80 – 98
7. Chapter 7 Regression Analysis 99 – 114
8. Chapter 8 Theory of Probability 115 – 134
9. Chapter 9 Probability Distribution 135 – 153
10. Chapter 10 Use of Excel Software for Statistical Analysis 154 – 194
Course Index
1– 2
3. Managerial decision-making can be made efficient and effective by
analyzing available data using appropriate statistical tools. Statistical tools
not only have application in research (marketing research included) but also in other
functional areas like quality management, inventory management, financial
analysis, human resource planning and so on.
Course Introduction
1– 3
Cont….
4. The word statistics is derived from the Italian word ‘Stato’ which
means ‘state’; and ‘Statista’ refers to a person involved with the affairs of
state. Thus, statistics originally was meant for collection of facts useful for
affaires of the state, like taxes, land records, population demography, etc.
1– 4
Cont….
5. Significant contribution has also been made by Indians in the field of
statistics. Prof Prasant Chandra Mahalanobis, is the first to pioneer the
study of statistical science in India. He founded the Indian Statistical Institute (ISI)
in1931. Mahalanobis viewed statistics as a tool in increasing the efficiency of
all human efforts and also concentrated on sample surveys.
Statistics are the classified facts representing the conditions of the
people in the state…. specially those facts which can be stated in
number or in table of numbers or in any tabular or classified arrangement”.
– Webster
1– 5
Cont….
6. Statistical methods are broadly divided into five categories. These are
Descriptive Statistics, Analytical Statistics, Inductive Statistics, Inferential
Statistics, Applied Statistics
Statistics is an indispensable tool of production control and market
research. Statistical tools are extensively used in business for time and
motion study, consumer behaviour study, investment decisions, performance
measurements and compensations, credit ratings, inventory management,
accounting, quality control, distribution channel design, etc.
1– 6
Cont….
7. Statistical analysis is a vital component in every aspect of research.
Social surveys, laboratory experiment, clinical trials, marketing research,
human resource planning, inventory management, quality management etc., require
statistical treatment before arriving at valid conclusions.
Functions of statistics are Condensation, Comparison, Forecast, Testing of
hypotheses, Preciseness, Expectation.
Statistical techniques, because of their flexibility and economy, have
become popular and are used in numerous fields. But statistics is not a
cure-all technique and has limitations. It cannot be applied to all kinds of
situations and cannot be made to answer all queries.
1– 7
8. Introduction to Business Statistics
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 09 – 09
2. Topic 1 Introduction 10 – 10
3. Topic 2 Development of Statistics 11 – 11
4. Topic 3 Definitions of Statistics 12 – 12
5. Topic 4 Importance of Statistics 13 – 13
6. Topic 5 Classification of Statistics 14 – 14
7. Topic 6 Role of Statistics 15 – 15
8. Topic 7 Functions of Statistics 16 – 16
9. Topic 8 Limitations of Statistics 17 – 17
10. Topic 9 Summary 18 – 21
1– 8
9. Learning Objectives
After studying this chapter, you should be able to:
Understand the development, importance and role of statistics
Explain the basic concept of statistical studies
Understand the application of statistics in business and management
Learn about functions and limitations of statistics
1– 9
10. Introduction
Information derived from good statistical analysis is always precise and never useless.
One of the primary tasks of a manager is decision-making.
Statistical techniques offer powerful tools in the decision-making process.
These tools have power to interpret quantitative information in a scientific and
an objective manner.
1– 10
11. Development of Statistics
The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; and
‘Statista’ refers to a person involved with the affairs of state.
Statistics originally was meant for collection of facts useful for affaires of the
state, like taxes, land records, population demography, etc.
During ancients times even before 300BC, the rulers and kings, like Chandragupta
Maurya used statistics to maintain the land and revenue records, collection of taxes
and registration of births and deaths.
1– 11
12. Definitions of Statistics
“Statistics are the classified facts representing the conditions of the people in
the state…. specially those facts which can be stated in number or in table of
numbers or in any tabular or classified arrangement”.
– Webster
“By statistics we mean quantitative data affected to a marked extent by multiplicity of
causes”.
–Yule and Kendall
“Statistics may be defined as the science of collection, presentation, analysis and
interpretation of data”.
– Croxton and Cowden
1– 12
13. Importance of Statistics
Identify what information or data is worth collecting,
Decide when and how judgments may be made on the basis of partial information,
and
Measure the extent of doubt and risk associated with the use of partial information
and stochastic processes.
1– 13
18. Summary
Managerial decision-making can be made efficient and effective by analyzing
available data using appropriate statistical tools. Statistical tools not only have
application in research (marketing research included) but also in other functional areas
like quality management, inventory management, financial analysis, human resource
planning and so on.
The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; and
‘Statista’ refers to a person involved with the affairs of state. Thus, statistics originally was
meant for collection of facts useful for affaires of the state, like taxes, land records,
population demography, etc.
1– 18
Cont….
19. Significant contribution has also been made by Indians in the field of statistics. Prof
Prasant Chandra Mahalanobis, is the first to pioneer the study of statistical science in
India. He founded the Indian Statistical Institute (ISI) in1931. Mahalanobis viewed statistics
as a tool in increasing the efficiency of all human efforts and also concentrated on sample
surveys.
Statistics is the classified facts representing the conditions of the people in the
state…. specially those facts which can be stated in number or in table of numbers or in
any tabular or classified arrangement.
Statistical methods are broadly divided into five categories. These are Descriptive
Statistics, Analytical Statistics, Inductive Statistics, Inferential Statistics and Applied
Statistics.
1– 19
Cont….
20. Statistics is an indispensable tool of production control and market research.
Statistical tools are extensively used in business for time and motion study,
consumer behaviour study, investment decisions, performance measurements and
compensations, credit ratings, inventory management, accounting, quality control, distribution
channel design, etc.
Statistical analysis is a vital component in every aspect of research. Social
surveys, laboratory experiment, clinical trials, marketing research, human
resource planning, inventory management, quality management, etc., require
statistical treatment before arriving at valid conclusions.
Functions of statistics are Condensation, Comparison, Forecast, Testing of
hypotheses, Preciseness and Expectation.
1– 20
Cont….
21. Statistical techniques, because of their flexibility and economy, have become
popular and are used in numerous fields. But statistics is not a cure-all technique and
has limitations. It cannot be applied to all kinds of situations and cannot be made to
answer all queries.
More dangerous than distrust is misuse of statistics to draw convenient conclusions to
satisfy selfish or ulterior motives. Arguments and analysis supported by facts, figures,
charts, graphs, index numbers, etc. are indeed very appealing and convincing. They
can be used to intimidate opposing views. Hence, statistics is open to manipulation.
1– 21
22. Descriptive Statistics: Collection, Processing and
Presentation of Data
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 23 – 23
2. Topic 1 Introduction 24 – 24
3. Topic 2 Descriptive and Inferential Statistics 25 – 26
4. Topic 3 Collection of Data 27 – 27
5. Topic 4 Editing and Coding of Data 28 – 28
6. Topic 5 Classification of Data 29 – 29
7. Topic 6 Tabulation of Data 30 – 30
8. Topic 7 Diagrammatic and Graphical Presentation of Data 31 – 32
9. Topic 8 Summary 33 – 36
1– 22
23. Learning Objectives
After studying this chapter, you should be able to:
Describe descriptive and inferential statistics
Explain collection, editing and classification of primary and secondary data
Define tabulation and presentation of data
Understand diagrammatic and graphical presentation
Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons and
Ogives
1– 23
24. Introduction
Success of any statistical investigation depends on the availability of accurate and
reliable data.
These depend on the appropriateness of the method chosen for data collection.
Data collection is a very basic activity in decision-making.
Data may be classified either as primary data or secondary data.
Successful use of the collected data depends to a great extent upon the way it is
arranged, displayed and summarized.
1– 24
25. Descriptive and Inferential Statistics
1– 25
Descriptive Statistics
Descriptive statistics is
the type of statistics that
probably comes to most
of the minds of people
when they hear the word
“statistics.”
Cont….
26. 1– 26
Inferential Statistics
Inferential statistics studies a
statistical sample, and
from this analysis we are able
to say something about the
population from which the
sample came.
28. Editing and Coding of Data
Editing Primary Data
Completeness
Consistency
Accuracy
Homogeneity
Editing Secondary Data
Field Editing
Central Editing
1– 28
Coding is the process of
assigning some symbols either
alphabetical or numeral or both to
the answers so that the
responses can be recorded into
a limited number of classes or
categories.
Coding of Data
29. Classification of Data
Classification refers to the
grouping of data into
homogeneous classes and
categories. It is the process of
arranging things in groups or
classes according to their
resemblances and affinities.
1– 29
Rules of
Classification
Frequency
Distribution
Bases of
Classification
1 2
3
30. Types of
Tabulation
Advantages
of
Tabulation
Multi – Way
Tabulation
Two – Way
Tabulation
One – Way
Tabulation
Tabulation of Data
Tabulation is arranging
the data in flat table
(two dimensional arrays)
format by grouping the
observations.
Table is a spreadsheet
with rows and columns
with headings and stubs
indicating class of the
data.
1– 30
31. Diagrammatic and Graphical Presentation of Data
1– 31
Cont….
Difference Between Diagrams And Graphs
Difference between Diagram and Graphs
Diagram Graph
1. Can be drawn on an ordinary paper. 1. Can be drawn on a graph paper.
2. Easy to grasp. 2. Needs some effort to grasp.
3. Not capable of analytical treatment. 3. Capable of analytical treatment.
4. Can be used only for comparisons. 4. Can be used to represent a
mathematical relation.
5. Data are represented by bars, and
rectangles, pictures, etc.
5. Data are represented by lines
curves.
33. Summary
There are two major divisions of the field of statistics, namely descriptive and
inferential statistics. Both the segments of statistics are important, and accomplish
different objectives.
Data can be obtained through primary source or secondary source according to
need, situation, convenience, time, resources and availability. The most important method
for primary data collection is through questionnaire. Data must be objective and fact-
based so that it helps a decision-maker to arrive at a better decision.
Statistical data is a set of facts expressed in quantitative form. Data is collected through
various methods. Sometimes our data set consists of the entire population we are
interested in. In other situations, data may constitute a sample from some population.
1– 33
Cont….
34. Type of research, its purpose, conditions under which the data are obtained will
determine the method of collecting the data. If relatively few items of information are
required quickly, and funds are limited telephonic interviews are recommended. If
respondents are industrial clients Internet could also be used. If depth interviews and probing
techniques are to be used, it is necessary to employ investigators to collect data.
The quality of information collected through the filling of a questionnaire
depends, to a large extent, upon the drafting of its questions. Hence, it is extremely
important that the questions be designed or drafted very carefully and in a tactful manner.
Before any processing of the data, editing and coding of data is necessary to ensure
the correctness of data. In any research studies, the voluminous data can be handled
only after classification. Data can be presented through tables and charts.
1– 34
Cont….
35. Classification refers to the grouping of data into homogeneous classes and
categories. It is the process of arranging things in groups or classes according to
their resemblances and affinities.
A frequency distribution is the principle tabular summary of either discrete data or
continuous data. The frequency distribution may show actual, relative or cumulative
frequencies. Actual and relative frequencies may be charted as either histogram (a bar chart)
or a frequency polygon. Two commonly used graphs of cumulative frequencies are
less than ogive or more than ogive.
Once the raw data is collected, it needs to be summarized and presented to the
decision-maker in a form that is easy to comprehend. Tabulation not only
condenses the data, but also makes it easy to understand. Tabulation is the fastest
way to extract information from the mass of data and hence popular even among those
not exposed to the statistical method.
1– 35
Cont….
36. The charts help in grasping the data and analyze it qualitatively. This also helps
managers to effectively present the data as a part of reports. Various types of chart are
bar diagram, multiple bar diagrams, component bar diagram, deviation bar diagram, sliding
bar diagram, Histogram and Pie charts.
A graphic presentation is another way of representing the statistical data in a simple
and intelligible form. There are two types of graphs which we have discussed, line
graphs and ogives.
1– 36
37. Measures of Central Tendency
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 38 – 38
2. Topic 1 Introduction 39 – 39
3. Topic 2 Characteristics of Central Tendency 40 – 41
4. Topic 3 Arithmetic Mean 42 – 42
5. Topic 4 Median 43 – 43
6. Topic 5 Mode 44 – 44
7. Topic 6 Empirical Relationship between Mean, Median and Mode 45 – 45
8. Topic 7 Limitations of Central Tendency 46 – 46
9. Topic 8 Summary 47 – 51
1– 37
38. Learning Objectives
After studying this chapter, you should be able to:
Understand the concept and characteristics of central tendency
Describe all the measures of central tendency: mean, median and mode.
Explain merits and demerits of all measures of central tendency.
Discuss partition values or positional measures like quartiles, deciles and
percentiles.
1– 38
39. Introduction
The concept of central tendency plays a dominant role in the study of statistics.
In many frequency distributions, the tabulated values show a distinct tendency to
cluster or to group around a typical central value.
This behaviour of the data to concentrate the values around a central part of
distribution is called ‘Central Tendency’ of the data.
1– 39
40. Characteristics of Central Tendency
A good measure of central tendency should possess as far as possible the following
characteristics:
Easy to understand.
Simple to compute.
Based on all observations.
Uniquely defined.
Possibility of further algebraic treatment.
Not unduly affected by extreme values.
1– 40
Cont….
42. Arithmetic Mean
The arithmetic mean of
a series is the quotient
obtained by dividing
the sum of the values
by the number of items.
In algebraic language,
if X1, X2, X3....... Xn
are the n values of a
variate X.
1– 42
Properties of Arithmetic Mean
Calculation of Simple Arithmetic Mean
Merits and Demerits of Arithmetic Mean
Weighted Arithmetic Mean
43. 1– 43
Median
Median is the value, which divides the distribution of data, arranged in
ascending or descending order, into two equal parts. Thus, the ‘Median’ is a
value of the middle observation.
Calculation of Median
Merits and Demerits of Median
Partition Values or Positional Measures
Quartiles
Deciles
Percentiles
44. 1– 44
Mode is the value which
has the greatest frequency
density. Mode is denoted by Z.
Calculation of Mode
Merits and Demerits of Mode
Graphic Location of Mode
Mode
45. Empirical Relationship between Mean, Median and Mode
A distribution in which the mean, the median, and the mode coincide is
known as symmetrical (bell shaped) distribution. Normal distribution
is one such a symmetric distribution, which is very commonly used.
If the distribution is skewed, the mean, the median and the mode are not equal. In a
moderately skewed distribution distance between the mean and the median is
approximately one third of the distance between the mean and the mode. This can be
expressed as:
Mean – Median = (Mean – Mode) / 3
Mode = 3 * Median – 2 * Mean
1– 45
46. Limitations of Central Tendency
In case of highly skewed data.
In case of uneven or irregular spread of the data.
In open end distributions.
When average growth or average speed is required.
When there are extreme values in the data.
Except in these cases AM is widely used in practice.
1– 46
47. Summary
Measures of the central tendency give one of the very important characteristics
of the data. According to the situation, one of the various measures of central
tendency may be chosen as the most representative.
Arithmetic mean is widely used and understood. What characterizes the three measures
of centrality, and what are the relative merits of each in the given situation, is the
question.
Mean summarizes all the information in the data. Mean can be visualized as a
single point where all the mass (the weight) of the observations is concentrated. It is like a
centre of gravity in physics. Mean also has some desirable mathematical properties that
make it useful in the context of statistical inference.
1– 47
Cont….
48. To simplify the manual calculation, we may sometimes use shift of origin and
change of scale. Shifting of origin is achieved by adding or subtracting a
constant to all observations. In case of discrete data we add or subtract (usually
subtract) a constant to the individual observations. Whereas for grouped data, we add or
subtract (usually subtract) the constant to the class mark values.
There are cases where relative importance of the different items is not the
same. In such a case, we need to compute the weighted arithmetic mean. The
procedure is similar to the grouped data calculations studied earlier, when we
consider frequency as a weight associated with the class-mark.
Median is the middle value when the data is arranged in order. The median is
resistant to the extreme observations. Median is like the geometric centre in physics. In
case we want to guard against the influence of a few outlying observations (called
outliers), we may use the median.
1– 48
Cont….
49. Quantiles are related positional measures of central tendency. These are useful and
frequently employed measures. Most familiar quantiles are Quartiles, Deciles, and
Percentiles.
Quartiles are position values similar to the Median. There are three quartiles
denoted by Q1, Q2 and Q3. Q1 is called the lower Quartile or first quartile. The second
quartile Q2 is nothing but the median. In a distribution, one fourth of the item are less then Q1
and the other ¾ th item are greater then Q1 is called the upper quartile (or) the 3rd
quartile.
Inter-quartile range is defined as the difference between the first and third
quartile. It is a measure of spread of the data.
D1, D2, D3… and D9 are the nine deciles. They divide a series into 10 equal
parts. One tenth of the items are less than or equal to D1. One tenth of the
items are more than or equal to D9 and one tenth of the items between any
successive pairs of deciles when all the items are in ascending order
1– 49
Cont….
50. 1– 50
Cont….
Pth percentile of a group of observations is that observation below which lie P%
(P percent) observations. The position of Pth percentile is given by
, where ‘n’ is the number of data points.
If the value of is a fraction, we need to interpolate the value.
The Mode of a data set is the value that occurs most frequently. There are
many situations in which arithmetic mean and median fail to reveal the true
characteristics of a data (most representative figure), for example, most common size of
shoes, most common size of garments etc. In such cases, mode is the best-suited
measure of the central tendency.
A distribution in which the mean, the median, and the mode coincide is
known as symmetrical (bell shaped) distribution. Normal distribution is one
such a symmetric distribution, which is very commonly used.
51. This can be expressed as:
Mean – Median = (Mean – Mode) / 3
Mode = 3 * Median – 2 * Mean
No single average can be regarded as the best or most suitable under all circumstances.
Each average has its merits and demerits and its own particular field of importance and
utility. A proper selection of an average depends on the (1) nature of the data and (2)
purpose of enquiry or requirement of the data.
1– 51
52. Measures of Dispersion
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 53 – 53
2. Topic 1 Introduction 54 – 54
3. Topic 2 Characteristics of Measures of Dispersion 55 – 55
4. Topic 3 Absolute and Relative Measures of Dispersion 56 – 57
5. Topic 4 Range 58 – 59
6. Topic 5 Inter-quartile Range and Deviations 60 – 60
7. Topic 6 Variance and Standard Deviation 61 – 62
8. Topic 7 Summary 63 – 66
1– 52
53. Learning Objectives
After studying this chapter, you should be able to:
Understand absolute and relative measures of variation
Learn about range and inter-quartile range
Discuss variance, standard deviation, mean deviation and coefficient of variation
Study the empirical relationship between different measures of variation
1– 53
54. Introduction
A measure of dispersion
or variation in any data
shows the extent to
which the numerical
values tend to spread
about an average.
1– 54
Data is useful:
To compare the current results
with the past results.
To compare two are more sets
of observations.
To suggest methods to control
variation in the data.
55. Characteristics of Measures of Dispersion
1– 55
It should be easy to compute.
It should be rigidly defined.
It should be based on each individual item of the distribution.
It should be capable of further algebraic treatment.
It should have sampling stability.
It should not be unduly affected by the extreme items.
It should be simple to understand.
56. Absolute and Relative Measures of Dispersion
‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a
measure of absolute dispersion to an appropriate average.
A precise measure of dispersion is one which gives the magnitude of the
variation in a series, i.e. it measures in numerical terms, the extent of the
scatter of the values around the average.
1– 56
Cont….
57. 1– 57
ABSOLUTE AND RELATIVE MEASURES OF DISPERSION
Measures of Dispersion Relative Variability
The range Relative range
The Quartile Deviation Relative Quartile Deviation
The Mean Deviation Relative Mean deviation
The Median Deviation Coefficient of Variation
The Standard Deviation
Graphical Method
58. 1– 58
Cont….
Range
The ‘Range’ of the data is the difference
between the largest value of data and smallest
value of data.
59. Merits and Demerits of Range
Merits
Range is a simplest method of studying dispersion.
It takes lesser time to compute the ‘absolute’ and ‘relative’ range.
Demerits
Range does not take into account all the values of a series, i.e. it considers only
the extreme items and middle items are not given any importance.
Range cannot be computed in the case of “open ends’ distribution i.e., a distribution
where the lower limit of the first group and upper limit of the higher group is not given.
1– 59
60. Inter – Quartile Range and Deviations
Inter-quartile Range
Inter-quartile range is a difference between upper quartile (third quartile) and
lower quartile (first quartile).
Quartile Deviation
Quartile Deviation is the average of the difference between upper quartile and
lower quartile.
Mean Deviation
Mean deviation is the arithmetic mean of the absolute deviations of the values about
their arithmetic mean or median or mode.
1– 60
61. Variance and Standard Deviation
1– 61
Cont….
Variance is defined as the average of squared
deviation of data points from their mean.
62. 1– 62
Different Formulae
for Calculating
Variance
Calculation
of Standard
Deviation
Properties
of Standard
Deviation
Merits and
Demerits of
Standard Deviation
Standard
Deviation of
Combined Means
Coefficient
of Variation
Empirical
Relationship Between
Different Measures of
Variation
63. Summary
Study of distribution is very important for decision-making. Usually, measures of
central tendency and variability are adequate for taking decision. However, if data is quite
different from normal distribution then measure skewness and kurtosis need to be
considered. We discussed measures of variability: Range, Variance and Standard
Deviation.
A measure of dispersion gives an idea about the extent of lack of uniformity in the
sizes and qualities of the items in a series. It helps us to know the degree of uniformity and
consistency in the series. If the difference between items is large the dispersion or
variation is large and vice versa.
1– 63
Cont….
64. The measures of dispersion can be either ‘absolute’ or ‘relative’. Absolute
measures of dispersion are expressed in the same units in which the original data
are expressed. For example, if the series is expressed as Marks of the students in a
particular subject; the absolute dispersion will provide the value in Marks. The only difficulty
is that if two or more series are expressed in different units, the series cannot be compared on
the basis of dispersion.
The ‘Range’ of the data is the difference between the largest value of data and
smallest value of data. This is an absolute measure of variability. However, if we
have to compare two sets of data, ‘Range’ may not give a true picture. In such case,
relative measure of range, called coefficient of range is used.
Inter-quartile range is a difference between upper quartile (third quartile) and lower
quartile (first quartile). Quartile Deviation is the average of the difference between upper
quartile and lower quartile.
1– 64
Cont….
65. Average used for calculating deviation can be the mean, the median or the mode.
However, usually the mean is used. There is also an advantage of taking deviations from
the median, because ‘Mean Deviation’ from median is lowest as compared to any other
‘Mean Deviations’. Since absolute values of deviations ignoring sign are taken for
calculating Mean Deviation, the mean deviation is not amenable to further algebraic
treatment.
The variance is the average squared deviation of the data from their mean. For
sample data, we take the average by dividing with (n-1) where n is a sample size. This
is to cater for degree of freedom. For population data, we average by dividing with the
population size N.
The Standard Deviation (SD) of a set of data is the positive square root of the
variance of the set. This is also referred as Root Mean Square (RMS) value of the
deviations of the data points. SD of sample is the square root of the sample variance
1– 65
Cont….
66. There is no effect of shifting origin on standard deviation or variance.
The measures of deviation are very effective in making reports and presentations by
the business executives to present their data top general public who do not
understand statistical methods.
Variance analysis also helps in managing budgets by controlling budgeted versus
actual costs. Without the standard deviation, you can’t compare two data sets
effectively.
1– 66
67. Skewness and Kurtosis
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 68 – 68
2. Topic 1 Introduction 69 – 70
3. Topic 2 Karl Pearson’s Coefficient of Skewness (SKP) 71 – 71
4. Topic 3 Bowley’s Coefficient of Skewness (SKB) 72 – 72
5. Topic 4 Kelly’s Coefficient of Skewness (SKK) 73 – 73
6. Topic 5 Measures of Kurtosis 74 – 74
7. Topic 6 Moments 75 – 75
8. Topic 7 Summary 76 – 79
1– 67
68. Learning Objectives
After studying this chapter, you should be able to:
Understand the concept and different types of skewness
Discuss various measures of kurtosis
Learn about moments, its properties and coefficients based on moments
1– 68
69. Introduction
Skewness is a measure that studies the degree and direction of departure from symmetry.
Nature of Skewness
Skewness can be positive or negative or zero.
When the values of mean, median and mode are equal, there is no skewness.
When mean > median > mode, skewness will be positive.
When mean < median < mode, skewness will be negative.
1– 69
Cont….
70. Characteristic of a Good Measure of Skewness
It should be a pure number in the sense that its value should be independent of
the unit of the series and also degree of variation in the series.
It should have zero-value, when the distribution is symmetrical.
It should have a meaningful scale of measurement so that we could easily
interpret the measured value.
Mathematical measures of skewness can be calculated by:
Karl-Pearson’s Method
Bowley’s Method
Kelly’s method
1– 70
71. Karl Pearson’s Coefficient of Skewness (SKP)
1– 71
Karl Person has suggested two formulae:
Where the relationship of mean and mode is
established;
Where the relationship between mean and
median is not established.
72. Bowley’s Coefficient of Skewness (SKB)
Bowley’s method of skewness is based on the values of median, lower and
upper quartiles. This method suffers from the same limitations which are in the
case of median and quartiles.
Wherever positional measures are given, skewness should be measured by
Bowley’s method. This method is also used in case of ‘open-end series’, where the
importance of extreme values is ignored.
Absolute skewness = Q3 + Q1 – 2 Median
Coefficient of Skewness, (SkB) =
Where, Q is quartile.
1– 72
73. Kelly’s Coefficient of Skewness (SKK)
Kelly’s coefficient of skewness is defined as:
Skk =
Where, P is percentile.
Example: Calculate the Kelly’s coefficient of skewness from the following data:
1– 73
74. Measures of Kurtosis
Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more and
more peaked will be the distribution. The kurtosis is calculated either as an absolute or a
relative value. Absolute kurtosis is always a positive number.
1– 74
Negative kurtosis indicates a flatter distribution than the normal
distribution, and called as platykurtic.
A positive kurtosis means more peaked curve, called Leptokurtic.
Peakedness of normal distribution is called Mesokurtic.
75. 1– 75
Moments
The arithmetic mean
of various powers of
these deviations in
any distribution is
called the moments of
the distribution about
mean.
PROPERTIES OF MOMENTS
COEFFICIENTS BASED ON MOMENTS
76. Summary
Measures of Skewness and Kurtosis, like measures of central tendency and
dispersion, study the characteristics of a frequency distribution. Averages tell us
about the central value of the distribution and measures of dispersion tell us about the
concentration of the items around a central value.
When two or more symmetrical distributions are compared, the difference in them
is studied with ‘Kurtosis’. On the other hand, when two or more symmetrical distributions are
compared, they will give different degrees of Skewness. These measures are mutually
exclusive i.e. the presence of skewness implies absence of kurtosis and vice-versa.
1– 76
Cont….
77. Bowley’s method of skewness is based on the values of median, lower and upper
quartiles. This method suffers from the same limitations which are in the case of median
and quartiles. Wherever positional measures are given, skewness should be measured by
Bowley’s method. This method is also used in case of ‘open-end series’, where the
importance of extreme values is ignored.
Kelly’s coefficient of skewness is defined as:
Skk =
Where, P is percentile.
1– 77
Cont….
78. Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more and
more peaked will be the distribution. The kurtosis is calculated either as an absolute or a
relative value. Absolute kurtosis is always a positive number. Absolute kurtosis of a
normal distribution (symmetric bell shaped distribution) is taken as 3. It is taken as
datum to calculate relative kurtosis as follows:
Absolute kurtosis =
Relative kurtosis = Absolute kurtosis – 3
1– 78
Cont….
79. Moments about mean are generally used in statistics. We use a Greek alphabet read as
mu for these moments. Consider a mass attached at each point proportional to its
frequency and take moments about the mean. First, second, third and fourth moments can
be used as a measure of Central Tendency, Variation (dispersion), asymmetry and peakedness
of the curve.
1– 79
81. Learning Objectives
After studying this chapter, you should be able to:
Understand the concept of correlation
Study about different types of correlation
Describe various methods of calculating correlation such as scatter diagram
method
Discuss various types of correlation coefficients viz, Karl Pearson correlation
coefficient, rank correlation and coefficient based on concurrent deviations.
1– 81
82. 1– 82
Cont….
Croxton and Cowden say, “When
the relationship is of a quantitative
nature, the appropriate statistical
tool for discovering and measuring
the relationship and expressing it
in a brief formula is known as
correlation”.
Introduction
83. The study of correlation helps managers in following ways:
To identify relationship of various factors and decision variables.
To estimate value of one variable for a given value of other if both are correlated. E.g.
estimating sales for a given advertising and promotion expenditure.
To understand economic behaviour and market forces.
To reduce uncertainty in decision-making to a large extent.
1– 83
84. Types of Correlation
1– 84
Positive or Negative Correlation
Simple or Multiple Correlations
Partial or Total Correlation
Linear and Non-linear Correlation
85. Methods of Calculating Correlation
1– 85
Scatter
Diagram Method
Karl Pearson’s
Coefficient of
Correlation
Rank
Method
Concurrent
Deviation
Method
86. Scatter Diagram Method
1– 86
The pattern of points
obtained by plotting the
observed points are knows
as scatter diagram.
It gives us two types of information.
Whether the variables are related or
not.
If so, what kind of relationship or
estimating equation that describes
the relationship.
87. Co – Variance Method – The Karl Pearson’s Correlation Coefficient
The correlation coefficient measures the degree of association between two variables X and Y.
Karl Pearson’s formula for correlation coefficient is given as,
Where r is the ‘Correlation Coefficient’ or
‘Product Moment Correlation Coefficient’
between X and Y.
1– 87
Cont….
88. 1– 88
Estimation of Probable Error
Interpretation of R
Assumptions Underlying Karl Pearson’s Correlation Coefficient
89. Rank Correlation Method
1– 89
RANK CORRELATION WHEN RANKS ARE GIVEN
RANK CORRELATION WHEN RANKS ARE NOT GIVEN
RANK CORRELATION WHEN EQUAL RANKS ARE GIVEN
90. Correlation Coefficient using Concurrent Deviation
This is the easiest method to find the correlation between two variables. Although the
method is effective in giving the direction of the correlation as positive or negative but fails
to give the accurate strength of the correlation. In this method we check the fluctuation in
each data series as increasing (+), or decreasing (-) or equal values. Then we count the
number of items that increase or decrease or remains equal concurrently and denote as c. The
correlation coefficient is then calculated as,
Where, n = total number of pairs.
c = Number of concurrent changes
1– 90
Cont….
91. 1– 91
Example: The data of advertisement expenditure (X) and sales (Y) of a company for past
10 year period is given below. Determine the correlation coefficient between these
variables and comment the correlation.
92. Summary
In this chapter the concept of correlation or the association between two variables has
been discussed. A scatter plot of the variables may suggest that the two variables are
related but the value of the Pearson correlation coefficient r quantifies this
association.
Correlation is a degree of linear association between two random variables. In
these two variables, we do not differentiate them as dependent and independent variables. It
may be the case that one is the cause and other is an effect i.e. independent and
dependent variables respectively. On the other hand, both may be dependent
variables on a third variable.
1– 92
Cont….
93. In business, correlation analysis often helps manager to take decisions by
estimating the effects of changing the values of the decision variables like
promotion, advertising, price, production processes, on the objective parameters like
costs, sales, market share, consumer satisfaction, competitive price. The decision
becomes more objective by removing subjectivity to certain extent.
The correlation coefficient r may assume values between –1 and 1. The sign
indicates whether the association is direct (+ve) or inverse (-ve). A numerical
value of r equal to unity indicates perfect association while a value of zero
indicates no association.
1– 93
Cont….
94. The correlation is said to be positive when the increase (decrease) in the value of one
variable is accompanied by an increase (decrease) in the value of other variable also.
Negative or inverse correlation refers to the movement of the variables in opposite direction.
Correlation is said to be negative, if an increase (decrease) in the value of one variable is
accompanied by a decrease (increase) in the value of other.
In simple correlation the variation is between only two variables under study and
the variation is hardly influenced by any external factor. In other words, if one of the
variables remains same, there won’t be any change in other variable.
1– 94
Cont….
95. In case of multiple correlation analysis there are two approaches to study the
correlation. In case of partial correlation, we study variation of two variables and
excluding the effects of other variables by keeping them under controlled condition.
When the amount of change in one variable tends to keep a constant ratio to the
amount of change in the other variable, then the correlation is said to be linear. But if the
amount of change in one variable does not bear a constant ratio to the amount of
change in the other variable then the correlation is said to be non-linear.
1– 95
Cont….
96. Correlation analysis may also be necessary to eliminate a variable which
shows low or hardly any correlation with the variable of our interest. In statistics, there
are number of measures to describe degree of association between variables. These
are Karl Pearson’s Correlation Coefficient, Spearman’s rank correlation coefficient,
coefficient of determination, Yule’s coefficient of association, coefficient of
colligation, etc.
The correlation coefficient measures the degree of association between two
variables X and Y.
Karl Pearson’s formula for correlation coefficient is given as,
1– 96
Cont….
97. The purpose of computing a correlation coefficient in such situations is to
determine the extent to which the two sets of ranking are in agreement. The
coefficient that is determined from these ranks is known as Spearman’s rank
coefficient, rs. This is defined by the following formula:
1– 97
Cont….
98. Although the concurrent deviation method is effective in giving the direction of
the correlation as positive or negative but fails to give the accurate strength of the correlation.
In this method we check the fluctuation in each data series as increasing (+), or
decreasing (–) or equal values. Then we count the number of items that increase or
decrease or remains equal concurrently and denote as c. The correlation coefficient is
then calculated as,
Where, n = total number of pairs.
c = Number of concurrent changes
1– 98
100. Learning Objectives
After studying this chapter, you should be able to:
Understand the concept of regression analysis
Discuss the applicability of regression
Describe simple linear regression and nonlinear regression model.
Learn about coefficient of regression and linear regression equations
1– 100
101. Introduction
In regression analysis we develop an equation called as an estimating equation used
to relate known and unknown variables.
Then correlation analysis is used to determine the degree of the relationship
between the variables.
In this chapter we will learn, how to calculate the regression line
mathematically.
1– 101
102. Regression Analysis
1– 102
Cont….
According to Morris Myers Blair, “regression is the measure of the
average relationship between two or more variables in terms of the
original units of the data.”
103. 1– 103
Applicability of Regression Analysis
Regression analysis is a branch of statistical
theory which is widely used in all the scientific
disciplines. It is a basic technique for measuring or
estimating the relationship among economic
variables that constitute the essence of economic
theory and economic life.
104. Simple Linear Regression
1– 104
Cont….
The
highest
power
of x is
called
as
order of
the
model.
This model is used if we have
bivariate distribution i.e. only two
variables are considered and the
‘best fit’ curve is approximated to a
straight line.
105. Simple Linear Regression Model
The linear regression model uses straight line relationship. Equation of a
straight line is of the form,
(1)
Where ŷ is the predicted value of Y corresponding to x. and are constants. Now
if we assume the error (deviation) in Y direction is e, we can write the
relationship of X and Y in data points as,
Error e is the amount by which observation will fall off regression line. Error e is due
to random error ‘a’ and ‘b’ are called parameters of the linear regression model whose values
are found out from the observed data.
1– 105 Cont….
106. Linear Regression Equation
Suppose the data points are (x1, y1) (x2, y2) ….. (xn, yn) . Then we can write from
regression equation,
(2)
Thus, sum square of errors is,
To have minimum sum of squares of errors (SSE) we must have the condition,
1– 106
107. 1– 107
Cont….
Coefficient of Regression
The coefficients of regression are bYX and bXY. They have following implications:
Slopes of regression lines of Y on X and X on Y viz. bYX and bXY must have
same signs (because r² cannot be negative).
Correlation coefficient is geometric mean of bYX and bXY.
If both slopes bYX and bXY are positive correlation coefficient r is positive. If
both bYX and bXY are negative the correlation coefficient r is negative.
If indicating perfect correlation.
Both regression lines intersect at point
108. 1– 108
Properties of Regression Coefficients
The coefficient of correlation is the geometric mean of the two
regression coefficients.
Both the regression coefficients are either positive or negative. It
means that they always have identical sign i.e., either both have positive
sign or negative sign.
The coefficient of correlation and the regression coefficients will
also have same sign.
Regression coefficients are independent of the change in the origin but not
of the scale.
109. Non – Linear Regression Models
1– 109
Second Degree Model
Other Regression Models
Seasonal Model
Seasonal Model with Trend
Coefficient of Determination
110. Correlation Analysis vs Regression Analysis
Degree and Nature of Relationship
Cause and Effect Relationship
Like in correlation, regression analysis can also be studied as ‘simple and
multiple’, ‘total and partial’, ‘linear and nonlinear’, etc.
In correlation, there is no distinction between independent and dependent
variables.
1– 110
111. Summary
In this chapter, the concept of regression between dependent and independent
variables has been discussed. Regression provides us a measure of the relationship and also
facilitates to predict one variable for a value of other variable.
Unlike correlation analysis, in regression analysis, one variable is independent
and other dependent. Please note that this relationship need not be a cause-effect
relationship.
Regression analysis is a branch of statistical theory which is widely used in all
the scientific disciplines. It is a basic technique for measuring or estimating the relationship
among economic variables that constitute the essence of economic theory and
economic life. The uses of regression analysis are not confined to economic and
business activities. Its applications are extended to almost all the natural, physical
and social sciences.
1– 111
Cont….
112. Simple linear regression model is used if we have bivariate distribution i.e. only
two variables are considered and the ‘best fit’ curve is approximated to a straight line.
This describes the liner relationship between two variables. Although it appears to be
too simplistic, in many business situations, it is adequate. At least, initial study can be
based on this model for any decision- making situation.
We have studied simple linear, non-linear and multiple regression models. For
multiple regression and non-linear regression models, MS Excel or any other computer
package would help in reducing voluminous calculations. We also discussed coefficient
of determination as a measure of the strength of relationship.
1– 112
Cont….
113. Least square principle can also be applied to the fitting of a second degree
polynomial which may be useful in business situation if we have some idea that
the relationship between two variables is parabolic. In any case second degree
polynomial fit is more likely to be better approximation of the actual relationship. We
may use second order model (parabolic trend) if we feel that the variation is parabolic.
The least square approximation can be calculated easily for low degree polynomials,
like linear, parabolic, cubic, etc. But for higher degrees (more than three), the system of
normal equations becomes ill conditioned. This causes large errors in values of
coefficients. Then the approximation becomes incorrect. To avoid these problems,
‘orthogonal polynomials’ are used for approximation.
1– 113
Cont….
114. Mean Square Error (MSE) is an estimate of the variance of the regression
error. MSE depends on the values of data and its scales. Hence we need a
measure that calculates relative degree of variation so that it can be compared for
the fits obtained from different models and for different data sets. Coefficient of
determination is such a measure.
Coefficient of determination is a measure of the strength of the regression fit. It is
an estimator of population parameter of correlation and can be obtained directly from a
decomposition of variation in Y into two components, viz. due to error and due to
regression. Error is a deviation of a data point from its respective group mean. Thus error is
the deviation of a data from its predicted values explained by the regression line.
1– 114
115. Theory of Probability
1– 115
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 116 – 116
2. Topic 1 Introduction 117 – 117
3. Topic 2 Important Terms in Probability 118 – 119
4. Topic 3 Kinds of Probability 120 – 120
5. Topic 4 Simple Propositions of Probability 121 – 125
6. Topic 5 Addition Theorem of Probability 126 – 127
7. Topic 6 Multiplication Theorem of Probability 128 – 128
8. Topic 7 Conditional Probability 129 – 129
9. Topic 8 Law of Total Probability 130 – 131
10. Topic 9 Independence of Events 132 – 132
11. Topic 10 Combinatorial Concept 133 – 133
12. Topic 11 Summary 134 – 134
116. Learning Objectives
After studying this chapter, you should be able to:
Understand the meaning and important terms of probability
Learn about addition theorem and multiplicative theorem of probability
Understand the concept of independence of events, combinatorial concepts like
permutation and combination
Solve problems of conditional probability and Baye’s Theorem and other
concepts of probability
1– 116
117. Introduction
A probability is a quantitative measure of risk.
This chapter provides exposure to fundamental concepts, since probability is
inseparable from statistical methods.
1– 117
118. Important Terms in Probability
Probability and sampling are inseparable parts of statistics.
1– 118
Cont….
Random Experiment
Random experiment is an experiment whose outcome is not
predictable in advance.
119. 1– 119
Sample Space
Event
Event Space
Union of events
Intersection of events
Mutually exclusive events
Collectively exhaustive events
Complement of event
120. Kinds of Probability
1– 120
Classical
Probability
Axiomatic
Probability
Subjective
Probability
Relative
Frequency
Probability
121. Simple Propositions of Probability
Proposition 1
P (EC) = 1 – P (E)
Probability of compliment: Let even EC denote complement of the event E. Obviously by
definition of complement, EC has all elements from the sample space S that are not in E. Thus,
E and EC are mutually exclusive and collectively exhaustive. Therefore, by axiom 2 and 3 we
have,
1 = P(S) = P (E ∪ EC) = P (E) + P (EC)
or, P (EC) = 1 - P (E)
1– 121
Cont….
122. Proposition 2
If E ⊂ F, then P (E) ≤ P (F)
If the event E is contained in event F, that is, then we can express,
F = E ∪ (EC ∩ F).
However, as events E and (EC ∩ F) are mutually exclusive, we get,
P (F) = P (E) + P (EC ∩ F)
But, by axiom 1, P (EC ∩ F) ≥ 0. Therefore, we have proved the proposition,
P (E) ≤ P (F)
1– 122
Cont….
123. Proposition 3
P (E ∪ F) = P (E) + P (F) – P (E ∩ F)
Probability of unions: Event E ∪ F can be written as the union of the two
disjoint events namely E and (EC ∩ F). Thus, from axiom 3,
P (E ∪ F) = P [E ∪ (EC ∩ F)] = P (E) + P (EC ∩ F) (1)
Also, F = (E ∩ F) ∪ (EC ∩ F), hence,
P (F) = P (E ∩ F) + P (EC ∩ F) (2)
From (1) and (2) we get the proposition 3 as,
P (E ∪ F) = P (E) + P (F) - P (E ∩ F)
Extended statement of this proposition for n events is also called as inclusion-
exclusion principle.
P(E ∪ F ∪ G) = P(E) + P(F) + P(G) – P(EF) – P(FG) – P(EG) + P(E∩F∩G)
1– 123
Cont….
124. Proposition 4
Mutually exclusive events: When the sets corresponding to two events are
disjoint (have no common elements, or the intersection is null), the two events are
called mutually exclusive.
E ∩ F = Φ Therefore,
P (E ∩ F) = P (Φ) = 0
Also, for mutually exclusive events E and F,
P (E ∪ F) = P (E) + P (F)
1– 124
Cont….
125. Proposition 5
P (EC∩F) = P (F) – P (E∩F)
From set theory, F can be written as a union of two disjoint events E ∩ F and EC ∩
F . Hence, by Axiom III, we have, P(F) = P(E ∩ F) + P(EC ∩ F). By re- arranging the
terms we get the result.
1– 125
126. Addition Theorem of Probability
The addition theorem in the probability concept is the process of determination
of the probability that either event ‘A’ or event ‘B’ occurs or both occur. The notation
between two events ‘A’ and ‘B’ the addition is denoted as ‘∪’ and pronounced as
Union.
1– 126
Cont….
Let A and B be two events defined in a sample space. The union of events
A and B is the collection of all outcomes that belong either to A or to B or
to both A and B and is denoted by A or B.
127. The result of this addition theorem generally written using Set notation, P (A ∪ B) = P (A) + P
(B) – P (A ∩ B),
Where, P (A) = probability of occurrence of event ‘A’
P (B) = probability of occurrence of event ‘B’
P (A ∪ B) = probability of occurrence of event ‘A’ or event ‘B’.
P (A ∩ B) = probability of occurrence of event ‘A’ or event ‘B’.Addition theorem probability
can be defined and proved as follows: Let ‘A’ and ‘B’ are Subsets of a finite non empty set ‘S’
then according to the addition rule
P (A ∪ B) = P (A) + P (B) – P (A). P(B),
On dividing both sides by P(S), we get
P (A ∪ B) / P(S) = P (A) / P(S) + P (B) / P(S) – P (A ∩ B) / P(S) (1).
1– 127
128. Multiplication Theorem of Probability
Probability is the branch of mathematics which deals with the occurrence of
samples. The basic form of Multiplication theorems on probability for two events
‘X’ and ‘Y’ can be stated as,
P (x. y) = p (x). P(x / y)
Here p (x) and p (y) are the probabilities of occurrences of events ‘x’ and ‘y’
respectively.
P (x / y) is the Conditional Probability of ‘x’ and the condition is that ‘y’ has
occurred before ‘x’.
P (x / y) is always calculated after ‘y’ has occurred. Here, occurrence of ‘x’
depends on ‘y’. ‘y’ has changed some events already. So, occurrence of ‘x’ also
changes.
1– 128
129. Conditional Probability
1– 129
Conditional probability is the probability
that an event will occur given that another
event has already occurred. If A and B are two
events, then the conditional probability of A
given B is written as P(A/B) and read as “the
probability of A given that B has already
occurred.”
130. 1– 130
Consider two events, E and F. Whatsoever be the events, we can
always say that the probability of E is equal to the probability of
intersection of E and F, plus, the probability of the intersection
of E and complement of F. That is,
P (E) = P (E ∩ F) + P (E ∩ F ∩ C)
Law of Total Probability
131. Bayes’s Formula
Let, E and F are events.
E = (E ∩ F) U (E ∩ F ∩ C)
For any element in E, must be either in both E and F or be in E but not in F. (E F) and (E
FC) are mutually exclusive, since former must be in F and latter must not in F, we have by
Axiom 3,
P (E) = (E F) + (E FC) = P(E/F) × P(F) +P(E/FC) × P(FC) = P(E/F) × P(F) + ()[1()]
1– 131
134. Summary
In this chapter, we discussed basic idea of probability. We defined probability in
different ways and pointed out serious limitations of each definition.
Then we discussed axioms of probability, which are the backbone of theory of
probability. Then we studied number of useful propositions of probability.
We also defined conditional probability, law of total probability, and Bayes’
Theorem. We also defined mutually exclusive events, and independence of
events.
Lastly, we discussed few important concepts of combinatorial analysis, which
comes very handy while calculating probability of an event.
1– 134
135. Probability Distribution
1– 135
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 136 – 136
2. Topic 1 Introduction 137 – 137
3. Topic 2 Random Variable 138 – 139
4. Topic 3 Probability Distributions of Standard Random Variables 140 – 140
5. Topic 4 Bernoulli Distribution 141 – 142
6. Topic 5 Binomial Distribution 143 – 145
7. Topic 6 Poisson Distribution 146 – 147
8. Topic 7 Normal Distribution 148 – 149
9. Topic 8 Summary 150 – 153
136. Learning Objectives
After studying this chapter, you should be able to:
Differentiate between discrete and continuous random variables
Discuss probability distributions of standard random variable
Understand discrete probability distribution which include Binomial and Poisson
Distribution
Explain continuous probability distribution which includes Normal distribution
1– 136
137. Introduction
We will study a few common distributions in this chapter.
Normal distribution has extensive use in statistical tools and therefore readers are
advised to study it in detail.
Knowledge of sequences, series and calculus is expected.
1– 137
138. Random Variable
Arandom variable, usually writtenX, is a variable whose possible values are numerical
outcomes of a random phenomenon.
1– 138
Cont….
139. 1– 139
Discrete and Continuous Random Variables
Probability Mass Function (P.M.F.)
Probability Density Function
Cumulative Distribution Function
Expectation Value of Random Variables
Expected Value of a Function of a Random Variable
Variance and Standard Deviation of Random Variable
140. Probability Distributions of Standard Random Variables
1– 140
Bernoulli
Distribution
Binomial
Distribution
Normal
Distribution
Poisson
Distribution
2
1
3
4
141. Bernoulli Distribution
1– 141
Cont….
It is a basis of many discrete
random variables, as it deals
with individual trial. It is a
building block for other
random variables. It is a
single trial distribution.
142. 1– 142
Bernoulli trial is fundamental to many discrete distributions like Binomial,
Poisson, Geometric, etc. Situations where Bernoulli distribution is commonly
used are:
Sex of newborn child; Male = 0, Female = 1 say.
Items produced by a machine are Defective or Non-defective.
During next flight an engine will fail or remain serviceable.
Student appearing for examination will pass or fail.
Application of Bernoulli Distribution
143. Binomial Distribution
1– 143
Cont….
A binomial random variable is the number of
successes x in n repeated trials of a binomial
experiment. The probability distribution of a
binomial random variable is called a binomial
distribution (also known as a Bernoulli
distribution).
144. Applications of Binomial Distribution
Trials are finite (and not very large), performed repeatedly for ‘n’ times.
Each trial (random experiment) should be a Bernoulli trial, the one that results in either
success or failure.
Probability of success in any trial is ‘p’ and is constant for each trial.
All trials are independent.
1– 144
Cont….
145. Following are some of the real life examples of applications of binomial distribution.
Number of defective items in a lot of n items produced by a machine.
Number of male births out of n births in a hospital.
Number of correct answers in a multiple-choice test.
Number of seeds germinated in a row of n planted seeds.
Number of re-captured fish in a sample of n fishes.
Number of missiles hitting the targets out of n fired.
1– 145
146. Poisson Distribution
1– 146
Cont….
A random variable X, taking one of the values 0, 1, 2 … is said to
be a Poisson random variable with parameter λ, if for some λ > 0,
P(X = i) is a probability mass function (p.m.f.) of the Poisson random
variable. Its expected value and variance are,
m = E [X] = l
Var [X] = l
147. 1– 147
Some of the common examples where Poisson random variable can be used to
define the probability distribution are:
Number of accidents per day on expressway.
Number of earthquakes occurring over fixed time span.
Number of misprints on a page.
Number of arrivals of calls on telephone exchange per minute.
Number of interrupts per second on a server.
148. Normal Distribution
1– 148
Cont….
Equation For Normal Probability Curve
Standard Normal Distribution
Properties Of Normal Distribution
Areas Under Standard Normal Probability
Curve
Importance Of Normal Distribution
150. Summary
Random variable is a real valued function defined over a sample space with
probability associated with it. The value of the random variable is outcome of an
experiment. Random variables are neither ‘random’ nor ‘variable’.
In this chapter we discussed several important random variables, the associated
formulae, and problem solving using formulae. A discrete random variable is the one that takes
at the most countable values. A continuous random variable can take any real value.
1– 150
Cont….
151. We also discussed probability distributions of random variables. Binomial
distribution is used if an experiment is carried out for finite number of n independent
trials; all trials being Bernoulli trials with constant probability of success p.
Random variable will follow Poisson distribution if it is the number of occurrences of a
rare event during a finite period. Waiting time for a rare event is exponentially distributed.
Negative binomial distribution is used if numbers of Bernoulli trials are made to achieve
desired number of successes.
1– 151
Cont….
152. One of the continuous random variable required often is uniform random
variable. Waiting time for an event that occurs periodically follows uniform
distribution.
Normal probability distribution is the most important distribution in statistics. We
defined normal distribution with parameters (μ, σ) where μ is mean and σ is standard
deviation.
Further, we defined standard normal distribution, which is a special case of
normal distribution with parameters (0, 1).
1– 152
Cont….
153. We also discussed transformation of normal random variable X to standard
random variable Z using xzms−= Z distribution is very convenient for manual
calculation as we can use standard normal tables which are extensively plotted, to find
probability and interval.
Normal distribution is used as a model in many real world situations, both as a
continuous distribution or an approximation to discrete distributions like binomial or
Poisson.
1– 153
154. Use of Excel Software for Statistical Analysis
1– 154
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 155 – 155
2. Topic 1 Introduction 156 – 157
3. Topic 2 Introduction to Excel 158 – 168
4. Topic 3 Entering Data in Excel 169 – 169
5. Topic 4 Descriptive Statistics 170 – 172
6. Topic 5 Basic Built-in Functions (Average, Mean, Mode,
Count, Max and Min)
173 – 177
7. Topic 6 Statistical Analysis 178 – 182
8. Topic 7 Normal Distribution 183 – 183
9. Topic 8 Brief about SPSS 184 – 189
10. Topic 9 Summary 190 – 194
155. Learning Objectives
After studying this chapter, you should be able to:
Understand the basic concepts of using Microsoft Excel
Discuss how to enter data in excel and basic built-in functions
Gain knowledge about SPSS
1– 155
156. Introduction
The most popular software in the MS Office Suite includes the following:
Microsoft Word
Microsoft Excel
Microsoft PowerPoint
Microsoft Access
Microsoft Project Plan
Microsoft Outlook
1– 156
Cont….
157. 1– 157
MICROSOFT OFFICE SUITE
Suite Product Home and
Student
Home and
Business
Professional
Word
2010
Included Included Included
Excel
2010
Included Included Included
PowerPoint
2010
Included Included Included
OneNote
2010
Included Included Included
Outlook
2010
- Included Included
Access
2010
- - Included
Publisher
2010
- - Included
158. Introduction to Excel
Opening A Document
Click on File-Open (Ctrl+O) to open/retrieve an existing workbook; change the
directory area or drive to look for files in other locations.
To create a new workbook, click on File-New-Blank Document.
1– 158
Cont….
159. Saving And Closing A Document
To save your document with its current filename, location and file format
either click on File - Save.
When you have finished working on a document you should close it. Go to
the File menu and click on Close.
1– 159
Cont….
168. Moving between Cells
While working with any Office productivity tool, the clipboard functions are
invaluable.
The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’.
In the Microsoft Office suite, there are keyboard shortcuts for these functions.
1– 168
KEYBOARD SHORTCUTS
Cut Ctrl + X
Copy Ctrl + C
Paste Ctrl + V
169. Entering Data in Excel
A new worksheet is a grid of
rows and columns. The rows
are labeled with numbers, and
the columns are labeled with
letters. Each intersection of a
row and a column is a cell.
1– 169
Entering
Labels
Entering
Values
Rounding
Numbers that
Meet Specified
Criteria
Sorting by
Columns
170. 1– 170
Cont….
Descriptive Statistics
Excel includes elaborate and customisable toolbars, for example the “standard”
toolbar shown here:
Some of the icons are useful mathematical computation: is the “Autosum” icon,
which enters the formula “=sum ()” to add up a range of cells.
is the “Function Wizard” icon, which gives you access to all the functions
available.
171. is the “Graph Wizard” icon, giving access to all graph types available, as
shown in this display:
1– 171
Cont….
172. Excel can be used to generate measures of location and variability for a variable. Suppose we
wish to find descriptive statistics for a sample data: 2, 4, 6, and 8.
Step1: Select the Tools *pull-down menu, if you see data analysis, click on this option,
otherwise, click on add-in.. option to install analysis tool pak.
Step 2: Click on the data analysis option.
Step 3: Choose Descriptive Statistics from Analysis Tools list.
Step 4: When the dialog box appears:
Enter A1:A4 in the input range box, A1is a value in column A and row 1; in this
case this value is 2. Using the same technique enters other VALUES until you reach the last
one.
Step 5: Select an output range, in this case B1. Click on summary statistics to see the
results.
Select OK.
1– 172
173. Basic Built – in Functions (Average, Mean, Mode, Count, Max and
Min)
Manual Equation Entry
1– 173
Cont….
175. SUM Function
The SUM function is probably the most commonly used function in Excel. It comes in three
flavours in Excel, namely:
1– 175
Cont….
SUMIF()
SUMIFS()
SUM()
1
2
3
177. Statistical Functions
Statistical functions are invaluable in any mathematical calculations.
They can provide insights into trends provide data for detailed analysis as
well as help identify gaps that need to be plugged.
Excel provides a wide range of functions that can be used to perform basic
statistical analyses.
1– 177
178. Statistical Analysis
Creating Charts
Select the data range (only numbers) for which the chart needs to be created.
Under the Insert Ribbon, in the Chart section, click on the type of chart you want
to create and the category. Here the clustered chart has been used.
Select the chart and click on Select Data button in Data section of the Design
Layout.
In the Select Data Source dialog, select ‘Series 1’ and click on Edit button.
1– 178
Cont….
180. This opens the Edit Series dialog that allows you to change the range of values in series and
provide a Series name. For the series name, click on icon to select the column title of Series
1.
1– 180
Cont….
Edit Series
181. Histogram
Now follow the steps given below to draw histogram.
Select the first two columns i.e. class interval and frequency in the Excel sheet.
Click on ‘Chart Wizard’ icon on tool bar or select from menu [Insert → Chart…..] From
insert drop down menu. A dialogue box with title ‘Chart Wizard – Step 1 to 4 – Chart
type’ will appear.
In the menu ‘Standard Type’, select ‘Column’. Click on ‘Next’ button.
Now the next menu with title ‘Chart Wizard – Step 2 to 4 – Chart Source Data’ will
appear. Since we have already selected the source data, select ‘Next’. Don’t forget to check
that column is selected in data series.
Now the next menu with title ‘Chart Wizard – Step 3 to 4 – Chart Options’ will
appear.
1– 181
Cont….
182. Correlation Plot and Regression Analysis
Using MS Excel for calculating Karl Pearson’s correlation coefficient Calculating Karl
Pearson’s correlation coefficient using MS Excel is very simple. The steps are as follows:
Open an Excel worksheet and enter the data values of X and Y variables as two
arrays (columns or rows). Keep these contiguous if possible.
Select the cell where you want to store the result r. Enter the formula with syntax
as,
‘=CORREL (array1, array2)’
‘array1’ is a cell range of values and ‘array2’ is a second cell range of values.
1– 182
183. Normal Distribution
NORMDIST returns the normal distribution for the specified mean and standard deviation. This
function has a very wide range of applications in statistics, including hypothesis testing.
Syntax: NORMDIST(x,mean,standard_dev,cumulative)
X is the value for which you want the distribution.
Mean is the arithmetic mean of the distribution.
Standard_dev is the standard deviation of the distribution.
1– 183
184. Brief about SPSS
SPSS Statistics is a software package used for statistical analysis.
SPSS Files
SPSS uses several types of files. First, there is the file that contains data view and
variable view. These have been entered using SPSS Data Editor Window. It is known as an
SPSS system file.
1– 184
Cont….
190. Summary
Microsoft office is one of the most powerful office productivity tools in the market
today. The entire suite is vast and covers a wide range of software solutions catering to various
aspects of modern businesses.
Microsoft excel is a powerful accounting and calculation solution. It has a
standard tabular layout and it supports a wide range of arithmetic, accounting and
statistical functions.
The Microsoft Outlook is the mail client that can be set up to download mails from a
mail server as well as send and receive emails as desired. Being a part of the Microsoft
Office suite, this tool is compatible with other applications in the suite.
1– 190
Cont….
191. One of the most popular and widely used Microsoft Office Suites is the MS Office
2003. Later Microsoft released two other versions of Office, namely Office 2007 and
Office 2010. Although Office 2010 is the latest version, many businesses still continue to
use Office 2003. From Office 2003 to Office 2007, Microsoft radicalised the overall look
and feel of the office suite.
Excel is built on the concept of cell, rows, columns, spreadsheets and workbooks. The
entire structure is hierarchical, and this allows it to be scalable and versatile enough to adapt to
varying needs for users from different specialisations. Understanding the following
concepts is pretty useful in developing complex reports and models.
1– 191
Cont….
192. As long as you work on the soft copies, page layouts are not really important – you
can scroll a spreadsheet to view the contents. However, when it comes to printouts it is
important that one gets the page layouts sorted out. Excel 2010 has all the page layout
options under Page Layout menu item.
While working with any Office productivity tool, the clipboard functions are
invaluable. The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’. In the
Microsoft Office suite, there are keyboard shortcuts for these functions. Once you become
conversant with the Excel functions, you would prefer to use the keyboard shortcuts as
they are faster and easier to use than the mouse.
1– 192
Cont….
193. A new worksheet is a grid of rows and columns. The rows are labelled with
numbers, and the columns are labelled with letters. Each intersection of a row and a
column is a cell. Each cell has an address, which are the column letter and the row
number. The arrow on the worksheet to the right points to cell A1, which is currently
highlighted, indicating that it is an active cell. A cell must be active to enter
information into it.
Excel is a very powerful accounting tool, but before going to the real complex
functions, let us sees how to use Excel for simple calculations. There are two ways
of using Excel for simple calculations: you can enter the actual arithmetic equations in the
cell or use pre-defined Excel formulas to do the same.
1– 193
Cont….
194. Statistical calculations for exponential random variables could be calculated using
statistical functions available in MS Excel. NORMDIST returns the normal
distribution for the specified mean and standard deviation. This function has a very wide
range of applications in statistics, including hypothesis testing. Syntax:
NORMDIST(x,mean,standard_dev,cumulative)
SPSS Statistics is a software package used for statistical analysis. Long produced by
SPSS Inc., it was acquired by IBM in 2009. The current versions (2014) are officially named
IBM SPSS Statistics. Companion products in the same family are used for survey
authoring and deployment (IBM SPSS Data Collection), data mining(IBM SPSS Modeler), text
analytics, and collaboration and deployment (batch and automated scoring services).
1– 194