SlideShare a Scribd company logo
1 of 195
Business
Statistics
S. No. Reference
No.
Particulars Slide From
– To
1. Chapter 1 Introduction to Business Statistics 08 – 21
2. Chapter 2 Descriptive Statistics: Collection, Processing and Presentation of
Data
22 – 36
3. Chapter 3 Measures of Central Tendency 37 – 51
4. Chapter 4 Measures of Dispersion 52 – 66
5. Chapter 5 Skewness and Kurtosis 67 – 79
6. Chapter 6 Correlation Analysis 80 – 98
7. Chapter 7 Regression Analysis 99 – 114
8. Chapter 8 Theory of Probability 115 – 134
9. Chapter 9 Probability Distribution 135 – 153
10. Chapter 10 Use of Excel Software for Statistical Analysis 154 – 194
Course Index
1– 2
 Managerial decision-making can be made efficient and effective by
analyzing available data using appropriate statistical tools. Statistical tools
not only have application in research (marketing research included) but also in other
functional areas like quality management, inventory management, financial
analysis, human resource planning and so on.
Course Introduction
1– 3
Cont….
 The word statistics is derived from the Italian word ‘Stato’ which
means ‘state’; and ‘Statista’ refers to a person involved with the affairs of
state. Thus, statistics originally was meant for collection of facts useful for
affaires of the state, like taxes, land records, population demography, etc.
1– 4
Cont….
 Significant contribution has also been made by Indians in the field of
statistics. Prof Prasant Chandra Mahalanobis, is the first to pioneer the
study of statistical science in India. He founded the Indian Statistical Institute (ISI)
in1931. Mahalanobis viewed statistics as a tool in increasing the efficiency of
all human efforts and also concentrated on sample surveys.
 Statistics are the classified facts representing the conditions of the
people in the state…. specially those facts which can be stated in
number or in table of numbers or in any tabular or classified arrangement”.
– Webster
1– 5
Cont….
 Statistical methods are broadly divided into five categories. These are
Descriptive Statistics, Analytical Statistics, Inductive Statistics, Inferential
Statistics, Applied Statistics
 Statistics is an indispensable tool of production control and market
research. Statistical tools are extensively used in business for time and
motion study, consumer behaviour study, investment decisions, performance
measurements and compensations, credit ratings, inventory management,
accounting, quality control, distribution channel design, etc.
1– 6
Cont….
 Statistical analysis is a vital component in every aspect of research.
Social surveys, laboratory experiment, clinical trials, marketing research,
human resource planning, inventory management, quality management etc., require
statistical treatment before arriving at valid conclusions.
 Functions of statistics are Condensation, Comparison, Forecast, Testing of
hypotheses, Preciseness, Expectation.
 Statistical techniques, because of their flexibility and economy, have
become popular and are used in numerous fields. But statistics is not a
cure-all technique and has limitations. It cannot be applied to all kinds of
situations and cannot be made to answer all queries.
1– 7
Introduction to Business Statistics
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 09 – 09
2. Topic 1 Introduction 10 – 10
3. Topic 2 Development of Statistics 11 – 11
4. Topic 3 Definitions of Statistics 12 – 12
5. Topic 4 Importance of Statistics 13 – 13
6. Topic 5 Classification of Statistics 14 – 14
7. Topic 6 Role of Statistics 15 – 15
8. Topic 7 Functions of Statistics 16 – 16
9. Topic 8 Limitations of Statistics 17 – 17
10. Topic 9 Summary 18 – 21
1– 8
Learning Objectives
After studying this chapter, you should be able to:
 Understand the development, importance and role of statistics
 Explain the basic concept of statistical studies
 Understand the application of statistics in business and management
 Learn about functions and limitations of statistics
1– 9
Introduction
 Information derived from good statistical analysis is always precise and never useless.
 One of the primary tasks of a manager is decision-making.
 Statistical techniques offer powerful tools in the decision-making process.
 These tools have power to interpret quantitative information in a scientific and
an objective manner.
1– 10
Development of Statistics
 The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; and
‘Statista’ refers to a person involved with the affairs of state.
 Statistics originally was meant for collection of facts useful for affaires of the
state, like taxes, land records, population demography, etc.
 During ancients times even before 300BC, the rulers and kings, like Chandragupta
Maurya used statistics to maintain the land and revenue records, collection of taxes
and registration of births and deaths.
1– 11
Definitions of Statistics
 “Statistics are the classified facts representing the conditions of the people in
the state…. specially those facts which can be stated in number or in table of
numbers or in any tabular or classified arrangement”.
– Webster
 “By statistics we mean quantitative data affected to a marked extent by multiplicity of
causes”.
–Yule and Kendall
 “Statistics may be defined as the science of collection, presentation, analysis and
interpretation of data”.
– Croxton and Cowden
1– 12
Importance of Statistics
 Identify what information or data is worth collecting,
 Decide when and how judgments may be made on the basis of partial information,
and
 Measure the extent of doubt and risk associated with the use of partial information
and stochastic processes.
1– 13
Classification of Statistics
1– 14
Role of Statistics
1– 15
Role of
Statistics in
Business
Role of
Statistics
in Decision
Making
Role of
Statistics in
Research
Functions of Statistics
1– 16
Condensation Comparison
Forecast
Testing of
Hypotheses
Preciseness Expectation
Laws of Statistics
Limitations of Statistics
1– 17
COMMON STATISTICAL ISSUES
DISTRUST OF STATISTICS
MISUSE OF STATISTICS
Summary
 Managerial decision-making can be made efficient and effective by analyzing
available data using appropriate statistical tools. Statistical tools not only have
application in research (marketing research included) but also in other functional areas
like quality management, inventory management, financial analysis, human resource
planning and so on.
 The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; and
‘Statista’ refers to a person involved with the affairs of state. Thus, statistics originally was
meant for collection of facts useful for affaires of the state, like taxes, land records,
population demography, etc.
1– 18
Cont….
 Significant contribution has also been made by Indians in the field of statistics. Prof
Prasant Chandra Mahalanobis, is the first to pioneer the study of statistical science in
India. He founded the Indian Statistical Institute (ISI) in1931. Mahalanobis viewed statistics
as a tool in increasing the efficiency of all human efforts and also concentrated on sample
surveys.
 Statistics is the classified facts representing the conditions of the people in the
state…. specially those facts which can be stated in number or in table of numbers or in
any tabular or classified arrangement.
 Statistical methods are broadly divided into five categories. These are Descriptive
Statistics, Analytical Statistics, Inductive Statistics, Inferential Statistics and Applied
Statistics.
1– 19
Cont….
 Statistics is an indispensable tool of production control and market research.
Statistical tools are extensively used in business for time and motion study,
consumer behaviour study, investment decisions, performance measurements and
compensations, credit ratings, inventory management, accounting, quality control, distribution
channel design, etc.
 Statistical analysis is a vital component in every aspect of research. Social
surveys, laboratory experiment, clinical trials, marketing research, human
resource planning, inventory management, quality management, etc., require
statistical treatment before arriving at valid conclusions.
 Functions of statistics are Condensation, Comparison, Forecast, Testing of
hypotheses, Preciseness and Expectation.
1– 20
Cont….
 Statistical techniques, because of their flexibility and economy, have become
popular and are used in numerous fields. But statistics is not a cure-all technique and
has limitations. It cannot be applied to all kinds of situations and cannot be made to
answer all queries.
 More dangerous than distrust is misuse of statistics to draw convenient conclusions to
satisfy selfish or ulterior motives. Arguments and analysis supported by facts, figures,
charts, graphs, index numbers, etc. are indeed very appealing and convincing. They
can be used to intimidate opposing views. Hence, statistics is open to manipulation.
1– 21
Descriptive Statistics: Collection, Processing and
Presentation of Data
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 23 – 23
2. Topic 1 Introduction 24 – 24
3. Topic 2 Descriptive and Inferential Statistics 25 – 26
4. Topic 3 Collection of Data 27 – 27
5. Topic 4 Editing and Coding of Data 28 – 28
6. Topic 5 Classification of Data 29 – 29
7. Topic 6 Tabulation of Data 30 – 30
8. Topic 7 Diagrammatic and Graphical Presentation of Data 31 – 32
9. Topic 8 Summary 33 – 36
1– 22
Learning Objectives
After studying this chapter, you should be able to:
 Describe descriptive and inferential statistics
 Explain collection, editing and classification of primary and secondary data
 Define tabulation and presentation of data
 Understand diagrammatic and graphical presentation
 Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons and
Ogives
1– 23
Introduction
 Success of any statistical investigation depends on the availability of accurate and
reliable data.
 These depend on the appropriateness of the method chosen for data collection.
 Data collection is a very basic activity in decision-making.
 Data may be classified either as primary data or secondary data.
 Successful use of the collected data depends to a great extent upon the way it is
arranged, displayed and summarized.
1– 24
Descriptive and Inferential Statistics
1– 25
Descriptive Statistics
 Descriptive statistics is
the type of statistics that
probably comes to most
of the minds of people
when they hear the word
“statistics.”
Cont….
1– 26
Inferential Statistics
 Inferential statistics studies a
statistical sample, and
from this analysis we are able
to say something about the
population from which the
sample came.
Collection of Data
1– 27
Editing and Coding of Data
Editing Primary Data
 Completeness
 Consistency
 Accuracy
 Homogeneity
Editing Secondary Data
 Field Editing
 Central Editing
1– 28
 Coding is the process of
assigning some symbols either
alphabetical or numeral or both to
the answers so that the
responses can be recorded into
a limited number of classes or
categories.
Coding of Data
Classification of Data
Classification refers to the
grouping of data into
homogeneous classes and
categories. It is the process of
arranging things in groups or
classes according to their
resemblances and affinities.
1– 29
Rules of
Classification
Frequency
Distribution
Bases of
Classification
1 2
3
Types of
Tabulation
Advantages
of
Tabulation
Multi – Way
Tabulation
Two – Way
Tabulation
One – Way
Tabulation
Tabulation of Data
 Tabulation is arranging
the data in flat table
(two dimensional arrays)
format by grouping the
observations.
 Table is a spreadsheet
with rows and columns
with headings and stubs
indicating class of the
data.
1– 30
Diagrammatic and Graphical Presentation of Data
1– 31
Cont….
Difference Between Diagrams And Graphs
Difference between Diagram and Graphs
Diagram Graph
1. Can be drawn on an ordinary paper. 1. Can be drawn on a graph paper.
2. Easy to grasp. 2. Needs some effort to grasp.
3. Not capable of analytical treatment. 3. Capable of analytical treatment.
4. Can be used only for comparisons. 4. Can be used to represent a
mathematical relation.
5. Data are represented by bars, and
rectangles, pictures, etc.
5. Data are represented by lines
curves.
1– 32
OGIVES
FREQUENCY POLYGON
PIE DIAGRAM
HISTOGRAM
BAR DIAGRAM
TYPES OF DIAGRAMS
Summary
 There are two major divisions of the field of statistics, namely descriptive and
inferential statistics. Both the segments of statistics are important, and accomplish
different objectives.
 Data can be obtained through primary source or secondary source according to
need, situation, convenience, time, resources and availability. The most important method
for primary data collection is through questionnaire. Data must be objective and fact-
based so that it helps a decision-maker to arrive at a better decision.
 Statistical data is a set of facts expressed in quantitative form. Data is collected through
various methods. Sometimes our data set consists of the entire population we are
interested in. In other situations, data may constitute a sample from some population.
1– 33
Cont….
 Type of research, its purpose, conditions under which the data are obtained will
determine the method of collecting the data. If relatively few items of information are
required quickly, and funds are limited telephonic interviews are recommended. If
respondents are industrial clients Internet could also be used. If depth interviews and probing
techniques are to be used, it is necessary to employ investigators to collect data.
 The quality of information collected through the filling of a questionnaire
depends, to a large extent, upon the drafting of its questions. Hence, it is extremely
important that the questions be designed or drafted very carefully and in a tactful manner.
 Before any processing of the data, editing and coding of data is necessary to ensure
the correctness of data. In any research studies, the voluminous data can be handled
only after classification. Data can be presented through tables and charts.
1– 34
Cont….
 Classification refers to the grouping of data into homogeneous classes and
categories. It is the process of arranging things in groups or classes according to
their resemblances and affinities.
 A frequency distribution is the principle tabular summary of either discrete data or
continuous data. The frequency distribution may show actual, relative or cumulative
frequencies. Actual and relative frequencies may be charted as either histogram (a bar chart)
or a frequency polygon. Two commonly used graphs of cumulative frequencies are
less than ogive or more than ogive.
 Once the raw data is collected, it needs to be summarized and presented to the
decision-maker in a form that is easy to comprehend. Tabulation not only
condenses the data, but also makes it easy to understand. Tabulation is the fastest
way to extract information from the mass of data and hence popular even among those
not exposed to the statistical method.
1– 35
Cont….
 The charts help in grasping the data and analyze it qualitatively. This also helps
managers to effectively present the data as a part of reports. Various types of chart are
bar diagram, multiple bar diagrams, component bar diagram, deviation bar diagram, sliding
bar diagram, Histogram and Pie charts.
 A graphic presentation is another way of representing the statistical data in a simple
and intelligible form. There are two types of graphs which we have discussed, line
graphs and ogives.
1– 36
Measures of Central Tendency
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 38 – 38
2. Topic 1 Introduction 39 – 39
3. Topic 2 Characteristics of Central Tendency 40 – 41
4. Topic 3 Arithmetic Mean 42 – 42
5. Topic 4 Median 43 – 43
6. Topic 5 Mode 44 – 44
7. Topic 6 Empirical Relationship between Mean, Median and Mode 45 – 45
8. Topic 7 Limitations of Central Tendency 46 – 46
9. Topic 8 Summary 47 – 51
1– 37
Learning Objectives
After studying this chapter, you should be able to:
 Understand the concept and characteristics of central tendency
 Describe all the measures of central tendency: mean, median and mode.
 Explain merits and demerits of all measures of central tendency.
 Discuss partition values or positional measures like quartiles, deciles and
percentiles.
1– 38
Introduction
 The concept of central tendency plays a dominant role in the study of statistics.
 In many frequency distributions, the tabulated values show a distinct tendency to
cluster or to group around a typical central value.
 This behaviour of the data to concentrate the values around a central part of
distribution is called ‘Central Tendency’ of the data.
1– 39
Characteristics of Central Tendency
A good measure of central tendency should possess as far as possible the following
characteristics:
 Easy to understand.
 Simple to compute.
 Based on all observations.
 Uniquely defined.
 Possibility of further algebraic treatment.
 Not unduly affected by extreme values.
1– 40
Cont….
1– 41
Common Measures of Central Tendency
Mean
Median Mode
Arithmetic Mean
 The arithmetic mean of
a series is the quotient
obtained by dividing
the sum of the values
by the number of items.
In algebraic language,
if X1, X2, X3....... Xn
are the n values of a
variate X.
1– 42
Properties of Arithmetic Mean
Calculation of Simple Arithmetic Mean
Merits and Demerits of Arithmetic Mean
Weighted Arithmetic Mean
1– 43
Median
Median is the value, which divides the distribution of data, arranged in
ascending or descending order, into two equal parts. Thus, the ‘Median’ is a
value of the middle observation.
 Calculation of Median
 Merits and Demerits of Median
 Partition Values or Positional Measures
 Quartiles
 Deciles
 Percentiles
1– 44
 Mode is the value which
has the greatest frequency
density. Mode is denoted by Z.
 Calculation of Mode
 Merits and Demerits of Mode
 Graphic Location of Mode
Mode
Empirical Relationship between Mean, Median and Mode
 A distribution in which the mean, the median, and the mode coincide is
known as symmetrical (bell shaped) distribution. Normal distribution
is one such a symmetric distribution, which is very commonly used.
 If the distribution is skewed, the mean, the median and the mode are not equal. In a
moderately skewed distribution distance between the mean and the median is
approximately one third of the distance between the mean and the mode. This can be
expressed as:
Mean – Median = (Mean – Mode) / 3
Mode = 3 * Median – 2 * Mean
1– 45
Limitations of Central Tendency
 In case of highly skewed data.
 In case of uneven or irregular spread of the data.
 In open end distributions.
 When average growth or average speed is required.
 When there are extreme values in the data.
 Except in these cases AM is widely used in practice.
1– 46
Summary
 Measures of the central tendency give one of the very important characteristics
of the data. According to the situation, one of the various measures of central
tendency may be chosen as the most representative.
 Arithmetic mean is widely used and understood. What characterizes the three measures
of centrality, and what are the relative merits of each in the given situation, is the
question.
 Mean summarizes all the information in the data. Mean can be visualized as a
single point where all the mass (the weight) of the observations is concentrated. It is like a
centre of gravity in physics. Mean also has some desirable mathematical properties that
make it useful in the context of statistical inference.
1– 47
Cont….
 To simplify the manual calculation, we may sometimes use shift of origin and
change of scale. Shifting of origin is achieved by adding or subtracting a
constant to all observations. In case of discrete data we add or subtract (usually
subtract) a constant to the individual observations. Whereas for grouped data, we add or
subtract (usually subtract) the constant to the class mark values.
 There are cases where relative importance of the different items is not the
same. In such a case, we need to compute the weighted arithmetic mean. The
procedure is similar to the grouped data calculations studied earlier, when we
consider frequency as a weight associated with the class-mark.
 Median is the middle value when the data is arranged in order. The median is
resistant to the extreme observations. Median is like the geometric centre in physics. In
case we want to guard against the influence of a few outlying observations (called
outliers), we may use the median.
1– 48
Cont….
 Quantiles are related positional measures of central tendency. These are useful and
frequently employed measures. Most familiar quantiles are Quartiles, Deciles, and
Percentiles.
 Quartiles are position values similar to the Median. There are three quartiles
denoted by Q1, Q2 and Q3. Q1 is called the lower Quartile or first quartile. The second
quartile Q2 is nothing but the median. In a distribution, one fourth of the item are less then Q1
and the other ¾ th item are greater then Q1 is called the upper quartile (or) the 3rd
quartile.
 Inter-quartile range is defined as the difference between the first and third
quartile. It is a measure of spread of the data.
 D1, D2, D3… and D9 are the nine deciles. They divide a series into 10 equal
parts. One tenth of the items are less than or equal to D1. One tenth of the
items are more than or equal to D9 and one tenth of the items between any
successive pairs of deciles when all the items are in ascending order
1– 49
Cont….
1– 50
Cont….
 Pth percentile of a group of observations is that observation below which lie P%
(P percent) observations. The position of Pth percentile is given by
, where ‘n’ is the number of data points.
 If the value of is a fraction, we need to interpolate the value.
 The Mode of a data set is the value that occurs most frequently. There are
many situations in which arithmetic mean and median fail to reveal the true
characteristics of a data (most representative figure), for example, most common size of
shoes, most common size of garments etc. In such cases, mode is the best-suited
measure of the central tendency.
 A distribution in which the mean, the median, and the mode coincide is
known as symmetrical (bell shaped) distribution. Normal distribution is one
such a symmetric distribution, which is very commonly used.
This can be expressed as:
 Mean – Median = (Mean – Mode) / 3
 Mode = 3 * Median – 2 * Mean
 No single average can be regarded as the best or most suitable under all circumstances.
Each average has its merits and demerits and its own particular field of importance and
utility. A proper selection of an average depends on the (1) nature of the data and (2)
purpose of enquiry or requirement of the data.
1– 51
Measures of Dispersion
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 53 – 53
2. Topic 1 Introduction 54 – 54
3. Topic 2 Characteristics of Measures of Dispersion 55 – 55
4. Topic 3 Absolute and Relative Measures of Dispersion 56 – 57
5. Topic 4 Range 58 – 59
6. Topic 5 Inter-quartile Range and Deviations 60 – 60
7. Topic 6 Variance and Standard Deviation 61 – 62
8. Topic 7 Summary 63 – 66
1– 52
Learning Objectives
After studying this chapter, you should be able to:
 Understand absolute and relative measures of variation
 Learn about range and inter-quartile range
 Discuss variance, standard deviation, mean deviation and coefficient of variation
 Study the empirical relationship between different measures of variation
1– 53
Introduction
 A measure of dispersion
or variation in any data
shows the extent to
which the numerical
values tend to spread
about an average.
1– 54
Data is useful:
 To compare the current results
with the past results.
 To compare two are more sets
of observations.
 To suggest methods to control
variation in the data.
Characteristics of Measures of Dispersion
1– 55
It should be easy to compute.
It should be rigidly defined.
It should be based on each individual item of the distribution.
It should be capable of further algebraic treatment.
It should have sampling stability.
It should not be unduly affected by the extreme items.
It should be simple to understand.
Absolute and Relative Measures of Dispersion
 ‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a
measure of absolute dispersion to an appropriate average.
 A precise measure of dispersion is one which gives the magnitude of the
variation in a series, i.e. it measures in numerical terms, the extent of the
scatter of the values around the average.
1– 56
Cont….
1– 57
ABSOLUTE AND RELATIVE MEASURES OF DISPERSION
Measures of Dispersion Relative Variability
The range Relative range
The Quartile Deviation Relative Quartile Deviation
The Mean Deviation Relative Mean deviation
The Median Deviation Coefficient of Variation
The Standard Deviation
Graphical Method
1– 58
Cont….
Range
The ‘Range’ of the data is the difference
between the largest value of data and smallest
value of data.
Merits and Demerits of Range
Merits
 Range is a simplest method of studying dispersion.
 It takes lesser time to compute the ‘absolute’ and ‘relative’ range.
Demerits
 Range does not take into account all the values of a series, i.e. it considers only
the extreme items and middle items are not given any importance.
 Range cannot be computed in the case of “open ends’ distribution i.e., a distribution
where the lower limit of the first group and upper limit of the higher group is not given.
1– 59
Inter – Quartile Range and Deviations
Inter-quartile Range
 Inter-quartile range is a difference between upper quartile (third quartile) and
lower quartile (first quartile).
Quartile Deviation
 Quartile Deviation is the average of the difference between upper quartile and
lower quartile.
Mean Deviation
 Mean deviation is the arithmetic mean of the absolute deviations of the values about
their arithmetic mean or median or mode.
1– 60
Variance and Standard Deviation
1– 61
Cont….
Variance is defined as the average of squared
deviation of data points from their mean.
1– 62
Different Formulae
for Calculating
Variance
Calculation
of Standard
Deviation
Properties
of Standard
Deviation
Merits and
Demerits of
Standard Deviation
Standard
Deviation of
Combined Means
Coefficient
of Variation
Empirical
Relationship Between
Different Measures of
Variation
Summary
 Study of distribution is very important for decision-making. Usually, measures of
central tendency and variability are adequate for taking decision. However, if data is quite
different from normal distribution then measure skewness and kurtosis need to be
considered. We discussed measures of variability: Range, Variance and Standard
Deviation.
 A measure of dispersion gives an idea about the extent of lack of uniformity in the
sizes and qualities of the items in a series. It helps us to know the degree of uniformity and
consistency in the series. If the difference between items is large the dispersion or
variation is large and vice versa.
1– 63
Cont….
 The measures of dispersion can be either ‘absolute’ or ‘relative’. Absolute
measures of dispersion are expressed in the same units in which the original data
are expressed. For example, if the series is expressed as Marks of the students in a
particular subject; the absolute dispersion will provide the value in Marks. The only difficulty
is that if two or more series are expressed in different units, the series cannot be compared on
the basis of dispersion.
 The ‘Range’ of the data is the difference between the largest value of data and
smallest value of data. This is an absolute measure of variability. However, if we
have to compare two sets of data, ‘Range’ may not give a true picture. In such case,
relative measure of range, called coefficient of range is used.
 Inter-quartile range is a difference between upper quartile (third quartile) and lower
quartile (first quartile). Quartile Deviation is the average of the difference between upper
quartile and lower quartile.
1– 64
Cont….
 Average used for calculating deviation can be the mean, the median or the mode.
However, usually the mean is used. There is also an advantage of taking deviations from
the median, because ‘Mean Deviation’ from median is lowest as compared to any other
‘Mean Deviations’. Since absolute values of deviations ignoring sign are taken for
calculating Mean Deviation, the mean deviation is not amenable to further algebraic
treatment.
 The variance is the average squared deviation of the data from their mean. For
sample data, we take the average by dividing with (n-1) where n is a sample size. This
is to cater for degree of freedom. For population data, we average by dividing with the
population size N.
 The Standard Deviation (SD) of a set of data is the positive square root of the
variance of the set. This is also referred as Root Mean Square (RMS) value of the
deviations of the data points. SD of sample is the square root of the sample variance
1– 65
Cont….
 There is no effect of shifting origin on standard deviation or variance.
 The measures of deviation are very effective in making reports and presentations by
the business executives to present their data top general public who do not
understand statistical methods.
 Variance analysis also helps in managing budgets by controlling budgeted versus
actual costs. Without the standard deviation, you can’t compare two data sets
effectively.
1– 66
Skewness and Kurtosis
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 68 – 68
2. Topic 1 Introduction 69 – 70
3. Topic 2 Karl Pearson’s Coefficient of Skewness (SKP) 71 – 71
4. Topic 3 Bowley’s Coefficient of Skewness (SKB) 72 – 72
5. Topic 4 Kelly’s Coefficient of Skewness (SKK) 73 – 73
6. Topic 5 Measures of Kurtosis 74 – 74
7. Topic 6 Moments 75 – 75
8. Topic 7 Summary 76 – 79
1– 67
Learning Objectives
After studying this chapter, you should be able to:
 Understand the concept and different types of skewness
 Discuss various measures of kurtosis
 Learn about moments, its properties and coefficients based on moments
1– 68
Introduction
Skewness is a measure that studies the degree and direction of departure from symmetry.
Nature of Skewness
Skewness can be positive or negative or zero.
When the values of mean, median and mode are equal, there is no skewness.
 When mean > median > mode, skewness will be positive.
 When mean < median < mode, skewness will be negative.
1– 69
Cont….
Characteristic of a Good Measure of Skewness
 It should be a pure number in the sense that its value should be independent of
the unit of the series and also degree of variation in the series.
 It should have zero-value, when the distribution is symmetrical.
 It should have a meaningful scale of measurement so that we could easily
interpret the measured value.
Mathematical measures of skewness can be calculated by:
 Karl-Pearson’s Method
 Bowley’s Method
 Kelly’s method
1– 70
Karl Pearson’s Coefficient of Skewness (SKP)
1– 71
Karl Person has suggested two formulae:
 Where the relationship of mean and mode is
established;
 Where the relationship between mean and
median is not established.
Bowley’s Coefficient of Skewness (SKB)
 Bowley’s method of skewness is based on the values of median, lower and
upper quartiles. This method suffers from the same limitations which are in the
case of median and quartiles.
 Wherever positional measures are given, skewness should be measured by
Bowley’s method. This method is also used in case of ‘open-end series’, where the
importance of extreme values is ignored.
Absolute skewness = Q3 + Q1 – 2 Median
Coefficient of Skewness, (SkB) =
Where, Q is quartile.
1– 72
Kelly’s Coefficient of Skewness (SKK)
Kelly’s coefficient of skewness is defined as:
Skk =
Where, P is percentile.
Example: Calculate the Kelly’s coefficient of skewness from the following data:
1– 73
Measures of Kurtosis
 Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more and
more peaked will be the distribution. The kurtosis is calculated either as an absolute or a
relative value. Absolute kurtosis is always a positive number.
1– 74
Negative kurtosis indicates a flatter distribution than the normal
distribution, and called as platykurtic.
A positive kurtosis means more peaked curve, called Leptokurtic.
Peakedness of normal distribution is called Mesokurtic.
1– 75
Moments
 The arithmetic mean
of various powers of
these deviations in
any distribution is
called the moments of
the distribution about
mean.
PROPERTIES OF MOMENTS
COEFFICIENTS BASED ON MOMENTS
Summary
 Measures of Skewness and Kurtosis, like measures of central tendency and
dispersion, study the characteristics of a frequency distribution. Averages tell us
about the central value of the distribution and measures of dispersion tell us about the
concentration of the items around a central value.
 When two or more symmetrical distributions are compared, the difference in them
is studied with ‘Kurtosis’. On the other hand, when two or more symmetrical distributions are
compared, they will give different degrees of Skewness. These measures are mutually
exclusive i.e. the presence of skewness implies absence of kurtosis and vice-versa.
1– 76
Cont….
 Bowley’s method of skewness is based on the values of median, lower and upper
quartiles. This method suffers from the same limitations which are in the case of median
and quartiles. Wherever positional measures are given, skewness should be measured by
Bowley’s method. This method is also used in case of ‘open-end series’, where the
importance of extreme values is ignored.
 Kelly’s coefficient of skewness is defined as:
Skk =
Where, P is percentile.
1– 77
Cont….
 Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more and
more peaked will be the distribution. The kurtosis is calculated either as an absolute or a
relative value. Absolute kurtosis is always a positive number. Absolute kurtosis of a
normal distribution (symmetric bell shaped distribution) is taken as 3. It is taken as
datum to calculate relative kurtosis as follows:
Absolute kurtosis =
Relative kurtosis = Absolute kurtosis – 3
1– 78
Cont….
 Moments about mean are generally used in statistics. We use a Greek alphabet read as
mu for these moments. Consider a mass attached at each point proportional to its
frequency and take moments about the mean. First, second, third and fourth moments can
be used as a measure of Central Tendency, Variation (dispersion), asymmetry and peakedness
of the curve.
1– 79
Correlation Analysis
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 81 – 81
2. Topic 1 Introduction 82 – 83
3. Topic 2 Types of Correlation 84 – 84
4. Topic 3 Methods of Calculating Correlation 85 – 85
5. Topic 4 Scatter Diagram Method 86 – 86
6. Topic 5 Co-variance Method – The Karl Pearson’s Correlation
Coefficient
87 – 88
7. Topic 6 Rank Correlation Method 89 – 89
8. Topic 7 Correlation Coefficient using Concurrent Deviation 90 – 91
9. Topic 8 Summary 92 – 98
1– 80
Learning Objectives
After studying this chapter, you should be able to:
 Understand the concept of correlation
 Study about different types of correlation
 Describe various methods of calculating correlation such as scatter diagram
method
 Discuss various types of correlation coefficients viz, Karl Pearson correlation
coefficient, rank correlation and coefficient based on concurrent deviations.
1– 81
1– 82
Cont….
Croxton and Cowden say, “When
the relationship is of a quantitative
nature, the appropriate statistical
tool for discovering and measuring
the relationship and expressing it
in a brief formula is known as
correlation”.
Introduction
The study of correlation helps managers in following ways:
 To identify relationship of various factors and decision variables.
 To estimate value of one variable for a given value of other if both are correlated. E.g.
estimating sales for a given advertising and promotion expenditure.
 To understand economic behaviour and market forces.
 To reduce uncertainty in decision-making to a large extent.
1– 83
Types of Correlation
1– 84
Positive or Negative Correlation
Simple or Multiple Correlations
Partial or Total Correlation
Linear and Non-linear Correlation
Methods of Calculating Correlation
1– 85
Scatter
Diagram Method
Karl Pearson’s
Coefficient of
Correlation
Rank
Method
Concurrent
Deviation
Method
Scatter Diagram Method
1– 86
The pattern of points
obtained by plotting the
observed points are knows
as scatter diagram.
It gives us two types of information.
 Whether the variables are related or
not.
 If so, what kind of relationship or
estimating equation that describes
the relationship.
Co – Variance Method – The Karl Pearson’s Correlation Coefficient
The correlation coefficient measures the degree of association between two variables X and Y.
Karl Pearson’s formula for correlation coefficient is given as,
Where r is the ‘Correlation Coefficient’ or
‘Product Moment Correlation Coefficient’
between X and Y.
1– 87
Cont….
1– 88
Estimation of Probable Error
Interpretation of R
Assumptions Underlying Karl Pearson’s Correlation Coefficient
Rank Correlation Method
1– 89
RANK CORRELATION WHEN RANKS ARE GIVEN
RANK CORRELATION WHEN RANKS ARE NOT GIVEN
RANK CORRELATION WHEN EQUAL RANKS ARE GIVEN
Correlation Coefficient using Concurrent Deviation
 This is the easiest method to find the correlation between two variables. Although the
method is effective in giving the direction of the correlation as positive or negative but fails
to give the accurate strength of the correlation. In this method we check the fluctuation in
each data series as increasing (+), or decreasing (-) or equal values. Then we count the
number of items that increase or decrease or remains equal concurrently and denote as c. The
correlation coefficient is then calculated as,
Where, n = total number of pairs.
c = Number of concurrent changes
1– 90
Cont….
1– 91
Example: The data of advertisement expenditure (X) and sales (Y) of a company for past
10 year period is given below. Determine the correlation coefficient between these
variables and comment the correlation.
Summary
 In this chapter the concept of correlation or the association between two variables has
been discussed. A scatter plot of the variables may suggest that the two variables are
related but the value of the Pearson correlation coefficient r quantifies this
association.
 Correlation is a degree of linear association between two random variables. In
these two variables, we do not differentiate them as dependent and independent variables. It
may be the case that one is the cause and other is an effect i.e. independent and
dependent variables respectively. On the other hand, both may be dependent
variables on a third variable.
1– 92
Cont….
 In business, correlation analysis often helps manager to take decisions by
estimating the effects of changing the values of the decision variables like
promotion, advertising, price, production processes, on the objective parameters like
costs, sales, market share, consumer satisfaction, competitive price. The decision
becomes more objective by removing subjectivity to certain extent.
 The correlation coefficient r may assume values between –1 and 1. The sign
indicates whether the association is direct (+ve) or inverse (-ve). A numerical
value of r equal to unity indicates perfect association while a value of zero
indicates no association.
1– 93
Cont….
 The correlation is said to be positive when the increase (decrease) in the value of one
variable is accompanied by an increase (decrease) in the value of other variable also.
Negative or inverse correlation refers to the movement of the variables in opposite direction.
Correlation is said to be negative, if an increase (decrease) in the value of one variable is
accompanied by a decrease (increase) in the value of other.
 In simple correlation the variation is between only two variables under study and
the variation is hardly influenced by any external factor. In other words, if one of the
variables remains same, there won’t be any change in other variable.
1– 94
Cont….
 In case of multiple correlation analysis there are two approaches to study the
correlation. In case of partial correlation, we study variation of two variables and
excluding the effects of other variables by keeping them under controlled condition.
 When the amount of change in one variable tends to keep a constant ratio to the
amount of change in the other variable, then the correlation is said to be linear. But if the
amount of change in one variable does not bear a constant ratio to the amount of
change in the other variable then the correlation is said to be non-linear.
1– 95
Cont….
 Correlation analysis may also be necessary to eliminate a variable which
shows low or hardly any correlation with the variable of our interest. In statistics, there
are number of measures to describe degree of association between variables. These
are Karl Pearson’s Correlation Coefficient, Spearman’s rank correlation coefficient,
coefficient of determination, Yule’s coefficient of association, coefficient of
colligation, etc.
 The correlation coefficient measures the degree of association between two
variables X and Y.
 Karl Pearson’s formula for correlation coefficient is given as,
1– 96
Cont….
 The purpose of computing a correlation coefficient in such situations is to
determine the extent to which the two sets of ranking are in agreement. The
coefficient that is determined from these ranks is known as Spearman’s rank
coefficient, rs. This is defined by the following formula:
1– 97
Cont….
 Although the concurrent deviation method is effective in giving the direction of
the correlation as positive or negative but fails to give the accurate strength of the correlation.
In this method we check the fluctuation in each data series as increasing (+), or
decreasing (–) or equal values. Then we count the number of items that increase or
decrease or remains equal concurrently and denote as c. The correlation coefficient is
then calculated as,
Where, n = total number of pairs.
c = Number of concurrent changes
1– 98
Regression Analysis
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 100 – 100
2. Topic 1 Introduction 101 – 101
3. Topic 2 Regression Analysis 102 – 103
4. Topic 3 Simple Linear Regression 104 – 106
5. Topic 4 Coefficient of Regression 107 – 108
6. Topic 5 Non-linear Regression Models 109 – 109
7. Topic 6 Correlation Analysis vs Regression Analysis 110 – 110
8. Topic 7 Summary 111 – 114
1– 99
Learning Objectives
After studying this chapter, you should be able to:
 Understand the concept of regression analysis
 Discuss the applicability of regression
 Describe simple linear regression and nonlinear regression model.
 Learn about coefficient of regression and linear regression equations
1– 100
Introduction
 In regression analysis we develop an equation called as an estimating equation used
to relate known and unknown variables.
 Then correlation analysis is used to determine the degree of the relationship
between the variables.
 In this chapter we will learn, how to calculate the regression line
mathematically.
1– 101
Regression Analysis
1– 102
Cont….
According to Morris Myers Blair, “regression is the measure of the
average relationship between two or more variables in terms of the
original units of the data.”
1– 103
Applicability of Regression Analysis
 Regression analysis is a branch of statistical
theory which is widely used in all the scientific
disciplines. It is a basic technique for measuring or
estimating the relationship among economic
variables that constitute the essence of economic
theory and economic life.
Simple Linear Regression
1– 104
Cont….
The
highest
power
of x is
called
as
order of
the
model.
 This model is used if we have
bivariate distribution i.e. only two
variables are considered and the
‘best fit’ curve is approximated to a
straight line.
Simple Linear Regression Model
 The linear regression model uses straight line relationship. Equation of a
straight line is of the form,
(1)
 Where ŷ is the predicted value of Y corresponding to x.  and  are constants. Now
if we assume the error (deviation) in Y direction is e, we can write the
relationship of X and Y in data points as,
 Error e is the amount by which observation will fall off regression line. Error e is due
to random error ‘a’ and ‘b’ are called parameters of the linear regression model whose values
are found out from the observed data.
1– 105 Cont….
Linear Regression Equation
 Suppose the data points are (x1, y1) (x2, y2) ….. (xn, yn) . Then we can write from
regression equation,
(2)
Thus, sum square of errors is,
 To have minimum sum of squares of errors (SSE) we must have the condition,
1– 106
1– 107
Cont….
Coefficient of Regression
The coefficients of regression are bYX and bXY. They have following implications:
 Slopes of regression lines of Y on X and X on Y viz. bYX and bXY must have
same signs (because r² cannot be negative).
 Correlation coefficient is geometric mean of bYX and bXY.
 If both slopes bYX and bXY are positive correlation coefficient r is positive. If
both bYX and bXY are negative the correlation coefficient r is negative.
 If indicating perfect correlation.
 Both regression lines intersect at point
1– 108
Properties of Regression Coefficients
 The coefficient of correlation is the geometric mean of the two
regression coefficients.
 Both the regression coefficients are either positive or negative. It
means that they always have identical sign i.e., either both have positive
sign or negative sign.
 The coefficient of correlation and the regression coefficients will
also have same sign.
 Regression coefficients are independent of the change in the origin but not
of the scale.
Non – Linear Regression Models
1– 109
Second Degree Model
Other Regression Models
Seasonal Model
Seasonal Model with Trend
Coefficient of Determination
Correlation Analysis vs Regression Analysis
 Degree and Nature of Relationship
 Cause and Effect Relationship
 Like in correlation, regression analysis can also be studied as ‘simple and
multiple’, ‘total and partial’, ‘linear and nonlinear’, etc.
 In correlation, there is no distinction between independent and dependent
variables.
1– 110
Summary
 In this chapter, the concept of regression between dependent and independent
variables has been discussed. Regression provides us a measure of the relationship and also
facilitates to predict one variable for a value of other variable.
 Unlike correlation analysis, in regression analysis, one variable is independent
and other dependent. Please note that this relationship need not be a cause-effect
relationship.
 Regression analysis is a branch of statistical theory which is widely used in all
the scientific disciplines. It is a basic technique for measuring or estimating the relationship
among economic variables that constitute the essence of economic theory and
economic life. The uses of regression analysis are not confined to economic and
business activities. Its applications are extended to almost all the natural, physical
and social sciences.
1– 111
Cont….
 Simple linear regression model is used if we have bivariate distribution i.e. only
two variables are considered and the ‘best fit’ curve is approximated to a straight line.
This describes the liner relationship between two variables. Although it appears to be
too simplistic, in many business situations, it is adequate. At least, initial study can be
based on this model for any decision- making situation.
 We have studied simple linear, non-linear and multiple regression models. For
multiple regression and non-linear regression models, MS Excel or any other computer
package would help in reducing voluminous calculations. We also discussed coefficient
of determination as a measure of the strength of relationship.
1– 112
Cont….
 Least square principle can also be applied to the fitting of a second degree
polynomial which may be useful in business situation if we have some idea that
the relationship between two variables is parabolic. In any case second degree
polynomial fit is more likely to be better approximation of the actual relationship. We
may use second order model (parabolic trend) if we feel that the variation is parabolic.
 The least square approximation can be calculated easily for low degree polynomials,
like linear, parabolic, cubic, etc. But for higher degrees (more than three), the system of
normal equations becomes ill conditioned. This causes large errors in values of
coefficients. Then the approximation becomes incorrect. To avoid these problems,
‘orthogonal polynomials’ are used for approximation.
1– 113
Cont….
 Mean Square Error (MSE) is an estimate of the variance of the regression
error. MSE depends on the values of data and its scales. Hence we need a
measure that calculates relative degree of variation so that it can be compared for
the fits obtained from different models and for different data sets. Coefficient of
determination is such a measure.
 Coefficient of determination is a measure of the strength of the regression fit. It is
an estimator of population parameter of correlation and can be obtained directly from a
decomposition of variation in Y into two components, viz. due to error and due to
regression. Error is a deviation of a data point from its respective group mean. Thus error is
the deviation of a data from its predicted values explained by the regression line.
1– 114
Theory of Probability
1– 115
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 116 – 116
2. Topic 1 Introduction 117 – 117
3. Topic 2 Important Terms in Probability 118 – 119
4. Topic 3 Kinds of Probability 120 – 120
5. Topic 4 Simple Propositions of Probability 121 – 125
6. Topic 5 Addition Theorem of Probability 126 – 127
7. Topic 6 Multiplication Theorem of Probability 128 – 128
8. Topic 7 Conditional Probability 129 – 129
9. Topic 8 Law of Total Probability 130 – 131
10. Topic 9 Independence of Events 132 – 132
11. Topic 10 Combinatorial Concept 133 – 133
12. Topic 11 Summary 134 – 134
Learning Objectives
After studying this chapter, you should be able to:
 Understand the meaning and important terms of probability
 Learn about addition theorem and multiplicative theorem of probability
 Understand the concept of independence of events, combinatorial concepts like
permutation and combination
 Solve problems of conditional probability and Baye’s Theorem and other
concepts of probability
1– 116
Introduction
 A probability is a quantitative measure of risk.
 This chapter provides exposure to fundamental concepts, since probability is
inseparable from statistical methods.
1– 117
Important Terms in Probability
Probability and sampling are inseparable parts of statistics.
1– 118
Cont….
Random Experiment
Random experiment is an experiment whose outcome is not
predictable in advance.
1– 119
Sample Space
 Event
 Event Space
 Union of events
 Intersection of events
 Mutually exclusive events
 Collectively exhaustive events
 Complement of event
Kinds of Probability
1– 120
Classical
Probability
Axiomatic
Probability
Subjective
Probability
Relative
Frequency
Probability
Simple Propositions of Probability
Proposition 1
P (EC) = 1 – P (E)
Probability of compliment: Let even EC denote complement of the event E. Obviously by
definition of complement, EC has all elements from the sample space S that are not in E. Thus,
E and EC are mutually exclusive and collectively exhaustive. Therefore, by axiom 2 and 3 we
have,
1 = P(S) = P (E ∪ EC) = P (E) + P (EC)
or, P (EC) = 1 - P (E)
1– 121
Cont….
Proposition 2
If E ⊂ F, then P (E) ≤ P (F)
If the event E is contained in event F, that is, then we can express,
F = E ∪ (EC ∩ F).
However, as events E and (EC ∩ F) are mutually exclusive, we get,
P (F) = P (E) + P (EC ∩ F)
But, by axiom 1, P (EC ∩ F) ≥ 0. Therefore, we have proved the proposition,
P (E) ≤ P (F)
1– 122
Cont….
Proposition 3
P (E ∪ F) = P (E) + P (F) – P (E ∩ F)
Probability of unions: Event E ∪ F can be written as the union of the two
disjoint events namely E and (EC ∩ F). Thus, from axiom 3,
P (E ∪ F) = P [E ∪ (EC ∩ F)] = P (E) + P (EC ∩ F) (1)
Also, F = (E ∩ F) ∪ (EC ∩ F), hence,
P (F) = P (E ∩ F) + P (EC ∩ F) (2)
From (1) and (2) we get the proposition 3 as,
P (E ∪ F) = P (E) + P (F) - P (E ∩ F)
Extended statement of this proposition for n events is also called as inclusion-
exclusion principle.
P(E ∪ F ∪ G) = P(E) + P(F) + P(G) – P(EF) – P(FG) – P(EG) + P(E∩F∩G)
1– 123
Cont….
Proposition 4
Mutually exclusive events: When the sets corresponding to two events are
disjoint (have no common elements, or the intersection is null), the two events are
called mutually exclusive.
E ∩ F = Φ Therefore,
P (E ∩ F) = P (Φ) = 0
Also, for mutually exclusive events E and F,
P (E ∪ F) = P (E) + P (F)
1– 124
Cont….
Proposition 5
P (EC∩F) = P (F) – P (E∩F)
From set theory, F can be written as a union of two disjoint events E ∩ F and EC ∩
F . Hence, by Axiom III, we have, P(F) = P(E ∩ F) + P(EC ∩ F). By re- arranging the
terms we get the result.
1– 125
Addition Theorem of Probability
 The addition theorem in the probability concept is the process of determination
of the probability that either event ‘A’ or event ‘B’ occurs or both occur. The notation
between two events ‘A’ and ‘B’ the addition is denoted as ‘∪’ and pronounced as
Union.
1– 126
Cont….
Let A and B be two events defined in a sample space. The union of events
A and B is the collection of all outcomes that belong either to A or to B or
to both A and B and is denoted by A or B.
The result of this addition theorem generally written using Set notation, P (A ∪ B) = P (A) + P
(B) – P (A ∩ B),
Where, P (A) = probability of occurrence of event ‘A’
P (B) = probability of occurrence of event ‘B’
P (A ∪ B) = probability of occurrence of event ‘A’ or event ‘B’.
P (A ∩ B) = probability of occurrence of event ‘A’ or event ‘B’.Addition theorem probability
can be defined and proved as follows: Let ‘A’ and ‘B’ are Subsets of a finite non empty set ‘S’
then according to the addition rule
P (A ∪ B) = P (A) + P (B) – P (A). P(B),
On dividing both sides by P(S), we get
P (A ∪ B) / P(S) = P (A) / P(S) + P (B) / P(S) – P (A ∩ B) / P(S) (1).
1– 127
Multiplication Theorem of Probability
 Probability is the branch of mathematics which deals with the occurrence of
samples. The basic form of Multiplication theorems on probability for two events
‘X’ and ‘Y’ can be stated as,
P (x. y) = p (x). P(x / y)
 Here p (x) and p (y) are the probabilities of occurrences of events ‘x’ and ‘y’
respectively.
P (x / y) is the Conditional Probability of ‘x’ and the condition is that ‘y’ has
occurred before ‘x’.
P (x / y) is always calculated after ‘y’ has occurred. Here, occurrence of ‘x’
depends on ‘y’. ‘y’ has changed some events already. So, occurrence of ‘x’ also
changes.
1– 128
Conditional Probability
1– 129
 Conditional probability is the probability
that an event will occur given that another
event has already occurred. If A and B are two
events, then the conditional probability of A
given B is written as P(A/B) and read as “the
probability of A given that B has already
occurred.”
1– 130
 Consider two events, E and F. Whatsoever be the events, we can
always say that the probability of E is equal to the probability of
intersection of E and F, plus, the probability of the intersection
of E and complement of F. That is,
P (E) = P (E ∩ F) + P (E ∩ F ∩ C)
Law of Total Probability
Bayes’s Formula
Let, E and F are events.
E = (E ∩ F) U (E ∩ F ∩ C)
For any element in E, must be either in both E and F or be in E but not in F. (E F) and (E
FC) are mutually exclusive, since former must be in F and latter must not in F, we have by
Axiom 3,
P (E) = (E F) + (E FC) = P(E/F) × P(F) +P(E/FC) × P(FC) = P(E/F) × P(F) + ()[1()]
1– 131
Independence of Events
1– 132
Combinatorial Concept
1– 133
Product
Rule of
Counting
Permutation Combination
1 2 43
Sum Rule
of Counting
Summary
 In this chapter, we discussed basic idea of probability. We defined probability in
different ways and pointed out serious limitations of each definition.
 Then we discussed axioms of probability, which are the backbone of theory of
probability. Then we studied number of useful propositions of probability.
 We also defined conditional probability, law of total probability, and Bayes’
Theorem. We also defined mutually exclusive events, and independence of
events.
 Lastly, we discussed few important concepts of combinatorial analysis, which
comes very handy while calculating probability of an event.
1– 134
Probability Distribution
1– 135
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 136 – 136
2. Topic 1 Introduction 137 – 137
3. Topic 2 Random Variable 138 – 139
4. Topic 3 Probability Distributions of Standard Random Variables 140 – 140
5. Topic 4 Bernoulli Distribution 141 – 142
6. Topic 5 Binomial Distribution 143 – 145
7. Topic 6 Poisson Distribution 146 – 147
8. Topic 7 Normal Distribution 148 – 149
9. Topic 8 Summary 150 – 153
Learning Objectives
After studying this chapter, you should be able to:
 Differentiate between discrete and continuous random variables
 Discuss probability distributions of standard random variable
 Understand discrete probability distribution which include Binomial and Poisson
Distribution
 Explain continuous probability distribution which includes Normal distribution
1– 136
Introduction
 We will study a few common distributions in this chapter.
 Normal distribution has extensive use in statistical tools and therefore readers are
advised to study it in detail.
 Knowledge of sequences, series and calculus is expected.
1– 137
Random Variable
Arandom variable, usually writtenX, is a variable whose possible values are numerical
outcomes of a random phenomenon.
1– 138
Cont….
1– 139
Discrete and Continuous Random Variables
Probability Mass Function (P.M.F.)
Probability Density Function
Cumulative Distribution Function
Expectation Value of Random Variables
Expected Value of a Function of a Random Variable
Variance and Standard Deviation of Random Variable
Probability Distributions of Standard Random Variables
1– 140
Bernoulli
Distribution
Binomial
Distribution
Normal
Distribution
Poisson
Distribution
2
1
3
4
Bernoulli Distribution
1– 141
Cont….
 It is a basis of many discrete
random variables, as it deals
with individual trial. It is a
building block for other
random variables. It is a
single trial distribution.
1– 142
Bernoulli trial is fundamental to many discrete distributions like Binomial,
Poisson, Geometric, etc. Situations where Bernoulli distribution is commonly
used are:
 Sex of newborn child; Male = 0, Female = 1 say.
 Items produced by a machine are Defective or Non-defective.
 During next flight an engine will fail or remain serviceable.
 Student appearing for examination will pass or fail.
Application of Bernoulli Distribution
Binomial Distribution
1– 143
Cont….
A binomial random variable is the number of
successes x in n repeated trials of a binomial
experiment. The probability distribution of a
binomial random variable is called a binomial
distribution (also known as a Bernoulli
distribution).
Applications of Binomial Distribution
 Trials are finite (and not very large), performed repeatedly for ‘n’ times.
 Each trial (random experiment) should be a Bernoulli trial, the one that results in either
success or failure.
 Probability of success in any trial is ‘p’ and is constant for each trial.
 All trials are independent.
1– 144
Cont….
Following are some of the real life examples of applications of binomial distribution.
 Number of defective items in a lot of n items produced by a machine.
 Number of male births out of n births in a hospital.
 Number of correct answers in a multiple-choice test.
 Number of seeds germinated in a row of n planted seeds.
 Number of re-captured fish in a sample of n fishes.
 Number of missiles hitting the targets out of n fired.
1– 145
Poisson Distribution
1– 146
Cont….
A random variable X, taking one of the values 0, 1, 2 … is said to
be a Poisson random variable with parameter λ, if for some λ > 0,
P(X = i) is a probability mass function (p.m.f.) of the Poisson random
variable. Its expected value and variance are,
m = E [X] = l
Var [X] = l
1– 147
Some of the common examples where Poisson random variable can be used to
define the probability distribution are:
 Number of accidents per day on expressway.
 Number of earthquakes occurring over fixed time span.
 Number of misprints on a page.
 Number of arrivals of calls on telephone exchange per minute.
 Number of interrupts per second on a server.
Normal Distribution
1– 148
Cont….
Equation For Normal Probability Curve
Standard Normal Distribution
Properties Of Normal Distribution
Areas Under Standard Normal Probability
Curve
Importance Of Normal Distribution
1– 149
Area under the Normal Curve
Summary
 Random variable is a real valued function defined over a sample space with
probability associated with it. The value of the random variable is outcome of an
experiment. Random variables are neither ‘random’ nor ‘variable’.
 In this chapter we discussed several important random variables, the associated
formulae, and problem solving using formulae. A discrete random variable is the one that takes
at the most countable values. A continuous random variable can take any real value.
1– 150
Cont….
 We also discussed probability distributions of random variables. Binomial
distribution is used if an experiment is carried out for finite number of n independent
trials; all trials being Bernoulli trials with constant probability of success p.
 Random variable will follow Poisson distribution if it is the number of occurrences of a
rare event during a finite period. Waiting time for a rare event is exponentially distributed.
Negative binomial distribution is used if numbers of Bernoulli trials are made to achieve
desired number of successes.
1– 151
Cont….
 One of the continuous random variable required often is uniform random
variable. Waiting time for an event that occurs periodically follows uniform
distribution.
 Normal probability distribution is the most important distribution in statistics. We
defined normal distribution with parameters (μ, σ) where μ is mean and σ is standard
deviation.
 Further, we defined standard normal distribution, which is a special case of
normal distribution with parameters (0, 1).
1– 152
Cont….
 We also discussed transformation of normal random variable X to standard
random variable Z using xzms−= Z distribution is very convenient for manual
calculation as we can use standard normal tables which are extensively plotted, to find
probability and interval.
 Normal distribution is used as a model in many real world situations, both as a
continuous distribution or an approximation to discrete distributions like binomial or
Poisson.
1– 153
Use of Excel Software for Statistical Analysis
1– 154
S. No. Reference
No.
Particulars Slide From
– To
1. Learning Objectives 155 – 155
2. Topic 1 Introduction 156 – 157
3. Topic 2 Introduction to Excel 158 – 168
4. Topic 3 Entering Data in Excel 169 – 169
5. Topic 4 Descriptive Statistics 170 – 172
6. Topic 5 Basic Built-in Functions (Average, Mean, Mode,
Count, Max and Min)
173 – 177
7. Topic 6 Statistical Analysis 178 – 182
8. Topic 7 Normal Distribution 183 – 183
9. Topic 8 Brief about SPSS 184 – 189
10. Topic 9 Summary 190 – 194
Learning Objectives
After studying this chapter, you should be able to:
 Understand the basic concepts of using Microsoft Excel
 Discuss how to enter data in excel and basic built-in functions
 Gain knowledge about SPSS
1– 155
Introduction
The most popular software in the MS Office Suite includes the following:
 Microsoft Word
 Microsoft Excel
 Microsoft PowerPoint
 Microsoft Access
 Microsoft Project Plan
 Microsoft Outlook
1– 156
Cont….
1– 157
MICROSOFT OFFICE SUITE
Suite Product Home and
Student
Home and
Business
Professional
Word
2010
Included Included Included
Excel
2010
Included Included Included
PowerPoint
2010
Included Included Included
OneNote
2010
Included Included Included
Outlook
2010
- Included Included
Access
2010
- - Included
Publisher
2010
- - Included
Introduction to Excel
Opening A Document
 Click on File-Open (Ctrl+O) to open/retrieve an existing workbook; change the
directory area or drive to look for files in other locations.
 To create a new workbook, click on File-New-Blank Document.
1– 158
Cont….
Saving And Closing A Document
 To save your document with its current filename, location and file format
either click on File - Save.
 When you have finished working on a document you should close it. Go to
the File menu and click on Close.
1– 159
Cont….
Excel Screen
1– 160
Cont….
Menu Bar in Excel
1– 161
Cont….
Excel Screen
Workbooks and Worksheets
1– 162
Cont….
Cell
Row
Column
Spreadsheet
Workbook
1– 163
Cont….
Cell Name Box
Spreadsheet Tabs in Excel
Moving Around the Worksheet
1– 164
Cont….
Margins
Orientation
Paper Size
Print Area
1– 165
Cont….
Margin Options in Excel
1– 166
Cont….
Orientation Options in Excel
1– 167
Cont….
Print Area Selection
Moving between Cells
 While working with any Office productivity tool, the clipboard functions are
invaluable.
 The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’.
 In the Microsoft Office suite, there are keyboard shortcuts for these functions.
1– 168
KEYBOARD SHORTCUTS
Cut Ctrl + X
Copy Ctrl + C
Paste Ctrl + V
Entering Data in Excel
 A new worksheet is a grid of
rows and columns. The rows
are labeled with numbers, and
the columns are labeled with
letters. Each intersection of a
row and a column is a cell.
1– 169
Entering
Labels
Entering
Values
Rounding
Numbers that
Meet Specified
Criteria
Sorting by
Columns
1– 170
Cont….
Descriptive Statistics
 Excel includes elaborate and customisable toolbars, for example the “standard”
toolbar shown here:
 Some of the icons are useful mathematical computation: is the “Autosum” icon,
which enters the formula “=sum ()” to add up a range of cells.
 is the “Function Wizard” icon, which gives you access to all the functions
available.
 is the “Graph Wizard” icon, giving access to all graph types available, as
shown in this display:
1– 171
Cont….
Excel can be used to generate measures of location and variability for a variable. Suppose we
wish to find descriptive statistics for a sample data: 2, 4, 6, and 8.
 Step1: Select the Tools *pull-down menu, if you see data analysis, click on this option,
otherwise, click on add-in.. option to install analysis tool pak.
 Step 2: Click on the data analysis option.
 Step 3: Choose Descriptive Statistics from Analysis Tools list.
 Step 4: When the dialog box appears:
Enter A1:A4 in the input range box, A1is a value in column A and row 1; in this
case this value is 2. Using the same technique enters other VALUES until you reach the last
one.
 Step 5: Select an output range, in this case B1. Click on summary statistics to see the
results.
Select OK.
1– 172
Basic Built – in Functions (Average, Mean, Mode, Count, Max and
Min)
Manual Equation Entry
1– 173
Cont….
Arithmetic
Functions in
Excel
1– 174
Cont….
Function, Syntax and Description
SUM Function
The SUM function is probably the most commonly used function in Excel. It comes in three
flavours in Excel, namely:
1– 175
Cont….
SUMIF()
SUMIFS()
SUM()
1
2
3
1– 176
Cont….
Logical Functions
TRUE
AND ()
IF ()
OR ()
NOT
IFERROR ()
FALSE
Statistical Functions
 Statistical functions are invaluable in any mathematical calculations.
 They can provide insights into trends provide data for detailed analysis as
well as help identify gaps that need to be plugged.
 Excel provides a wide range of functions that can be used to perform basic
statistical analyses.
1– 177
Statistical Analysis
Creating Charts
 Select the data range (only numbers) for which the chart needs to be created.
 Under the Insert Ribbon, in the Chart section, click on the type of chart you want
to create and the category. Here the clustered chart has been used.
 Select the chart and click on Select Data button in Data section of the Design
Layout.
 In the Select Data Source dialog, select ‘Series 1’ and click on Edit button.
1– 178
Cont….
1– 179
Cont….
Select Data Source
This opens the Edit Series dialog that allows you to change the range of values in series and
provide a Series name. For the series name, click on icon to select the column title of Series
1.
1– 180
Cont….
Edit Series
Histogram
Now follow the steps given below to draw histogram.
 Select the first two columns i.e. class interval and frequency in the Excel sheet.
 Click on ‘Chart Wizard’ icon on tool bar or select from menu [Insert → Chart…..] From
insert drop down menu. A dialogue box with title ‘Chart Wizard – Step 1 to 4 – Chart
type’ will appear.
 In the menu ‘Standard Type’, select ‘Column’. Click on ‘Next’ button.
 Now the next menu with title ‘Chart Wizard – Step 2 to 4 – Chart Source Data’ will
appear. Since we have already selected the source data, select ‘Next’. Don’t forget to check
that column is selected in data series.
 Now the next menu with title ‘Chart Wizard – Step 3 to 4 – Chart Options’ will
appear.
1– 181
Cont….
Correlation Plot and Regression Analysis
Using MS Excel for calculating Karl Pearson’s correlation coefficient Calculating Karl
Pearson’s correlation coefficient using MS Excel is very simple. The steps are as follows:
 Open an Excel worksheet and enter the data values of X and Y variables as two
arrays (columns or rows). Keep these contiguous if possible.
 Select the cell where you want to store the result r. Enter the formula with syntax
as,
‘=CORREL (array1, array2)’
‘array1’ is a cell range of values and ‘array2’ is a second cell range of values.
1– 182
Normal Distribution
NORMDIST returns the normal distribution for the specified mean and standard deviation. This
function has a very wide range of applications in statistics, including hypothesis testing.
Syntax: NORMDIST(x,mean,standard_dev,cumulative)
 X is the value for which you want the distribution.
 Mean is the arithmetic mean of the distribution.
 Standard_dev is the standard deviation of the distribution.
1– 183
Brief about SPSS
SPSS Statistics is a software package used for statistical analysis.
SPSS Files
 SPSS uses several types of files. First, there is the file that contains data view and
variable view. These have been entered using SPSS Data Editor Window. It is known as an
SPSS system file.
1– 184
Cont….
1– 185
Cont….
SPSS Data Editor Window – Data View
1– 186
Cont….
Data Editor Window – Variable View
Define Variable Dialog Box
1– 187
Cont….
Student Motivation
Not willing
Undecided
Willing
1– 188
Cont….
Value Labels – Dialog Box
Value Labels Coded with Value and Value Label
1– 189
SPSS Data Editor Window with all Record Entered
Summary
 Microsoft office is one of the most powerful office productivity tools in the market
today. The entire suite is vast and covers a wide range of software solutions catering to various
aspects of modern businesses.
 Microsoft excel is a powerful accounting and calculation solution. It has a
standard tabular layout and it supports a wide range of arithmetic, accounting and
statistical functions.
 The Microsoft Outlook is the mail client that can be set up to download mails from a
mail server as well as send and receive emails as desired. Being a part of the Microsoft
Office suite, this tool is compatible with other applications in the suite.
1– 190
Cont….
 One of the most popular and widely used Microsoft Office Suites is the MS Office
2003. Later Microsoft released two other versions of Office, namely Office 2007 and
Office 2010. Although Office 2010 is the latest version, many businesses still continue to
use Office 2003. From Office 2003 to Office 2007, Microsoft radicalised the overall look
and feel of the office suite.
 Excel is built on the concept of cell, rows, columns, spreadsheets and workbooks. The
entire structure is hierarchical, and this allows it to be scalable and versatile enough to adapt to
varying needs for users from different specialisations. Understanding the following
concepts is pretty useful in developing complex reports and models.
1– 191
Cont….
 As long as you work on the soft copies, page layouts are not really important – you
can scroll a spreadsheet to view the contents. However, when it comes to printouts it is
important that one gets the page layouts sorted out. Excel 2010 has all the page layout
options under Page Layout menu item.
 While working with any Office productivity tool, the clipboard functions are
invaluable. The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’. In the
Microsoft Office suite, there are keyboard shortcuts for these functions. Once you become
conversant with the Excel functions, you would prefer to use the keyboard shortcuts as
they are faster and easier to use than the mouse.
1– 192
Cont….
 A new worksheet is a grid of rows and columns. The rows are labelled with
numbers, and the columns are labelled with letters. Each intersection of a row and a
column is a cell. Each cell has an address, which are the column letter and the row
number. The arrow on the worksheet to the right points to cell A1, which is currently
highlighted, indicating that it is an active cell. A cell must be active to enter
information into it.
 Excel is a very powerful accounting tool, but before going to the real complex
functions, let us sees how to use Excel for simple calculations. There are two ways
of using Excel for simple calculations: you can enter the actual arithmetic equations in the
cell or use pre-defined Excel formulas to do the same.
1– 193
Cont….
 Statistical calculations for exponential random variables could be calculated using
statistical functions available in MS Excel. NORMDIST returns the normal
distribution for the specified mean and standard deviation. This function has a very wide
range of applications in statistics, including hypothesis testing. Syntax:
NORMDIST(x,mean,standard_dev,cumulative)
 SPSS Statistics is a software package used for statistical analysis. Long produced by
SPSS Inc., it was acquired by IBM in 2009. The current versions (2014) are officially named
IBM SPSS Statistics. Companion products in the same family are used for survey
authoring and deployment (IBM SPSS Data Collection), data mining(IBM SPSS Modeler), text
analytics, and collaboration and deployment (batch and automated scoring services).
1– 194
1– 195

More Related Content

What's hot

Introduction to Business Statistics
Introduction to Business StatisticsIntroduction to Business Statistics
Introduction to Business Statistics
Megha Mishra
 
Introduction to the statistics project
Introduction to the statistics projectIntroduction to the statistics project
Introduction to the statistics project
pmakunja
 
Applications of statistics
Applications of statisticsApplications of statistics
Applications of statistics
Vinit Suchak
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretation
Vasista Vinuthan
 

What's hot (20)

Introduction to Business Statistics
Introduction to Business StatisticsIntroduction to Business Statistics
Introduction to Business Statistics
 
Chapter One Introduction To Business Statistics
Chapter One Introduction To Business StatisticsChapter One Introduction To Business Statistics
Chapter One Introduction To Business Statistics
 
Business Statistics Chapter 1
Business Statistics Chapter 1Business Statistics Chapter 1
Business Statistics Chapter 1
 
Business Statistics Notes for Business and Commerce Department
Business Statistics Notes for Business and Commerce DepartmentBusiness Statistics Notes for Business and Commerce Department
Business Statistics Notes for Business and Commerce Department
 
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
 
Introduction to Business Statistics
Introduction to Business StatisticsIntroduction to Business Statistics
Introduction to Business Statistics
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretation
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427
 
Introduction to the statistics project
Introduction to the statistics projectIntroduction to the statistics project
Introduction to the statistics project
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
Applications of statistics
Applications of statisticsApplications of statistics
Applications of statistics
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretation
 
Basic Statistics & Data Analysis
Basic Statistics & Data AnalysisBasic Statistics & Data Analysis
Basic Statistics & Data Analysis
 
Data analysis
Data analysisData analysis
Data analysis
 
Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Analysis, Intepretation
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Data
DataData
Data
 
Introduction concepts of Statistics
Introduction concepts of StatisticsIntroduction concepts of Statistics
Introduction concepts of Statistics
 

Viewers also liked

Business strategy chapter (1)
Business strategy  chapter (1)Business strategy  chapter (1)
Business strategy chapter (1)
WINNERbd.it
 
Principles of Business Law
Principles of Business LawPrinciples of Business Law
Principles of Business Law
Maha H
 
Introduction to business statistics
Introduction to business statisticsIntroduction to business statistics
Introduction to business statistics
Aakash Kulkarni
 
Time Series
Time SeriesTime Series
Time Series
yush313
 
Pestle Analysis
Pestle AnalysisPestle Analysis
Pestle Analysis
rakochy
 
correlation_and_covariance
correlation_and_covariancecorrelation_and_covariance
correlation_and_covariance
Ekta Doger
 
Entrepreneurship development
Entrepreneurship developmentEntrepreneurship development
Entrepreneurship development
balajisetty
 

Viewers also liked (20)

Business Statistics Chapter 2
Business Statistics Chapter 2Business Statistics Chapter 2
Business Statistics Chapter 2
 
Business statistics introduction
Business statistics  introductionBusiness statistics  introduction
Business statistics introduction
 
Nmims 2016 MBA questions
Nmims 2016 MBA questionsNmims 2016 MBA questions
Nmims 2016 MBA questions
 
Nmims distance learning programs ppt
Nmims distance learning programs pptNmims distance learning programs ppt
Nmims distance learning programs ppt
 
Business strategy chapter (1)
Business strategy  chapter (1)Business strategy  chapter (1)
Business strategy chapter (1)
 
Principles of Business Law
Principles of Business LawPrinciples of Business Law
Principles of Business Law
 
Introduction to business statistics
Introduction to business statisticsIntroduction to business statistics
Introduction to business statistics
 
Index number
Index numberIndex number
Index number
 
Chap15 time series forecasting & index number
Chap15 time series forecasting & index numberChap15 time series forecasting & index number
Chap15 time series forecasting & index number
 
Basic business statistics 2
Basic business statistics 2Basic business statistics 2
Basic business statistics 2
 
Quartile Deviation
Quartile DeviationQuartile Deviation
Quartile Deviation
 
Strategic Management - Lecture 1
Strategic Management - Lecture 1Strategic Management - Lecture 1
Strategic Management - Lecture 1
 
Time Series
Time SeriesTime Series
Time Series
 
Measure of dispersion part I (Range, Quartile Deviation, Interquartile devi...
Measure of dispersion part   I (Range, Quartile Deviation, Interquartile devi...Measure of dispersion part   I (Range, Quartile Deviation, Interquartile devi...
Measure of dispersion part I (Range, Quartile Deviation, Interquartile devi...
 
Index Number
Index NumberIndex Number
Index Number
 
Pestle Analysis
Pestle AnalysisPestle Analysis
Pestle Analysis
 
Pestle analysis
Pestle analysisPestle analysis
Pestle analysis
 
Different levels of strategy
Different levels of strategyDifferent levels of strategy
Different levels of strategy
 
correlation_and_covariance
correlation_and_covariancecorrelation_and_covariance
correlation_and_covariance
 
Entrepreneurship development
Entrepreneurship developmentEntrepreneurship development
Entrepreneurship development
 

Similar to Business statistics q_tts9fr8xc

Introduction statistical techinique in business and eonomics by douglas a lind
Introduction statistical techinique in business and eonomics by douglas a lindIntroduction statistical techinique in business and eonomics by douglas a lind
Introduction statistical techinique in business and eonomics by douglas a lind
Rubel Islam
 
Business statistics review
Business statistics reviewBusiness statistics review
Business statistics review
FELIXARCHER
 

Similar to Business statistics q_tts9fr8xc (20)

Lecture 1 PPT.ppt
Lecture 1 PPT.pptLecture 1 PPT.ppt
Lecture 1 PPT.ppt
 
Statistics / Quantitative Techniques Study Material
Statistics / Quantitative Techniques Study MaterialStatistics / Quantitative Techniques Study Material
Statistics / Quantitative Techniques Study Material
 
Stats notes
Stats notesStats notes
Stats notes
 
Lecture 1 PPT.pdf
Lecture 1 PPT.pdfLecture 1 PPT.pdf
Lecture 1 PPT.pdf
 
Meaning and uses of statistics
Meaning and uses of statisticsMeaning and uses of statistics
Meaning and uses of statistics
 
Introduction to Statistics PPT (1).pptx
Introduction to Statistics PPT (1).pptxIntroduction to Statistics PPT (1).pptx
Introduction to Statistics PPT (1).pptx
 
Statistics
StatisticsStatistics
Statistics
 
Unit 1 Introduction to Statistics with history (1).pptx
Unit 1 Introduction to Statistics with history (1).pptxUnit 1 Introduction to Statistics with history (1).pptx
Unit 1 Introduction to Statistics with history (1).pptx
 
Introduction statistical techinique in business and eonomics by douglas a lind
Introduction statistical techinique in business and eonomics by douglas a lindIntroduction statistical techinique in business and eonomics by douglas a lind
Introduction statistical techinique in business and eonomics by douglas a lind
 
A power point presentation on statistics
A power point presentation on statisticsA power point presentation on statistics
A power point presentation on statistics
 
Fundamentals of statistics
Fundamentals of statistics   Fundamentals of statistics
Fundamentals of statistics
 
BBA 2ND SEM STATISTIC.pdf
BBA 2ND SEM STATISTIC.pdfBBA 2ND SEM STATISTIC.pdf
BBA 2ND SEM STATISTIC.pdf
 
Mathematics and statistics for Managers
Mathematics and statistics for ManagersMathematics and statistics for Managers
Mathematics and statistics for Managers
 
Statistics.pptx
Statistics.pptxStatistics.pptx
Statistics.pptx
 
What is the importance of statistics.pdf
What is the importance of statistics.pdfWhat is the importance of statistics.pdf
What is the importance of statistics.pdf
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
 
S4 pn
S4 pnS4 pn
S4 pn
 
Statistics an introduction (1)
Statistics  an introduction (1)Statistics  an introduction (1)
Statistics an introduction (1)
 
Business statistics review
Business statistics reviewBusiness statistics review
Business statistics review
 
Statistics Reference Book
Statistics Reference BookStatistics Reference Book
Statistics Reference Book
 

More from Partha Das (7)

Corporate finance book_ppt_y_hj_rkrjg2g
Corporate finance book_ppt_y_hj_rkrjg2gCorporate finance book_ppt_y_hj_rkrjg2g
Corporate finance book_ppt_y_hj_rkrjg2g
 
Corporate social responsibility_xws_my4m7dk
Corporate social responsibility_xws_my4m7dkCorporate social responsibility_xws_my4m7dk
Corporate social responsibility_xws_my4m7dk
 
India's six diamond kings& jewelers who control india's gold
India's six diamond kings& jewelers who control india's goldIndia's six diamond kings& jewelers who control india's gold
India's six diamond kings& jewelers who control india's gold
 
The ultimate project on Jewellery all the data ever you wanted
The ultimate project on Jewellery all the data ever you wantedThe ultimate project on Jewellery all the data ever you wanted
The ultimate project on Jewellery all the data ever you wanted
 
The ultimate project on Jewellery
The ultimate project on JewelleryThe ultimate project on Jewellery
The ultimate project on Jewellery
 
2013 2014 guide to the jewellery market in india
2013 2014 guide to the jewellery market in india2013 2014 guide to the jewellery market in india
2013 2014 guide to the jewellery market in india
 
Jewellery buying influence xx
Jewellery buying influence xxJewellery buying influence xx
Jewellery buying influence xx
 

Recently uploaded

call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
vikas rana
 

Recently uploaded (15)

2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
 
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
 
The Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushThe Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by Mindbrush
 
WOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptxWOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptx
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdf
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
 
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
 
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
 
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
 
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
 
Pokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy TheoryPokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy Theory
 
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
 

Business statistics q_tts9fr8xc

  • 2. S. No. Reference No. Particulars Slide From – To 1. Chapter 1 Introduction to Business Statistics 08 – 21 2. Chapter 2 Descriptive Statistics: Collection, Processing and Presentation of Data 22 – 36 3. Chapter 3 Measures of Central Tendency 37 – 51 4. Chapter 4 Measures of Dispersion 52 – 66 5. Chapter 5 Skewness and Kurtosis 67 – 79 6. Chapter 6 Correlation Analysis 80 – 98 7. Chapter 7 Regression Analysis 99 – 114 8. Chapter 8 Theory of Probability 115 – 134 9. Chapter 9 Probability Distribution 135 – 153 10. Chapter 10 Use of Excel Software for Statistical Analysis 154 – 194 Course Index 1– 2
  • 3.  Managerial decision-making can be made efficient and effective by analyzing available data using appropriate statistical tools. Statistical tools not only have application in research (marketing research included) but also in other functional areas like quality management, inventory management, financial analysis, human resource planning and so on. Course Introduction 1– 3 Cont….
  • 4.  The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; and ‘Statista’ refers to a person involved with the affairs of state. Thus, statistics originally was meant for collection of facts useful for affaires of the state, like taxes, land records, population demography, etc. 1– 4 Cont….
  • 5.  Significant contribution has also been made by Indians in the field of statistics. Prof Prasant Chandra Mahalanobis, is the first to pioneer the study of statistical science in India. He founded the Indian Statistical Institute (ISI) in1931. Mahalanobis viewed statistics as a tool in increasing the efficiency of all human efforts and also concentrated on sample surveys.  Statistics are the classified facts representing the conditions of the people in the state…. specially those facts which can be stated in number or in table of numbers or in any tabular or classified arrangement”. – Webster 1– 5 Cont….
  • 6.  Statistical methods are broadly divided into five categories. These are Descriptive Statistics, Analytical Statistics, Inductive Statistics, Inferential Statistics, Applied Statistics  Statistics is an indispensable tool of production control and market research. Statistical tools are extensively used in business for time and motion study, consumer behaviour study, investment decisions, performance measurements and compensations, credit ratings, inventory management, accounting, quality control, distribution channel design, etc. 1– 6 Cont….
  • 7.  Statistical analysis is a vital component in every aspect of research. Social surveys, laboratory experiment, clinical trials, marketing research, human resource planning, inventory management, quality management etc., require statistical treatment before arriving at valid conclusions.  Functions of statistics are Condensation, Comparison, Forecast, Testing of hypotheses, Preciseness, Expectation.  Statistical techniques, because of their flexibility and economy, have become popular and are used in numerous fields. But statistics is not a cure-all technique and has limitations. It cannot be applied to all kinds of situations and cannot be made to answer all queries. 1– 7
  • 8. Introduction to Business Statistics S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 09 – 09 2. Topic 1 Introduction 10 – 10 3. Topic 2 Development of Statistics 11 – 11 4. Topic 3 Definitions of Statistics 12 – 12 5. Topic 4 Importance of Statistics 13 – 13 6. Topic 5 Classification of Statistics 14 – 14 7. Topic 6 Role of Statistics 15 – 15 8. Topic 7 Functions of Statistics 16 – 16 9. Topic 8 Limitations of Statistics 17 – 17 10. Topic 9 Summary 18 – 21 1– 8
  • 9. Learning Objectives After studying this chapter, you should be able to:  Understand the development, importance and role of statistics  Explain the basic concept of statistical studies  Understand the application of statistics in business and management  Learn about functions and limitations of statistics 1– 9
  • 10. Introduction  Information derived from good statistical analysis is always precise and never useless.  One of the primary tasks of a manager is decision-making.  Statistical techniques offer powerful tools in the decision-making process.  These tools have power to interpret quantitative information in a scientific and an objective manner. 1– 10
  • 11. Development of Statistics  The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; and ‘Statista’ refers to a person involved with the affairs of state.  Statistics originally was meant for collection of facts useful for affaires of the state, like taxes, land records, population demography, etc.  During ancients times even before 300BC, the rulers and kings, like Chandragupta Maurya used statistics to maintain the land and revenue records, collection of taxes and registration of births and deaths. 1– 11
  • 12. Definitions of Statistics  “Statistics are the classified facts representing the conditions of the people in the state…. specially those facts which can be stated in number or in table of numbers or in any tabular or classified arrangement”. – Webster  “By statistics we mean quantitative data affected to a marked extent by multiplicity of causes”. –Yule and Kendall  “Statistics may be defined as the science of collection, presentation, analysis and interpretation of data”. – Croxton and Cowden 1– 12
  • 13. Importance of Statistics  Identify what information or data is worth collecting,  Decide when and how judgments may be made on the basis of partial information, and  Measure the extent of doubt and risk associated with the use of partial information and stochastic processes. 1– 13
  • 15. Role of Statistics 1– 15 Role of Statistics in Business Role of Statistics in Decision Making Role of Statistics in Research
  • 16. Functions of Statistics 1– 16 Condensation Comparison Forecast Testing of Hypotheses Preciseness Expectation Laws of Statistics
  • 17. Limitations of Statistics 1– 17 COMMON STATISTICAL ISSUES DISTRUST OF STATISTICS MISUSE OF STATISTICS
  • 18. Summary  Managerial decision-making can be made efficient and effective by analyzing available data using appropriate statistical tools. Statistical tools not only have application in research (marketing research included) but also in other functional areas like quality management, inventory management, financial analysis, human resource planning and so on.  The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; and ‘Statista’ refers to a person involved with the affairs of state. Thus, statistics originally was meant for collection of facts useful for affaires of the state, like taxes, land records, population demography, etc. 1– 18 Cont….
  • 19.  Significant contribution has also been made by Indians in the field of statistics. Prof Prasant Chandra Mahalanobis, is the first to pioneer the study of statistical science in India. He founded the Indian Statistical Institute (ISI) in1931. Mahalanobis viewed statistics as a tool in increasing the efficiency of all human efforts and also concentrated on sample surveys.  Statistics is the classified facts representing the conditions of the people in the state…. specially those facts which can be stated in number or in table of numbers or in any tabular or classified arrangement.  Statistical methods are broadly divided into five categories. These are Descriptive Statistics, Analytical Statistics, Inductive Statistics, Inferential Statistics and Applied Statistics. 1– 19 Cont….
  • 20.  Statistics is an indispensable tool of production control and market research. Statistical tools are extensively used in business for time and motion study, consumer behaviour study, investment decisions, performance measurements and compensations, credit ratings, inventory management, accounting, quality control, distribution channel design, etc.  Statistical analysis is a vital component in every aspect of research. Social surveys, laboratory experiment, clinical trials, marketing research, human resource planning, inventory management, quality management, etc., require statistical treatment before arriving at valid conclusions.  Functions of statistics are Condensation, Comparison, Forecast, Testing of hypotheses, Preciseness and Expectation. 1– 20 Cont….
  • 21.  Statistical techniques, because of their flexibility and economy, have become popular and are used in numerous fields. But statistics is not a cure-all technique and has limitations. It cannot be applied to all kinds of situations and cannot be made to answer all queries.  More dangerous than distrust is misuse of statistics to draw convenient conclusions to satisfy selfish or ulterior motives. Arguments and analysis supported by facts, figures, charts, graphs, index numbers, etc. are indeed very appealing and convincing. They can be used to intimidate opposing views. Hence, statistics is open to manipulation. 1– 21
  • 22. Descriptive Statistics: Collection, Processing and Presentation of Data S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 23 – 23 2. Topic 1 Introduction 24 – 24 3. Topic 2 Descriptive and Inferential Statistics 25 – 26 4. Topic 3 Collection of Data 27 – 27 5. Topic 4 Editing and Coding of Data 28 – 28 6. Topic 5 Classification of Data 29 – 29 7. Topic 6 Tabulation of Data 30 – 30 8. Topic 7 Diagrammatic and Graphical Presentation of Data 31 – 32 9. Topic 8 Summary 33 – 36 1– 22
  • 23. Learning Objectives After studying this chapter, you should be able to:  Describe descriptive and inferential statistics  Explain collection, editing and classification of primary and secondary data  Define tabulation and presentation of data  Understand diagrammatic and graphical presentation  Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons and Ogives 1– 23
  • 24. Introduction  Success of any statistical investigation depends on the availability of accurate and reliable data.  These depend on the appropriateness of the method chosen for data collection.  Data collection is a very basic activity in decision-making.  Data may be classified either as primary data or secondary data.  Successful use of the collected data depends to a great extent upon the way it is arranged, displayed and summarized. 1– 24
  • 25. Descriptive and Inferential Statistics 1– 25 Descriptive Statistics  Descriptive statistics is the type of statistics that probably comes to most of the minds of people when they hear the word “statistics.” Cont….
  • 26. 1– 26 Inferential Statistics  Inferential statistics studies a statistical sample, and from this analysis we are able to say something about the population from which the sample came.
  • 28. Editing and Coding of Data Editing Primary Data  Completeness  Consistency  Accuracy  Homogeneity Editing Secondary Data  Field Editing  Central Editing 1– 28  Coding is the process of assigning some symbols either alphabetical or numeral or both to the answers so that the responses can be recorded into a limited number of classes or categories. Coding of Data
  • 29. Classification of Data Classification refers to the grouping of data into homogeneous classes and categories. It is the process of arranging things in groups or classes according to their resemblances and affinities. 1– 29 Rules of Classification Frequency Distribution Bases of Classification 1 2 3
  • 30. Types of Tabulation Advantages of Tabulation Multi – Way Tabulation Two – Way Tabulation One – Way Tabulation Tabulation of Data  Tabulation is arranging the data in flat table (two dimensional arrays) format by grouping the observations.  Table is a spreadsheet with rows and columns with headings and stubs indicating class of the data. 1– 30
  • 31. Diagrammatic and Graphical Presentation of Data 1– 31 Cont…. Difference Between Diagrams And Graphs Difference between Diagram and Graphs Diagram Graph 1. Can be drawn on an ordinary paper. 1. Can be drawn on a graph paper. 2. Easy to grasp. 2. Needs some effort to grasp. 3. Not capable of analytical treatment. 3. Capable of analytical treatment. 4. Can be used only for comparisons. 4. Can be used to represent a mathematical relation. 5. Data are represented by bars, and rectangles, pictures, etc. 5. Data are represented by lines curves.
  • 32. 1– 32 OGIVES FREQUENCY POLYGON PIE DIAGRAM HISTOGRAM BAR DIAGRAM TYPES OF DIAGRAMS
  • 33. Summary  There are two major divisions of the field of statistics, namely descriptive and inferential statistics. Both the segments of statistics are important, and accomplish different objectives.  Data can be obtained through primary source or secondary source according to need, situation, convenience, time, resources and availability. The most important method for primary data collection is through questionnaire. Data must be objective and fact- based so that it helps a decision-maker to arrive at a better decision.  Statistical data is a set of facts expressed in quantitative form. Data is collected through various methods. Sometimes our data set consists of the entire population we are interested in. In other situations, data may constitute a sample from some population. 1– 33 Cont….
  • 34.  Type of research, its purpose, conditions under which the data are obtained will determine the method of collecting the data. If relatively few items of information are required quickly, and funds are limited telephonic interviews are recommended. If respondents are industrial clients Internet could also be used. If depth interviews and probing techniques are to be used, it is necessary to employ investigators to collect data.  The quality of information collected through the filling of a questionnaire depends, to a large extent, upon the drafting of its questions. Hence, it is extremely important that the questions be designed or drafted very carefully and in a tactful manner.  Before any processing of the data, editing and coding of data is necessary to ensure the correctness of data. In any research studies, the voluminous data can be handled only after classification. Data can be presented through tables and charts. 1– 34 Cont….
  • 35.  Classification refers to the grouping of data into homogeneous classes and categories. It is the process of arranging things in groups or classes according to their resemblances and affinities.  A frequency distribution is the principle tabular summary of either discrete data or continuous data. The frequency distribution may show actual, relative or cumulative frequencies. Actual and relative frequencies may be charted as either histogram (a bar chart) or a frequency polygon. Two commonly used graphs of cumulative frequencies are less than ogive or more than ogive.  Once the raw data is collected, it needs to be summarized and presented to the decision-maker in a form that is easy to comprehend. Tabulation not only condenses the data, but also makes it easy to understand. Tabulation is the fastest way to extract information from the mass of data and hence popular even among those not exposed to the statistical method. 1– 35 Cont….
  • 36.  The charts help in grasping the data and analyze it qualitatively. This also helps managers to effectively present the data as a part of reports. Various types of chart are bar diagram, multiple bar diagrams, component bar diagram, deviation bar diagram, sliding bar diagram, Histogram and Pie charts.  A graphic presentation is another way of representing the statistical data in a simple and intelligible form. There are two types of graphs which we have discussed, line graphs and ogives. 1– 36
  • 37. Measures of Central Tendency S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 38 – 38 2. Topic 1 Introduction 39 – 39 3. Topic 2 Characteristics of Central Tendency 40 – 41 4. Topic 3 Arithmetic Mean 42 – 42 5. Topic 4 Median 43 – 43 6. Topic 5 Mode 44 – 44 7. Topic 6 Empirical Relationship between Mean, Median and Mode 45 – 45 8. Topic 7 Limitations of Central Tendency 46 – 46 9. Topic 8 Summary 47 – 51 1– 37
  • 38. Learning Objectives After studying this chapter, you should be able to:  Understand the concept and characteristics of central tendency  Describe all the measures of central tendency: mean, median and mode.  Explain merits and demerits of all measures of central tendency.  Discuss partition values or positional measures like quartiles, deciles and percentiles. 1– 38
  • 39. Introduction  The concept of central tendency plays a dominant role in the study of statistics.  In many frequency distributions, the tabulated values show a distinct tendency to cluster or to group around a typical central value.  This behaviour of the data to concentrate the values around a central part of distribution is called ‘Central Tendency’ of the data. 1– 39
  • 40. Characteristics of Central Tendency A good measure of central tendency should possess as far as possible the following characteristics:  Easy to understand.  Simple to compute.  Based on all observations.  Uniquely defined.  Possibility of further algebraic treatment.  Not unduly affected by extreme values. 1– 40 Cont….
  • 41. 1– 41 Common Measures of Central Tendency Mean Median Mode
  • 42. Arithmetic Mean  The arithmetic mean of a series is the quotient obtained by dividing the sum of the values by the number of items. In algebraic language, if X1, X2, X3....... Xn are the n values of a variate X. 1– 42 Properties of Arithmetic Mean Calculation of Simple Arithmetic Mean Merits and Demerits of Arithmetic Mean Weighted Arithmetic Mean
  • 43. 1– 43 Median Median is the value, which divides the distribution of data, arranged in ascending or descending order, into two equal parts. Thus, the ‘Median’ is a value of the middle observation.  Calculation of Median  Merits and Demerits of Median  Partition Values or Positional Measures  Quartiles  Deciles  Percentiles
  • 44. 1– 44  Mode is the value which has the greatest frequency density. Mode is denoted by Z.  Calculation of Mode  Merits and Demerits of Mode  Graphic Location of Mode Mode
  • 45. Empirical Relationship between Mean, Median and Mode  A distribution in which the mean, the median, and the mode coincide is known as symmetrical (bell shaped) distribution. Normal distribution is one such a symmetric distribution, which is very commonly used.  If the distribution is skewed, the mean, the median and the mode are not equal. In a moderately skewed distribution distance between the mean and the median is approximately one third of the distance between the mean and the mode. This can be expressed as: Mean – Median = (Mean – Mode) / 3 Mode = 3 * Median – 2 * Mean 1– 45
  • 46. Limitations of Central Tendency  In case of highly skewed data.  In case of uneven or irregular spread of the data.  In open end distributions.  When average growth or average speed is required.  When there are extreme values in the data.  Except in these cases AM is widely used in practice. 1– 46
  • 47. Summary  Measures of the central tendency give one of the very important characteristics of the data. According to the situation, one of the various measures of central tendency may be chosen as the most representative.  Arithmetic mean is widely used and understood. What characterizes the three measures of centrality, and what are the relative merits of each in the given situation, is the question.  Mean summarizes all the information in the data. Mean can be visualized as a single point where all the mass (the weight) of the observations is concentrated. It is like a centre of gravity in physics. Mean also has some desirable mathematical properties that make it useful in the context of statistical inference. 1– 47 Cont….
  • 48.  To simplify the manual calculation, we may sometimes use shift of origin and change of scale. Shifting of origin is achieved by adding or subtracting a constant to all observations. In case of discrete data we add or subtract (usually subtract) a constant to the individual observations. Whereas for grouped data, we add or subtract (usually subtract) the constant to the class mark values.  There are cases where relative importance of the different items is not the same. In such a case, we need to compute the weighted arithmetic mean. The procedure is similar to the grouped data calculations studied earlier, when we consider frequency as a weight associated with the class-mark.  Median is the middle value when the data is arranged in order. The median is resistant to the extreme observations. Median is like the geometric centre in physics. In case we want to guard against the influence of a few outlying observations (called outliers), we may use the median. 1– 48 Cont….
  • 49.  Quantiles are related positional measures of central tendency. These are useful and frequently employed measures. Most familiar quantiles are Quartiles, Deciles, and Percentiles.  Quartiles are position values similar to the Median. There are three quartiles denoted by Q1, Q2 and Q3. Q1 is called the lower Quartile or first quartile. The second quartile Q2 is nothing but the median. In a distribution, one fourth of the item are less then Q1 and the other ¾ th item are greater then Q1 is called the upper quartile (or) the 3rd quartile.  Inter-quartile range is defined as the difference between the first and third quartile. It is a measure of spread of the data.  D1, D2, D3… and D9 are the nine deciles. They divide a series into 10 equal parts. One tenth of the items are less than or equal to D1. One tenth of the items are more than or equal to D9 and one tenth of the items between any successive pairs of deciles when all the items are in ascending order 1– 49 Cont….
  • 50. 1– 50 Cont….  Pth percentile of a group of observations is that observation below which lie P% (P percent) observations. The position of Pth percentile is given by , where ‘n’ is the number of data points.  If the value of is a fraction, we need to interpolate the value.  The Mode of a data set is the value that occurs most frequently. There are many situations in which arithmetic mean and median fail to reveal the true characteristics of a data (most representative figure), for example, most common size of shoes, most common size of garments etc. In such cases, mode is the best-suited measure of the central tendency.  A distribution in which the mean, the median, and the mode coincide is known as symmetrical (bell shaped) distribution. Normal distribution is one such a symmetric distribution, which is very commonly used.
  • 51. This can be expressed as:  Mean – Median = (Mean – Mode) / 3  Mode = 3 * Median – 2 * Mean  No single average can be regarded as the best or most suitable under all circumstances. Each average has its merits and demerits and its own particular field of importance and utility. A proper selection of an average depends on the (1) nature of the data and (2) purpose of enquiry or requirement of the data. 1– 51
  • 52. Measures of Dispersion S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 53 – 53 2. Topic 1 Introduction 54 – 54 3. Topic 2 Characteristics of Measures of Dispersion 55 – 55 4. Topic 3 Absolute and Relative Measures of Dispersion 56 – 57 5. Topic 4 Range 58 – 59 6. Topic 5 Inter-quartile Range and Deviations 60 – 60 7. Topic 6 Variance and Standard Deviation 61 – 62 8. Topic 7 Summary 63 – 66 1– 52
  • 53. Learning Objectives After studying this chapter, you should be able to:  Understand absolute and relative measures of variation  Learn about range and inter-quartile range  Discuss variance, standard deviation, mean deviation and coefficient of variation  Study the empirical relationship between different measures of variation 1– 53
  • 54. Introduction  A measure of dispersion or variation in any data shows the extent to which the numerical values tend to spread about an average. 1– 54 Data is useful:  To compare the current results with the past results.  To compare two are more sets of observations.  To suggest methods to control variation in the data.
  • 55. Characteristics of Measures of Dispersion 1– 55 It should be easy to compute. It should be rigidly defined. It should be based on each individual item of the distribution. It should be capable of further algebraic treatment. It should have sampling stability. It should not be unduly affected by the extreme items. It should be simple to understand.
  • 56. Absolute and Relative Measures of Dispersion  ‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a measure of absolute dispersion to an appropriate average.  A precise measure of dispersion is one which gives the magnitude of the variation in a series, i.e. it measures in numerical terms, the extent of the scatter of the values around the average. 1– 56 Cont….
  • 57. 1– 57 ABSOLUTE AND RELATIVE MEASURES OF DISPERSION Measures of Dispersion Relative Variability The range Relative range The Quartile Deviation Relative Quartile Deviation The Mean Deviation Relative Mean deviation The Median Deviation Coefficient of Variation The Standard Deviation Graphical Method
  • 58. 1– 58 Cont…. Range The ‘Range’ of the data is the difference between the largest value of data and smallest value of data.
  • 59. Merits and Demerits of Range Merits  Range is a simplest method of studying dispersion.  It takes lesser time to compute the ‘absolute’ and ‘relative’ range. Demerits  Range does not take into account all the values of a series, i.e. it considers only the extreme items and middle items are not given any importance.  Range cannot be computed in the case of “open ends’ distribution i.e., a distribution where the lower limit of the first group and upper limit of the higher group is not given. 1– 59
  • 60. Inter – Quartile Range and Deviations Inter-quartile Range  Inter-quartile range is a difference between upper quartile (third quartile) and lower quartile (first quartile). Quartile Deviation  Quartile Deviation is the average of the difference between upper quartile and lower quartile. Mean Deviation  Mean deviation is the arithmetic mean of the absolute deviations of the values about their arithmetic mean or median or mode. 1– 60
  • 61. Variance and Standard Deviation 1– 61 Cont…. Variance is defined as the average of squared deviation of data points from their mean.
  • 62. 1– 62 Different Formulae for Calculating Variance Calculation of Standard Deviation Properties of Standard Deviation Merits and Demerits of Standard Deviation Standard Deviation of Combined Means Coefficient of Variation Empirical Relationship Between Different Measures of Variation
  • 63. Summary  Study of distribution is very important for decision-making. Usually, measures of central tendency and variability are adequate for taking decision. However, if data is quite different from normal distribution then measure skewness and kurtosis need to be considered. We discussed measures of variability: Range, Variance and Standard Deviation.  A measure of dispersion gives an idea about the extent of lack of uniformity in the sizes and qualities of the items in a series. It helps us to know the degree of uniformity and consistency in the series. If the difference between items is large the dispersion or variation is large and vice versa. 1– 63 Cont….
  • 64.  The measures of dispersion can be either ‘absolute’ or ‘relative’. Absolute measures of dispersion are expressed in the same units in which the original data are expressed. For example, if the series is expressed as Marks of the students in a particular subject; the absolute dispersion will provide the value in Marks. The only difficulty is that if two or more series are expressed in different units, the series cannot be compared on the basis of dispersion.  The ‘Range’ of the data is the difference between the largest value of data and smallest value of data. This is an absolute measure of variability. However, if we have to compare two sets of data, ‘Range’ may not give a true picture. In such case, relative measure of range, called coefficient of range is used.  Inter-quartile range is a difference between upper quartile (third quartile) and lower quartile (first quartile). Quartile Deviation is the average of the difference between upper quartile and lower quartile. 1– 64 Cont….
  • 65.  Average used for calculating deviation can be the mean, the median or the mode. However, usually the mean is used. There is also an advantage of taking deviations from the median, because ‘Mean Deviation’ from median is lowest as compared to any other ‘Mean Deviations’. Since absolute values of deviations ignoring sign are taken for calculating Mean Deviation, the mean deviation is not amenable to further algebraic treatment.  The variance is the average squared deviation of the data from their mean. For sample data, we take the average by dividing with (n-1) where n is a sample size. This is to cater for degree of freedom. For population data, we average by dividing with the population size N.  The Standard Deviation (SD) of a set of data is the positive square root of the variance of the set. This is also referred as Root Mean Square (RMS) value of the deviations of the data points. SD of sample is the square root of the sample variance 1– 65 Cont….
  • 66.  There is no effect of shifting origin on standard deviation or variance.  The measures of deviation are very effective in making reports and presentations by the business executives to present their data top general public who do not understand statistical methods.  Variance analysis also helps in managing budgets by controlling budgeted versus actual costs. Without the standard deviation, you can’t compare two data sets effectively. 1– 66
  • 67. Skewness and Kurtosis S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 68 – 68 2. Topic 1 Introduction 69 – 70 3. Topic 2 Karl Pearson’s Coefficient of Skewness (SKP) 71 – 71 4. Topic 3 Bowley’s Coefficient of Skewness (SKB) 72 – 72 5. Topic 4 Kelly’s Coefficient of Skewness (SKK) 73 – 73 6. Topic 5 Measures of Kurtosis 74 – 74 7. Topic 6 Moments 75 – 75 8. Topic 7 Summary 76 – 79 1– 67
  • 68. Learning Objectives After studying this chapter, you should be able to:  Understand the concept and different types of skewness  Discuss various measures of kurtosis  Learn about moments, its properties and coefficients based on moments 1– 68
  • 69. Introduction Skewness is a measure that studies the degree and direction of departure from symmetry. Nature of Skewness Skewness can be positive or negative or zero. When the values of mean, median and mode are equal, there is no skewness.  When mean > median > mode, skewness will be positive.  When mean < median < mode, skewness will be negative. 1– 69 Cont….
  • 70. Characteristic of a Good Measure of Skewness  It should be a pure number in the sense that its value should be independent of the unit of the series and also degree of variation in the series.  It should have zero-value, when the distribution is symmetrical.  It should have a meaningful scale of measurement so that we could easily interpret the measured value. Mathematical measures of skewness can be calculated by:  Karl-Pearson’s Method  Bowley’s Method  Kelly’s method 1– 70
  • 71. Karl Pearson’s Coefficient of Skewness (SKP) 1– 71 Karl Person has suggested two formulae:  Where the relationship of mean and mode is established;  Where the relationship between mean and median is not established.
  • 72. Bowley’s Coefficient of Skewness (SKB)  Bowley’s method of skewness is based on the values of median, lower and upper quartiles. This method suffers from the same limitations which are in the case of median and quartiles.  Wherever positional measures are given, skewness should be measured by Bowley’s method. This method is also used in case of ‘open-end series’, where the importance of extreme values is ignored. Absolute skewness = Q3 + Q1 – 2 Median Coefficient of Skewness, (SkB) = Where, Q is quartile. 1– 72
  • 73. Kelly’s Coefficient of Skewness (SKK) Kelly’s coefficient of skewness is defined as: Skk = Where, P is percentile. Example: Calculate the Kelly’s coefficient of skewness from the following data: 1– 73
  • 74. Measures of Kurtosis  Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more and more peaked will be the distribution. The kurtosis is calculated either as an absolute or a relative value. Absolute kurtosis is always a positive number. 1– 74 Negative kurtosis indicates a flatter distribution than the normal distribution, and called as platykurtic. A positive kurtosis means more peaked curve, called Leptokurtic. Peakedness of normal distribution is called Mesokurtic.
  • 75. 1– 75 Moments  The arithmetic mean of various powers of these deviations in any distribution is called the moments of the distribution about mean. PROPERTIES OF MOMENTS COEFFICIENTS BASED ON MOMENTS
  • 76. Summary  Measures of Skewness and Kurtosis, like measures of central tendency and dispersion, study the characteristics of a frequency distribution. Averages tell us about the central value of the distribution and measures of dispersion tell us about the concentration of the items around a central value.  When two or more symmetrical distributions are compared, the difference in them is studied with ‘Kurtosis’. On the other hand, when two or more symmetrical distributions are compared, they will give different degrees of Skewness. These measures are mutually exclusive i.e. the presence of skewness implies absence of kurtosis and vice-versa. 1– 76 Cont….
  • 77.  Bowley’s method of skewness is based on the values of median, lower and upper quartiles. This method suffers from the same limitations which are in the case of median and quartiles. Wherever positional measures are given, skewness should be measured by Bowley’s method. This method is also used in case of ‘open-end series’, where the importance of extreme values is ignored.  Kelly’s coefficient of skewness is defined as: Skk = Where, P is percentile. 1– 77 Cont….
  • 78.  Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more and more peaked will be the distribution. The kurtosis is calculated either as an absolute or a relative value. Absolute kurtosis is always a positive number. Absolute kurtosis of a normal distribution (symmetric bell shaped distribution) is taken as 3. It is taken as datum to calculate relative kurtosis as follows: Absolute kurtosis = Relative kurtosis = Absolute kurtosis – 3 1– 78 Cont….
  • 79.  Moments about mean are generally used in statistics. We use a Greek alphabet read as mu for these moments. Consider a mass attached at each point proportional to its frequency and take moments about the mean. First, second, third and fourth moments can be used as a measure of Central Tendency, Variation (dispersion), asymmetry and peakedness of the curve. 1– 79
  • 80. Correlation Analysis S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 81 – 81 2. Topic 1 Introduction 82 – 83 3. Topic 2 Types of Correlation 84 – 84 4. Topic 3 Methods of Calculating Correlation 85 – 85 5. Topic 4 Scatter Diagram Method 86 – 86 6. Topic 5 Co-variance Method – The Karl Pearson’s Correlation Coefficient 87 – 88 7. Topic 6 Rank Correlation Method 89 – 89 8. Topic 7 Correlation Coefficient using Concurrent Deviation 90 – 91 9. Topic 8 Summary 92 – 98 1– 80
  • 81. Learning Objectives After studying this chapter, you should be able to:  Understand the concept of correlation  Study about different types of correlation  Describe various methods of calculating correlation such as scatter diagram method  Discuss various types of correlation coefficients viz, Karl Pearson correlation coefficient, rank correlation and coefficient based on concurrent deviations. 1– 81
  • 82. 1– 82 Cont…. Croxton and Cowden say, “When the relationship is of a quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation”. Introduction
  • 83. The study of correlation helps managers in following ways:  To identify relationship of various factors and decision variables.  To estimate value of one variable for a given value of other if both are correlated. E.g. estimating sales for a given advertising and promotion expenditure.  To understand economic behaviour and market forces.  To reduce uncertainty in decision-making to a large extent. 1– 83
  • 84. Types of Correlation 1– 84 Positive or Negative Correlation Simple or Multiple Correlations Partial or Total Correlation Linear and Non-linear Correlation
  • 85. Methods of Calculating Correlation 1– 85 Scatter Diagram Method Karl Pearson’s Coefficient of Correlation Rank Method Concurrent Deviation Method
  • 86. Scatter Diagram Method 1– 86 The pattern of points obtained by plotting the observed points are knows as scatter diagram. It gives us two types of information.  Whether the variables are related or not.  If so, what kind of relationship or estimating equation that describes the relationship.
  • 87. Co – Variance Method – The Karl Pearson’s Correlation Coefficient The correlation coefficient measures the degree of association between two variables X and Y. Karl Pearson’s formula for correlation coefficient is given as, Where r is the ‘Correlation Coefficient’ or ‘Product Moment Correlation Coefficient’ between X and Y. 1– 87 Cont….
  • 88. 1– 88 Estimation of Probable Error Interpretation of R Assumptions Underlying Karl Pearson’s Correlation Coefficient
  • 89. Rank Correlation Method 1– 89 RANK CORRELATION WHEN RANKS ARE GIVEN RANK CORRELATION WHEN RANKS ARE NOT GIVEN RANK CORRELATION WHEN EQUAL RANKS ARE GIVEN
  • 90. Correlation Coefficient using Concurrent Deviation  This is the easiest method to find the correlation between two variables. Although the method is effective in giving the direction of the correlation as positive or negative but fails to give the accurate strength of the correlation. In this method we check the fluctuation in each data series as increasing (+), or decreasing (-) or equal values. Then we count the number of items that increase or decrease or remains equal concurrently and denote as c. The correlation coefficient is then calculated as, Where, n = total number of pairs. c = Number of concurrent changes 1– 90 Cont….
  • 91. 1– 91 Example: The data of advertisement expenditure (X) and sales (Y) of a company for past 10 year period is given below. Determine the correlation coefficient between these variables and comment the correlation.
  • 92. Summary  In this chapter the concept of correlation or the association between two variables has been discussed. A scatter plot of the variables may suggest that the two variables are related but the value of the Pearson correlation coefficient r quantifies this association.  Correlation is a degree of linear association between two random variables. In these two variables, we do not differentiate them as dependent and independent variables. It may be the case that one is the cause and other is an effect i.e. independent and dependent variables respectively. On the other hand, both may be dependent variables on a third variable. 1– 92 Cont….
  • 93.  In business, correlation analysis often helps manager to take decisions by estimating the effects of changing the values of the decision variables like promotion, advertising, price, production processes, on the objective parameters like costs, sales, market share, consumer satisfaction, competitive price. The decision becomes more objective by removing subjectivity to certain extent.  The correlation coefficient r may assume values between –1 and 1. The sign indicates whether the association is direct (+ve) or inverse (-ve). A numerical value of r equal to unity indicates perfect association while a value of zero indicates no association. 1– 93 Cont….
  • 94.  The correlation is said to be positive when the increase (decrease) in the value of one variable is accompanied by an increase (decrease) in the value of other variable also. Negative or inverse correlation refers to the movement of the variables in opposite direction. Correlation is said to be negative, if an increase (decrease) in the value of one variable is accompanied by a decrease (increase) in the value of other.  In simple correlation the variation is between only two variables under study and the variation is hardly influenced by any external factor. In other words, if one of the variables remains same, there won’t be any change in other variable. 1– 94 Cont….
  • 95.  In case of multiple correlation analysis there are two approaches to study the correlation. In case of partial correlation, we study variation of two variables and excluding the effects of other variables by keeping them under controlled condition.  When the amount of change in one variable tends to keep a constant ratio to the amount of change in the other variable, then the correlation is said to be linear. But if the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable then the correlation is said to be non-linear. 1– 95 Cont….
  • 96.  Correlation analysis may also be necessary to eliminate a variable which shows low or hardly any correlation with the variable of our interest. In statistics, there are number of measures to describe degree of association between variables. These are Karl Pearson’s Correlation Coefficient, Spearman’s rank correlation coefficient, coefficient of determination, Yule’s coefficient of association, coefficient of colligation, etc.  The correlation coefficient measures the degree of association between two variables X and Y.  Karl Pearson’s formula for correlation coefficient is given as, 1– 96 Cont….
  • 97.  The purpose of computing a correlation coefficient in such situations is to determine the extent to which the two sets of ranking are in agreement. The coefficient that is determined from these ranks is known as Spearman’s rank coefficient, rs. This is defined by the following formula: 1– 97 Cont….
  • 98.  Although the concurrent deviation method is effective in giving the direction of the correlation as positive or negative but fails to give the accurate strength of the correlation. In this method we check the fluctuation in each data series as increasing (+), or decreasing (–) or equal values. Then we count the number of items that increase or decrease or remains equal concurrently and denote as c. The correlation coefficient is then calculated as, Where, n = total number of pairs. c = Number of concurrent changes 1– 98
  • 99. Regression Analysis S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 100 – 100 2. Topic 1 Introduction 101 – 101 3. Topic 2 Regression Analysis 102 – 103 4. Topic 3 Simple Linear Regression 104 – 106 5. Topic 4 Coefficient of Regression 107 – 108 6. Topic 5 Non-linear Regression Models 109 – 109 7. Topic 6 Correlation Analysis vs Regression Analysis 110 – 110 8. Topic 7 Summary 111 – 114 1– 99
  • 100. Learning Objectives After studying this chapter, you should be able to:  Understand the concept of regression analysis  Discuss the applicability of regression  Describe simple linear regression and nonlinear regression model.  Learn about coefficient of regression and linear regression equations 1– 100
  • 101. Introduction  In regression analysis we develop an equation called as an estimating equation used to relate known and unknown variables.  Then correlation analysis is used to determine the degree of the relationship between the variables.  In this chapter we will learn, how to calculate the regression line mathematically. 1– 101
  • 102. Regression Analysis 1– 102 Cont…. According to Morris Myers Blair, “regression is the measure of the average relationship between two or more variables in terms of the original units of the data.”
  • 103. 1– 103 Applicability of Regression Analysis  Regression analysis is a branch of statistical theory which is widely used in all the scientific disciplines. It is a basic technique for measuring or estimating the relationship among economic variables that constitute the essence of economic theory and economic life.
  • 104. Simple Linear Regression 1– 104 Cont…. The highest power of x is called as order of the model.  This model is used if we have bivariate distribution i.e. only two variables are considered and the ‘best fit’ curve is approximated to a straight line.
  • 105. Simple Linear Regression Model  The linear regression model uses straight line relationship. Equation of a straight line is of the form, (1)  Where ŷ is the predicted value of Y corresponding to x.  and  are constants. Now if we assume the error (deviation) in Y direction is e, we can write the relationship of X and Y in data points as,  Error e is the amount by which observation will fall off regression line. Error e is due to random error ‘a’ and ‘b’ are called parameters of the linear regression model whose values are found out from the observed data. 1– 105 Cont….
  • 106. Linear Regression Equation  Suppose the data points are (x1, y1) (x2, y2) ….. (xn, yn) . Then we can write from regression equation, (2) Thus, sum square of errors is,  To have minimum sum of squares of errors (SSE) we must have the condition, 1– 106
  • 107. 1– 107 Cont…. Coefficient of Regression The coefficients of regression are bYX and bXY. They have following implications:  Slopes of regression lines of Y on X and X on Y viz. bYX and bXY must have same signs (because r² cannot be negative).  Correlation coefficient is geometric mean of bYX and bXY.  If both slopes bYX and bXY are positive correlation coefficient r is positive. If both bYX and bXY are negative the correlation coefficient r is negative.  If indicating perfect correlation.  Both regression lines intersect at point
  • 108. 1– 108 Properties of Regression Coefficients  The coefficient of correlation is the geometric mean of the two regression coefficients.  Both the regression coefficients are either positive or negative. It means that they always have identical sign i.e., either both have positive sign or negative sign.  The coefficient of correlation and the regression coefficients will also have same sign.  Regression coefficients are independent of the change in the origin but not of the scale.
  • 109. Non – Linear Regression Models 1– 109 Second Degree Model Other Regression Models Seasonal Model Seasonal Model with Trend Coefficient of Determination
  • 110. Correlation Analysis vs Regression Analysis  Degree and Nature of Relationship  Cause and Effect Relationship  Like in correlation, regression analysis can also be studied as ‘simple and multiple’, ‘total and partial’, ‘linear and nonlinear’, etc.  In correlation, there is no distinction between independent and dependent variables. 1– 110
  • 111. Summary  In this chapter, the concept of regression between dependent and independent variables has been discussed. Regression provides us a measure of the relationship and also facilitates to predict one variable for a value of other variable.  Unlike correlation analysis, in regression analysis, one variable is independent and other dependent. Please note that this relationship need not be a cause-effect relationship.  Regression analysis is a branch of statistical theory which is widely used in all the scientific disciplines. It is a basic technique for measuring or estimating the relationship among economic variables that constitute the essence of economic theory and economic life. The uses of regression analysis are not confined to economic and business activities. Its applications are extended to almost all the natural, physical and social sciences. 1– 111 Cont….
  • 112.  Simple linear regression model is used if we have bivariate distribution i.e. only two variables are considered and the ‘best fit’ curve is approximated to a straight line. This describes the liner relationship between two variables. Although it appears to be too simplistic, in many business situations, it is adequate. At least, initial study can be based on this model for any decision- making situation.  We have studied simple linear, non-linear and multiple regression models. For multiple regression and non-linear regression models, MS Excel or any other computer package would help in reducing voluminous calculations. We also discussed coefficient of determination as a measure of the strength of relationship. 1– 112 Cont….
  • 113.  Least square principle can also be applied to the fitting of a second degree polynomial which may be useful in business situation if we have some idea that the relationship between two variables is parabolic. In any case second degree polynomial fit is more likely to be better approximation of the actual relationship. We may use second order model (parabolic trend) if we feel that the variation is parabolic.  The least square approximation can be calculated easily for low degree polynomials, like linear, parabolic, cubic, etc. But for higher degrees (more than three), the system of normal equations becomes ill conditioned. This causes large errors in values of coefficients. Then the approximation becomes incorrect. To avoid these problems, ‘orthogonal polynomials’ are used for approximation. 1– 113 Cont….
  • 114.  Mean Square Error (MSE) is an estimate of the variance of the regression error. MSE depends on the values of data and its scales. Hence we need a measure that calculates relative degree of variation so that it can be compared for the fits obtained from different models and for different data sets. Coefficient of determination is such a measure.  Coefficient of determination is a measure of the strength of the regression fit. It is an estimator of population parameter of correlation and can be obtained directly from a decomposition of variation in Y into two components, viz. due to error and due to regression. Error is a deviation of a data point from its respective group mean. Thus error is the deviation of a data from its predicted values explained by the regression line. 1– 114
  • 115. Theory of Probability 1– 115 S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 116 – 116 2. Topic 1 Introduction 117 – 117 3. Topic 2 Important Terms in Probability 118 – 119 4. Topic 3 Kinds of Probability 120 – 120 5. Topic 4 Simple Propositions of Probability 121 – 125 6. Topic 5 Addition Theorem of Probability 126 – 127 7. Topic 6 Multiplication Theorem of Probability 128 – 128 8. Topic 7 Conditional Probability 129 – 129 9. Topic 8 Law of Total Probability 130 – 131 10. Topic 9 Independence of Events 132 – 132 11. Topic 10 Combinatorial Concept 133 – 133 12. Topic 11 Summary 134 – 134
  • 116. Learning Objectives After studying this chapter, you should be able to:  Understand the meaning and important terms of probability  Learn about addition theorem and multiplicative theorem of probability  Understand the concept of independence of events, combinatorial concepts like permutation and combination  Solve problems of conditional probability and Baye’s Theorem and other concepts of probability 1– 116
  • 117. Introduction  A probability is a quantitative measure of risk.  This chapter provides exposure to fundamental concepts, since probability is inseparable from statistical methods. 1– 117
  • 118. Important Terms in Probability Probability and sampling are inseparable parts of statistics. 1– 118 Cont…. Random Experiment Random experiment is an experiment whose outcome is not predictable in advance.
  • 119. 1– 119 Sample Space  Event  Event Space  Union of events  Intersection of events  Mutually exclusive events  Collectively exhaustive events  Complement of event
  • 120. Kinds of Probability 1– 120 Classical Probability Axiomatic Probability Subjective Probability Relative Frequency Probability
  • 121. Simple Propositions of Probability Proposition 1 P (EC) = 1 – P (E) Probability of compliment: Let even EC denote complement of the event E. Obviously by definition of complement, EC has all elements from the sample space S that are not in E. Thus, E and EC are mutually exclusive and collectively exhaustive. Therefore, by axiom 2 and 3 we have, 1 = P(S) = P (E ∪ EC) = P (E) + P (EC) or, P (EC) = 1 - P (E) 1– 121 Cont….
  • 122. Proposition 2 If E ⊂ F, then P (E) ≤ P (F) If the event E is contained in event F, that is, then we can express, F = E ∪ (EC ∩ F). However, as events E and (EC ∩ F) are mutually exclusive, we get, P (F) = P (E) + P (EC ∩ F) But, by axiom 1, P (EC ∩ F) ≥ 0. Therefore, we have proved the proposition, P (E) ≤ P (F) 1– 122 Cont….
  • 123. Proposition 3 P (E ∪ F) = P (E) + P (F) – P (E ∩ F) Probability of unions: Event E ∪ F can be written as the union of the two disjoint events namely E and (EC ∩ F). Thus, from axiom 3, P (E ∪ F) = P [E ∪ (EC ∩ F)] = P (E) + P (EC ∩ F) (1) Also, F = (E ∩ F) ∪ (EC ∩ F), hence, P (F) = P (E ∩ F) + P (EC ∩ F) (2) From (1) and (2) we get the proposition 3 as, P (E ∪ F) = P (E) + P (F) - P (E ∩ F) Extended statement of this proposition for n events is also called as inclusion- exclusion principle. P(E ∪ F ∪ G) = P(E) + P(F) + P(G) – P(EF) – P(FG) – P(EG) + P(E∩F∩G) 1– 123 Cont….
  • 124. Proposition 4 Mutually exclusive events: When the sets corresponding to two events are disjoint (have no common elements, or the intersection is null), the two events are called mutually exclusive. E ∩ F = Φ Therefore, P (E ∩ F) = P (Φ) = 0 Also, for mutually exclusive events E and F, P (E ∪ F) = P (E) + P (F) 1– 124 Cont….
  • 125. Proposition 5 P (EC∩F) = P (F) – P (E∩F) From set theory, F can be written as a union of two disjoint events E ∩ F and EC ∩ F . Hence, by Axiom III, we have, P(F) = P(E ∩ F) + P(EC ∩ F). By re- arranging the terms we get the result. 1– 125
  • 126. Addition Theorem of Probability  The addition theorem in the probability concept is the process of determination of the probability that either event ‘A’ or event ‘B’ occurs or both occur. The notation between two events ‘A’ and ‘B’ the addition is denoted as ‘∪’ and pronounced as Union. 1– 126 Cont…. Let A and B be two events defined in a sample space. The union of events A and B is the collection of all outcomes that belong either to A or to B or to both A and B and is denoted by A or B.
  • 127. The result of this addition theorem generally written using Set notation, P (A ∪ B) = P (A) + P (B) – P (A ∩ B), Where, P (A) = probability of occurrence of event ‘A’ P (B) = probability of occurrence of event ‘B’ P (A ∪ B) = probability of occurrence of event ‘A’ or event ‘B’. P (A ∩ B) = probability of occurrence of event ‘A’ or event ‘B’.Addition theorem probability can be defined and proved as follows: Let ‘A’ and ‘B’ are Subsets of a finite non empty set ‘S’ then according to the addition rule P (A ∪ B) = P (A) + P (B) – P (A). P(B), On dividing both sides by P(S), we get P (A ∪ B) / P(S) = P (A) / P(S) + P (B) / P(S) – P (A ∩ B) / P(S) (1). 1– 127
  • 128. Multiplication Theorem of Probability  Probability is the branch of mathematics which deals with the occurrence of samples. The basic form of Multiplication theorems on probability for two events ‘X’ and ‘Y’ can be stated as, P (x. y) = p (x). P(x / y)  Here p (x) and p (y) are the probabilities of occurrences of events ‘x’ and ‘y’ respectively. P (x / y) is the Conditional Probability of ‘x’ and the condition is that ‘y’ has occurred before ‘x’. P (x / y) is always calculated after ‘y’ has occurred. Here, occurrence of ‘x’ depends on ‘y’. ‘y’ has changed some events already. So, occurrence of ‘x’ also changes. 1– 128
  • 129. Conditional Probability 1– 129  Conditional probability is the probability that an event will occur given that another event has already occurred. If A and B are two events, then the conditional probability of A given B is written as P(A/B) and read as “the probability of A given that B has already occurred.”
  • 130. 1– 130  Consider two events, E and F. Whatsoever be the events, we can always say that the probability of E is equal to the probability of intersection of E and F, plus, the probability of the intersection of E and complement of F. That is, P (E) = P (E ∩ F) + P (E ∩ F ∩ C) Law of Total Probability
  • 131. Bayes’s Formula Let, E and F are events. E = (E ∩ F) U (E ∩ F ∩ C) For any element in E, must be either in both E and F or be in E but not in F. (E F) and (E FC) are mutually exclusive, since former must be in F and latter must not in F, we have by Axiom 3, P (E) = (E F) + (E FC) = P(E/F) × P(F) +P(E/FC) × P(FC) = P(E/F) × P(F) + ()[1()] 1– 131
  • 133. Combinatorial Concept 1– 133 Product Rule of Counting Permutation Combination 1 2 43 Sum Rule of Counting
  • 134. Summary  In this chapter, we discussed basic idea of probability. We defined probability in different ways and pointed out serious limitations of each definition.  Then we discussed axioms of probability, which are the backbone of theory of probability. Then we studied number of useful propositions of probability.  We also defined conditional probability, law of total probability, and Bayes’ Theorem. We also defined mutually exclusive events, and independence of events.  Lastly, we discussed few important concepts of combinatorial analysis, which comes very handy while calculating probability of an event. 1– 134
  • 135. Probability Distribution 1– 135 S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 136 – 136 2. Topic 1 Introduction 137 – 137 3. Topic 2 Random Variable 138 – 139 4. Topic 3 Probability Distributions of Standard Random Variables 140 – 140 5. Topic 4 Bernoulli Distribution 141 – 142 6. Topic 5 Binomial Distribution 143 – 145 7. Topic 6 Poisson Distribution 146 – 147 8. Topic 7 Normal Distribution 148 – 149 9. Topic 8 Summary 150 – 153
  • 136. Learning Objectives After studying this chapter, you should be able to:  Differentiate between discrete and continuous random variables  Discuss probability distributions of standard random variable  Understand discrete probability distribution which include Binomial and Poisson Distribution  Explain continuous probability distribution which includes Normal distribution 1– 136
  • 137. Introduction  We will study a few common distributions in this chapter.  Normal distribution has extensive use in statistical tools and therefore readers are advised to study it in detail.  Knowledge of sequences, series and calculus is expected. 1– 137
  • 138. Random Variable Arandom variable, usually writtenX, is a variable whose possible values are numerical outcomes of a random phenomenon. 1– 138 Cont….
  • 139. 1– 139 Discrete and Continuous Random Variables Probability Mass Function (P.M.F.) Probability Density Function Cumulative Distribution Function Expectation Value of Random Variables Expected Value of a Function of a Random Variable Variance and Standard Deviation of Random Variable
  • 140. Probability Distributions of Standard Random Variables 1– 140 Bernoulli Distribution Binomial Distribution Normal Distribution Poisson Distribution 2 1 3 4
  • 141. Bernoulli Distribution 1– 141 Cont….  It is a basis of many discrete random variables, as it deals with individual trial. It is a building block for other random variables. It is a single trial distribution.
  • 142. 1– 142 Bernoulli trial is fundamental to many discrete distributions like Binomial, Poisson, Geometric, etc. Situations where Bernoulli distribution is commonly used are:  Sex of newborn child; Male = 0, Female = 1 say.  Items produced by a machine are Defective or Non-defective.  During next flight an engine will fail or remain serviceable.  Student appearing for examination will pass or fail. Application of Bernoulli Distribution
  • 143. Binomial Distribution 1– 143 Cont…. A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution (also known as a Bernoulli distribution).
  • 144. Applications of Binomial Distribution  Trials are finite (and not very large), performed repeatedly for ‘n’ times.  Each trial (random experiment) should be a Bernoulli trial, the one that results in either success or failure.  Probability of success in any trial is ‘p’ and is constant for each trial.  All trials are independent. 1– 144 Cont….
  • 145. Following are some of the real life examples of applications of binomial distribution.  Number of defective items in a lot of n items produced by a machine.  Number of male births out of n births in a hospital.  Number of correct answers in a multiple-choice test.  Number of seeds germinated in a row of n planted seeds.  Number of re-captured fish in a sample of n fishes.  Number of missiles hitting the targets out of n fired. 1– 145
  • 146. Poisson Distribution 1– 146 Cont…. A random variable X, taking one of the values 0, 1, 2 … is said to be a Poisson random variable with parameter λ, if for some λ > 0, P(X = i) is a probability mass function (p.m.f.) of the Poisson random variable. Its expected value and variance are, m = E [X] = l Var [X] = l
  • 147. 1– 147 Some of the common examples where Poisson random variable can be used to define the probability distribution are:  Number of accidents per day on expressway.  Number of earthquakes occurring over fixed time span.  Number of misprints on a page.  Number of arrivals of calls on telephone exchange per minute.  Number of interrupts per second on a server.
  • 148. Normal Distribution 1– 148 Cont…. Equation For Normal Probability Curve Standard Normal Distribution Properties Of Normal Distribution Areas Under Standard Normal Probability Curve Importance Of Normal Distribution
  • 149. 1– 149 Area under the Normal Curve
  • 150. Summary  Random variable is a real valued function defined over a sample space with probability associated with it. The value of the random variable is outcome of an experiment. Random variables are neither ‘random’ nor ‘variable’.  In this chapter we discussed several important random variables, the associated formulae, and problem solving using formulae. A discrete random variable is the one that takes at the most countable values. A continuous random variable can take any real value. 1– 150 Cont….
  • 151.  We also discussed probability distributions of random variables. Binomial distribution is used if an experiment is carried out for finite number of n independent trials; all trials being Bernoulli trials with constant probability of success p.  Random variable will follow Poisson distribution if it is the number of occurrences of a rare event during a finite period. Waiting time for a rare event is exponentially distributed. Negative binomial distribution is used if numbers of Bernoulli trials are made to achieve desired number of successes. 1– 151 Cont….
  • 152.  One of the continuous random variable required often is uniform random variable. Waiting time for an event that occurs periodically follows uniform distribution.  Normal probability distribution is the most important distribution in statistics. We defined normal distribution with parameters (μ, σ) where μ is mean and σ is standard deviation.  Further, we defined standard normal distribution, which is a special case of normal distribution with parameters (0, 1). 1– 152 Cont….
  • 153.  We also discussed transformation of normal random variable X to standard random variable Z using xzms−= Z distribution is very convenient for manual calculation as we can use standard normal tables which are extensively plotted, to find probability and interval.  Normal distribution is used as a model in many real world situations, both as a continuous distribution or an approximation to discrete distributions like binomial or Poisson. 1– 153
  • 154. Use of Excel Software for Statistical Analysis 1– 154 S. No. Reference No. Particulars Slide From – To 1. Learning Objectives 155 – 155 2. Topic 1 Introduction 156 – 157 3. Topic 2 Introduction to Excel 158 – 168 4. Topic 3 Entering Data in Excel 169 – 169 5. Topic 4 Descriptive Statistics 170 – 172 6. Topic 5 Basic Built-in Functions (Average, Mean, Mode, Count, Max and Min) 173 – 177 7. Topic 6 Statistical Analysis 178 – 182 8. Topic 7 Normal Distribution 183 – 183 9. Topic 8 Brief about SPSS 184 – 189 10. Topic 9 Summary 190 – 194
  • 155. Learning Objectives After studying this chapter, you should be able to:  Understand the basic concepts of using Microsoft Excel  Discuss how to enter data in excel and basic built-in functions  Gain knowledge about SPSS 1– 155
  • 156. Introduction The most popular software in the MS Office Suite includes the following:  Microsoft Word  Microsoft Excel  Microsoft PowerPoint  Microsoft Access  Microsoft Project Plan  Microsoft Outlook 1– 156 Cont….
  • 157. 1– 157 MICROSOFT OFFICE SUITE Suite Product Home and Student Home and Business Professional Word 2010 Included Included Included Excel 2010 Included Included Included PowerPoint 2010 Included Included Included OneNote 2010 Included Included Included Outlook 2010 - Included Included Access 2010 - - Included Publisher 2010 - - Included
  • 158. Introduction to Excel Opening A Document  Click on File-Open (Ctrl+O) to open/retrieve an existing workbook; change the directory area or drive to look for files in other locations.  To create a new workbook, click on File-New-Blank Document. 1– 158 Cont….
  • 159. Saving And Closing A Document  To save your document with its current filename, location and file format either click on File - Save.  When you have finished working on a document you should close it. Go to the File menu and click on Close. 1– 159 Cont….
  • 162. Workbooks and Worksheets 1– 162 Cont…. Cell Row Column Spreadsheet Workbook
  • 163. 1– 163 Cont…. Cell Name Box Spreadsheet Tabs in Excel
  • 164. Moving Around the Worksheet 1– 164 Cont…. Margins Orientation Paper Size Print Area
  • 168. Moving between Cells  While working with any Office productivity tool, the clipboard functions are invaluable.  The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’.  In the Microsoft Office suite, there are keyboard shortcuts for these functions. 1– 168 KEYBOARD SHORTCUTS Cut Ctrl + X Copy Ctrl + C Paste Ctrl + V
  • 169. Entering Data in Excel  A new worksheet is a grid of rows and columns. The rows are labeled with numbers, and the columns are labeled with letters. Each intersection of a row and a column is a cell. 1– 169 Entering Labels Entering Values Rounding Numbers that Meet Specified Criteria Sorting by Columns
  • 170. 1– 170 Cont…. Descriptive Statistics  Excel includes elaborate and customisable toolbars, for example the “standard” toolbar shown here:  Some of the icons are useful mathematical computation: is the “Autosum” icon, which enters the formula “=sum ()” to add up a range of cells.  is the “Function Wizard” icon, which gives you access to all the functions available.
  • 171.  is the “Graph Wizard” icon, giving access to all graph types available, as shown in this display: 1– 171 Cont….
  • 172. Excel can be used to generate measures of location and variability for a variable. Suppose we wish to find descriptive statistics for a sample data: 2, 4, 6, and 8.  Step1: Select the Tools *pull-down menu, if you see data analysis, click on this option, otherwise, click on add-in.. option to install analysis tool pak.  Step 2: Click on the data analysis option.  Step 3: Choose Descriptive Statistics from Analysis Tools list.  Step 4: When the dialog box appears: Enter A1:A4 in the input range box, A1is a value in column A and row 1; in this case this value is 2. Using the same technique enters other VALUES until you reach the last one.  Step 5: Select an output range, in this case B1. Click on summary statistics to see the results. Select OK. 1– 172
  • 173. Basic Built – in Functions (Average, Mean, Mode, Count, Max and Min) Manual Equation Entry 1– 173 Cont….
  • 175. SUM Function The SUM function is probably the most commonly used function in Excel. It comes in three flavours in Excel, namely: 1– 175 Cont…. SUMIF() SUMIFS() SUM() 1 2 3
  • 176. 1– 176 Cont…. Logical Functions TRUE AND () IF () OR () NOT IFERROR () FALSE
  • 177. Statistical Functions  Statistical functions are invaluable in any mathematical calculations.  They can provide insights into trends provide data for detailed analysis as well as help identify gaps that need to be plugged.  Excel provides a wide range of functions that can be used to perform basic statistical analyses. 1– 177
  • 178. Statistical Analysis Creating Charts  Select the data range (only numbers) for which the chart needs to be created.  Under the Insert Ribbon, in the Chart section, click on the type of chart you want to create and the category. Here the clustered chart has been used.  Select the chart and click on Select Data button in Data section of the Design Layout.  In the Select Data Source dialog, select ‘Series 1’ and click on Edit button. 1– 178 Cont….
  • 180. This opens the Edit Series dialog that allows you to change the range of values in series and provide a Series name. For the series name, click on icon to select the column title of Series 1. 1– 180 Cont…. Edit Series
  • 181. Histogram Now follow the steps given below to draw histogram.  Select the first two columns i.e. class interval and frequency in the Excel sheet.  Click on ‘Chart Wizard’ icon on tool bar or select from menu [Insert → Chart…..] From insert drop down menu. A dialogue box with title ‘Chart Wizard – Step 1 to 4 – Chart type’ will appear.  In the menu ‘Standard Type’, select ‘Column’. Click on ‘Next’ button.  Now the next menu with title ‘Chart Wizard – Step 2 to 4 – Chart Source Data’ will appear. Since we have already selected the source data, select ‘Next’. Don’t forget to check that column is selected in data series.  Now the next menu with title ‘Chart Wizard – Step 3 to 4 – Chart Options’ will appear. 1– 181 Cont….
  • 182. Correlation Plot and Regression Analysis Using MS Excel for calculating Karl Pearson’s correlation coefficient Calculating Karl Pearson’s correlation coefficient using MS Excel is very simple. The steps are as follows:  Open an Excel worksheet and enter the data values of X and Y variables as two arrays (columns or rows). Keep these contiguous if possible.  Select the cell where you want to store the result r. Enter the formula with syntax as, ‘=CORREL (array1, array2)’ ‘array1’ is a cell range of values and ‘array2’ is a second cell range of values. 1– 182
  • 183. Normal Distribution NORMDIST returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing. Syntax: NORMDIST(x,mean,standard_dev,cumulative)  X is the value for which you want the distribution.  Mean is the arithmetic mean of the distribution.  Standard_dev is the standard deviation of the distribution. 1– 183
  • 184. Brief about SPSS SPSS Statistics is a software package used for statistical analysis. SPSS Files  SPSS uses several types of files. First, there is the file that contains data view and variable view. These have been entered using SPSS Data Editor Window. It is known as an SPSS system file. 1– 184 Cont….
  • 185. 1– 185 Cont…. SPSS Data Editor Window – Data View
  • 186. 1– 186 Cont…. Data Editor Window – Variable View
  • 187. Define Variable Dialog Box 1– 187 Cont…. Student Motivation Not willing Undecided Willing
  • 188. 1– 188 Cont…. Value Labels – Dialog Box Value Labels Coded with Value and Value Label
  • 189. 1– 189 SPSS Data Editor Window with all Record Entered
  • 190. Summary  Microsoft office is one of the most powerful office productivity tools in the market today. The entire suite is vast and covers a wide range of software solutions catering to various aspects of modern businesses.  Microsoft excel is a powerful accounting and calculation solution. It has a standard tabular layout and it supports a wide range of arithmetic, accounting and statistical functions.  The Microsoft Outlook is the mail client that can be set up to download mails from a mail server as well as send and receive emails as desired. Being a part of the Microsoft Office suite, this tool is compatible with other applications in the suite. 1– 190 Cont….
  • 191.  One of the most popular and widely used Microsoft Office Suites is the MS Office 2003. Later Microsoft released two other versions of Office, namely Office 2007 and Office 2010. Although Office 2010 is the latest version, many businesses still continue to use Office 2003. From Office 2003 to Office 2007, Microsoft radicalised the overall look and feel of the office suite.  Excel is built on the concept of cell, rows, columns, spreadsheets and workbooks. The entire structure is hierarchical, and this allows it to be scalable and versatile enough to adapt to varying needs for users from different specialisations. Understanding the following concepts is pretty useful in developing complex reports and models. 1– 191 Cont….
  • 192.  As long as you work on the soft copies, page layouts are not really important – you can scroll a spreadsheet to view the contents. However, when it comes to printouts it is important that one gets the page layouts sorted out. Excel 2010 has all the page layout options under Page Layout menu item.  While working with any Office productivity tool, the clipboard functions are invaluable. The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’. In the Microsoft Office suite, there are keyboard shortcuts for these functions. Once you become conversant with the Excel functions, you would prefer to use the keyboard shortcuts as they are faster and easier to use than the mouse. 1– 192 Cont….
  • 193.  A new worksheet is a grid of rows and columns. The rows are labelled with numbers, and the columns are labelled with letters. Each intersection of a row and a column is a cell. Each cell has an address, which are the column letter and the row number. The arrow on the worksheet to the right points to cell A1, which is currently highlighted, indicating that it is an active cell. A cell must be active to enter information into it.  Excel is a very powerful accounting tool, but before going to the real complex functions, let us sees how to use Excel for simple calculations. There are two ways of using Excel for simple calculations: you can enter the actual arithmetic equations in the cell or use pre-defined Excel formulas to do the same. 1– 193 Cont….
  • 194.  Statistical calculations for exponential random variables could be calculated using statistical functions available in MS Excel. NORMDIST returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing. Syntax: NORMDIST(x,mean,standard_dev,cumulative)  SPSS Statistics is a software package used for statistical analysis. Long produced by SPSS Inc., it was acquired by IBM in 2009. The current versions (2014) are officially named IBM SPSS Statistics. Companion products in the same family are used for survey authoring and deployment (IBM SPSS Data Collection), data mining(IBM SPSS Modeler), text analytics, and collaboration and deployment (batch and automated scoring services). 1– 194