Introduction to regression analysis 2

INTRODUCTION TO REGRESSION
ANALYSIS_2
Session 03
Sibashis Chakraborty

MEASUREMENT SCALE OF VARIABLES
The variables that we will generally encounter shall
fall into either of these four broad categories,
namely,
• Ratio Scale.
• Interval Scale.
• Ordinal Scale.
• Nominal Scale.

RATIO SCALE
For a variable 𝑋, taking two values, 𝑋1 and 𝑋2, the
ratio
𝑋1
𝑋2
and the distance (𝑋2 − 𝑋1) are meaningful
quantities.
Also there is a natural ordering (ascending or
descending) of the values along the scale which makes
comparisons such as, 𝑋2 ≥ 𝑋1 𝑎𝑛𝑑 𝑋1 ≥ 𝑋2
meaningful.

INTERVAL SCALE
An interval scale variable satisﬁes the last two
properties of the ratio scale variable but not the ﬁrst.
Thus, the distance between two time periods, say
(2000–1995) is meaningful, but not the ratio of two
time periods (2000/1995).

ORDINAL SCALE
 A variable belongs to this category only if it
satisﬁes the third property of the ratio scale (i.e.,
natural ordering). Examples are grading systems (A, B,
C grades) or income class (upper, middle, lower). For
these variables the ordering exists but the distances
between the categories cannot be quantiﬁed.

NOMINAL SCALE
 Variables in this category have none of the features
of the ratio scale variables. Variables such as gender
(male, female) and marital status (married, unmarried,
divorced, separated) simply denote categories.
Also called Categorical Variables.

SIMPLE REGRESSION ANALYSIS
SOME BASIC IDEAS ON

CONDITIONAL EXPECTED VALUES VERSUS
EXPECTED VALUES
As pointed out before Regression analysis is largely concerned with
estimating and/or predicting the population mean value of the dependent
variable on the basis of the known/fixed values of the explanatory variable.
To understand the concept of Conditional Means (Conditional Expected
values) versus Unconditional Means(Expected values), we take an example
where we have a dataset consisting of 60 families in a community with their
weekly income (X) and weekly consumption expenditure (Y).

OBSERVATIONS
Considerable variation in consumption expenditure within each income group.
However on the average weekly consumption expenditure increases as income rises.
That is to say the mean or average weekly expenditure corresponding to each of the
10 levels of income increases, as we move along the table from left to right.
We have 10 different income levels and for each of them we have mean
consumption expenditures. We call these Conditional Expected Values, as they
depend on the given values of X(income level).

CONTINUED…
Symbolically Conditional Expected Means are denoted by 
𝐸 𝑌 𝑋 , 𝑟𝑒𝑎𝑑 𝑎𝑠 𝒆𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒗𝒂𝒍𝒖𝒆𝒔 𝒐𝒇 𝒀 𝒈𝒊𝒗𝒆𝒏 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆 𝒐𝒇 𝑿.
The Unconditional Expected Value of consumption 𝐸(𝑌) is the sum of all consumption
expenditures divided by the total number of families, $ 121.20. It is unconditional in the
respect that we disregard the various income levels of the families.
The question is WHY IS THE CONDITIONAL MEAN OR EQUIVALENTLY KNOWLEDGE OF
INCOME LEVELS IMPORTANT WHILE COMPUTING MEAN VALUES?

CONTINUED…
What if we face a question where we are supposed to predict the weekly
expenditure of families with a weekly income of $140 ?
Which of these would serve as a better prediction to the problem ?
1. 𝐸 𝑌 = $121.20
2. 𝐸 𝑌 𝑋 = 140 = $101
Guess and Fix Your Answers!.

CONTINUED…
The Correct answer would be $101 or the Conditional
Mean of Consumption expenditure (Y) where the level
of income (X) is given as $140.
‘Thus the knowledge of the income level may enable us
to better predict the mean value of consumption
expenditure than if we do not have that knowledge. This
probably is the essence of regression analysis.’

POPULATION REGRESSION LINE (PRL)
The dark circled points show the conditional mean
values of Y against various X values. Joining these
conditional mean values we obtain the Population
Regression Line(Curve), or more simply, it is the
regression line of Y on X.

CONTINUED…
‘Geometrically, then, a population regression
curve is simply the locus of the conditional
means of the dependent variable for the ﬁxed
values of the explanatory variable(s).’

POPULATION
REGRESSION
FUNCTION
(PRF)
From our previous discussions it is evident that,
𝐸(𝑌|𝑋𝑖) is a function of 𝑋𝑖, where 𝑋𝑖 is the given
value of 𝑋.
Symbolically,
𝐸 𝑌 𝑋𝑖 = 𝑓(𝑋𝑖)
This is known as the Conditional Expectation function
or the Population Regression Function. It states
merely that the expected value of the distribution of Y
given 𝑋𝑖 is functionally related to 𝑋𝑖.

MORE ON PRF…
The functional form of PRF is an empirical question since in real world situations we do not
have the entire population available for examination.
However in certain cases theory may have some contribution, as in the case of defining how
Consumption expenditure is linearly related to Income. In such cases as a working hypothesis
we can assume the PRF in the following form,
𝐸 𝑌 𝑋𝑖 = 𝛽1 + 𝛽2 𝑋𝑖  Linear PRF / LRM
Where 𝛽1 and 𝛽2 are unknown but fixed parameters known as Regression Coefficients.

SIGNIFICANCE OF THE TERM ‘LINEARITY’
‘Linearity’ can be interpreted in two ways,
I. Linearity in Variables : A function Y = f(X) is said to be linear in X if X appears
with a power or index of 1 only (that is, terms such as 𝑋2
,√X, and so on, are
excluded) and is not multiplied or divided by any other variable (for example, X·
Z or X/Z, where Z is another variable).
II. Linearity in Parameters : A function is said to be linear in the parameter, say, 𝛽1 ,
if 𝛽1 appears with a power of 1 only and is not multiplied or divided by any
other parameter (for example, 𝛽1 𝛽2, 𝛽2 / 𝛽1, and so on).

LINEARITY IN
LINEAR
REGRESSION
“Linear” regression will always mean a regression
that is linear in the parameters; the 𝛽′ 𝑠. It may or
may not be linear in the explanatory variables, the
X’s.
Some examples of Linear-in-parameter functions
but not in Variables are presented in the
following slides.

STOCHASTIC SPECIFICATION OF PRF
We have seen that average consumption expenditure of a family rises with the level of
Income.
One step ahead we look into how individual family expenditure on consumption is related
to the level of Income.
Does it show a similar trend as the average consumption expenditure?
The answer is NO!

CONTINUED…
It is seen that the
expenditure on consumption
of an individual family is
clustered around the
conditional mean of
expenditure for a given level
of Income.

CONTINUED…
The deviation of an individual consumption expenditure around it’s conditional mean is
given by
𝑢𝑖= 𝑌𝑖 − 𝐸(𝑌|𝑋𝑖)
𝑌𝑖 = 𝐸 𝑌 𝑋𝑖 + 𝑢𝑖
Where 𝑢𝑖 is the unobservable random error term that can take +ve or –ve values.
We interpret the second equation above as follows,
The expenditure of an individual family has two components
i. The mean consumption expenditure at a given level of income which by nature is
deterministic/systematic and,
ii. The Random error term which by nature depends on chance and other factors
affecting consumption expenditure not taken under consideration.

THE SAMPLE REGRESSION FUNCTION (SRF)
In most cases we do not have the entire population at our disposal to study on it. That is when
Samples come into play. In practical scenarios we have a set of observed values of Y(samples)
corresponding to some fixed X’s out of the population.
The task is to estimate the PRF based on these samples.
We must note here that because sample observations do not include all the individual units
from a population we will only get an approximate estimate of the PRF from samples, further
we will get different estimates for different samples drawn from the same population due to
sampling fluctuations.

SRF
The two sample regression lines are made to
fit the scatter obtained from two different
samples reasonably well, so as to show that
we may have n such regression lines from n
different samples drawn from the same
population.
Since both of these serve as an estimate of
the PRF, it is not straightforward to judge
which of these is a better estimate over the
other.

SRF
Similar to the PRF we develop the concept of SRF to represent the
Sample Regression line in the following way
෡𝑌𝑖 = ෢𝛽1 + ෢𝛽2 𝑋𝑖
Where
෡𝑌𝑖= estimator of 𝐸(𝑌|𝑋𝑖)
෢𝛽1= estimator of 𝛽1
෢𝛽2= estimator of 𝛽2

FURTHER
DISCUSSION
Considering that the SRF is an approximation of PRF,
our question now remains, can we devise a rule which
will make this approximation as close as possible?
This we shall discuss in our next presentation.

REFERENCES
Gujarati N Damodar, Porter C Dawn, Gunasekar Sangeetha ; Basic
Econometrics (Fifth Edition).
Bhaumik K Sankar ; Principles of Econometrics: A modern approach
using Eviews (First Edition).

Introduction to regression analysis 2

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Introduction to regression analysis 2

Ähnlich wie Introduction to regression analysis 2 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Introduction to regression analysis 2