Econometrics Lecture 1: Intro to Models

Econometrics: ECON2300 – Lecture 1
The Econometric Model:
Econometrics is about how we can use theory and data from economics, business and the social
sciences, along with tools from statistics, to answer “how much” type questions.
In economics we express our ideas about relationships between economics variables using the
mathematical concept of a function. An example of this is when expressing the price of a house in
terms of its size.
Price = f(size)
Hedonic Model: A model that decomposes the item being researched into its constituent
characteristics, and obtains estimates of the contributory value of each characteristic
An example of a hedonic model for house price might be expressed as:
)
,
,
,
,
,
,
(
Price oning
airconditi
pool
age
stories
bathrooms
bedrooms
size
f

Economic theory does not claim to be able to predict the specific behaviour of any individual or firm,
but rather is describes the average or systematic behaviour of many individuals for firms.
Economic models = Generalisation
In fact we realise that there will be a random and unpredictable component e that we will call random
error. Hence the econometric model for price would be
e
oning
airconditi
pool
age
stories
bathrooms
bedrooms
size
f 
 )
,
,
,
,
,
,
(
Price
The random error e, accounts for the many factors that affect sales that we have omitted from this
simplistic model, and it also reflects the intrinsic uncertainty in economic activity.
Take for example the demand relation:
i
p
p
p
i
p
p
p
f
q c
s
c
s
d
5
4
3
2
1
)
,
,
,
( 



 





The corresponding econometric model is:
e
i
p
p
p
i
p
p
p
f
q c
s
c
s
d






 5
4
3
2
1
)
,
,
,
( 




Econometric Models include the error term, e
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are
making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such
copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205

In every model there are two parts:
1. A systematic portion – part we obtain from economic theory, includes assumptions about the
functional form.
2. An unobservable random component – “noise” component which obscures our understanding
of the relationship among variables: e.
How Do we Obtain Data?
In an ideal world:
1. We would design an experiment to obtain economic observations or sample information
2. Repeating the experiment N times would create a sample of N sample observations
In the real world:
Economists work in a complex world in which data on variables are “observed” and rarely obtained
from a controlled experiment. It is just not feasible to conduct an experiment to obtain data. Thus we
use non-experimental data generated by an uncontrolled experiment.
Experimental data: Variables can be fixed at specific values in repeated trials of the
experiment
Non-experimental data: Values are neither fixed nor repeatable
Most economic, financial or accounting data are collected for administrative rather than research
purposes, often by government agencies or industry. The data may be:
 Time-series form – data collected over discrete intervals of time (stock market index, CPI,
GDP, interest rates, the annual price of wheat in Australia from 1880 to 2009)
 Cross-sectional form – data collected over sample units in a particular time period (income in
suburbs in Brisbane during 2009, or household census)
 Panel data form – data that follow individual microunits over time (data for 30 countries for
the period 1980-2005, monthly value of 3 stock market indices over the last 5 years)
Data may be collected at various level of aggregation:
 Micro – data collected on individual economic decision-making units units such as
individuals, households, or firms
 Macro – data resulting from a pooling or aggregating over individuals, households, or firms
at the local, state, or national levels
Data collected may also represent flow or a stock:
 Flow – outcome measures over a period of time, such as the consumption of petrol during the
last quarter of 2005
 Stock – outcome measured at a particular point in time, such as the quantity of crude oil held
by BHP in its Australian storage tanks on April 1, 2002, or the asset value of Macquarie Bank
on 5th
July 2009.
lOMoARcPSD|2941205

Data collected may be quantitative or qualitative:
 Quantitative – numerical data, data that can be expressed as numbers or some transformation
of them such as real prices or per capital income
 Qualitative – outcomes that of an “either-or” situation that is whether an attribute is present
or not. Eg. Colour, or whether a consumer purchased a certain good or not (Dummy
variables)
Statistical Inference:
The aim of statistics is to “infer” or learn something about the real world by analysing a sample of
data. The ways which statistical inference are carried out include:
 Estimating economic parameters, such as elasticities
 Predicting economic outcomes, such as the enrolments in bachelor degree programs in
Australia for the next 5 years.
 Testing economic hypotheses, such as: Ii newspaper advertising better than “email”
advertising for increasing sales?
Econometrics includes all of these aspects of statistical inference. There are two types of inference:
1. Deductive: go from a general case  to  a specific case: this is used in mathematical
proofs
2. Inferential: go from a specific case  to  a general case: this is used in statistics
lOMoARcPSD|2941205

Review of Statistic Concepts:
Random variables: Discrete and Continuous
Random variable: A random variable is a variable whose value is unknown until it is observed, it is
not perfectly predictable. The value of the random variable results from an experiment (controlled or
uncontrolled). Uppercase letters (e.g. X) are usually used to denote random variables. Lower case
letters (e.g. x) are usually used to denote values of random variables.
Discrete random variable:
A discrete random variable can take only a finite number of values that can be counted by using the
positive integers
 E.g. The number of cars you own, your age in whole years, etc.
 Dummy variables:






female)
isnot
person
If
female
is
person
If
0
1
D
Probability distribution of a discrete random variable:
A discrete random variable has a probability density function which summarises all the possible
values of a discrete random variable together with their associated probabilities. It can be in the form
of a table, formula or graph.
Two key features of a probability distribution are its centre (location) and width (dispersion); the
mean, μ, and variance, σ2
, respectively. For a discrete random variable X:
Mean:  


 )
(
)
( x
X
P
x
X
E

Variance:    
 





 )
(
)
(
)
(
)
( 2
2
2
x
X
P
x
X
E
X
Var 


It can be seen in the graph above that there are only distinct values that the variable x can take which
is what a discrete variable is – the probability density function is NOT continuous.
Discrete probability distributions are:
1. Mutually exclusive – no overlap between values
2. Collectly exhausted – full sample space covered, includes every possibility
frequency
lOMoARcPSD|2941205

Example: A 5 sided dice is biased: the sides have 0, 1, 2, 3 & 4 respectively the following table
shows the probability distribution.
a) Calculate the mean & variance of X
b) Sketch the probability distribution of X
c) Find P(X 2
 )
X 0 1 2 3 4
P(X) 0.10 0.45 0.30 0.10 0.05
Solution:
a) i) Mean:
55
.
1
05
.
0
4
10
.
0
3
30
.
0
2
45
.
0
1
10
.
0
0
)
4
(
4
)
3
(
3
)
2
(
2
)
1
(
1
)
0
(
0
)
(




























 
X
P
X
P
X
P
X
P
X
P
x
X
P
X

ii) Variance:
 
9475
.
0
05
.
0
)
55
.
1
4
(
10
.
0
)
55
.
1
3
(
30
.
0
)
55
.
1
2
(
45
.
0
)
55
.
1
1
(
10
.
0
)
55
.
1
0
(
)
(
)
(
2
2
2
2
2
2
2



















  x
X
P
X 

b)
c) P(X 2
 )
85
.
0
40
.
0
45
.
0
10
.
0
)
2
(
)
1
(
)
0
(
)
2
(










 X
P
X
P
X
P
X
P
X
P(X=x)
0 1 2 3 4
0.45
0.30
0.10
0.05
lOMoARcPSD|2941205

Continuous random variable:
A continuous random variable can take any real value (not just whole numbers or positive) generally
measurable.
 E.g. Your height, the temperature etc.
Easy way to establish, is to pick a random number eg. 3.4314135315 and ask if the variable can take
that value? If yes then it is continuous, if no it is discrete.
Probability distribution of a continuous random variable:
A continuous random variable has a probability density function which is a smooth non-negative
function representing likely and unlikely values of the random variable.
Two key features of a probability distribution are its centre (location) and width (dispersion); the
mean, μ, and variance, σ2
, respectively. Let f(x) denote the pdf for a random continuous variable X.
Mean: 





 dx
x
f
x
X
E )
(
)
(

Variance:   








 dx
x
f
x
X
E
X
Var )
(
)
(
)
(
)
( 2
2
2



There are an infinite number of points in an interval of a continuous random variable, so a positive
probability cannot be assigned to each point – the area of a line = 0. Therefore, for a continuous
random variable, P(X= x) = 0.
We can only assign probabilities to a range of values or to put it another way, we can only assign a
probability that X will lie within a certain range of variables.




2
1
)
(
)
( 2
1
x
x
dx
x
f
x
X
x
P
Note that it does not matter if greater than or greater than or equal to symbols are used as the
difference in negligible (the probability of a single value is 0).
lOMoARcPSD|2941205

The Normal Distribution:
The most useful continuous distribution is the normal distribution. The Normal distribution has a
probability distribution function (pdf) of:












 

x
e
x
f
x
-
,
2
1
)
(
2
2
2
)
(
2



Important Parameters of the normal distribution:
1. μ = mean: the centre of the distribution.
2. σ2
= variance: level of dispersion
3.
Properties of the normal distribution:
 Symmetric about the mean
 Bell shaped
 The mean, μ median and mode are all the same
 Used to find the probabilities of range
 Probabilities of a single value = 0. E.g. P(X=3) = 0
 There are an infinite number of normal distributions for each value of μ and σ
 Area under the probability Density function is equal to 1
o As symmetric, each side has 0.5 area
 Probability is measured by the area under the curve – the cumulative distribution function
The Standardised Normal Distribution:
 Variance and Standard Deviation of 1
 Mean of 0
 Values greater than the mean have positive Z-Values
 Values less than the mean have negative Z-Values
The most useful element of the normal distribution is that we can “standardise” it to the standard
normal distribution of which we have tables to determine probabilities (Z values)
lOMoARcPSD|2941205





X
Z
Example: In a given population, heights of people are normally distributed with a mean of 160cm
and standard deviation of 10cm.
a) What is the probability that a person is more than 163.5cm tall?
b) What proportion of people have heights between 155cm and 164cm?
Solution:
a)
b)
160cm
 
 
3632
.
0
1368
.
0
5
.
0
35
.
0
0
5
.
0
10
160
5
.
163
10
160
160
5
.
0
5
.
163
160
5
.
0
)
5
.
163
(












 










Z
P
Z
P
X
P
X
P
160cm
 
   
3283
.
0
1368
.
0
1915
.
0
35
.
0
0
0
5
.
0
10
160
5
.
163
10
160
160
10
160
160
10
160
155
5
.
163
160
)
160
155
(
)
5
.
163
155
(















 









 












Z
P
Z
P
Z
P
Z
P
X
P
X
P
X
P
lOMoARcPSD|2941205

The Chi-Square Distribution:
The Chi-square random variables arise when standard normal random variables are squared. If Z1,
Z2, ..., Zm denote m independent N(0,1) random variables, then
  






m
i
m
i
m
m Z
Z
Z
Z
V
1
2
)
(
2
2
)
(
2
2
2
2
1 ~
~ 


The notation
2
)
(
~ m
X
V is read as: the random variable V has a chi-square distribution with m
degrees of freedom.
The degrees of freedom parameter m indicates the number of independent N(0,1) random variables
that are squared and summed to form V. The value of m determines the entire shape of the chi-
square distribution – including its mean and variance.
 
  m
V
m
E
V
E
m
m
2
var
)
var(
)
(
2
)
(
2
)
(






The Values of V must be non-negative, v  0, because V is formed by squaring and summing m
standardised normal N(0,1) random variables. The distribution has a long tail, or is
lOMoARcPSD|2941205

skewed to the right (long tail to the right). As the degrees of freedom increase m gets larger and the
distribution becomes more symmetric and “bell-shaped”. As m gets larger, the chi-square
distribution converges to and essentially becomes the normal distribution.
The student ‘t’ Distribution:
A ‘t’ random variable is formed by dividing a standard normal random variable Z ~ N(0,1) by the
square root of an independent chi-square random variable, V ~ χ2
m.
m
t
m
V
Z
t ~

The t-distributions shape is completely determined by the degrees of freedom parameter, m and the
distribution is symbolised by tm.
Note that the t distribution is more spreadout than the standard normal distribution and less peaked.
With mean and variance:
2
)
var(
0
)
(
)
(
)
(



m
m
t
t
E
m
m
As the number of degrees of freedom approaches infinity, the distribution approaches the standard
normal. N(0,1).
lOMoARcPSD|2941205

The F distribution:
An F random variable is formed by the ratio of two independent chi-square random variables that
have been divided by their degrees of freedom. If V1 ~ χ2
m1 and V2 ~ χ2
m2 and if V1 and V2 are
independent, then:
)
,
(
~
/
/
2
1
2
2
1
1
m
m
F
m
V
m
V
F 
The F-distribution is said to have m1 numerator degrees of freedom and m2 denominator degree’s of
freedom. The values of m1 and m2 determine the shape of the distribution, which in general looks
like the figure below.
The graph below shows the range of shapes the distribution can take for different degrees of
freedom.
lOMoARcPSD|2941205

Laws of Expectation and Variation:
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
0
]
[
]
[
2
2
Y
Var
X
Var
Y
X
Var
Y
E
X
E
Y
X
E
X
Var
a
b
aX
Var
b
X
aE
b
aX
E
X
Var
a
aX
Var
X
aE
aX
E
b
Var
b
b
E















The Error Term:
The error term in a regression model is a random variable. Like other random variables it is
characterised by:
a) A mean (or expected value)
b) A variance
c) A distribution (i.e. probability density function)
We usually assume the random error term of an econometric model to:
a) Have expected value of zero
b) Have a variance which we will call σ2
Where:
a and bare constants
X and Y are random variables
lOMoARcPSD|2941205

The smaller the variance of the error term, the more efficient the model.
Sampling Distributions:
We can usually draw many samples of size n from a population. Each sample can be used to
compute a sample statistic (eg. A sample mean) these statistics will vary from sample to sample. If
we take infinitely many samples of a normally distributed random variable X in the population the
sample statistic X will also be normally distributed.
The probability distribution that gives all possible values of a statistic and associated probabilities is
known as a sampling distribution.
If Xi ~ N(μ,σ2
) then, )
/
,
(
~ 2
N
N
X 

If the distribution of X is non-normal but n is large, then X is approximately normally distributed.
The approximation is good when n 30
 - this is known as the central limit theory.
Central limit Theorem:
If Y1, ..., YN are independent and identically distributed random variables with mean μ and variance
σ2
, and 
 N
Yi
Y / , then
N
Y
ZN




has a probability distribution that converges to the standard normal as N (0,1) as N  ∞.
Estimators & Estimates:
A point estimator is a rule or formula which tells us how to use a set of sample observations to
estimate the value of a parameter of interest. A point estimate is the value obtained after the
observations have been substituted into the formula.
Desirable properties of point estimators include:
 Unbiased – an estimator ˆ is an unbiased estimator of the population parameter θ if E(ˆ) = θ
 Efficiency - 1
ˆ
 is more efficient than 2
ˆ
 if    
2
1
ˆ
var
ˆ
var 
 
 Consistency- the distribution of the estimator becomes more concentrated about the
population parameter as the sample size becomes larger
Note that both bias and variance approach 0 as n approaches infinity.
Estimate: is a particular value for a parameter
Estimator: a formula to get estimate
lOMoARcPSD|2941205

Examples:
N
Xi
X 
 is the best linear unbiased estimator of )
(X
E


 
N
X
Xi
 

2
2
̂ is a biased but consistent estimator of 2
2
)
( 
 
 X
E
 
1
ˆ
2
2


 
N
X
Xi
 is an unbiased and consistent estimator of 2
2
)
( 
 
 X
E
Confidence Intervals:
A confidence interval or interval estimate, is a range of values which contains information not only
about the location of the population mean, but about the precision with which we estimate it.
We can generally use the sampling distribution of an estimator to derive a confidence interval for the
population parameter.
In general, a 100(1-α)% confidence interval for the population mean is given by:
n
Z
x
CI

 

 2
/
Where α is the level of confidence.
Prior to selecting a random sample, the probability that a CI will contain the population parameter is
100(1-α)%. Eg. If we took many samples of size n and calculated the many corresponding random
1-α = 0.95
α/2 α/2
lOMoARcPSD|2941205

intervals
n
Z
x

 
 2
/ then 100(1-α)% would contain μ.
After we construct a confidence interval, either it does or it does not contain the population
parameter, with probabilities 1 and 0 (so we can only say we are 100(1-α)% confident that a
particular confidence interval contains the parameter.
General conclusion: “We can say with 100(1-α)% confidence that the population parameter is
between lower bound and upper bound.”
Hypothesis Testing:
An hypothesis is a statement or claim about the value(s) of one or more population parameters. To
test a hypothesis we
1. Identify a test statistic and find its sampling distribution when the hypothesis is true
2. Reject the hypothesis if the test statistic takes a value that is deemed unlikely
5 steps:
1. State H0 and H1 – H0 must contain an equality ( 

 ,
, )
2. State a decision rule – Reject H0 if...
3. Calculate test statistic
4. Compare, and make decision
5. Write conclusion
Note:
o One-tail or two tail tests can be used
o Can use critical values or p-value method
lOMoARcPSD|2941205

An Econometric Model:
For a given set of data the aim of a econometric model is to fit a regression line and then check how
good it fits.
In order to investigate this relationship between expenditure and income we must build an economic
model and then a corresponding econometric model that forms the basis for a quantitative or
empirical economic analysis.
We must express mathematically which variables are dependent and independent. (In this case we
can say that the weekly expenditure depends on income – y depends on x)
We represent our economic model mathematically by the conditional mean:
x
x
y
E x
y 2
1
|
)
|
( 

 


The conditional mean )
|
( x
y
E is called a simple regression function as there is only one
explanatory variable. The unknown regression parameters 1
 and 2
 are the intercept and slope
respectively.
dx
x
y
dE
x
x
y
E )
|
(
)
|
(
2 




lOMoARcPSD|2941205

For each value of x there is potentially a range of values of y – in fact each has a probability
distribution.
The figure above shows that the regression line passes through the mean of each distribution of
expenditure at each level of income.
The difference between the actual value of y and the expected value is known as the random error
term.
)
(
)
( 2
1 x
y
y
E
y
e 
 




If we rearrange:
e
x
y 

 2
1 

lOMoARcPSD|2941205

Assumptions of the Simple Linear Regression (SLR) Model:
1. The population can be represented by:
e
x
y 

 2
1 

2. The mean value of y, for each value of x is given by the linear regression function
x
x
y
E 2
1
)
|
( 
 

Error term: This means that the mean error term is 0.
0
)
( 
e
E
3. For each value of x, the values of y are distributed about their mean value, following
probability distributions that all have the same variance
2
)
|
var( 

x
y
Error term: This means that the error terms are homoskedastic: constant variance. Violation
of this is hetroskadastic.
)
var(
)
var( 2
y
e 
  v
4. The sample values of y are all uncorrelated and have zero covariance, implying there is no
linear association amoung them,
0
)
,
cov( 
j
i y
y
Error term: There is no Serial Correlation. Note that this assumption can be made stronger
by assuming that the random errors e are all statistically independent in which case the values
of y are also statistically independent.
lOMoARcPSD|2941205

5. The variable x is not random and must take at least two different values.
6. (optional) The values of y are normally distributed about their mean for each value of x.
 
 
2
2
1 ,
~ 

 x
N
y 
Error term: The values of e are normally distributed about their mean
)
,
0
(
~ 2

N
e
If the values of y are normally distributed, and vice versa.
The Error term:
If the regression parameters 1
 and 2
 were known then for any value of y we could calculate:
)
(
)
( 2
1 x
y
y
E
y
e 
 




However, the values of 1
 and 2
 are never known for certain and therefore it is impossible to
calculate e.
The random error e represents all factors affecting y other than x. These factors cause individual
observations y to differ from the mean value:
x
y
E 2
1
)
( 
 

Estimating the Parameters of the Similar Linear Regression:
Our problem is to estimate the location of x
y
E 2
1
)
( 
 
 that best represents our data. We would
expect this line to be somewhere in the middle of all the data points ince it represents mean, or
average behaviour. To estimate 1
 and 2
 we could simply draw a line through the middle of the
data and then measure the slope and intercept with a ruler. The problem with this method is that
different people would draw different lines – in fact there would be an infinite set of possibilities,
and that it would not be accurate.
lOMoARcPSD|2941205

The estimated regression line is given by:
i
i x
b
b
y 2
1
ˆ 

The least squares principle:
The least squares method involves finding estimators 1
 and 2
 that provide the smallest sum of
squared residuals:
 

 

2
2
ˆ
min
ˆ
min i
i
i y
y
e





 2
2
)
(
)
)(
(
x
x
y
y
x
x
b
i
i
i
x
b
y
b 2
1 

We usually use a computer to calculate these values as the process would take too long and be too
tedious to do by hand.
Interpreting the estimates:
 The value of b2 is an estimate of 2
 , the amount by which y increases per unit increase in x
 The value of b1 is an estimate of 1
 , what y would be when x = 0
lOMoARcPSD|2941205

Because the least squares estimate is generated using sample data, different samples will lead to
different values of b1 and b2. Therefore b1 and b2 are random variables.
In this context we call b1 and b2 the least squares estimators, but when actual sample values are
substituted then we obtain values of random variables which are estimates.
Estimators: Formulas for estimates
Estimates: Actual values given by the estimators
The variances and Covariance of b1 and b2:












2
2
2
1
)
(
)
var(
x
x
N
x
b
i
i











 2
2
2
)
(
)
var(
x
x
b
i

The square roots of the estimated variances are known as standard errors.











 2
2
2
1
)
(
)
,
cov(
x
x
x
b
b
i

Summary: the variances and covariances of b1 and b2
 The larger the variance in the error term, 2
 , the greater the uncertainty there is in the
statistical model, and the larger the variances and covariance of the least squares estimators.
 The larger the sum of squares,  2
  x
xi , the smaller the variances of the least squares
estimators and the more precisely we can estimate the unknown parameters
lOMoARcPSD|2941205

In a the data are bunched, the  2
  x
xi is smaller and we cannot estimate the line very
accurately. In b the  2
  x
xi is larger and we can estimate the unknown parameters more
precisely.
 The larger the sample size N, the smaller the variances and covariance’s of the least squares
estimators
 The larger the term  2
i
x is, the larger the variance of the least squares estimator b1
The further our data are from x = 0, the more difficult it is to interpret B1.
 The absolute magnitude of the covariance increase the larger in magnitude is the sample
mean x , and the covariance has a sign opposite that of x .
The probability distribution of the Least Squares Estimators:
 If the normality assumption about the error terms, is correct, the the least squares estimators
are normally distributed.
 If assumptions 1 – 5 hold, and if the sample size is sufficiently large ( 30

n ), then by the
central limit theorem the least squares estimators have a distribution that approximates the
normal distribution shown.
The Gauss-Markov Theorem:
Under the assumptions of SR1-SR5 of the linear regression model, the estimators b1 and b2 have the
smallest variance of all linear and unbiased estimators 2
1 & 
 . They are the Best Linear Unbiased
Estimators (BLUE) of 2
1 & 
 .
To clarify what the Gauss-Markov theorem does, and does not, say:
1. The estimators b1 and b2 are “best” when compared to similar estimators, those that are linear
and unbiased. The theorem does not say that b1 and b2 are the best of all possible estimators.
2. They are the “best” within their class because they have the minimum variance. When
comparing two linear and unbiased estimators we always want to use the one with the
smallest variance.
3. In order for the Gauss-Markov Theorem to hold, assumptions SR1-SR5 must be true. If any
of these assumptions are not true, then b1 and b2 are not the best linear unbiased estimators of
B1 and B2.
4. The Gauss-Markov theorem does not depend on the assumption of normality
5. In simple linear regression these are the estimators to use.
6. The theorem applies to the least squares estimators. It does not apply to the least squares
estimates from a single same.
lOMoARcPSD|2941205

Estimating the variance of the Error term:
The variance of the random error ei is:
    )
(
)
0
(
)
(
)
var(
2
2
2
2
i
i
i
i
i e
E
e
E
e
E
e
E
e 




 
Assuming that the mean error = 0 assumption is correct.
The unbiased estimator of variance is:
2
ˆ
ˆ
2
2

 
N
ei
 with 2
2
)
ˆ
( 
 
E
lOMoARcPSD|2941205

Interval Estimation:
Confidence interval:
)
(
. k
crit
k b
error
std
t
b
CI 


Where:
bk = b1 or b2
tcrit = the critical value )
2
,
2
/
1
( 
 N
t  where N-2 are the degrees of freedom
std. Error = is given by the regression estimation
Before sampling, we can make the probability statement there is a 100(1-α)% chance that the real
value lies within the interval.
After sampling, we can only make a confidence interval – we are 100(1-α)% confident that the real
value lies within the interval.
Example:
Construct a 95% confidence interval for B2 for the following equation when there are 40
observations.
)
09
.
2
(
)
4
.
43
(
21
.
10
4
.
83
ˆ x
y 

Solution:
23016
.
4
21
.
10
09
.
2
024
.
2
21
.
10
09
.
2
21
.
10
09
.
2
21
.
10
09
.
2
21
.
10
)
(
.
)
38
,
975
.
0
(
)
2
40
,
2
/
05
.
0
1
(
)
2
,
2
/
1
(
2
2





















t
t
t
b
error
std
t
b
CI
N
crit

We can say with 95% confidence that the true value of β2 lies within the interval 5.98 to 14.44.
lOMoARcPSD|2941205

Hypothesis Testing:
We can conduct a hypothesis test on the slope of the regression line.
Step 1: State Hypothesis:
,
,
,
:
,
,
,
:
1
0
c
c
c
H
c
c
c
H
k
k
k
k
k
k












Step 2: Decision rule:
Reject H0 if .....
Step 3: Calculate test statistic
Step 4: Compare and decision
Step 5: Conclusion
lOMoARcPSD|2941205

Example:
Using 40 observations on food expenditure.
)
09
.
2
(
)
4
.
43
(
21
.
10
4
.
83
ˆ x
y 

Test whether B2 is less than or equal to 0 at the 5% level of significance.
Step 1: State Hypothesis
0
:
0
:
2
1
2
0




H
H
Step 2: Decision Rule
Reject H0 if tcalc > tcrit
88
.
4
09
.
2
0
21
.
10
2
2
2





b
Se
b
tcalc

4.88 > 2.024 therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that the value of B2, that the
increase in expenditure for a 1 unit increase in income, is not less than or equal to 0.
Types of errors:
H0 true H0 False
Reject H0 Type 1 error = α No error
DNR H0 No error Type 2 error
Rejection
region
024
.
2
38
,
975
.
0
)
2
40
,
2
/
05
.
0
1
(
)
2
,
2
/
1
(








tcalc
t
tcalc
t
tcalc
t
tcalc N

lOMoARcPSD|2941205

The least Squares Predictor:
The linear regression model provides a way to predict y given any value of x. This is extremely
important for forecasters; be it in politics, finance or business. Accurate predictions provide a basis
for better decision making.
Our first SR assumption is that our model is linear: For a given value of the explanatory variable, x0,
the value of the dependent various y0 is given by the econometric model:
0
2
1
0 e
x
y 

 

Where e0 is a random error. This random error has:
1. Mean: E(e0)= 0
2. Variance: var(e0) = σ2
3. Covariance: cov(e0,e1) = 0
The least squares predictor (or estimator) of y0 (given x0) is:
0
2
1
0
ˆ x
b
b
y 

To evaluate how well this predictor or estimator performs we define the forecast error, which is
analogous to the least squares residual.
The variance of the prediction error is:
i
i x
b
b
y 2
1
ˆ 

Forecast error: f
Actual value: yi
i
ŷ
i
x
0
0
2
2
1
1
0
2
1
0
2
1
0
0
)
(
)
(
)
(
ˆ
e
x
b
b
e
x
x
b
b
y
y
f
















Now: if we apply the assumptions SR1 to SR5:
0
0
0
0
)
(
)
(
)
(
)
ˆ
(
)
(
0
0
2
2
1
1
0
0











e
E
x
b
E
b
E
y
y
E
f
E


As:
0
)
(
&
)
(
&
)
( 0
2
2
1
1 

 e
E
b
E
b
E 

Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases,
these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of
assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been
superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the
copyright owner. We are making these materials available for the purposes of research and study and as such believe that this
constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
lOMoARcPSD|2941205
















 2
2
0
2
0
)
(
)
(
1
1
)
ˆ
var(
)
var(
x
x
x
x
N
y
y
f
i

If SR6 holds, or the sample size is large enough, then the prediction error is normally distributed.
Note that, the further x0 is from the sample mean, the larger the variance of the prediction error.
 This means that as you extrapolate more and more your predictions will be less accurate.
Note the variance of the forecast error is smaller when:
i) The overall uncertainty in the model is smaller, as measured by the variance of the
random errors σ2
ii) The sample size N is larger
iii) The variation in the explanatory variable is larger
iv) The value of x0 from x is smaller
The forecast error variance is estimated by replacing
σ2
with its estimator:
)
(
r̂
va
)
(
ˆ
ˆ
)
(
ˆ
)
(
ˆ
ˆ
)
(
)
(
ˆ
ˆ
ˆ
)
(
)
(
1
1
ˆ
)
ˆ
var(
2
2
0
2
2
2
2
2
0
2
2
2
2
0
2
2
2
2
2
0
2
b
x
x
N
x
x
x
x
N
x
x
x
x
N
x
x
x
x
N
f
i
i
i







































i
i x
b
b
y 2
1
ˆ 

i
ŷ
x x2
x1
Obviously:
The estimate that the estimator or predictor
gives at x1 will be close to the actual value as
there are lots of data points that the regression
is based on round x1 – it is close to the sample
mean.
At x2, there are no points very close that the
regression was based on, so the prediction will
be less accurate aka will have a larger variance.
i.e. We can do a better job of predicting in the
region where we have more sample
information.
The standard error of the forecast:
)
(
r̂
va
)
( f
f
se 
Hence, we can construct a (1-α)x100%
prediction interval for y0:
)
(
ˆ0
0 f
se
t
y
y crit


lOMoARcPSD|2941205

Example:
Calculate a 95% confidence interval for y when x0 = 20:
)
(
ˆ0
0 f
se
t
y
y crit


Step 1: Linear equation
From the output above we can determine a linear regression:
)
093
.
2
(
)
41
.
43
(
21
.
10
416
.
83
ˆ
ˆ 2
1
x
y
x
b
b
y




Therefore: when x = 20
616
.
287
)
20
(
21
.
10
416
.
83
ˆ 


y
Step 2: Determine se(f)
)
(
r̂
va
)
( f
f
se 
2
2
2
2
0
2
2
2
0
2
)
0932
.
2
)(
605
.
19
20
(
40
517
.
89
517
.
89
)
(
r̂
va
)
(
ˆ
ˆ
)
(
)
(
1
1
ˆ
)
(
r̂
va






















b
x
x
N
x
x
x
x
N
f
i



S.E of regression Sample size (N)
x-value mean of x se(b2)
Note:
Var(b2) = se(b2)2

)
(
r̂
va f 8214.34 Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205

Step 3: Confidence interval
34
.
8214
024
.
2
616
.
287
34
.
8214
616
.
287
)
(
r̂
va
616
.
287
)
(
ˆ
2
40
,
975
.
0
)
2
,
2
/
1
(
0
0













t
f
t
f
se
t
y
y
N
crit

06
.
471
17
.
104 
 y
Therefore we can say with 95% confidence that the true expenditure on food will be between
$104.17 and $471.06.
Transforming x to obtain se(f):
A simple way to obtain the prediction and prediction interval estimates with EViews ( or any other
econometrics package, including Excel) is as follows:
1. Transform the independent variable x by subtracting x0 from each of the values.
Generate a new variable:
Genr  x2 = x – x0
2. The estimate the regression model by running a regression analysis
3. The estimated standard error of the forecast is given by:
2
1 ˆ
)
var(
)
( 

 b
f
se
Example:
lOMoARcPSD|2941205

The transformation has the following effect:
Measuring Goodness-of-Fit:
Two major reasons for analysing the model
e
x
y 

 2
1 

1. To explain how the dependent variable (yi) changes as the independent variable (xi) changes
2. To predict y0 given an x0
These two objectives come under the broad headings of estimation and prediction. Closely allied
with the prediction problem discussed in the previous section is the desire to use xi to explain as
much of the variable in the dependent variable yi as possible.
i
i x
b
b
y 2
1
ˆ 

lOMoARcPSD|2941205

SST = total sum of squares – measure of total variation in the dependent variable about its sample
mean
SSR = regression sum of squares – the part that is explained by the regression
SSE = sum of squared errors – that part of the total variation that is unexplained
Coefficient of determination: R2
The coefficient of determination measures the proportion of the variation in the dependent variable
that is explained by the regression model:
SST
SSE
SST
SSR
R 

 1
2
0 < R2
< 1
If R2
=1 the data fall exactly on the fitted least squares regression line and we have a perfect fit. If the
sample data for y and x are uncorrelated and show no linear association, then the least squares fitted
line is “horizontal” so SSR = 0 and R2
= 0.
For a simple regression model, R2
can also be computed as the square of the correlation coefficient
between yi and i
ŷ .
 R2
= 1: All the sample data falls exactly on the fitted least squares line, SSE = 0
 R2
= 0: The sample data for y and x are uncorrelated, the least squares fitted line is horizontal
and equal to the mean of y so that SSR = 0
Note:
1. R2 is a descriptive measure
2. By itself, it does NOT measure the quality of the regression model
3. It is NOT the objective of regression analysis to find the model with the highest R2
4. By adding more variables R2
will automatically increase even if the variables have no
economic justification this is why we use adjusted R2 in multiple regression analysis(will
expand on this when we study multiple regression):
i
ŷ
x i
x
y
SST = y
yi 
SSE i
i
i y
y
e ˆ
ˆ 
 = unexplained
SSR y
yi 
ˆ = explained component
lOMoARcPSD|2941205

)
1
/(
)
/(
1
2




N
SSR
K
N
SSE
R
Example:
For the same data as before:
The Effects of Scaling the Data:
Data we obtain is not always in a convenient form for presentation in a table or use in a regression
analysis. When the scale of the data is not convenient, it can be altered without changing any of the
real underlying relationships between variables.
If we scale x by 1/c:
e
c
x
c
y
e
x
y







)
/
(
)
( 2
1
2
1




If we scale y by 1/c:
c
e
x
c
c
c
y
e
x
y
/
)
(
)
/
(
/
/ 2
1
2
1











Example if we now report income in $100 units.
Because b2 = 10.21 and x = 200, After scaling: b2 = 0.1021 and x = 2
This has no change in the model.
Choosing a Functional Form:
So far we have assumed that the mean household food expenditure is a linear function of household
income. That is, we assumed the underlying economic relationship to be x
y
E 2
1
)
( 
 
 , which
implies that there is a linear, straight-line relationship between E(y) and x.
When the scale of x is altered, the standard error of the
regression coefficient changes by the same multiplicative
factor as the coefficient, so that their ratio, the t-statistic, is
unaffected. All other regression statistics are unchanged.
Because the error term is scaled in this process the least
squares residuals will also be scaled. This will affect the
standard errors of the regression coefficients, but will not
affect t statistics or R2
.
lOMoARcPSD|2941205

In the real world this might not be the case, and this was only assumed to make the analysis easier.
The starting point in all econometric analysis is economic theory. What does economics really say
about the relation between food expenditure and income, holding all else constant? We expect there
to be a positive relationship between these variables because food is a normal good. But nothing says
the relationship must be a straight line. In fact we do not expect that as household income rises that
food expenditures will continue to rise indefinitely at the same constant rate. Instead, as income rises,
we expect food expenditures to rise but at a decrease rate – law of diminishing returns.
The term linear in “linear regression model”
1. Does not mean a linear relationship between the economic variables.
2. Does mean that the model is “linear in the parameters” (eg. βk values – must not be raised to
powers or multiplied by other parameters etc.) but is not, necessarily, “linear in the variables”
(eg. X can be x2
x3
etc etc.)
Linear in parameters: the parameters are not multiplied together, divided, squared, cubed etc.
k
k x
x
x
f 

 


 ...
)
( 1
1
0
1. each explanatory variable in the function is multiplied by an unknown parameter,
2. there is at most one unknown parameter with no corresponding explanatory variable, and
3. all of the individual terms are summed to produce the final function value.
An example of a non-linear in parameter model is:
x
x
f 1
0
0
)
( 

 
 or 1
0
)
( 
 x
x
f 

This is non-linear because the slope of this line is expressed as a product of two parameters.
As a result, nonlinear least squares regression must be used to fit this model, but linear least
squares cannot be used.
Because of this fact, the simple linear regression model is much more flexible than it appears at first
glance. By transforming the variables y and x, we can represent many curved, nonlinear
relationships and still use the linear regression model. Choosing an algebraic form for the
relationship means choosing transformations of the original variables.
The slopes of which can be determined by taking the derivatives of the function.
Note: the most important implication of transforming variables is that the regression result
interpretations change. Both the slope and elasticity change from the linear relationship case.
Some common function types are:
lOMoARcPSD|2941205

A Practical Approach:
1. Plotting the data and choosing economically-plausible models
2. Testing hypotheses concerning the parameters
3. Performing residual analysis
4. Assessing forecasting performance
5. Measuring goodness-of fit (R2
)
6. Using the principle of parsimony – simplest model
Example on Food Expenditure:
1. Plotting data
lOMoARcPSD|2941205

2. Testing hypothesise:
All slope coefficients are significantly different from zero at the 5% level of significance.
3. Performing residual analysis: Testing for normally distributed Errors
The k-th moment (from physics) of the random variable e is:
k
k e
E )
( 
 

Where μ denotes the mean of e. Measures of spread, symmetry and “peakedness” are:
Variance: 2
2 
 

V
Skewness: 3
3 / 


S
Kurtosis: 4
4 / 


K - Whether the tails are thicker or thinner than expected
If e is normally distributed then S = 0 and K = 3. Formalising this, is the Jarque-Bera Test:
The Jarque-Bera test is a test of how far measurers of residual skewness and kurtosis are from 0 and
3 (normality).
To test the null hypothesis of normality of the errors, we use the test statistic:







 


4
)
3
(
6
2
2 K
S
N
JB
Where:
lOMoARcPSD|2941205

N = sample size
S = skewness
K = Kurtosis
When the null hypothesis is true, the Jarque-Bera statistic, JB has a χ2
distribution with 2 df.
Step 1: State the hypothesis:
H0: the errors are normally distributed
H1:the errors are not normally distributed
Step 2: Decision rule:
Rejet H0 if JB > χ2
(0.95,2) = 5.991
Step 3: Calculate test statistic:
063
.
0
4
)
3
99
.
2
(
)
097
.
0
(
6
40
4
)
3
(
6
2
2
2
2








 










 


K
S
N
JB
0.063 < 5.991 therefore do not reject H0.
Step 5: conclusion
There is insufficient evidence to conclude that the errors are not normally distributed at the
5% level of significance.
lOMoARcPSD|2941205

5. Measuring goodness-of-fit: With different dependent variables:
Goodness of fit with different dependent variables:
The R2 from a linear model, measures how well the linear model explains the variation in y, while
the R2 from a log-linear model measures how well that model explains the variation in ln(y). The
two measures should NOT be compared.
To compare goodness-of-fit in models with different dependent variables, we can compute the
generalised R2
.
  2
ˆ
,
2
2
)
ˆ
,
( y
y
g r
y
y
corr
R 

We can’t compare R2
as each has
different dependent variable.
lOMoARcPSD|2941205

6. Using the principle of parsimony – Use the simplest model
The principle of parsimony states that you should use the simplest model if two models
appear to be of equal forecasting ability.
lOMoARcPSD|2941205

Multiple Regression A:
The simple regression model we have studied so far relates the dependent variable y to only ONE
explanatory variable x.
When we turn an economics model with more than one explanatory variable into its corresponding statistical
model, we refer to it as a multiple regression model.
Changes and Extensions from the simple regression model:
1. Interpretation of the β parameters:
The population regression line is:
ik
k
i
iK
i x
x
x
x
yi
E 

 


 ...
)
,...,
|
( 2
2
1
2
The k-th slope coefficient measures the effect of a change in the variable xk, upon the expected value
of y, all other variables held constant. Mathematically:
ik
ik
i
i
sconstnat
allotherx
iK
i
x
x
x
y
E
xiK
x
x
yi
E




 )
,...,
|
(
)
,...,
|
( 2
'
2
Note: the x’s start at 2 as 1 refers to the intercept term (which has no slope).
2. The assumption concerning the characteristics of the explanatory (x) variables
The assumptions of the multiple regression model are:
MR1: i
K
i
K
i
i e
x
x
y 



 

 ...
2
2
1 , where: i = 1,..., N
- The model is linear in parameters but may be non-linear in the variables
MR2: 0
)
E(e
:
with
synonomous
is
which
...
)
( i
2
2
1 




 K
i
K
i
i x
x
y
E 


- The expected (average) value of yi depends on the values of the explanatory variables and
the unknown parameters.
MR3: 2
)
var(
)
var( 

 i
i e
y - the error terms are homoskedastic (have constant variance)
MR4: cov(yi,yj) = cov(ei,ej) = 0 – There is no serial correlation
lOMoARcPSD|2941205

MR5: The values of each xik are not random and are not exact linear functions of the other
explanatory variables
MR6: (optional) )
,
0
(
~
]
),
...
[(
~ 2
2
2
2
1 



 N
e
x
x
N
y i
iK
K
i
i 



3. The degrees of freedom for the t-distribution
We will go into further detail of this further in the summary.
Least Squares Estimation:
The fitted regression line for the multiple regression model is:
iK
k
i
i x
b
x
b
b
y 


 ...
ˆ 2
2
1
The least squares residual is:
iK
k
i
i
i
i x
b
x
b
b
yi
y
y
e 





 ...
ˆ
ˆ 2
2
1
Similarly to the simple linear regression, the unknown parameters β1,...,βK are obtained by
minimising the residual sum of squares:
   


 









N
i
iK
k
i
N
i
i
i
N
i
i x
b
x
b
b
yi
y
y
e
1
2
2
2
1
1
2
1
...
ˆ
ˆ
Solving the first-order conditions for a minimum yields messy expressions for the ordinary least
squares estimators, even when K is small.
For example when K = 3 the OLS method gives:
In practise we use matrix algebra to solve these systems:
lOMoARcPSD|2941205

To understand graphically what a multiple regression model embodies look at the image below:
The equation forms a surface or plane which describes the position of the variable.
Example:
lOMoARcPSD|2941205

The model is given by:
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Interpretation of the coefficients:
b2: The number of sales is expected to fall by $7907 units when the price increases by $1 holding
the amount of advertising constant.
b1: The number of sales is expected to increase by $1863 units when the advertising increase by $1
holding the price constant.
Properties Of The OLS Estimators: (OLS = Ordinary Least Squares)
The Gauss-Markov Theorem says that:
If MR1 to MR5 are correct, the OLS estimators b1,...,bK have the smallest variance of all linear and
unbiased estimators of β1, ...,βK – they are the Best Linear and Unbiased Estimators (BLUE).
Remember that the Gauss-Markov theorem does not depend on the assumption of normality (MR6).
However, if MR6 does hold, then the OLS estimators are also normally distributed.
Again with larger values of K, the formula’s for variances of the OLS estimators are messy. For
example, when K = 3, we can show that:
Where r23 is the sample correlation coefficient between x2 and x3. -1 < r < 1
lOMoARcPSD|2941205

The variances and covariances are often presented in the form of a covariance matrix. For K = 3, this matrix
takes the form:
In practise however, σ2
, the population variance is unknown. So instead we use an unbiased estimator of the
error variance:
K
N
e
K
N
y
y
N
i
i
N
i
i
i






 
 1
2
2
1
2
ˆ
)
ˆ
(
̂
The estimated variances and covariances of the OLS estimators are obtained by replacing within the
appropriate formulas. The square roots of the estimated variances are still known as standard errors.
It is important to understand the factors affecting the variance of bi (i = 2,...,K):
Inferences in the Multiple Regression Model:
If the assumptions MR1 – MR6 hold, we can:
1. The larger σ2
the larger the variance of the least squares estimators.
2. The larger the sample size the smaller the variances
3. More variation in an explanatory variable around its mean, leads to a smaller variance of the
least squares estimator.
4. The larger the correlation between the explanatory variables, the larger is the variance of the
least squares estimators. “Independent” variables ideally exhibit variation that is “independent”
of the variation in other explanatory variables.
5. Variation is one explanatory variable connected to variation in another explanatory variable is
knonw as Multicoliniarity (see next week). E.g. A larger correlation between x2 and x3 leads to a
larger variance of b2.
lOMoARcPSD|2941205

1. Construct confidence intervals for each of the K parameters
2. Conduct a significance test for each of the K parameters
3. Conduct a hypothesis test on any of the parameters or combinations of parameters
The approach is that followed for the simple regression model in weeks 2 and 3 for the parameters of the
simple regression model.
1. Confidence interval:
A 100(1-α)% confidence interval for βk is given by:
)
( k
crit
k
k b
se
t
b 

 for k = 1, ..., K
Where:
K = Number of βi parameters . e.g. 3
3
2
2
1
ˆ i
i
i x
b
x
b
b
y 

 : K = 3
)
,
2
/
1
( K
N
t
tcrit 

 
Se = the standard error given in the regression output of the bi estimate
Example: construct a 95% confidence interval for the coefficient of advertising for the following
model which was based on N = 75 observations on hamburger sales.
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Solution:
)
683
.
0
(
993
.
1
863
.
1
)
(
)
(
)
(
3
3
)
72
,
975
.
0
(
3
3
3
)
3
75
,
2
/
05
.
0
1
(
3
3
)
,
2
/
1
(















 
b
se
t
b
b
se
t
b
b
se
t
b k
K
N
k
k
2. Hypotheses Testing
2.1.A simple null hypothesis is a null hypothesis with a single restriction on one or more parameters.
Under MR1 to MR6, we can test the null hypothesis H0: βk = c using the t-statistic:
)
(
~
)
(
K
N
k
k
t
b
se
c
b
t 


Even if MR6 doesn’t hold, the test is still valid provided the sample size is large.
Example: Test whether revenue is related to price at the 5% level of significance when N = 75.
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Solution:
224
.
3
502
.
0 3 
 
We can say with 95% confidence that the true
change in sales for a one dollar increase in
advertising is between $502 and $3224.
lOMoARcPSD|2941205

Step 1: State Hypotheses
0
:
0
:
2
1
2
0




H
H
Reject H0 if |tcalc| > tcrit
Step 3: Calculate Test Statistic
215
.
7
096
.
1
0
908
.
7
)
(
)
( 2
2
2









b
se
b
b
se
b
t
k
k
k
calc


Step 4: Compare and Decision
|-7.215| > 1.993 therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that the price does not have
no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as
effect on revenue.
2.2.Testing of a null hypothesis consisting of two or more hypotheses about the parameters in the
multiple regression model.
F- Tests
Used in:
1. Overall significance of the Model
2. Testing economic hypotheses involving more than one parameter in the model
3. Misspecification Tests
4. Testing for Heteroskedasticity
5. Testing for Serial correlation
Note: We adopt assumptions MR1-MR6 (i.e. including normality). If the errors are not normal, then
the results presented will hold approximately if the sample is large.
993
.
1
|
|
|
|
|
|
)
72
,
975
.
0
(
)
,
2
/
1
(


 

tcalc
t
tcalc
t
tcalc K
N

lOMoARcPSD|2941205

A Familiar Form of the F-test:
From ECON1320 we saw that we could express F as:
)
/(
)
1
/(
)
(
)
/(
)
(
)
1
/(
)
(
K
N
SSE
K
SSE
SST
K
N
SSE
K
SSR
F







However, this is just a particular example of a more general F-statistic that can be used to test
sets of joint hypotheses.
The general F-test:
A joint null hypothesis is a null hypothesis with two or more restrictions on two or more parameters.
Under MR1 to MR6, we can test a joint null hypothesis concerning the using the F statistic:
K
N
J
U
U
R
F
K
N
SSE
J
SSE
SSE
F 


 ,
~
)
/(
)
(
/
)
(
Where:
J = the number of restrictions in H0
SSEU = The unrestricted sum of squared errors from the original, unrestricted multiple regression
Model.
SSER = The restricted sum of squared errors from a regression model in which the null hypothesis
is assumed to be true
Note: Even if MR6 doesn’t hold, the test is still valid provided the sample size is large (by the central
limit theorem)
The General F-test can be used to test 3 types of hypotheses:
1. When used to test H0: βk=0 against H1: βk ≠ 0; the F-test is equivalent to a t-test
J = 1
2. When used to test: H0: β2= β3= ...= βk against H1: At least one βk ≠ 0
J = K - 1
3. The F-test can also be used to test whether some combination of parameters is
collectively significant to the model
K
J 

1
Restrictions:
When we have a restriction, we assume that the null hypothesis is true, for example if the null hypotheses is
0, then we assume that the βk value in the null hypotheses is 0 in the regression equation. Instead of using
the least squares estimates that minimise the sum of squared errors, we find estimates that minimise the sum
of squared errors subject to parameter constraints – restrictions. This means that the sum of squared errors
will increase; a constrained minimum is larger than an unconstrained minimum.
The theory behind the F-test, is that if the Errors are significantly different, then the assumption that the
parameter is the value assumed in the null hypothesis has significantly reduced the ability of the model to fit
the data, and thus the data do not support the null hypothesis. On the other hand, if the null hypothesis is
true, we expect that the data are compatible with the conditions placed on the parameters – we would expect
little change in the sum of squared errors when the null hypothesis is true.
lOMoARcPSD|2941205

1. Testing with 1 restriction (J=1)
Example: Test whether revenue is related to price at the 5% level of significance when N = 75.
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Solution:
Step 1: State Hypotheses & apply restriction
0
:
0
:
2
1
2
0




H
H
Now, impose the restriction assuming the null is correct, ie. Price is not significant and β2 is 0 and
then find the regression equation.
(0.890)
(1.80)
)
(
)
(
733
.
1
180
.
74
ˆ
se
ADVERT
S 

Reject H0 if Fcalc > Fcrit
Step 3: Calculate Test Statistic
06
.
52
)
3
75
/(
)
943
.
1718
(
1
/
)
943
.
1718
827
.
2961
(
)
/(
)
(
/
)
(







K
N
SSE
J
SSE
SSE
F
U
U
R
Step 4: Compare and Decision
52.06 > 3.97 therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that the price does not have
no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as
effect on revenue.
The t-test and F-test - a relationship:
When conducting a two-tail test for a single parameter, either a t-test or an F-test can be used and the
outcomes will be identical.
In fact, the square of a t random variable with df degrees of freedom is an F random variable with
distribution F(1,df)
F-statistic = (t-statistic)2
F-crit = (t-crit)2
52.06 = (-7.215)2
3.97 = (1.993)2
97
.
3
)
3
75
,
1
,
95
.
0
(
)
,
,
1
(






Fcalc
t
Fcalc
F
Fcalc K
N
J

lOMoARcPSD|2941205

2. Testing with J = K-1 restrictions: the overall significance of the model
An important application of the F-test is for what is called “Testing the overall significance of a model”.
Consider the general multiple regression model with (K - 1) explanatory variables and K unknown
coefficients.
Unrestricted model: i
K
K
i
i
i
i e
x
x
x
y 




 


 ...
3
3
2
2
1
To examine whether we have a viable explanatory model, we set up the following null and alternative
hypotheses.
Restricted model: i
i e
y 
 1

Therefore:
SSER = SSTU SSEU = SSEU
Step 1: State Hypotheses and calculate restricted model
nonzero
is
the
of
one
least
at
:
0
,...,
0
,
0
:
k
1
3
2
0




H
H K 


Estimate restricted model:
)
749
.
0
(
)
(
375
.
77
ˆ
se
S 
SSER = 3115.482 (=SSTU)
Step 2: Decision rule
248
.
29
)
3
75
/(
)
943
.
1718
(
2
/
)
943
.
1718
482
.
3115
(
)
/(
)
(
/
)
(







K
N
SSE
J
SSE
SSE
F
U
U
R
29.248 > 3.12 Therefore reject H0.
Step 5: Conclusion
There is sufficient evidence at the 5 % level of significance to conclude that at least one of the explanatory
variables as an effect on sales.
3. Testing a Group of parameters (1 ≤ J < K)
Consider the model:
Note: the null has K – 1 hypotheses, it is referred to
as a Joint hypothesis.
12
.
3
)
3
75
,
1
3
,
95
.
0
(
)
,
,
1
(







Fcalc
F
Fcalc
F
Fcalc K
N
J

lOMoARcPSD|2941205

Does advertising have an effect on sales?
Step 1: State Hypotheses
nonzero
are
both
or
0
0
:
0
0
:
4
3
1
4
3
0








H
H
Step 2: Decision rule
44
.
8
)
4
75
/(
)
084
.
1532
(
2
/
)
084
.
1532
391
.
1896
(
)
/(
)
(
/
)
(







K
N
SSE
J
SSE
SSE
F
U
U
R
8.44 > 3.126
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that advertising has a
statistically significant effect on sales.
126
.
3
)
4
75
,
2
,
95
.
0
(
)
,
,
1
(






Fcalc
F
Fcalc
F
Fcalc K
N
J

lOMoARcPSD|2941205

Prediction:
The value of y when the explanatory variables take the values x02.
The prediction error (or forecast error) is 0
0
ˆ y
y
f 
 . The prediction error is a random variable with a
mean and a variance. If assumptions MR1 to MR5 hold then 0
)
ˆ
(
)
( 0
0 

 y
y
E
f
E and
)
ˆ
var(
)
var( 0
0 y
y
f 
 with many terms, each involving σ2
. The prediction error variance is estimated by
replacing σ2
by 2
ˆ
 . The square root of the estimated forecast error variance is still called the standard error
of the forecast. If assumption MR6 (normality) is correct, or the sample size is large, then a 100(1-α)%
confidence interval or prediction interval for y0 is:
)
(
ˆ
)
(
ˆ
)
,
2
/
1
(
0
0
0
0
f
se
t
y
y
f
se
t
y
y
K
N
c







Example: Construct a 95% confidence interval for the prediction of y0 when P = 5.50 and A = 1200
Solution:
2
1
)
3
75
,
2
/
05
.
0
1
(
0
)
,
2
/
1
(
0
0
0
0
ˆ
*)
var(
))
1200
(
863
.
1
)
50
.
5
(
91
.
7
91
.
118
(
)
(
ˆ
)
(
ˆ















b
t
y
f
se
t
y
y
f
se
t
y
y
K
N
c
Therefore create two new variables:
P * = (P – 5.50) and A* = (A – 1200)
lOMoARcPSD|2941205

9429
.
4
993
.
1
66
.
77
0 


y
5112
.
87
809
.
67 0 
 y
We can therefore say with 95% confidence that the true value of sales when the price is $5.50 and the
advertising expenditure is $1200, that the true value of sales lies between 67.8 thousand and
87.5112thousand.
A reminder:
Estimated regression models describe the relationship between the economic variables for values similar to
those found in the sample data. Extrapolating the results to extreme values is generally not a good idea.
Predicting the value of dependent variables for values of the explanatory variables far from the sample
values invites disaster.
Goodness of Fit:
If the regression model constrains an intercept, we can still decompose the variation in the dependent
variable (SST) into its explainable and unexplainable components (SSR and SSE). Then the coefficient of
determination still measurers the proportion of the variation in the dependent variable that is explained by
the regression model:
SST
SSE
SST
SSR
R 

 1
2
The interpretation of R2 is identical to its interpretation in the simple regression model. i.e. R2
% of variation
can be explained by the estimated equation. (1 implies a perfect fit)
Adjusted R2:
A problem with R2
is that it can be made large by adding more and more variables to the model, even when
they have no economic justification. The adjusted R-squared imposes a penalty for adding more variables:










)
1
/(
)
/(
1
2
N
SST
K
N
SSE
R
Adjusted R-squared does not give the proportion of variation in the dependent variable that is explained by
the model. It should not be used as a criterion for adding or deleting variables (if we add a variable, adjusted
R-Squared will increase if the t-statistic on the new variable is greater than 1 in absolute value!)
lOMoARcPSD|2941205

SST = (N-1) (S.D. dependent variable)2
lOMoARcPSD|2941205

Multiple Regression B:
Non-sample information:
In many estimation problems, economic theory and experience provides us with information on the
parameters that is over and above the information contained in the sample data. If this non-sample
information is correct, and if we can combine it with the sample information, then we can estimate the
parameters with greater precision.
Some non-sample information can be written in the form of linear equality restrictions on the unknown
parameters. (e.g. several parameters sum to one). We can incorporate this information into the estimation
process by simply substituting the restrictions into the model.
One example is when dealing with a firm which has constant returns to scale – take for example the cobb-
dougles function whose parameters α and β must sum to 1 with constant returns to scale:


t
t
t
t L
K
A
y 
We can show: when K and L increase by proportion λ, that this has the effect λ on y also with constant
returns to scale.

























1
)
(
)
(
t
t
t
t
t
t
t
t
L
K
A
y
L
K
A
y
In order to incorporate the non-sample information, and impose constant returns to scale we should then
estimate the following model:

 
 1
t
t
t
t L
K
A
y
The model is now the function of a single unknown parameter α.
A technique to obtain an estimate of α in this case is known as restricted least squares - we “force” β = 1 – α
To estimate the above model in practise, we can use the least squares method – as the model is linear in its
parameters. We would convert the model to a log-log function as the model,
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of
assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been
superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the
copyright owner. We are making these materials available for the purposes of research and study and as such believe that this
constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
lOMoARcPSD|2941205

t
t
t
t
t e
L
K
A
y 



 )
ln(
)
1
(
)
ln(
)
ln(
)
ln( 

To insure the restriction holds we re-arrange and collect terms:
t
t
t
t
t e
L
K
A
L
y 

 )
/
ln(
)
ln(
)
/
ln( 
The restricted Least Squares Estimator:
The least squares estimates we obtain after imposing the restrictions are known as restricted least squares
(RLS) estimates.
The RLS estimator:
 Is biased unless the restrictions are EXACTLY true
 Has a smaller variance than the OLS (ordinary least squares) estimator, whether or not the
restrictions are true
By incorporating the additional information with the data, we usually give up unbiasedness in return for
reduced variances. Evidence on whether the restrictions are true can, of course, be obtained using an F-test
(Wald test).
Model Specification:
There are several key questions you should ask yourself when specifying a model:
Q1. What are the important considerations when choosing a model?
A1. The problem, the economic model
Q2. What are the consequences of choosing the wrong model?
A2. If the wrong model is used, there can be omitted and irrelevant variables in the model
Q3. Are there ways of assessing whether a model is adequate?
A3. Yes you can use model Diagnostics – A test of adequate functional form
In examining these model specifications we will look at the following example:
lOMoARcPSD|2941205

Omitted variables:
It is possible that a chosen model may have important variables omitted. Our economic principles may have
overlooked a variable, or lack of data may lead us to drop a variable even when it is prescribed by economic
theory.
We will consider a sample of married couples where both husbands and wives work. This sample was used
by labour economist Tom Mroz in a classic paper on female labour force participation. The variables from
this sample are in edu_inc.dat.
We are interested in the impact of level of education, both the husband’s education and the wife’s education,
on family income. Summary statistics for the data appear in table 6.2. The estimated relationship is:
We estimate that an additional year of education for the husband will increase annual income by $3132, and
an additional year of education for the wife will increase income by $4523. If we now incorrectly omit
wife’s education from the equation:
FAMINC = the combined income of husband and wife
If we omit a relevant variable, then the least squares estimator will generally be biased, although it will
have lower variance.
Including irrelevant variables does not cause least squares method to be biased – however variance and
therefore standard errors will be greater.
lOMoARcPSD|2941205

When we omit WEDU it leads us to overstate the effect of an extra year of education for they husband by
about $2000. This change in magnitude of a coefficient is typical of the effect of incorrectly omitting a
relevant variable.
To right a general expression for this bias for the case where one explanatory variable is omitted froma
model with two explanatory variables we write the underlying model as:
i
i
i
i e
x
x
y 


 3
3
2
2
1 


Omitting x3 from the equation is equivalent to imposing the restriction β3 = 0. It can be viewed as imposing
an incorrect constraint on the parameters. This of course has the implication of a reduced variance, but
causes biased coefficient estimators. We can show (in appendix 6B) that the new estimate b2* of β3 is:








)
var(
)
,
cov(
*)
(
*)
(
2
3
2
3
2
2
2
x
x
x
b
E
b
bias 
 


We can include further variables for instance, KL6 – the number of children under the age of 6. The larger
the number of young children, the fewer the number of hours likely to be worked and hence a lower family
income would be expected.
(0.004)
(0.000)
(0.000)
(0.488)
value)
-
(p
(5004)
(1061)
(796)
)
11163
(
)
(
6
14311
4777
3211
7755







se
i
KL
WEDUi
HEDUi
FAMINC





Notice that compared to the original estimated equation that the coefficients haven’t changed considerably
for HEDU and WEDU.
This outcome occurs because KL6 is not highly correlated with the education variables. From a general
modelling perspective, it means that useful results can still be obtained when a relevant variable is
omitted if that variable is uncorrelated with the included variables and our interest is on the coefficients
of the included variables.
Irrelevant Variables:
The consequences of omitting relevant variables may lead you to think that a good strategy is to include as
many variables as possible in your model. However this will:
Omission of a relevant variable leads to omitted variable bias. The bias increases with the correlation
between the included and omitted relevant variable.
Note: if cov(xi,xj) = 0 or if β3 = 0; then
bias will be 0. i.e. will be unbiased.
Corr(KL6, HEDU) = 0.105
Corr(KL6, WEDU) = 0.129
lOMoARcPSD|2941205

1. Complicate your model
2. Inflate the variances of your estimates
To examine this, we will add two artificially generated variables X5 and X6. These variables were
constructed so that they are correlated with HEDU and WEDU, but are not expected to influence family
income.
(0.591)
(0.692)
(0.004)
(0.000)
(0.000)
(0.488)
value)
-
(p
(1982)
(2242)
(50044)
(2278)
(1250)
)
11195
(
)
(
1067
889
6
14200
5869
3340
7759 6
5









se
X
X
KL
WEDUi
HEDUi
FAMINC i
i
i





The first thing that we notice is that the p-values for the two new coefficients are much greater than 0.05.
They do indeed appear to be irrelevant variables. Also, the standard errors of the coefficients for all other
variables have increased, with p-values increasing correspondingly. The inclusion of these irrelevant
variables has reduced the precision of the estimated coefficients for other variables in the equation.
The result follows because, by the Gauss-Markov theorem, the least squares estimator of the correct model
is the minimum variance linear unbiased estimator.
A Practical Approach:
We should choose a functional form that:
1. Is consistent with what economic theory tells us about the relationship between the variables
2. Is compatible with assumptions MR1 to MR5
3. Is flexible enough to fit the data
In a multiple regression context, this mainly involves:
1. Hypothesis testing
2. Performing residual analysis
4. Comparing information criteria
5. Using the principle of parsimony
Hypothesis Testing:
The usual t- and F-tests are available for testing simple and jpoint hypotheses concerning the coefficients.
As usual, failure to reject a null hypothesis can occur because the data are not sufficiently rich to disprove
the hypothesis. If a variable has an insignificant coefficient, it can either be (a) discarded because it is
irrelevant, or (b) retained because there are strong theoretical reasons for including it.
The adequacy of a model can also be tested using a general specification test known as RESET.
lOMoARcPSD|2941205

Testing for Model Misspecification: RESET
RESET (Regression Specification Error Test) is designed to detect omitted variables and incorrect
functional form.
Intuition:
Hypotheses:
H0: The functional form is correct, no omitted variables (extra terms are statistically not significantly)
H1: The functional form is incorrect, or/and there are omitted variables (extra terms are statistically
significant)
Suppose that we have specified and estimated the regression model:
i
i
i
i e
x
x
y 


 3
3
2
2
1 


The predicted of “fitted” values of yt:
3
3
2
2
1
ˆ i
i
i x
b
x
b
b
y 


There are two alternative forms for the test:
Artificial Model 1: i
i
i
i
i e
y
x
x
y 




2
1
3
3
2
2
1 ˆ




Artificial Model 2: i
i
i
i
i
i e
y
y
x
x
y 





3
2
2
1
3
3
2
2
1 ˆ
ˆ 




Example: FAMINC model:
Step 1: State hypothesis
H0: γ = 0
H1: γ ≠ 0
Reject H0 if p-value < α = 0.05
Ramsay RESET Test:
If the chosen model and algebraic form are correct, then squared and cubed terms of the “fitted or
predicted” values should not contain any explanatory power.
If we can significantly improve the model by artificially including powers of the predictions of the model,
then the original model must have been inadequate
lOMoARcPSD|2941205

F-calc = 0.0440
Step 4: Compare
0.0440 < 0.05 Therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that there are omitted
variables or the functional form is incorrect.
Selection of Models – Information Criteria
Akaike Information Criterion (AIC):
 Is often used in model selection for non-nested alternatives – smaller values of the AIC are preferred
N
K
N
SSE
AIC
2
ln 







The Schwarz Criterion (SC):
 Is an alternative to the AIC that imposes a larger penalty for additional coefficients
lOMoARcPSD|2941205

N
N
K
N
SSE
SC
)
ln(
ln 







Adjusted R2
:
 Penalizes for the addition of regressors which do not contribute to the explanatory power of the
model. It is sometimes used to select regressors, although the AIC and SC are superior. It does not
have the interpretation of R2
.
lOMoARcPSD|2941205

Collinear Economic Variables:
When data are the result of an uncontrolled experiment many of the economic variables may move together
in systematic ways.
Such variables are said to be collinear, and the problem is labelled collinearity, or multicollinearity when
several variables are involved.
Co-linearity: Moving together in a linear way
When there is collinarity, there is no guarantee that the data will be “rich in information” nor that it will be
possible to isolate the economic relationship or parameters of interest.
Consequences of collinearity:
1. One or more exact linear relationships amount the explanatory variables: exact collinearity, or exact
multicollinarity. Least squares estimator is not defined.
Multicollinearity calculation:
    y
x
x
x
b
T
T 1


From linear algebra, we know that a matrix whose rows and columns are not linearly independent
does not have an inverse, so matrix b – the multicollinearity cannot be calculated.
2. Nearly exact linear dependencies among the explanatory variables: some of the variances, standard
errors and covariances of the least squares estimators may be large.
 
 
 

 2
2
2
2
23
2
2
)
1
(
)
var(
x
x
r
b
i

For perfect collinearity:
r23 = -1 or 1 therefore (1-r23
2
) = 0
3. Large standard errors make the usual t-values small and lead to the conclusion that parameter
estimates are not significantly different from 0, ALTHOUGH high R2 or F-values indicate
“significant” explanatory power of the model as a whole.
smallvalue
b
se
b
tcalc
i
i
i



)
(

In general: Reject H0 (βi =0) if |tcalc| > |tcrit| therefore would conclude that Bi is 0.
4. Estimates may be very sensitive to the addition or deletion of a few observations, or the deletion of
an apparently insignificant variable.
5. Despite the difficulties in isolating the effects of individual variables from such a sample, accurate
forecasts may still be possible.
For Near Perfect collinearity:
r23 ≈ -1 or 1 therefore (1-r23
2
) ≈ 0
lOMoARcPSD|2941205

Example – Chinese Coal Production
We can detect multicollinearity by:
 Computing sample correlation coefficients between variables. A common rule
of thumb is that multicollinearity is a problem if the sample correlation between Only look at pairs
any pair of variables is greater than 0.8 or 0.9.
 Estimate auxiliary regressions (i.e. regress each explanatory variable on all the
others.) Multicollinearity is usually considered a problem if the R2
from an
auxiliary regression is greater than about 0.8.
Looks at
combinations of
variables.
Eg. x2 = 2x3 + 5x4
lOMoARcPSD|2941205

Pair-wise Correlations:
Conclusion:
The pair wise correlation between some of the inputs is extremely high, such as between ln(x3) and ln(x2)
and ln(x3).
Auxillary regression on ln(x3):
Solution:
A possible solution in this case is to use non-sample information:
1. Constant returns to scale
2. Variables 4,5 & 6 all are statistically insignificant (=0)
Conduct a Wald Test:















0
0
0
1
:
6
5
4
7
2
0




i
i
H
lOMoARcPSD|2941205

Mitigating the Effects of Multicollinearity:
The collinearity problem occurs because the data do not contain enough information about the effects of the
individual explanatory variables. We can include more information into the estimation process by:
 Obtaining more, and better data – not always possible in non-experimental contexts
 Introducing non-sample information into the estimation process in the form of restrictions on the
parameters.
Nonlinear Relationships:
Relationships between economic variables cannot always be adequately represented by straight lines. We
saw in Week 4 that we can add more flexibility to a regression model by considering logarithmic, reciprocal,
polynomial and various other nonlinear-in-the-variables functional forms.
 Linear in parameters, non-linear in variables
We can also use these types of functional forms in multiple regression models. In multiple regression
models, we also use models that involve interaction terms. When using these types of models some changes
in model interpretation are required.
lOMoARcPSD|2941205

Example:
lOMoARcPSD|2941205

lOMoARcPSD|2941205

Introductory Econometrics: ECON2300 – Dummy Variable Models
The Use of Dummy Variables in Econometric Models:
Assumption MR1 in the multiple Regresison model is:
i
iK
K
i
i e
x
x
y 



 

 ...
2
2
1 for i = 1, ..., N
1. The statistical model we assume is appropriate for all N observations in our sample
2. The parameters of the model, βk, are the same for each and every observation
3. If this assumption does not hold, and if the parameters are not the same for all the observations, then
the meaning of the least squares estimates of the parameters is not clear
There are some economic problems or questions where we might expect the parameters to be different for
different observations:
1. Everything else the same, is there a difference between male and female earnings?
2. Does studying econometrics make a difference in starting salaries of graduates?
3. Does having a pool make a difference in a house’s sale price in the Brisbane market?
4. Is there a difference in the demand for illicit drugs across race groups?
Dummy variables:
1. The simplest procedures for extending the multiple regression model to situations in which the
regression parameters are different for some or all of the observations in a sample
2. Dummy variables are explanatory variables that only take two values usually 0 and 1
3. These simple variables are a very powerful tool for capturing qualitative characteristics of
individuals, such as gender, race and geographic region of residence.
There are two main types of dummy variables:
1. Intercept Dummy Variables: parameter (coefficients) denoted - δ
2. Slope Dummy variables: parameter (coefficients) denoted – γ
Intercept Dummy Variables:
Intercept dummy variables allow the intercept to change for a subset of observations in the sample. Models
with intercept dummy variables take the form:
i
iK
K
i
i
i e
x
x
D
y 




 


 ...
2
2
1
where Di = 1 if the i-th observation has a certain characteristic and Di = 0 otherwise:
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
lOMoARcPSD|2941205




















intercept
:
0
if
...
)
0
(
intercept
:
1
if
...
)
1
(
)
(
1
2
2
1
1
2
2
1











i
i
iK
K
i
i
i
iK
K
i
D
e
x
x
D
e
x
x
y
E
Note that the least squares estimator properties are not affected by the fact that one of the explanatory
variables consists only of zeros and ones – D is treated as any other explanatory variable. We can construct
an interval estimate for δ, or we can test the significance of its least squares estimate. Such a test is a
statistical test of whether the effect is “statistically significant”. If δ = 0, the variable has no effect on the
variable in question.
Example: House prices
A model that allows the intercept to vary with the presence or absence of a particular characteristic
Estimated equation:
Sqft
Pool 60
.
8
69
.
5
68
.
29
e
ĉ
Pri 


In this model the value of Pool = 0 defines the
reference group (homes with no pool). Two
equivalent model would be:
ˆ
lOMoARcPSD|2941205

Log-Linear Models:
If e
SQFT
POOL
PRICE pool 


 2
1
)
ln( 


If e
SQFT
PRICE nopool 

 2
1
)
ln( 


 

 )
1
(
)
ln(
)
ln( nopool
pool PRICE
PRICE
Then: 













nopool
pool
nopool
pool
PRICE
PRICE
PRICE
PRICE
PRICE ln
)
ln(
)
ln(
)
ln(
1
1
1


















e
PRICE
PRICE
PRICE
PRICE
e
PRICE
PRICE
e
PRICE
PRICE
nopool
nopool
nopool
pool
nopool
pool
nopool
pool
And: 1


 
e
PRICE
PRICE
PRICE
nopool
nopool
pool
Thus, houses with pools are 100(eδ
-1)% more expensive than houses without pools, all other things
being equal.
Slope Dummy variables:
Slope dummy variables allow the slope to change for a subset of observations in the sample. A model that
allows β2 to vary across observations takes the form:
i
iK
K
i
i
i
i
i e
x
x
x
D
x
y 





 



 ...
3
3
2
2
2
1
lOMoARcPSD|2941205



















x
of
slope
:
0
if
...
x
of
slope
:
1
if
...
)
(
)
(
1
2
2
2
2
1
2
2
2
2
1










i
i
iK
K
i
i
i
iK
K
i
D
e
x
x
D
e
x
x
y
E
Slope and Intercept Dummy Variables Combined:
Testing for Qualitative Effects:
Dummy variables are frequently used to measure:
1. Interactions between qualitative factors (e.g. race and gender)
2. The effects of qualitative factors having more than two categories (eg. level of schooling)
Example: WAGES
lOMoARcPSD|2941205

Explaining wages as a function of individual characteristics using white males as the reference group:
e
FEMALE
BLACK
FEMALE
BLACK
EDUC
WAGE 





 )
(
1
2
1
2
1 




Only if black and female
does γ have an effect
lOMoARcPSD|2941205

To test the null hypothesis that neither race nor gender affect wages at the 1% Level of significance:
Now: Explaining wages as a function of location using workers in the northeast as the reference
group:
e
WEST
MIDWEST
SOUTH
EDUC
WAGE 




 3
2
1
2
1 




Not significant at 5%
LOS
lOMoARcPSD|2941205

Not significant at 5%
LOS
lOMoARcPSD|2941205

Testing the Equivalence of Two regressions:
By including an intercept dummy variable and an interaction term for every variable in a regression model,
we are allowing every coefficient in the model to differ based on the qualitative factor – we are specifying
two regressions.
A test of the equivalence of the two regressions is a test of the joint null hypothesis that all the dummy
variable coefficients are zero. We can test this null hypothesis using a standard F-test. This particular F-test
is known as a Chow test.
Explaining wage as a function of individual characteristics:
e
FEMALE
BLACK
FEMALE
BLACK
EDUC
WAGE 





 )
(
1
2
1
2
1 




To test if there are differences between the wage regressions for the south and the rest of the country we
estimate the model:
The two regression equations are:
lOMoARcPSD|2941205

If south = 1
If south = 0
lOMoARcPSD|2941205

A Chow test at the 10% level of significance:
Controlling For Time:
Dummy variables are frequently used to control for:
 Seasonal effects
 Annual effects
 Regime effects (government)
Example: Emergency room cases
Data on number of emergency room cases per day is available in the file fullmoon.wk1. The model:
lOMoARcPSD|2941205

Example – Stockton House prices
Example – Investment tax credits
lOMoARcPSD|2941205

ECONOMETRICS: ECON2300 – Lecture 7
Heteroskedasticity
If we were to guess food expenditure for a low-income household and food expenditure for a high-
income household we would be more accurate for a low-income house-hold as they have less choice
and only have a limited income which they MUST spend on food. Alternatively a high-income
household could have extravagant or simple food taste – a large variance at high income levels:
resulting in heteroskedasticity.
How can we model this phenomena?
Note that assumption MR3 says that the errors have equal variance, or equal (homo) spread
(skedasticity). An alternative and much more general assumption is:
2
)
var( i
i
e 

Heteroskedasticity is often encountered in cross-section studies, where different individuals may have
very different characteristics. It is less common in time-series studies.
Properties of the OLS Estimator:
If the errors are heteroskedastic then:
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
lOMoARcPSD|2941205

 OLS is still a linear and unbiased estimator. But it is inefficient in that it is no longer BLUE
– Best linear unbiased estimator
 The variances of the OLS estimators are no longer given by the formulas we discussed in
earlier lectures. Thus, confidence intervals and hypothesis tests based on these variances
are no longer valid.
There are three alternative courses of cation to deal with heteroskedasticity:
1. If in doubt, use Least Sqaures for the parameters and a standard errors formula that works
either way. (White Robust Standard Errors)
2. If heteroskedasticity is known to be present, use Generalised Least Squares (Weighted Least
Squares) – BLUE if variance is known
3. Test for Heteroskedasticity: (Goldfeld-Quant test or White’s General Test or Breusch- Pagan
Test)
a. If present, use feasible Generalised Least Squares (if variance unknown and must be
estimated)
b. If no evidence use least squares as it is BLUE
White’s Approximate Estimator for the Variances of the Least Sqaures Estimator under
Heteroskedasticity:
Whites estimator:
a) Is strictly appropriate only in large samples
b) If errors are homoskedastic, it converges to the least squares formula
The variances of the OLS estimators depend on σi
2
rather than σ2
. In the case of the simple linear
model:
i
i
i e
x
y 

 2
1 

The variance of b2 is given by:
 
 
 










N
i
i
i
N
i
i
N
i
i
i
w
x
x
x
x
b
1
2
2
1
2
1
2
2
2
)
)
)
(
r̂
va 

If we replace 2
i
 with 2
ˆi
e we obtain White’s heteroskedasticity consistent estimator
White’s Robust Standard errors give the same coefficients but a reduced standard error.
lOMoARcPSD|2941205

What would happen if we always compute the standard errors (and therefore t-ratios) using White’s
formula instead of the traditional Least Squares?
This is known as Heteroskedasticity-Robust Inference, and it is used by many applied economists.
Robust estimation is a “branch” of econometrics.
When the true variance is homoskedastic and the sample is large, Whites formula converges
approximately to:
N
SSE

2
̂
The Generalised Least Squares (Weighted Least Squares):
1. Under heteroskedasticity the least squares estimator is not the best linear unbiased estimator
2. One way of overcoming this dilemma is to change or transform our statistical model into one
with homoskedastic errors and then use Least squares
3. Leaving the basic structure of the model intact, it is possible to turn the heteroskedastic error
model into a homosedastic error model.
If 2
i
 is known then we can weight the original data (including the constant term) and then perform
OLS on the transformed model. The transformed model is:
*
*
*
2
2
*
1
1
*
2
2
1
...
...
1
i
iK
K
i
i
i
i
i
i
iK
K
i
i
i
i
i
e
x
x
x
y
or
e
x
x
y





















The transformed model satisfies all the assumptions of the multiple regression model (including
homoskedasticity). Thus, applying OLS to the transformed model yields best linear unbiased
estimates. The estimator is known as Generalised Least Squares (GLS) or Weighted Least Squares
(WLS).
Sometimes 2
i
 is only known up to a factor of proportionality. In this case, we can still transform the
original model in such a way that the transformed errors are homoskedastic. Some popular
heteroskedastic specifications:
ij
ij
ij
i
ij
ij
ij
i
x
x
x
x
x
x
by
divide
therefore,
)
(
by
divide
therefore,
)
(
2
2
2
2
2
2
2












If our assumptions about the form of heteroskedasticity are incorrect then GLS will yield biased
estimates.
For: ij
ij
i x
x by
divide
therefore,
2
2
2

 

2
2
2
2
* 1
)
var(
1
var
)
var( t
t
t
t
t
t
i x
x
e
x
x
e
e 











For: ij
ij
i x
x by
divide
therefore,
2
2

 

lOMoARcPSD|2941205

t
t
t
t
t
t
i x
x
e
x
x
e
e 2
* 1
)
var(
1
var
)
var( 











Feasible Generalised Least Squares:
If we reject the null hypotheses of homoskedasticity, we might wish to use an estimation technique for
the coefficients and the standard errors that accounts for heteroskedasticity.
We have already shown that if we “weight” the original data by some appropriate value we can
achieve a transformed model with homoskedastic errors that can be estimated by Ordinary Least
Squares (OLS).
We also note that the task of finding na appropriate weight in a multiple regression model is more
complicated as we might have several variables that are potentially an option.
Feasible Generalised Least Squares is based on the idea that we should use all the information
available, therefore, we will construct a suitable weight that is a function of all the explanatory
variables in the original model.
If 2
i
 is unkown then it must be estimated. The resulting estimator is known as Feasible Generalised
Least Squares (GLS) a popular specification:
)
...
exp( 2
2
1
2
iS
S
i
i z
z 


 



In this case, we estimate the model:
i
iS
S
i
i
i
i v
z
z
v
e 





 


 ...
)
ln(
)
ˆ
ln( 2
2
1
2
2
And then use the variance estimator:
)
ˆ
...
ˆ
ˆ
exp(
)
ˆ
( 2
2
1
2
iS
S
i
i z
z 


 



The aim is to produce a prediction 2
i
 , based on the model and then use it to weight the original
model.
lOMoARcPSD|2941205

Econometrics Lecture 1: Intro to Models

Econometrics Lecture 1: Intro to Models

Recommended

Recommended

More Related Content

Similar to Econometrics Lecture 1: Intro to Models

Similar to Econometrics Lecture 1: Intro to Models (20)

More from MrDampha

More from MrDampha (11)

Recently uploaded

Recently uploaded (20)

Econometrics Lecture 1: Intro to Models