SlideShare a Scribd company logo
1 of 131
Download to read offline
Econometrics: ECON2300 – Lecture 1
The Econometric Model:
Econometrics is about how we can use theory and data from economics, business and the social
sciences, along with tools from statistics, to answer “how much” type questions.
In economics we express our ideas about relationships between economics variables using the
mathematical concept of a function. An example of this is when expressing the price of a house in
terms of its size.
Price = f(size)
Hedonic Model: A model that decomposes the item being researched into its constituent
characteristics, and obtains estimates of the contributory value of each characteristic
An example of a hedonic model for house price might be expressed as:
)
,
,
,
,
,
,
(
Price oning
airconditi
pool
age
stories
bathrooms
bedrooms
size
f

Economic theory does not claim to be able to predict the specific behaviour of any individual or firm,
but rather is describes the average or systematic behaviour of many individuals for firms.
Economic models = Generalisation
In fact we realise that there will be a random and unpredictable component e that we will call random
error. Hence the econometric model for price would be
e
oning
airconditi
pool
age
stories
bathrooms
bedrooms
size
f 
 )
,
,
,
,
,
,
(
Price
The random error e, accounts for the many factors that affect sales that we have omitted from this
simplistic model, and it also reflects the intrinsic uncertainty in economic activity.
Take for example the demand relation:
i
p
p
p
i
p
p
p
f
q c
s
c
s
d
5
4
3
2
1
)
,
,
,
( 



 





The corresponding econometric model is:
e
i
p
p
p
i
p
p
p
f
q c
s
c
s
d






 5
4
3
2
1
)
,
,
,
( 




Econometric Models include the error term, e
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are
making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such
copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
In every model there are two parts:
1. A systematic portion – part we obtain from economic theory, includes assumptions about the
functional form.
2. An unobservable random component – “noise” component which obscures our understanding
of the relationship among variables: e.
How Do we Obtain Data?
In an ideal world:
1. We would design an experiment to obtain economic observations or sample information
2. Repeating the experiment N times would create a sample of N sample observations
In the real world:
Economists work in a complex world in which data on variables are “observed” and rarely obtained
from a controlled experiment. It is just not feasible to conduct an experiment to obtain data. Thus we
use non-experimental data generated by an uncontrolled experiment.
Experimental data: Variables can be fixed at specific values in repeated trials of the
experiment
Non-experimental data: Values are neither fixed nor repeatable
Most economic, financial or accounting data are collected for administrative rather than research
purposes, often by government agencies or industry. The data may be:
 Time-series form – data collected over discrete intervals of time (stock market index, CPI,
GDP, interest rates, the annual price of wheat in Australia from 1880 to 2009)
 Cross-sectional form – data collected over sample units in a particular time period (income in
suburbs in Brisbane during 2009, or household census)
 Panel data form – data that follow individual microunits over time (data for 30 countries for
the period 1980-2005, monthly value of 3 stock market indices over the last 5 years)
Data may be collected at various level of aggregation:
 Micro – data collected on individual economic decision-making units units such as
individuals, households, or firms
 Macro – data resulting from a pooling or aggregating over individuals, households, or firms
at the local, state, or national levels
Data collected may also represent flow or a stock:
 Flow – outcome measures over a period of time, such as the consumption of petrol during the
last quarter of 2005
 Stock – outcome measured at a particular point in time, such as the quantity of crude oil held
by BHP in its Australian storage tanks on April 1, 2002, or the asset value of Macquarie Bank
on 5th
July 2009.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Data collected may be quantitative or qualitative:
 Quantitative – numerical data, data that can be expressed as numbers or some transformation
of them such as real prices or per capital income
 Qualitative – outcomes that of an “either-or” situation that is whether an attribute is present
or not. Eg. Colour, or whether a consumer purchased a certain good or not (Dummy
variables)
Statistical Inference:
The aim of statistics is to “infer” or learn something about the real world by analysing a sample of
data. The ways which statistical inference are carried out include:
 Estimating economic parameters, such as elasticities
 Predicting economic outcomes, such as the enrolments in bachelor degree programs in
Australia for the next 5 years.
 Testing economic hypotheses, such as: Ii newspaper advertising better than “email”
advertising for increasing sales?
Econometrics includes all of these aspects of statistical inference. There are two types of inference:
1. Deductive: go from a general case  to  a specific case: this is used in mathematical
proofs
2. Inferential: go from a specific case  to  a general case: this is used in statistics
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Review of Statistic Concepts:
Random variables: Discrete and Continuous
Random variable: A random variable is a variable whose value is unknown until it is observed, it is
not perfectly predictable. The value of the random variable results from an experiment (controlled or
uncontrolled). Uppercase letters (e.g. X) are usually used to denote random variables. Lower case
letters (e.g. x) are usually used to denote values of random variables.
Discrete random variable:
A discrete random variable can take only a finite number of values that can be counted by using the
positive integers
 E.g. The number of cars you own, your age in whole years, etc.
 Dummy variables:






female)
isnot
person
If
female
is
person
If
0
1
D
Probability distribution of a discrete random variable:
A discrete random variable has a probability density function which summarises all the possible
values of a discrete random variable together with their associated probabilities. It can be in the form
of a table, formula or graph.
Two key features of a probability distribution are its centre (location) and width (dispersion); the
mean, μ, and variance, σ2
, respectively. For a discrete random variable X:
Mean:  


 )
(
)
( x
X
P
x
X
E

Variance:    
 





 )
(
)
(
)
(
)
( 2
2
2
x
X
P
x
X
E
X
Var 


It can be seen in the graph above that there are only distinct values that the variable x can take which
is what a discrete variable is – the probability density function is NOT continuous.
Discrete probability distributions are:
1. Mutually exclusive – no overlap between values
2. Collectly exhausted – full sample space covered, includes every possibility
frequency
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example: A 5 sided dice is biased: the sides have 0, 1, 2, 3 & 4 respectively the following table
shows the probability distribution.
a) Calculate the mean & variance of X
b) Sketch the probability distribution of X
c) Find P(X 2
 )
X 0 1 2 3 4
P(X) 0.10 0.45 0.30 0.10 0.05
Solution:
a) i) Mean:
55
.
1
05
.
0
4
10
.
0
3
30
.
0
2
45
.
0
1
10
.
0
0
)
4
(
4
)
3
(
3
)
2
(
2
)
1
(
1
)
0
(
0
)
(




























 
X
P
X
P
X
P
X
P
X
P
x
X
P
X

ii) Variance:
 
9475
.
0
05
.
0
)
55
.
1
4
(
10
.
0
)
55
.
1
3
(
30
.
0
)
55
.
1
2
(
45
.
0
)
55
.
1
1
(
10
.
0
)
55
.
1
0
(
)
(
)
(
2
2
2
2
2
2
2



















  x
X
P
X 

b)
c) P(X 2
 )
85
.
0
40
.
0
45
.
0
10
.
0
)
2
(
)
1
(
)
0
(
)
2
(










 X
P
X
P
X
P
X
P
X
P(X=x)
0 1 2 3 4
0.45
0.30
0.10
0.05
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Continuous random variable:
A continuous random variable can take any real value (not just whole numbers or positive) generally
measurable.
 E.g. Your height, the temperature etc.
Easy way to establish, is to pick a random number eg. 3.4314135315 and ask if the variable can take
that value? If yes then it is continuous, if no it is discrete.
Probability distribution of a continuous random variable:
A continuous random variable has a probability density function which is a smooth non-negative
function representing likely and unlikely values of the random variable.
Two key features of a probability distribution are its centre (location) and width (dispersion); the
mean, μ, and variance, σ2
, respectively. Let f(x) denote the pdf for a random continuous variable X.
Mean: 





 dx
x
f
x
X
E )
(
)
(

Variance:   








 dx
x
f
x
X
E
X
Var )
(
)
(
)
(
)
( 2
2
2



There are an infinite number of points in an interval of a continuous random variable, so a positive
probability cannot be assigned to each point – the area of a line = 0. Therefore, for a continuous
random variable, P(X= x) = 0.
We can only assign probabilities to a range of values or to put it another way, we can only assign a
probability that X will lie within a certain range of variables.




2
1
)
(
)
( 2
1
x
x
dx
x
f
x
X
x
P
Note that it does not matter if greater than or greater than or equal to symbols are used as the
difference in negligible (the probability of a single value is 0).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The Normal Distribution:
The most useful continuous distribution is the normal distribution. The Normal distribution has a
probability distribution function (pdf) of:












 

x
e
x
f
x
-
,
2
1
)
(
2
2
2
)
(
2



Important Parameters of the normal distribution:
1. μ = mean: the centre of the distribution.
2. σ2
= variance: level of dispersion
3.
Properties of the normal distribution:
 Symmetric about the mean
 Bell shaped
 The mean, μ median and mode are all the same
 Used to find the probabilities of range
 Probabilities of a single value = 0. E.g. P(X=3) = 0
 There are an infinite number of normal distributions for each value of μ and σ
 Area under the probability Density function is equal to 1
o As symmetric, each side has 0.5 area
 Probability is measured by the area under the curve – the cumulative distribution function
The Standardised Normal Distribution:
 Variance and Standard Deviation of 1
 Mean of 0
 Values greater than the mean have positive Z-Values
 Values less than the mean have negative Z-Values
The most useful element of the normal distribution is that we can “standardise” it to the standard
normal distribution of which we have tables to determine probabilities (Z values)
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205




X
Z
Example: In a given population, heights of people are normally distributed with a mean of 160cm
and standard deviation of 10cm.
a) What is the probability that a person is more than 163.5cm tall?
b) What proportion of people have heights between 155cm and 164cm?
Solution:
a)
b)
160cm
 
 
3632
.
0
1368
.
0
5
.
0
35
.
0
0
5
.
0
10
160
5
.
163
10
160
160
5
.
0
5
.
163
160
5
.
0
)
5
.
163
(












 










Z
P
Z
P
X
P
X
P
160cm
 
   
3283
.
0
1368
.
0
1915
.
0
35
.
0
0
0
5
.
0
10
160
5
.
163
10
160
160
10
160
160
10
160
155
5
.
163
160
)
160
155
(
)
5
.
163
155
(















 









 












Z
P
Z
P
Z
P
Z
P
X
P
X
P
X
P
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The Chi-Square Distribution:
The Chi-square random variables arise when standard normal random variables are squared. If Z1,
Z2, ..., Zm denote m independent N(0,1) random variables, then
  






m
i
m
i
m
m Z
Z
Z
Z
V
1
2
)
(
2
2
)
(
2
2
2
2
1 ~
~ 


The notation
2
)
(
~ m
X
V is read as: the random variable V has a chi-square distribution with m
degrees of freedom.
The degrees of freedom parameter m indicates the number of independent N(0,1) random variables
that are squared and summed to form V. The value of m determines the entire shape of the chi-
square distribution – including its mean and variance.
 
  m
V
m
E
V
E
m
m
2
var
)
var(
)
(
2
)
(
2
)
(






The Values of V must be non-negative, v  0, because V is formed by squaring and summing m
standardised normal N(0,1) random variables. The distribution has a long tail, or is
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
skewed to the right (long tail to the right). As the degrees of freedom increase m gets larger and the
distribution becomes more symmetric and “bell-shaped”. As m gets larger, the chi-square
distribution converges to and essentially becomes the normal distribution.
The student ‘t’ Distribution:
A ‘t’ random variable is formed by dividing a standard normal random variable Z ~ N(0,1) by the
square root of an independent chi-square random variable, V ~ χ2
m.
m
t
m
V
Z
t ~

The t-distributions shape is completely determined by the degrees of freedom parameter, m and the
distribution is symbolised by tm.
Note that the t distribution is more spreadout than the standard normal distribution and less peaked.
With mean and variance:
2
)
var(
0
)
(
)
(
)
(



m
m
t
t
E
m
m
As the number of degrees of freedom approaches infinity, the distribution approaches the standard
normal. N(0,1).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The F distribution:
An F random variable is formed by the ratio of two independent chi-square random variables that
have been divided by their degrees of freedom. If V1 ~ χ2
m1 and V2 ~ χ2
m2 and if V1 and V2 are
independent, then:
)
,
(
~
/
/
2
1
2
2
1
1
m
m
F
m
V
m
V
F 
The F-distribution is said to have m1 numerator degrees of freedom and m2 denominator degree’s of
freedom. The values of m1 and m2 determine the shape of the distribution, which in general looks
like the figure below.
The graph below shows the range of shapes the distribution can take for different degrees of
freedom.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Laws of Expectation and Variation:
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
0
]
[
]
[
2
2
Y
Var
X
Var
Y
X
Var
Y
E
X
E
Y
X
E
X
Var
a
b
aX
Var
b
X
aE
b
aX
E
X
Var
a
aX
Var
X
aE
aX
E
b
Var
b
b
E















The Error Term:
The error term in a regression model is a random variable. Like other random variables it is
characterised by:
a) A mean (or expected value)
b) A variance
c) A distribution (i.e. probability density function)
We usually assume the random error term of an econometric model to:
a) Have expected value of zero
b) Have a variance which we will call σ2
Where:
a and bare constants
X and Y are random variables
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The smaller the variance of the error term, the more efficient the model.
Sampling Distributions:
We can usually draw many samples of size n from a population. Each sample can be used to
compute a sample statistic (eg. A sample mean) these statistics will vary from sample to sample. If
we take infinitely many samples of a normally distributed random variable X in the population the
sample statistic X will also be normally distributed.
The probability distribution that gives all possible values of a statistic and associated probabilities is
known as a sampling distribution.
If Xi ~ N(μ,σ2
) then, )
/
,
(
~ 2
N
N
X 

If the distribution of X is non-normal but n is large, then X is approximately normally distributed.
The approximation is good when n 30
 - this is known as the central limit theory.
Central limit Theorem:
If Y1, ..., YN are independent and identically distributed random variables with mean μ and variance
σ2
, and 
 N
Yi
Y / , then
N
Y
ZN




has a probability distribution that converges to the standard normal as N (0,1) as N  ∞.
Estimators & Estimates:
A point estimator is a rule or formula which tells us how to use a set of sample observations to
estimate the value of a parameter of interest. A point estimate is the value obtained after the
observations have been substituted into the formula.
Desirable properties of point estimators include:
 Unbiased – an estimator ˆ is an unbiased estimator of the population parameter θ if E(ˆ) = θ
 Efficiency - 1
ˆ
 is more efficient than 2
ˆ
 if    
2
1
ˆ
var
ˆ
var 
 
 Consistency- the distribution of the estimator becomes more concentrated about the
population parameter as the sample size becomes larger
Note that both bias and variance approach 0 as n approaches infinity.
Estimate: is a particular value for a parameter
Estimator: a formula to get estimate
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Examples:
N
Xi
X 
 is the best linear unbiased estimator of )
(X
E


 
N
X
Xi
 

2
2
̂ is a biased but consistent estimator of 2
2
)
( 
 
 X
E
 
1
ˆ
2
2


 
N
X
Xi
 is an unbiased and consistent estimator of 2
2
)
( 
 
 X
E
Confidence Intervals:
A confidence interval or interval estimate, is a range of values which contains information not only
about the location of the population mean, but about the precision with which we estimate it.
We can generally use the sampling distribution of an estimator to derive a confidence interval for the
population parameter.
In general, a 100(1-α)% confidence interval for the population mean is given by:
n
Z
x
CI

 

 2
/
Where α is the level of confidence.
Prior to selecting a random sample, the probability that a CI will contain the population parameter is
100(1-α)%. Eg. If we took many samples of size n and calculated the many corresponding random
1-α = 0.95
α/2 α/2
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
intervals
n
Z
x

 
 2
/ then 100(1-α)% would contain μ.
After we construct a confidence interval, either it does or it does not contain the population
parameter, with probabilities 1 and 0 (so we can only say we are 100(1-α)% confident that a
particular confidence interval contains the parameter.
General conclusion: “We can say with 100(1-α)% confidence that the population parameter is
between lower bound and upper bound.”
Hypothesis Testing:
An hypothesis is a statement or claim about the value(s) of one or more population parameters. To
test a hypothesis we
1. Identify a test statistic and find its sampling distribution when the hypothesis is true
2. Reject the hypothesis if the test statistic takes a value that is deemed unlikely
5 steps:
1. State H0 and H1 – H0 must contain an equality ( 

 ,
, )
2. State a decision rule – Reject H0 if...
3. Calculate test statistic
4. Compare, and make decision
5. Write conclusion
Note:
o One-tail or two tail tests can be used
o Can use critical values or p-value method
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics: ECON2300 – Lecture 2
An Econometric Model:
For a given set of data the aim of a econometric model is to fit a regression line and then check how
good it fits.
In order to investigate this relationship between expenditure and income we must build an economic
model and then a corresponding econometric model that forms the basis for a quantitative or
empirical economic analysis.
We must express mathematically which variables are dependent and independent. (In this case we
can say that the weekly expenditure depends on income – y depends on x)
We represent our economic model mathematically by the conditional mean:
x
x
y
E x
y 2
1
|
)
|
( 

 


The conditional mean )
|
( x
y
E is called a simple regression function as there is only one
explanatory variable. The unknown regression parameters 1
 and 2
 are the intercept and slope
respectively.
dx
x
y
dE
x
x
y
E )
|
(
)
|
(
2 




Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are
making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such
copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
For each value of x there is potentially a range of values of y – in fact each has a probability
distribution.
The figure above shows that the regression line passes through the mean of each distribution of
expenditure at each level of income.
The difference between the actual value of y and the expected value is known as the random error
term.
)
(
)
( 2
1 x
y
y
E
y
e 
 




If we rearrange:
e
x
y 

 2
1 

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Assumptions of the Simple Linear Regression (SLR) Model:
1. The population can be represented by:
e
x
y 

 2
1 

2. The mean value of y, for each value of x is given by the linear regression function
x
x
y
E 2
1
)
|
( 
 

Error term: This means that the mean error term is 0.
0
)
( 
e
E
3. For each value of x, the values of y are distributed about their mean value, following
probability distributions that all have the same variance
2
)
|
var( 

x
y
Error term: This means that the error terms are homoskedastic: constant variance. Violation
of this is hetroskadastic.
)
var(
)
var( 2
y
e 
  v
4. The sample values of y are all uncorrelated and have zero covariance, implying there is no
linear association amoung them,
0
)
,
cov( 
j
i y
y
Error term: There is no Serial Correlation. Note that this assumption can be made stronger
by assuming that the random errors e are all statistically independent in which case the values
of y are also statistically independent.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
5. The variable x is not random and must take at least two different values.
6. (optional) The values of y are normally distributed about their mean for each value of x.
 
 
2
2
1 ,
~ 

 x
N
y 
Error term: The values of e are normally distributed about their mean
)
,
0
(
~ 2

N
e
If the values of y are normally distributed, and vice versa.
The Error term:
If the regression parameters 1
 and 2
 were known then for any value of y we could calculate:
)
(
)
( 2
1 x
y
y
E
y
e 
 




However, the values of 1
 and 2
 are never known for certain and therefore it is impossible to
calculate e.
The random error e represents all factors affecting y other than x. These factors cause individual
observations y to differ from the mean value:
x
y
E 2
1
)
( 
 

Estimating the Parameters of the Similar Linear Regression:
Our problem is to estimate the location of x
y
E 2
1
)
( 
 
 that best represents our data. We would
expect this line to be somewhere in the middle of all the data points ince it represents mean, or
average behaviour. To estimate 1
 and 2
 we could simply draw a line through the middle of the
data and then measure the slope and intercept with a ruler. The problem with this method is that
different people would draw different lines – in fact there would be an infinite set of possibilities,
and that it would not be accurate.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The estimated regression line is given by:
i
i x
b
b
y 2
1
ˆ 

The least squares principle:
The least squares method involves finding estimators 1
 and 2
 that provide the smallest sum of
squared residuals:
 

 

2
2
ˆ
min
ˆ
min i
i
i y
y
e





 2
2
)
(
)
)(
(
x
x
y
y
x
x
b
i
i
i
x
b
y
b 2
1 

We usually use a computer to calculate these values as the process would take too long and be too
tedious to do by hand.
Interpreting the estimates:
 The value of b2 is an estimate of 2
 , the amount by which y increases per unit increase in x
 The value of b1 is an estimate of 1
 , what y would be when x = 0
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Because the least squares estimate is generated using sample data, different samples will lead to
different values of b1 and b2. Therefore b1 and b2 are random variables.
In this context we call b1 and b2 the least squares estimators, but when actual sample values are
substituted then we obtain values of random variables which are estimates.
Estimators: Formulas for estimates
Estimates: Actual values given by the estimators
The variances and Covariance of b1 and b2:












2
2
2
1
)
(
)
var(
x
x
N
x
b
i
i











 2
2
2
)
(
)
var(
x
x
b
i

The square roots of the estimated variances are known as standard errors.











 2
2
2
1
)
(
)
,
cov(
x
x
x
b
b
i

Summary: the variances and covariances of b1 and b2
 The larger the variance in the error term, 2
 , the greater the uncertainty there is in the
statistical model, and the larger the variances and covariance of the least squares estimators.
 The larger the sum of squares,  2
  x
xi , the smaller the variances of the least squares
estimators and the more precisely we can estimate the unknown parameters
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
In a the data are bunched, the  2
  x
xi is smaller and we cannot estimate the line very
accurately. In b the  2
  x
xi is larger and we can estimate the unknown parameters more
precisely.
 The larger the sample size N, the smaller the variances and covariance’s of the least squares
estimators
 The larger the term  2
i
x is, the larger the variance of the least squares estimator b1
The further our data are from x = 0, the more difficult it is to interpret B1.
 The absolute magnitude of the covariance increase the larger in magnitude is the sample
mean x , and the covariance has a sign opposite that of x .
The probability distribution of the Least Squares Estimators:
 If the normality assumption about the error terms, is correct, the the least squares estimators
are normally distributed.
 If assumptions 1 – 5 hold, and if the sample size is sufficiently large ( 30

n ), then by the
central limit theorem the least squares estimators have a distribution that approximates the
normal distribution shown.
The Gauss-Markov Theorem:
Under the assumptions of SR1-SR5 of the linear regression model, the estimators b1 and b2 have the
smallest variance of all linear and unbiased estimators 2
1 & 
 . They are the Best Linear Unbiased
Estimators (BLUE) of 2
1 & 
 .
To clarify what the Gauss-Markov theorem does, and does not, say:
1. The estimators b1 and b2 are “best” when compared to similar estimators, those that are linear
and unbiased. The theorem does not say that b1 and b2 are the best of all possible estimators.
2. They are the “best” within their class because they have the minimum variance. When
comparing two linear and unbiased estimators we always want to use the one with the
smallest variance.
3. In order for the Gauss-Markov Theorem to hold, assumptions SR1-SR5 must be true. If any
of these assumptions are not true, then b1 and b2 are not the best linear unbiased estimators of
B1 and B2.
4. The Gauss-Markov theorem does not depend on the assumption of normality
5. In simple linear regression these are the estimators to use.
6. The theorem applies to the least squares estimators. It does not apply to the least squares
estimates from a single same.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Estimating the variance of the Error term:
The variance of the random error ei is:
    )
(
)
0
(
)
(
)
var(
2
2
2
2
i
i
i
i
i e
E
e
E
e
E
e
E
e 




 
Assuming that the mean error = 0 assumption is correct.
The unbiased estimator of variance is:
2
ˆ
ˆ
2
2

 
N
ei
 with 2
2
)
ˆ
( 
 
E
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Interval Estimation:
Confidence interval:
)
(
. k
crit
k b
error
std
t
b
CI 


Where:
bk = b1 or b2
tcrit = the critical value )
2
,
2
/
1
( 
 N
t  where N-2 are the degrees of freedom
std. Error = is given by the regression estimation
Before sampling, we can make the probability statement there is a 100(1-α)% chance that the real
value lies within the interval.
After sampling, we can only make a confidence interval – we are 100(1-α)% confident that the real
value lies within the interval.
Example:
Construct a 95% confidence interval for B2 for the following equation when there are 40
observations.
)
09
.
2
(
)
4
.
43
(
21
.
10
4
.
83
ˆ x
y 

Solution:
23016
.
4
21
.
10
09
.
2
024
.
2
21
.
10
09
.
2
21
.
10
09
.
2
21
.
10
09
.
2
21
.
10
)
(
.
)
38
,
975
.
0
(
)
2
40
,
2
/
05
.
0
1
(
)
2
,
2
/
1
(
2
2





















t
t
t
b
error
std
t
b
CI
N
crit

We can say with 95% confidence that the true value of β2 lies within the interval 5.98 to 14.44.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Hypothesis Testing:
We can conduct a hypothesis test on the slope of the regression line.
Step 1: State Hypothesis:
,
,
,
:
,
,
,
:
1
0
c
c
c
H
c
c
c
H
k
k
k
k
k
k












Step 2: Decision rule:
Reject H0 if .....
Step 3: Calculate test statistic
Step 4: Compare and decision
Step 5: Conclusion
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example:
Using 40 observations on food expenditure.
)
09
.
2
(
)
4
.
43
(
21
.
10
4
.
83
ˆ x
y 

Test whether B2 is less than or equal to 0 at the 5% level of significance.
Step 1: State Hypothesis
0
:
0
:
2
1
2
0




H
H
Step 2: Decision Rule
Reject H0 if tcalc > tcrit
Step 3: Calculate test statistic
88
.
4
09
.
2
0
21
.
10
2
2
2





b
Se
b
tcalc

Step 4: Compare and decision
4.88 > 2.024 therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that the value of B2, that the
increase in expenditure for a 1 unit increase in income, is not less than or equal to 0.
Types of errors:
H0 true H0 False
Reject H0 Type 1 error = α No error
DNR H0 No error Type 2 error
Rejection
region
024
.
2
38
,
975
.
0
)
2
40
,
2
/
05
.
0
1
(
)
2
,
2
/
1
(








tcalc
t
tcalc
t
tcalc
t
tcalc N

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics: ECON2300 – Lecture 3
The least Squares Predictor:
The linear regression model provides a way to predict y given any value of x. This is extremely
important for forecasters; be it in politics, finance or business. Accurate predictions provide a basis
for better decision making.
Our first SR assumption is that our model is linear: For a given value of the explanatory variable, x0,
the value of the dependent various y0 is given by the econometric model:
0
2
1
0 e
x
y 

 

Where e0 is a random error. This random error has:
1. Mean: E(e0)= 0
2. Variance: var(e0) = σ2
3. Covariance: cov(e0,e1) = 0
The least squares predictor (or estimator) of y0 (given x0) is:
0
2
1
0
ˆ x
b
b
y 

To evaluate how well this predictor or estimator performs we define the forecast error, which is
analogous to the least squares residual.
The variance of the prediction error is:
i
i x
b
b
y 2
1
ˆ 

Forecast error: f
Actual value: yi
i
ŷ
i
x
0
0
2
2
1
1
0
2
1
0
2
1
0
0
)
(
)
(
)
(
ˆ
e
x
b
b
e
x
x
b
b
y
y
f
















Now: if we apply the assumptions SR1 to SR5:
0
0
0
0
)
(
)
(
)
(
)
ˆ
(
)
(
0
0
2
2
1
1
0
0











e
E
x
b
E
b
E
y
y
E
f
E


As:
0
)
(
&
)
(
&
)
( 0
2
2
1
1 

 e
E
b
E
b
E 

Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases,
these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of
assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been
superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the
copyright owner. We are making these materials available for the purposes of research and study and as such believe that this
constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205















 2
2
0
2
0
)
(
)
(
1
1
)
ˆ
var(
)
var(
x
x
x
x
N
y
y
f
i

If SR6 holds, or the sample size is large enough, then the prediction error is normally distributed.
Note that, the further x0 is from the sample mean, the larger the variance of the prediction error.
 This means that as you extrapolate more and more your predictions will be less accurate.
Note the variance of the forecast error is smaller when:
i) The overall uncertainty in the model is smaller, as measured by the variance of the
random errors σ2
ii) The sample size N is larger
iii) The variation in the explanatory variable is larger
iv) The value of x0 from x is smaller
The forecast error variance is estimated by replacing
σ2
with its estimator:
)
(
r̂
va
)
(
ˆ
ˆ
)
(
ˆ
)
(
ˆ
ˆ
)
(
)
(
ˆ
ˆ
ˆ
)
(
)
(
1
1
ˆ
)
ˆ
var(
2
2
0
2
2
2
2
2
0
2
2
2
2
0
2
2
2
2
2
0
2
b
x
x
N
x
x
x
x
N
x
x
x
x
N
x
x
x
x
N
f
i
i
i







































i
i x
b
b
y 2
1
ˆ 

i
ŷ
x x2
x1
Obviously:
The estimate that the estimator or predictor
gives at x1 will be close to the actual value as
there are lots of data points that the regression
is based on round x1 – it is close to the sample
mean.
At x2, there are no points very close that the
regression was based on, so the prediction will
be less accurate aka will have a larger variance.
i.e. We can do a better job of predicting in the
region where we have more sample
information.
The standard error of the forecast:
)
(
r̂
va
)
( f
f
se 
Hence, we can construct a (1-α)x100%
prediction interval for y0:
)
(
ˆ0
0 f
se
t
y
y crit


Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example:
Calculate a 95% confidence interval for y when x0 = 20:
)
(
ˆ0
0 f
se
t
y
y crit


Step 1: Linear equation
From the output above we can determine a linear regression:
)
093
.
2
(
)
41
.
43
(
21
.
10
416
.
83
ˆ
ˆ 2
1
x
y
x
b
b
y




Therefore: when x = 20
616
.
287
)
20
(
21
.
10
416
.
83
ˆ 


y
Step 2: Determine se(f)
)
(
r̂
va
)
( f
f
se 
2
2
2
2
0
2
2
2
0
2
)
0932
.
2
)(
605
.
19
20
(
40
517
.
89
517
.
89
)
(
r̂
va
)
(
ˆ
ˆ
)
(
)
(
1
1
ˆ
)
(
r̂
va






















b
x
x
N
x
x
x
x
N
f
i



S.E of regression Sample size (N)
x-value mean of x se(b2)
Note:
Var(b2) = se(b2)2

)
(
r̂
va f 8214.34 Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Step 3: Confidence interval
34
.
8214
024
.
2
616
.
287
34
.
8214
616
.
287
)
(
r̂
va
616
.
287
)
(
ˆ
2
40
,
975
.
0
)
2
,
2
/
1
(
0
0













t
f
t
f
se
t
y
y
N
crit

06
.
471
17
.
104 
 y
Therefore we can say with 95% confidence that the true expenditure on food will be between
$104.17 and $471.06.
Transforming x to obtain se(f):
A simple way to obtain the prediction and prediction interval estimates with EViews ( or any other
econometrics package, including Excel) is as follows:
1. Transform the independent variable x by subtracting x0 from each of the values.
Generate a new variable:
Genr  x2 = x – x0
2. The estimate the regression model by running a regression analysis
3. The estimated standard error of the forecast is given by:
2
1 ˆ
)
var(
)
( 

 b
f
se
Example:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The transformation has the following effect:
Measuring Goodness-of-Fit:
Two major reasons for analysing the model
e
x
y 

 2
1 

1. To explain how the dependent variable (yi) changes as the independent variable (xi) changes
2. To predict y0 given an x0
These two objectives come under the broad headings of estimation and prediction. Closely allied
with the prediction problem discussed in the previous section is the desire to use xi to explain as
much of the variable in the dependent variable yi as possible.
i
i x
b
b
y 2
1
ˆ 

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
SST = total sum of squares – measure of total variation in the dependent variable about its sample
mean
SSR = regression sum of squares – the part that is explained by the regression
SSE = sum of squared errors – that part of the total variation that is unexplained
Coefficient of determination: R2
The coefficient of determination measures the proportion of the variation in the dependent variable
that is explained by the regression model:
SST
SSE
SST
SSR
R 

 1
2
0 < R2
< 1
If R2
=1 the data fall exactly on the fitted least squares regression line and we have a perfect fit. If the
sample data for y and x are uncorrelated and show no linear association, then the least squares fitted
line is “horizontal” so SSR = 0 and R2
= 0.
For a simple regression model, R2
can also be computed as the square of the correlation coefficient
between yi and i
ŷ .
 R2
= 1: All the sample data falls exactly on the fitted least squares line, SSE = 0
 R2
= 0: The sample data for y and x are uncorrelated, the least squares fitted line is horizontal
and equal to the mean of y so that SSR = 0
Note:
1. R2 is a descriptive measure
2. By itself, it does NOT measure the quality of the regression model
3. It is NOT the objective of regression analysis to find the model with the highest R2
4. By adding more variables R2
will automatically increase even if the variables have no
economic justification this is why we use adjusted R2 in multiple regression analysis(will
expand on this when we study multiple regression):
i
ŷ
x i
x
y
SST = y
yi 
SSE i
i
i y
y
e ˆ
ˆ 
 = unexplained
SSR y
yi 
ˆ = explained component
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
)
1
/(
)
/(
1
2




N
SSR
K
N
SSE
R
Example:
For the same data as before:
The Effects of Scaling the Data:
Data we obtain is not always in a convenient form for presentation in a table or use in a regression
analysis. When the scale of the data is not convenient, it can be altered without changing any of the
real underlying relationships between variables.
If we scale x by 1/c:
e
c
x
c
y
e
x
y







)
/
(
)
( 2
1
2
1




If we scale y by 1/c:
c
e
x
c
c
c
y
e
x
y
/
)
(
)
/
(
/
/ 2
1
2
1











Example if we now report income in $100 units.
Because b2 = 10.21 and x = 200, After scaling: b2 = 0.1021 and x = 2
This has no change in the model.
Choosing a Functional Form:
So far we have assumed that the mean household food expenditure is a linear function of household
income. That is, we assumed the underlying economic relationship to be x
y
E 2
1
)
( 
 
 , which
implies that there is a linear, straight-line relationship between E(y) and x.
When the scale of x is altered, the standard error of the
regression coefficient changes by the same multiplicative
factor as the coefficient, so that their ratio, the t-statistic, is
unaffected. All other regression statistics are unchanged.
Because the error term is scaled in this process the least
squares residuals will also be scaled. This will affect the
standard errors of the regression coefficients, but will not
affect t statistics or R2
.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
In the real world this might not be the case, and this was only assumed to make the analysis easier.
The starting point in all econometric analysis is economic theory. What does economics really say
about the relation between food expenditure and income, holding all else constant? We expect there
to be a positive relationship between these variables because food is a normal good. But nothing says
the relationship must be a straight line. In fact we do not expect that as household income rises that
food expenditures will continue to rise indefinitely at the same constant rate. Instead, as income rises,
we expect food expenditures to rise but at a decrease rate – law of diminishing returns.
The term linear in “linear regression model”
1. Does not mean a linear relationship between the economic variables.
2. Does mean that the model is “linear in the parameters” (eg. βk values – must not be raised to
powers or multiplied by other parameters etc.) but is not, necessarily, “linear in the variables”
(eg. X can be x2
x3
etc etc.)
Linear in parameters: the parameters are not multiplied together, divided, squared, cubed etc.
k
k x
x
x
f 

 


 ...
)
( 1
1
0
1. each explanatory variable in the function is multiplied by an unknown parameter,
2. there is at most one unknown parameter with no corresponding explanatory variable, and
3. all of the individual terms are summed to produce the final function value.
An example of a non-linear in parameter model is:
x
x
f 1
0
0
)
( 

 
 or 1
0
)
( 
 x
x
f 

This is non-linear because the slope of this line is expressed as a product of two parameters.
As a result, nonlinear least squares regression must be used to fit this model, but linear least
squares cannot be used.
Because of this fact, the simple linear regression model is much more flexible than it appears at first
glance. By transforming the variables y and x, we can represent many curved, nonlinear
relationships and still use the linear regression model. Choosing an algebraic form for the
relationship means choosing transformations of the original variables.
The slopes of which can be determined by taking the derivatives of the function.
Note: the most important implication of transforming variables is that the regression result
interpretations change. Both the slope and elasticity change from the linear relationship case.
Some common function types are:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
A Practical Approach:
1. Plotting the data and choosing economically-plausible models
2. Testing hypotheses concerning the parameters
3. Performing residual analysis
4. Assessing forecasting performance
5. Measuring goodness-of fit (R2
)
6. Using the principle of parsimony – simplest model
Example on Food Expenditure:
1. Plotting data
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
2. Testing hypothesise:
All slope coefficients are significantly different from zero at the 5% level of significance.
3. Performing residual analysis: Testing for normally distributed Errors
The k-th moment (from physics) of the random variable e is:
k
k e
E )
( 
 

Where μ denotes the mean of e. Measures of spread, symmetry and “peakedness” are:
Variance: 2
2 
 

V
Skewness: 3
3 / 


S
Kurtosis: 4
4 / 


K - Whether the tails are thicker or thinner than expected
If e is normally distributed then S = 0 and K = 3. Formalising this, is the Jarque-Bera Test:
The Jarque-Bera test is a test of how far measurers of residual skewness and kurtosis are from 0 and
3 (normality).
To test the null hypothesis of normality of the errors, we use the test statistic:







 


4
)
3
(
6
2
2 K
S
N
JB
Where:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
N = sample size
S = skewness
K = Kurtosis
When the null hypothesis is true, the Jarque-Bera statistic, JB has a χ2
distribution with 2 df.
Step 1: State the hypothesis:
H0: the errors are normally distributed
H1:the errors are not normally distributed
Step 2: Decision rule:
Rejet H0 if JB > χ2
(0.95,2) = 5.991
Step 3: Calculate test statistic:
063
.
0
4
)
3
99
.
2
(
)
097
.
0
(
6
40
4
)
3
(
6
2
2
2
2








 










 


K
S
N
JB
Step 4: Compare and decision
0.063 < 5.991 therefore do not reject H0.
Step 5: conclusion
There is insufficient evidence to conclude that the errors are not normally distributed at the
5% level of significance.
4. Assessing forecasting performance
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
5. Measuring goodness-of-fit: With different dependent variables:
Goodness of fit with different dependent variables:
The R2 from a linear model, measures how well the linear model explains the variation in y, while
the R2 from a log-linear model measures how well that model explains the variation in ln(y). The
two measures should NOT be compared.
To compare goodness-of-fit in models with different dependent variables, we can compute the
generalised R2
.
  2
ˆ
,
2
2
)
ˆ
,
( y
y
g r
y
y
corr
R 

We can’t compare R2
as each has
different dependent variable.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
6. Using the principle of parsimony – Use the simplest model
The principle of parsimony states that you should use the simplest model if two models
appear to be of equal forecasting ability.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics: ECON2300 – Lecture 4
Multiple Regression A:
The simple regression model we have studied so far relates the dependent variable y to only ONE
explanatory variable x.
When we turn an economics model with more than one explanatory variable into its corresponding statistical
model, we refer to it as a multiple regression model.
Changes and Extensions from the simple regression model:
1. Interpretation of the β parameters:
The population regression line is:
ik
k
i
iK
i x
x
x
x
yi
E 

 


 ...
)
,...,
|
( 2
2
1
2
The k-th slope coefficient measures the effect of a change in the variable xk, upon the expected value
of y, all other variables held constant. Mathematically:
ik
ik
i
i
sconstnat
allotherx
iK
i
x
x
x
y
E
xiK
x
x
yi
E




 )
,...,
|
(
)
,...,
|
( 2
'
2
Note: the x’s start at 2 as 1 refers to the intercept term (which has no slope).
2. The assumption concerning the characteristics of the explanatory (x) variables
The assumptions of the multiple regression model are:
MR1: i
K
i
K
i
i e
x
x
y 



 

 ...
2
2
1 , where: i = 1,..., N
- The model is linear in parameters but may be non-linear in the variables
MR2: 0
)
E(e
:
with
synonomous
is
which
...
)
( i
2
2
1 




 K
i
K
i
i x
x
y
E 


- The expected (average) value of yi depends on the values of the explanatory variables and
the unknown parameters.
MR3: 2
)
var(
)
var( 

 i
i e
y - the error terms are homoskedastic (have constant variance)
MR4: cov(yi,yj) = cov(ei,ej) = 0 – There is no serial correlation
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are
making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such
copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
MR5: The values of each xik are not random and are not exact linear functions of the other
explanatory variables
MR6: (optional) )
,
0
(
~
]
),
...
[(
~ 2
2
2
2
1 



 N
e
x
x
N
y i
iK
K
i
i 



3. The degrees of freedom for the t-distribution
We will go into further detail of this further in the summary.
Least Squares Estimation:
The fitted regression line for the multiple regression model is:
iK
k
i
i x
b
x
b
b
y 


 ...
ˆ 2
2
1
The least squares residual is:
iK
k
i
i
i
i x
b
x
b
b
yi
y
y
e 





 ...
ˆ
ˆ 2
2
1
Similarly to the simple linear regression, the unknown parameters β1,...,βK are obtained by
minimising the residual sum of squares:
   


 









N
i
iK
k
i
N
i
i
i
N
i
i x
b
x
b
b
yi
y
y
e
1
2
2
2
1
1
2
1
...
ˆ
ˆ
Solving the first-order conditions for a minimum yields messy expressions for the ordinary least
squares estimators, even when K is small.
For example when K = 3 the OLS method gives:
In practise we use matrix algebra to solve these systems:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
To understand graphically what a multiple regression model embodies look at the image below:
The equation forms a surface or plane which describes the position of the variable.
Example:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The model is given by:
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Interpretation of the coefficients:
b2: The number of sales is expected to fall by $7907 units when the price increases by $1 holding
the amount of advertising constant.
b1: The number of sales is expected to increase by $1863 units when the advertising increase by $1
holding the price constant.
Properties Of The OLS Estimators: (OLS = Ordinary Least Squares)
The Gauss-Markov Theorem says that:
If MR1 to MR5 are correct, the OLS estimators b1,...,bK have the smallest variance of all linear and
unbiased estimators of β1, ...,βK – they are the Best Linear and Unbiased Estimators (BLUE).
Remember that the Gauss-Markov theorem does not depend on the assumption of normality (MR6).
However, if MR6 does hold, then the OLS estimators are also normally distributed.
Again with larger values of K, the formula’s for variances of the OLS estimators are messy. For
example, when K = 3, we can show that:
Where r23 is the sample correlation coefficient between x2 and x3. -1 < r < 1
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The variances and covariances are often presented in the form of a covariance matrix. For K = 3, this matrix
takes the form:
In practise however, σ2
, the population variance is unknown. So instead we use an unbiased estimator of the
error variance:
K
N
e
K
N
y
y
N
i
i
N
i
i
i






 
 1
2
2
1
2
ˆ
)
ˆ
(
̂
The estimated variances and covariances of the OLS estimators are obtained by replacing within the
appropriate formulas. The square roots of the estimated variances are still known as standard errors.
It is important to understand the factors affecting the variance of bi (i = 2,...,K):
Inferences in the Multiple Regression Model:
If the assumptions MR1 – MR6 hold, we can:
1. The larger σ2
the larger the variance of the least squares estimators.
2. The larger the sample size the smaller the variances
3. More variation in an explanatory variable around its mean, leads to a smaller variance of the
least squares estimator.
4. The larger the correlation between the explanatory variables, the larger is the variance of the
least squares estimators. “Independent” variables ideally exhibit variation that is “independent”
of the variation in other explanatory variables.
5. Variation is one explanatory variable connected to variation in another explanatory variable is
knonw as Multicoliniarity (see next week). E.g. A larger correlation between x2 and x3 leads to a
larger variance of b2.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
1. Construct confidence intervals for each of the K parameters
2. Conduct a significance test for each of the K parameters
3. Conduct a hypothesis test on any of the parameters or combinations of parameters
The approach is that followed for the simple regression model in weeks 2 and 3 for the parameters of the
simple regression model.
1. Confidence interval:
A 100(1-α)% confidence interval for βk is given by:
)
( k
crit
k
k b
se
t
b 

 for k = 1, ..., K
Where:
K = Number of βi parameters . e.g. 3
3
2
2
1
ˆ i
i
i x
b
x
b
b
y 

 : K = 3
)
,
2
/
1
( K
N
t
tcrit 

 
Se = the standard error given in the regression output of the bi estimate
Example: construct a 95% confidence interval for the coefficient of advertising for the following
model which was based on N = 75 observations on hamburger sales.
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Solution:
)
683
.
0
(
993
.
1
863
.
1
)
(
)
(
)
(
3
3
)
72
,
975
.
0
(
3
3
3
)
3
75
,
2
/
05
.
0
1
(
3
3
)
,
2
/
1
(















 
b
se
t
b
b
se
t
b
b
se
t
b k
K
N
k
k
2. Hypotheses Testing
2.1.A simple null hypothesis is a null hypothesis with a single restriction on one or more parameters.
Under MR1 to MR6, we can test the null hypothesis H0: βk = c using the t-statistic:
)
(
~
)
(
K
N
k
k
t
b
se
c
b
t 


Even if MR6 doesn’t hold, the test is still valid provided the sample size is large.
Example: Test whether revenue is related to price at the 5% level of significance when N = 75.
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Solution:
224
.
3
502
.
0 3 
 
We can say with 95% confidence that the true
change in sales for a one dollar increase in
advertising is between $502 and $3224.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Step 1: State Hypotheses
0
:
0
:
2
1
2
0




H
H
Step 2: Decision Rule
Reject H0 if |tcalc| > tcrit
Step 3: Calculate Test Statistic
215
.
7
096
.
1
0
908
.
7
)
(
)
( 2
2
2









b
se
b
b
se
b
t
k
k
k
calc


Step 4: Compare and Decision
|-7.215| > 1.993 therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that the price does not have
no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as
effect on revenue.
2.2.Testing of a null hypothesis consisting of two or more hypotheses about the parameters in the
multiple regression model.
F- Tests
Used in:
1. Overall significance of the Model
2. Testing economic hypotheses involving more than one parameter in the model
3. Misspecification Tests
4. Testing for Heteroskedasticity
5. Testing for Serial correlation
Note: We adopt assumptions MR1-MR6 (i.e. including normality). If the errors are not normal, then
the results presented will hold approximately if the sample is large.
993
.
1
|
|
|
|
|
|
)
72
,
975
.
0
(
)
,
2
/
1
(


 

tcalc
t
tcalc
t
tcalc K
N

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
A Familiar Form of the F-test:
From ECON1320 we saw that we could express F as:
)
/(
)
1
/(
)
(
)
/(
)
(
)
1
/(
)
(
K
N
SSE
K
SSE
SST
K
N
SSE
K
SSR
F







However, this is just a particular example of a more general F-statistic that can be used to test
sets of joint hypotheses.
The general F-test:
A joint null hypothesis is a null hypothesis with two or more restrictions on two or more parameters.
Under MR1 to MR6, we can test a joint null hypothesis concerning the using the F statistic:
K
N
J
U
U
R
F
K
N
SSE
J
SSE
SSE
F 


 ,
~
)
/(
)
(
/
)
(
Where:
J = the number of restrictions in H0
SSEU = The unrestricted sum of squared errors from the original, unrestricted multiple regression
Model.
SSER = The restricted sum of squared errors from a regression model in which the null hypothesis
is assumed to be true
Note: Even if MR6 doesn’t hold, the test is still valid provided the sample size is large (by the central
limit theorem)
The General F-test can be used to test 3 types of hypotheses:
1. When used to test H0: βk=0 against H1: βk ≠ 0; the F-test is equivalent to a t-test
J = 1
2. When used to test: H0: β2= β3= ...= βk against H1: At least one βk ≠ 0
J = K - 1
3. The F-test can also be used to test whether some combination of parameters is
collectively significant to the model
K
J 

1
Restrictions:
When we have a restriction, we assume that the null hypothesis is true, for example if the null hypotheses is
0, then we assume that the βk value in the null hypotheses is 0 in the regression equation. Instead of using
the least squares estimates that minimise the sum of squared errors, we find estimates that minimise the sum
of squared errors subject to parameter constraints – restrictions. This means that the sum of squared errors
will increase; a constrained minimum is larger than an unconstrained minimum.
The theory behind the F-test, is that if the Errors are significantly different, then the assumption that the
parameter is the value assumed in the null hypothesis has significantly reduced the ability of the model to fit
the data, and thus the data do not support the null hypothesis. On the other hand, if the null hypothesis is
true, we expect that the data are compatible with the conditions placed on the parameters – we would expect
little change in the sum of squared errors when the null hypothesis is true.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
1. Testing with 1 restriction (J=1)
Example: Test whether revenue is related to price at the 5% level of significance when N = 75.
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Solution:
Step 1: State Hypotheses & apply restriction
0
:
0
:
2
1
2
0




H
H
Now, impose the restriction assuming the null is correct, ie. Price is not significant and β2 is 0 and
then find the regression equation.
(0.890)
(1.80)
)
(
)
(
733
.
1
180
.
74
ˆ
se
ADVERT
S 

Step 2: Decision Rule
Reject H0 if Fcalc > Fcrit
Step 3: Calculate Test Statistic
06
.
52
)
3
75
/(
)
943
.
1718
(
1
/
)
943
.
1718
827
.
2961
(
)
/(
)
(
/
)
(







K
N
SSE
J
SSE
SSE
F
U
U
R
Step 4: Compare and Decision
52.06 > 3.97 therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that the price does not have
no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as
effect on revenue.
The t-test and F-test - a relationship:
When conducting a two-tail test for a single parameter, either a t-test or an F-test can be used and the
outcomes will be identical.
In fact, the square of a t random variable with df degrees of freedom is an F random variable with
distribution F(1,df)
F-statistic = (t-statistic)2
F-crit = (t-crit)2
52.06 = (-7.215)2
3.97 = (1.993)2
97
.
3
)
3
75
,
1
,
95
.
0
(
)
,
,
1
(






Fcalc
t
Fcalc
F
Fcalc K
N
J

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
2. Testing with J = K-1 restrictions: the overall significance of the model
An important application of the F-test is for what is called “Testing the overall significance of a model”.
Consider the general multiple regression model with (K - 1) explanatory variables and K unknown
coefficients.
Unrestricted model: i
K
K
i
i
i
i e
x
x
x
y 




 


 ...
3
3
2
2
1
To examine whether we have a viable explanatory model, we set up the following null and alternative
hypotheses.
Restricted model: i
i e
y 
 1

Therefore:
SSER = SSTU SSEU = SSEU
Step 1: State Hypotheses and calculate restricted model
nonzero
is
the
of
one
least
at
:
0
,...,
0
,
0
:
k
1
3
2
0




H
H K 


Estimate restricted model:
)
749
.
0
(
)
(
375
.
77
ˆ
se
S 
SSER = 3115.482 (=SSTU)
Step 2: Decision rule
Reject H0 if Fcalc > Fcrit
Step 3: Calculate test statistic
248
.
29
)
3
75
/(
)
943
.
1718
(
2
/
)
943
.
1718
482
.
3115
(
)
/(
)
(
/
)
(







K
N
SSE
J
SSE
SSE
F
U
U
R
Step 4: Compare and decision
29.248 > 3.12 Therefore reject H0.
Step 5: Conclusion
There is sufficient evidence at the 5 % level of significance to conclude that at least one of the explanatory
variables as an effect on sales.
3. Testing a Group of parameters (1 ≤ J < K)
Consider the model:
Note: the null has K – 1 hypotheses, it is referred to
as a Joint hypothesis.
12
.
3
)
3
75
,
1
3
,
95
.
0
(
)
,
,
1
(







Fcalc
F
Fcalc
F
Fcalc K
N
J

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Does advertising have an effect on sales?
Step 1: State Hypotheses
nonzero
are
both
or
0
0
:
0
0
:
4
3
1
4
3
0








H
H
Step 2: Decision rule
Reject H0 if Fcalc > Fcrit
Step 3: Calculate test statistic
44
.
8
)
4
75
/(
)
084
.
1532
(
2
/
)
084
.
1532
391
.
1896
(
)
/(
)
(
/
)
(







K
N
SSE
J
SSE
SSE
F
U
U
R
Step 4: Compare and decision
8.44 > 3.126
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that advertising has a
statistically significant effect on sales.
126
.
3
)
4
75
,
2
,
95
.
0
(
)
,
,
1
(






Fcalc
F
Fcalc
F
Fcalc K
N
J

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Prediction:
The value of y when the explanatory variables take the values x02.
The prediction error (or forecast error) is 0
0
ˆ y
y
f 
 . The prediction error is a random variable with a
mean and a variance. If assumptions MR1 to MR5 hold then 0
)
ˆ
(
)
( 0
0 

 y
y
E
f
E and
)
ˆ
var(
)
var( 0
0 y
y
f 
 with many terms, each involving σ2
. The prediction error variance is estimated by
replacing σ2
by 2
ˆ
 . The square root of the estimated forecast error variance is still called the standard error
of the forecast. If assumption MR6 (normality) is correct, or the sample size is large, then a 100(1-α)%
confidence interval or prediction interval for y0 is:
)
(
ˆ
)
(
ˆ
)
,
2
/
1
(
0
0
0
0
f
se
t
y
y
f
se
t
y
y
K
N
c







Example: Construct a 95% confidence interval for the prediction of y0 when P = 5.50 and A = 1200
Solution:
2
1
)
3
75
,
2
/
05
.
0
1
(
0
)
,
2
/
1
(
0
0
0
0
ˆ
*)
var(
))
1200
(
863
.
1
)
50
.
5
(
91
.
7
91
.
118
(
)
(
ˆ
)
(
ˆ















b
t
y
f
se
t
y
y
f
se
t
y
y
K
N
c
Therefore create two new variables:
P * = (P – 5.50) and A* = (A – 1200)
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
9429
.
4
993
.
1
66
.
77
0 


y
5112
.
87
809
.
67 0 
 y
We can therefore say with 95% confidence that the true value of sales when the price is $5.50 and the
advertising expenditure is $1200, that the true value of sales lies between 67.8 thousand and
87.5112thousand.
A reminder:
Estimated regression models describe the relationship between the economic variables for values similar to
those found in the sample data. Extrapolating the results to extreme values is generally not a good idea.
Predicting the value of dependent variables for values of the explanatory variables far from the sample
values invites disaster.
Goodness of Fit:
If the regression model constrains an intercept, we can still decompose the variation in the dependent
variable (SST) into its explainable and unexplainable components (SSR and SSE). Then the coefficient of
determination still measurers the proportion of the variation in the dependent variable that is explained by
the regression model:
SST
SSE
SST
SSR
R 

 1
2
The interpretation of R2 is identical to its interpretation in the simple regression model. i.e. R2
% of variation
can be explained by the estimated equation. (1 implies a perfect fit)
Adjusted R2:
A problem with R2
is that it can be made large by adding more and more variables to the model, even when
they have no economic justification. The adjusted R-squared imposes a penalty for adding more variables:










)
1
/(
)
/(
1
2
N
SST
K
N
SSE
R
Adjusted R-squared does not give the proportion of variation in the dependent variable that is explained by
the model. It should not be used as a criterion for adding or deleting variables (if we add a variable, adjusted
R-Squared will increase if the t-statistic on the new variable is greater than 1 in absolute value!)
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
SST = (N-1) (S.D. dependent variable)2
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics: ECON2300 – Lecture 5
Multiple Regression B:
Non-sample information:
In many estimation problems, economic theory and experience provides us with information on the
parameters that is over and above the information contained in the sample data. If this non-sample
information is correct, and if we can combine it with the sample information, then we can estimate the
parameters with greater precision.
Some non-sample information can be written in the form of linear equality restrictions on the unknown
parameters. (e.g. several parameters sum to one). We can incorporate this information into the estimation
process by simply substituting the restrictions into the model.
One example is when dealing with a firm which has constant returns to scale – take for example the cobb-
dougles function whose parameters α and β must sum to 1 with constant returns to scale:


t
t
t
t L
K
A
y 
We can show: when K and L increase by proportion λ, that this has the effect λ on y also with constant
returns to scale.

























1
)
(
)
(
t
t
t
t
t
t
t
t
L
K
A
y
L
K
A
y
In order to incorporate the non-sample information, and impose constant returns to scale we should then
estimate the following model:

 
 1
t
t
t
t L
K
A
y
The model is now the function of a single unknown parameter α.
A technique to obtain an estimate of α in this case is known as restricted least squares - we “force” β = 1 – α
To estimate the above model in practise, we can use the least squares method – as the model is linear in its
parameters. We would convert the model to a log-log function as the model,
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of
assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been
superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the
copyright owner. We are making these materials available for the purposes of research and study and as such believe that this
constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
t
t
t
t
t e
L
K
A
y 



 )
ln(
)
1
(
)
ln(
)
ln(
)
ln( 

To insure the restriction holds we re-arrange and collect terms:
t
t
t
t
t e
L
K
A
L
y 

 )
/
ln(
)
ln(
)
/
ln( 
The restricted Least Squares Estimator:
The least squares estimates we obtain after imposing the restrictions are known as restricted least squares
(RLS) estimates.
The RLS estimator:
 Is biased unless the restrictions are EXACTLY true
 Has a smaller variance than the OLS (ordinary least squares) estimator, whether or not the
restrictions are true
By incorporating the additional information with the data, we usually give up unbiasedness in return for
reduced variances. Evidence on whether the restrictions are true can, of course, be obtained using an F-test
(Wald test).
Model Specification:
There are several key questions you should ask yourself when specifying a model:
Q1. What are the important considerations when choosing a model?
A1. The problem, the economic model
Q2. What are the consequences of choosing the wrong model?
A2. If the wrong model is used, there can be omitted and irrelevant variables in the model
Q3. Are there ways of assessing whether a model is adequate?
A3. Yes you can use model Diagnostics – A test of adequate functional form
In examining these model specifications we will look at the following example:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Omitted variables:
It is possible that a chosen model may have important variables omitted. Our economic principles may have
overlooked a variable, or lack of data may lead us to drop a variable even when it is prescribed by economic
theory.
We will consider a sample of married couples where both husbands and wives work. This sample was used
by labour economist Tom Mroz in a classic paper on female labour force participation. The variables from
this sample are in edu_inc.dat.
We are interested in the impact of level of education, both the husband’s education and the wife’s education,
on family income. Summary statistics for the data appear in table 6.2. The estimated relationship is:
We estimate that an additional year of education for the husband will increase annual income by $3132, and
an additional year of education for the wife will increase income by $4523. If we now incorrectly omit
wife’s education from the equation:
FAMINC = the combined income of husband and wife
If we omit a relevant variable, then the least squares estimator will generally be biased, although it will
have lower variance.
Including irrelevant variables does not cause least squares method to be biased – however variance and
therefore standard errors will be greater.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
When we omit WEDU it leads us to overstate the effect of an extra year of education for they husband by
about $2000. This change in magnitude of a coefficient is typical of the effect of incorrectly omitting a
relevant variable.
To right a general expression for this bias for the case where one explanatory variable is omitted froma
model with two explanatory variables we write the underlying model as:
i
i
i
i e
x
x
y 


 3
3
2
2
1 


Omitting x3 from the equation is equivalent to imposing the restriction β3 = 0. It can be viewed as imposing
an incorrect constraint on the parameters. This of course has the implication of a reduced variance, but
causes biased coefficient estimators. We can show (in appendix 6B) that the new estimate b2* of β3 is:








)
var(
)
,
cov(
*)
(
*)
(
2
3
2
3
2
2
2
x
x
x
b
E
b
bias 
 


We can include further variables for instance, KL6 – the number of children under the age of 6. The larger
the number of young children, the fewer the number of hours likely to be worked and hence a lower family
income would be expected.
(0.004)
(0.000)
(0.000)
(0.488)
value)
-
(p
(5004)
(1061)
(796)
)
11163
(
)
(
6
14311
4777
3211
7755







se
i
KL
WEDUi
HEDUi
FAMINC





Notice that compared to the original estimated equation that the coefficients haven’t changed considerably
for HEDU and WEDU.
This outcome occurs because KL6 is not highly correlated with the education variables. From a general
modelling perspective, it means that useful results can still be obtained when a relevant variable is
omitted if that variable is uncorrelated with the included variables and our interest is on the coefficients
of the included variables.
Irrelevant Variables:
The consequences of omitting relevant variables may lead you to think that a good strategy is to include as
many variables as possible in your model. However this will:
Omission of a relevant variable leads to omitted variable bias. The bias increases with the correlation
between the included and omitted relevant variable.
Note: if cov(xi,xj) = 0 or if β3 = 0; then
bias will be 0. i.e. will be unbiased.
Corr(KL6, HEDU) = 0.105
Corr(KL6, WEDU) = 0.129
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
1. Complicate your model
2. Inflate the variances of your estimates
To examine this, we will add two artificially generated variables X5 and X6. These variables were
constructed so that they are correlated with HEDU and WEDU, but are not expected to influence family
income.
(0.591)
(0.692)
(0.004)
(0.000)
(0.000)
(0.488)
value)
-
(p
(1982)
(2242)
(50044)
(2278)
(1250)
)
11195
(
)
(
1067
889
6
14200
5869
3340
7759 6
5









se
X
X
KL
WEDUi
HEDUi
FAMINC i
i
i





The first thing that we notice is that the p-values for the two new coefficients are much greater than 0.05.
They do indeed appear to be irrelevant variables. Also, the standard errors of the coefficients for all other
variables have increased, with p-values increasing correspondingly. The inclusion of these irrelevant
variables has reduced the precision of the estimated coefficients for other variables in the equation.
The result follows because, by the Gauss-Markov theorem, the least squares estimator of the correct model
is the minimum variance linear unbiased estimator.
A Practical Approach:
We should choose a functional form that:
1. Is consistent with what economic theory tells us about the relationship between the variables
2. Is compatible with assumptions MR1 to MR5
3. Is flexible enough to fit the data
In a multiple regression context, this mainly involves:
1. Hypothesis testing
2. Performing residual analysis
3. Assessing forecasting performance
4. Comparing information criteria
5. Using the principle of parsimony
Hypothesis Testing:
The usual t- and F-tests are available for testing simple and jpoint hypotheses concerning the coefficients.
As usual, failure to reject a null hypothesis can occur because the data are not sufficiently rich to disprove
the hypothesis. If a variable has an insignificant coefficient, it can either be (a) discarded because it is
irrelevant, or (b) retained because there are strong theoretical reasons for including it.
The adequacy of a model can also be tested using a general specification test known as RESET.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Testing for Model Misspecification: RESET
RESET (Regression Specification Error Test) is designed to detect omitted variables and incorrect
functional form.
Intuition:
Hypotheses:
H0: The functional form is correct, no omitted variables (extra terms are statistically not significantly)
H1: The functional form is incorrect, or/and there are omitted variables (extra terms are statistically
significant)
Suppose that we have specified and estimated the regression model:
i
i
i
i e
x
x
y 


 3
3
2
2
1 


The predicted of “fitted” values of yt:
3
3
2
2
1
ˆ i
i
i x
b
x
b
b
y 


There are two alternative forms for the test:
Artificial Model 1: i
i
i
i
i e
y
x
x
y 




2
1
3
3
2
2
1 ˆ




Artificial Model 2: i
i
i
i
i
i e
y
y
x
x
y 





3
2
2
1
3
3
2
2
1 ˆ
ˆ 




Example: FAMINC model:
Step 1: State hypothesis
H0: γ = 0
H1: γ ≠ 0
Step 2: Decision Rule
Reject H0 if p-value < α = 0.05
Step 3: Calculate test statistic
Ramsay RESET Test:
If the chosen model and algebraic form are correct, then squared and cubed terms of the “fitted or
predicted” values should not contain any explanatory power.
If we can significantly improve the model by artificially including powers of the predictions of the model,
then the original model must have been inadequate
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
F-calc = 0.0440
Step 4: Compare
0.0440 < 0.05 Therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that there are omitted
variables or the functional form is incorrect.
Selection of Models – Information Criteria
Akaike Information Criterion (AIC):
 Is often used in model selection for non-nested alternatives – smaller values of the AIC are preferred
N
K
N
SSE
AIC
2
ln 







The Schwarz Criterion (SC):
 Is an alternative to the AIC that imposes a larger penalty for additional coefficients
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
N
N
K
N
SSE
SC
)
ln(
ln 







Adjusted R2
:
 Penalizes for the addition of regressors which do not contribute to the explanatory power of the
model. It is sometimes used to select regressors, although the AIC and SC are superior. It does not
have the interpretation of R2
.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Collinear Economic Variables:
When data are the result of an uncontrolled experiment many of the economic variables may move together
in systematic ways.
Such variables are said to be collinear, and the problem is labelled collinearity, or multicollinearity when
several variables are involved.
Co-linearity: Moving together in a linear way
When there is collinarity, there is no guarantee that the data will be “rich in information” nor that it will be
possible to isolate the economic relationship or parameters of interest.
Consequences of collinearity:
1. One or more exact linear relationships amount the explanatory variables: exact collinearity, or exact
multicollinarity. Least squares estimator is not defined.
Multicollinearity calculation:
    y
x
x
x
b
T
T 1


From linear algebra, we know that a matrix whose rows and columns are not linearly independent
does not have an inverse, so matrix b – the multicollinearity cannot be calculated.
2. Nearly exact linear dependencies among the explanatory variables: some of the variances, standard
errors and covariances of the least squares estimators may be large.
 
 
 

 2
2
2
2
23
2
2
)
1
(
)
var(
x
x
r
b
i

For perfect collinearity:
r23 = -1 or 1 therefore (1-r23
2
) = 0
3. Large standard errors make the usual t-values small and lead to the conclusion that parameter
estimates are not significantly different from 0, ALTHOUGH high R2 or F-values indicate
“significant” explanatory power of the model as a whole.
smallvalue
b
se
b
tcalc
i
i
i



)
(

In general: Reject H0 (βi =0) if |tcalc| > |tcrit| therefore would conclude that Bi is 0.
4. Estimates may be very sensitive to the addition or deletion of a few observations, or the deletion of
an apparently insignificant variable.
5. Despite the difficulties in isolating the effects of individual variables from such a sample, accurate
forecasts may still be possible.
For Near Perfect collinearity:
r23 ≈ -1 or 1 therefore (1-r23
2
) ≈ 0
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example – Chinese Coal Production
We can detect multicollinearity by:
 Computing sample correlation coefficients between variables. A common rule
of thumb is that multicollinearity is a problem if the sample correlation between Only look at pairs
any pair of variables is greater than 0.8 or 0.9.
 Estimate auxiliary regressions (i.e. regress each explanatory variable on all the
others.) Multicollinearity is usually considered a problem if the R2
from an
auxiliary regression is greater than about 0.8.
Looks at
combinations of
variables.
Eg. x2 = 2x3 + 5x4
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Pair-wise Correlations:
Conclusion:
The pair wise correlation between some of the inputs is extremely high, such as between ln(x3) and ln(x2)
and ln(x3).
Auxillary regression on ln(x3):
Solution:
A possible solution in this case is to use non-sample information:
1. Constant returns to scale
2. Variables 4,5 & 6 all are statistically insignificant (=0)
Conduct a Wald Test:















0
0
0
1
:
6
5
4
7
2
0




i
i
H
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Mitigating the Effects of Multicollinearity:
The collinearity problem occurs because the data do not contain enough information about the effects of the
individual explanatory variables. We can include more information into the estimation process by:
 Obtaining more, and better data – not always possible in non-experimental contexts
 Introducing non-sample information into the estimation process in the form of restrictions on the
parameters.
Nonlinear Relationships:
Relationships between economic variables cannot always be adequately represented by straight lines. We
saw in Week 4 that we can add more flexibility to a regression model by considering logarithmic, reciprocal,
polynomial and various other nonlinear-in-the-variables functional forms.
 Linear in parameters, non-linear in variables
We can also use these types of functional forms in multiple regression models. In multiple regression
models, we also use models that involve interaction terms. When using these types of models some changes
in model interpretation are required.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Introductory Econometrics: ECON2300 – Dummy Variable Models
The Use of Dummy Variables in Econometric Models:
Assumption MR1 in the multiple Regresison model is:
i
iK
K
i
i e
x
x
y 



 

 ...
2
2
1 for i = 1, ..., N
1. The statistical model we assume is appropriate for all N observations in our sample
2. The parameters of the model, βk, are the same for each and every observation
3. If this assumption does not hold, and if the parameters are not the same for all the observations, then
the meaning of the least squares estimates of the parameters is not clear
There are some economic problems or questions where we might expect the parameters to be different for
different observations:
1. Everything else the same, is there a difference between male and female earnings?
2. Does studying econometrics make a difference in starting salaries of graduates?
3. Does having a pool make a difference in a house’s sale price in the Brisbane market?
4. Is there a difference in the demand for illicit drugs across race groups?
Dummy variables:
1. The simplest procedures for extending the multiple regression model to situations in which the
regression parameters are different for some or all of the observations in a sample
2. Dummy variables are explanatory variables that only take two values usually 0 and 1
3. These simple variables are a very powerful tool for capturing qualitative characteristics of
individuals, such as gender, race and geographic region of residence.
There are two main types of dummy variables:
1. Intercept Dummy Variables: parameter (coefficients) denoted - δ
2. Slope Dummy variables: parameter (coefficients) denoted – γ
Intercept Dummy Variables:
Intercept dummy variables allow the intercept to change for a subset of observations in the sample. Models
with intercept dummy variables take the form:
i
iK
K
i
i
i e
x
x
D
y 




 


 ...
2
2
1
where Di = 1 if the i-th observation has a certain characteristic and Di = 0 otherwise:
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205



















intercept
:
0
if
...
)
0
(
intercept
:
1
if
...
)
1
(
)
(
1
2
2
1
1
2
2
1











i
i
iK
K
i
i
i
iK
K
i
D
e
x
x
D
e
x
x
y
E
Note that the least squares estimator properties are not affected by the fact that one of the explanatory
variables consists only of zeros and ones – D is treated as any other explanatory variable. We can construct
an interval estimate for δ, or we can test the significance of its least squares estimate. Such a test is a
statistical test of whether the effect is “statistically significant”. If δ = 0, the variable has no effect on the
variable in question.
Example: House prices
A model that allows the intercept to vary with the presence or absence of a particular characteristic
Estimated equation:
Sqft
Pool 60
.
8
69
.
5
68
.
29
e
ĉ
Pri 


In this model the value of Pool = 0 defines the
reference group (homes with no pool). Two
equivalent model would be:
ˆ
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Log-Linear Models:
If e
SQFT
POOL
PRICE pool 


 2
1
)
ln( 


If e
SQFT
PRICE nopool 

 2
1
)
ln( 


 

 )
1
(
)
ln(
)
ln( nopool
pool PRICE
PRICE
Then: 













nopool
pool
nopool
pool
PRICE
PRICE
PRICE
PRICE
PRICE ln
)
ln(
)
ln(
)
ln(
1
1
1


















e
PRICE
PRICE
PRICE
PRICE
e
PRICE
PRICE
e
PRICE
PRICE
nopool
nopool
nopool
pool
nopool
pool
nopool
pool
And: 1


 
e
PRICE
PRICE
PRICE
nopool
nopool
pool
Thus, houses with pools are 100(eδ
-1)% more expensive than houses without pools, all other things
being equal.
Slope Dummy variables:
Slope dummy variables allow the slope to change for a subset of observations in the sample. A model that
allows β2 to vary across observations takes the form:
i
iK
K
i
i
i
i
i e
x
x
x
D
x
y 





 



 ...
3
3
2
2
2
1
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205


















x
of
slope
:
0
if
...
x
of
slope
:
1
if
...
)
(
)
(
1
2
2
2
2
1
2
2
2
2
1










i
i
iK
K
i
i
i
iK
K
i
D
e
x
x
D
e
x
x
y
E
Slope and Intercept Dummy Variables Combined:
Testing for Qualitative Effects:
Dummy variables are frequently used to measure:
1. Interactions between qualitative factors (e.g. race and gender)
2. The effects of qualitative factors having more than two categories (eg. level of schooling)
Example: WAGES
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Explaining wages as a function of individual characteristics using white males as the reference group:
e
FEMALE
BLACK
FEMALE
BLACK
EDUC
WAGE 





 )
(
1
2
1
2
1 




Only if black and female
does γ have an effect
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
To test the null hypothesis that neither race nor gender affect wages at the 1% Level of significance:
Now: Explaining wages as a function of location using workers in the northeast as the reference
group:
e
WEST
MIDWEST
SOUTH
EDUC
WAGE 




 3
2
1
2
1 




Not significant at 5%
LOS
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Not significant at 5%
LOS
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Testing the Equivalence of Two regressions:
By including an intercept dummy variable and an interaction term for every variable in a regression model,
we are allowing every coefficient in the model to differ based on the qualitative factor – we are specifying
two regressions.
A test of the equivalence of the two regressions is a test of the joint null hypothesis that all the dummy
variable coefficients are zero. We can test this null hypothesis using a standard F-test. This particular F-test
is known as a Chow test.
Explaining wage as a function of individual characteristics:
e
FEMALE
BLACK
FEMALE
BLACK
EDUC
WAGE 





 )
(
1
2
1
2
1 




To test if there are differences between the wage regressions for the south and the rest of the country we
estimate the model:
The two regression equations are:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
If south = 1
If south = 0
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
A Chow test at the 10% level of significance:
Controlling For Time:
Dummy variables are frequently used to control for:
 Seasonal effects
 Annual effects
 Regime effects (government)
Example: Emergency room cases
Data on number of emergency room cases per day is available in the file fullmoon.wk1. The model:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example – Stockton House prices
Example – Investment tax credits
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
ECONOMETRICS: ECON2300 – Lecture 7
Heteroskedasticity
If we were to guess food expenditure for a low-income household and food expenditure for a high-
income household we would be more accurate for a low-income house-hold as they have less choice
and only have a limited income which they MUST spend on food. Alternatively a high-income
household could have extravagant or simple food taste – a large variance at high income levels:
resulting in heteroskedasticity.
How can we model this phenomena?
Note that assumption MR3 says that the errors have equal variance, or equal (homo) spread
(skedasticity). An alternative and much more general assumption is:
2
)
var( i
i
e 

Heteroskedasticity is often encountered in cross-section studies, where different individuals may have
very different characteristics. It is less common in time-series studies.
Properties of the OLS Estimator:
If the errors are heteroskedastic then:
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
 OLS is still a linear and unbiased estimator. But it is inefficient in that it is no longer BLUE
– Best linear unbiased estimator
 The variances of the OLS estimators are no longer given by the formulas we discussed in
earlier lectures. Thus, confidence intervals and hypothesis tests based on these variances
are no longer valid.
There are three alternative courses of cation to deal with heteroskedasticity:
1. If in doubt, use Least Sqaures for the parameters and a standard errors formula that works
either way. (White Robust Standard Errors)
2. If heteroskedasticity is known to be present, use Generalised Least Squares (Weighted Least
Squares) – BLUE if variance is known
3. Test for Heteroskedasticity: (Goldfeld-Quant test or White’s General Test or Breusch- Pagan
Test)
a. If present, use feasible Generalised Least Squares (if variance unknown and must be
estimated)
b. If no evidence use least squares as it is BLUE
White’s Approximate Estimator for the Variances of the Least Sqaures Estimator under
Heteroskedasticity:
Whites estimator:
a) Is strictly appropriate only in large samples
b) If errors are homoskedastic, it converges to the least squares formula
The variances of the OLS estimators depend on σi
2
rather than σ2
. In the case of the simple linear
model:
i
i
i e
x
y 

 2
1 

The variance of b2 is given by:
 
 
 










N
i
i
i
N
i
i
N
i
i
i
w
x
x
x
x
b
1
2
2
1
2
1
2
2
2
)
)
)
(
r̂
va 

If we replace 2
i
 with 2
ˆi
e we obtain White’s heteroskedasticity consistent estimator
White’s Robust Standard errors give the same coefficients but a reduced standard error.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
What would happen if we always compute the standard errors (and therefore t-ratios) using White’s
formula instead of the traditional Least Squares?
This is known as Heteroskedasticity-Robust Inference, and it is used by many applied economists.
Robust estimation is a “branch” of econometrics.
When the true variance is homoskedastic and the sample is large, Whites formula converges
approximately to:
N
SSE

2
̂
The Generalised Least Squares (Weighted Least Squares):
1. Under heteroskedasticity the least squares estimator is not the best linear unbiased estimator
2. One way of overcoming this dilemma is to change or transform our statistical model into one
with homoskedastic errors and then use Least squares
3. Leaving the basic structure of the model intact, it is possible to turn the heteroskedastic error
model into a homosedastic error model.
If 2
i
 is known then we can weight the original data (including the constant term) and then perform
OLS on the transformed model. The transformed model is:
*
*
*
2
2
*
1
1
*
2
2
1
...
...
1
i
iK
K
i
i
i
i
i
i
iK
K
i
i
i
i
i
e
x
x
x
y
or
e
x
x
y





















The transformed model satisfies all the assumptions of the multiple regression model (including
homoskedasticity). Thus, applying OLS to the transformed model yields best linear unbiased
estimates. The estimator is known as Generalised Least Squares (GLS) or Weighted Least Squares
(WLS).
Sometimes 2
i
 is only known up to a factor of proportionality. In this case, we can still transform the
original model in such a way that the transformed errors are homoskedastic. Some popular
heteroskedastic specifications:
ij
ij
ij
i
ij
ij
ij
i
x
x
x
x
x
x
by
divide
therefore,
)
(
by
divide
therefore,
)
(
2
2
2
2
2
2
2












If our assumptions about the form of heteroskedasticity are incorrect then GLS will yield biased
estimates.
For: ij
ij
i x
x by
divide
therefore,
2
2
2

 

2
2
2
2
* 1
)
var(
1
var
)
var( t
t
t
t
t
t
i x
x
e
x
x
e
e 











For: ij
ij
i x
x by
divide
therefore,
2
2

 

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
t
t
t
t
t
t
i x
x
e
x
x
e
e 2
* 1
)
var(
1
var
)
var( 











Feasible Generalised Least Squares:
If we reject the null hypotheses of homoskedasticity, we might wish to use an estimation technique for
the coefficients and the standard errors that accounts for heteroskedasticity.
We have already shown that if we “weight” the original data by some appropriate value we can
achieve a transformed model with homoskedastic errors that can be estimated by Ordinary Least
Squares (OLS).
We also note that the task of finding na appropriate weight in a multiple regression model is more
complicated as we might have several variables that are potentially an option.
Feasible Generalised Least Squares is based on the idea that we should use all the information
available, therefore, we will construct a suitable weight that is a function of all the explanatory
variables in the original model.
If 2
i
 is unkown then it must be estimated. The resulting estimator is known as Feasible Generalised
Least Squares (GLS) a popular specification:
)
...
exp( 2
2
1
2
iS
S
i
i z
z 


 



In this case, we estimate the model:
i
iS
S
i
i
i
i v
z
z
v
e 





 


 ...
)
ln(
)
ˆ
ln( 2
2
1
2
2
And then use the variance estimator:
)
ˆ
...
ˆ
ˆ
exp(
)
ˆ
( 2
2
1
2
iS
S
i
i z
z 


 



The aim is to produce a prediction 2
i
 , based on the model and then use it to weight the original
model.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models
Econometrics Lecture 1: Intro to Models

More Related Content

Similar to Econometrics Lecture 1: Intro to Models

Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataArthur Charpentier
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminarTejas Jagtap
 
Advanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptxAdvanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptxakashayosha
 
Statistics
StatisticsStatistics
Statisticspikuoec
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regressionjamuga gitulho
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
 
Statistics for Managers notes.pdf
Statistics for Managers notes.pdfStatistics for Managers notes.pdf
Statistics for Managers notes.pdfVelujv
 
!!!!!!!!!!!!!!!!!!!!!!!!Optimal combinationofrealizedvolatilityestimators!!!!...
!!!!!!!!!!!!!!!!!!!!!!!!Optimal combinationofrealizedvolatilityestimators!!!!...!!!!!!!!!!!!!!!!!!!!!!!!Optimal combinationofrealizedvolatilityestimators!!!!...
!!!!!!!!!!!!!!!!!!!!!!!!Optimal combinationofrealizedvolatilityestimators!!!!...pace130557
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data sciencepujashri1975
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfkobra22
 
Statics for management
Statics for managementStatics for management
Statics for managementparth06
 
Modeling & Simulation Lecture Notes
Modeling & Simulation Lecture NotesModeling & Simulation Lecture Notes
Modeling & Simulation Lecture NotesFellowBuddy.com
 
Master of Computer Application (MCA) – Semester 4 MC0079
Master of Computer Application (MCA) – Semester 4  MC0079Master of Computer Application (MCA) – Semester 4  MC0079
Master of Computer Application (MCA) – Semester 4 MC0079Aravind NC
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 

Similar to Econometrics Lecture 1: Intro to Models (20)

Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminar
 
Advanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptxAdvanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptx
 
Statistics
StatisticsStatistics
Statistics
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
Statistics for Managers notes.pdf
Statistics for Managers notes.pdfStatistics for Managers notes.pdf
Statistics for Managers notes.pdf
 
!!!!!!!!!!!!!!!!!!!!!!!!Optimal combinationofrealizedvolatilityestimators!!!!...
!!!!!!!!!!!!!!!!!!!!!!!!Optimal combinationofrealizedvolatilityestimators!!!!...!!!!!!!!!!!!!!!!!!!!!!!!Optimal combinationofrealizedvolatilityestimators!!!!...
!!!!!!!!!!!!!!!!!!!!!!!!Optimal combinationofrealizedvolatilityestimators!!!!...
 
Principles of Econometrics
Principles of Econometrics Principles of Econometrics
Principles of Econometrics
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
 
Statics for management
Statics for managementStatics for management
Statics for management
 
Modeling & Simulation Lecture Notes
Modeling & Simulation Lecture NotesModeling & Simulation Lecture Notes
Modeling & Simulation Lecture Notes
 
Crowdfunding
CrowdfundingCrowdfunding
Crowdfunding
 
MModule 1 ppt.pptx
MModule 1 ppt.pptxMModule 1 ppt.pptx
MModule 1 ppt.pptx
 
Master of Computer Application (MCA) – Semester 4 MC0079
Master of Computer Application (MCA) – Semester 4  MC0079Master of Computer Application (MCA) – Semester 4  MC0079
Master of Computer Application (MCA) – Semester 4 MC0079
 
Notes1
Notes1Notes1
Notes1
 
Data science
Data scienceData science
Data science
 
Econometrics
EconometricsEconometrics
Econometrics
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 

More from MrDampha

Doctor of Management in Philosophy presentation
Doctor of Management in Philosophy presentationDoctor of Management in Philosophy presentation
Doctor of Management in Philosophy presentationMrDampha
 
Philosophy of Management powerpoint presentationpptx
Philosophy of Management powerpoint presentationpptxPhilosophy of Management powerpoint presentationpptx
Philosophy of Management powerpoint presentationpptxMrDampha
 
Lecture Five Philisophy - Positivist .pptx
Lecture Five Philisophy - Positivist .pptxLecture Five Philisophy - Positivist .pptx
Lecture Five Philisophy - Positivist .pptxMrDampha
 
PROPOSAL WRITING TRAINING. Presentationt
PROPOSAL WRITING TRAINING. PresentationtPROPOSAL WRITING TRAINING. Presentationt
PROPOSAL WRITING TRAINING. PresentationtMrDampha
 
CAPITAL MARKETS AND EFFICIENCY GROUP ASSIGNMENT.pptx
CAPITAL MARKETS AND EFFICIENCY GROUP ASSIGNMENT.pptxCAPITAL MARKETS AND EFFICIENCY GROUP ASSIGNMENT.pptx
CAPITAL MARKETS AND EFFICIENCY GROUP ASSIGNMENT.pptxMrDampha
 
AGO-Presentation_powerpoint presentation
AGO-Presentation_powerpoint presentationAGO-Presentation_powerpoint presentation
AGO-Presentation_powerpoint presentationMrDampha
 
Why do we ask questions.pptx
Why do we ask questions.pptxWhy do we ask questions.pptx
Why do we ask questions.pptxMrDampha
 
Scales of measurement.pdf
Scales of measurement.pdfScales of measurement.pdf
Scales of measurement.pdfMrDampha
 
CHAPTER 9 PPT.ppt
CHAPTER 9 PPT.pptCHAPTER 9 PPT.ppt
CHAPTER 9 PPT.pptMrDampha
 
CHAPTER 2 PPT.ppt
CHAPTER 2 PPT.pptCHAPTER 2 PPT.ppt
CHAPTER 2 PPT.pptMrDampha
 
Accounting for Managers.pdf
Accounting for Managers.pdfAccounting for Managers.pdf
Accounting for Managers.pdfMrDampha
 

More from MrDampha (11)

Doctor of Management in Philosophy presentation
Doctor of Management in Philosophy presentationDoctor of Management in Philosophy presentation
Doctor of Management in Philosophy presentation
 
Philosophy of Management powerpoint presentationpptx
Philosophy of Management powerpoint presentationpptxPhilosophy of Management powerpoint presentationpptx
Philosophy of Management powerpoint presentationpptx
 
Lecture Five Philisophy - Positivist .pptx
Lecture Five Philisophy - Positivist .pptxLecture Five Philisophy - Positivist .pptx
Lecture Five Philisophy - Positivist .pptx
 
PROPOSAL WRITING TRAINING. Presentationt
PROPOSAL WRITING TRAINING. PresentationtPROPOSAL WRITING TRAINING. Presentationt
PROPOSAL WRITING TRAINING. Presentationt
 
CAPITAL MARKETS AND EFFICIENCY GROUP ASSIGNMENT.pptx
CAPITAL MARKETS AND EFFICIENCY GROUP ASSIGNMENT.pptxCAPITAL MARKETS AND EFFICIENCY GROUP ASSIGNMENT.pptx
CAPITAL MARKETS AND EFFICIENCY GROUP ASSIGNMENT.pptx
 
AGO-Presentation_powerpoint presentation
AGO-Presentation_powerpoint presentationAGO-Presentation_powerpoint presentation
AGO-Presentation_powerpoint presentation
 
Why do we ask questions.pptx
Why do we ask questions.pptxWhy do we ask questions.pptx
Why do we ask questions.pptx
 
Scales of measurement.pdf
Scales of measurement.pdfScales of measurement.pdf
Scales of measurement.pdf
 
CHAPTER 9 PPT.ppt
CHAPTER 9 PPT.pptCHAPTER 9 PPT.ppt
CHAPTER 9 PPT.ppt
 
CHAPTER 2 PPT.ppt
CHAPTER 2 PPT.pptCHAPTER 2 PPT.ppt
CHAPTER 2 PPT.ppt
 
Accounting for Managers.pdf
Accounting for Managers.pdfAccounting for Managers.pdf
Accounting for Managers.pdf
 

Recently uploaded

Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证jdkhjh
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithAdamYassin2
 
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...First NO1 World Amil baba in Faisalabad
 
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...amilabibi1
 
The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasCherylouCamus
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppmiss dipika
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantagesjayjaymabutot13
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfshaunmashale756
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...Amil Baba Dawood bangali
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companiesprashantbhati354
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证rjrjkk
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办fqiuho152
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Commonwealth
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Sonam Pathan
 
Current Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxCurrent Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxuzma244191
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 

Recently uploaded (20)

Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results Presentation
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
 
Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam Smith
 
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
 
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
 
The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng Pilipinas
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsApp
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantages
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdf
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companies
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
 
Current Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxCurrent Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptx
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 

Econometrics Lecture 1: Intro to Models

  • 1. Econometrics: ECON2300 – Lecture 1 The Econometric Model: Econometrics is about how we can use theory and data from economics, business and the social sciences, along with tools from statistics, to answer “how much” type questions. In economics we express our ideas about relationships between economics variables using the mathematical concept of a function. An example of this is when expressing the price of a house in terms of its size. Price = f(size) Hedonic Model: A model that decomposes the item being researched into its constituent characteristics, and obtains estimates of the contributory value of each characteristic An example of a hedonic model for house price might be expressed as: ) , , , , , , ( Price oning airconditi pool age stories bathrooms bedrooms size f  Economic theory does not claim to be able to predict the specific behaviour of any individual or firm, but rather is describes the average or systematic behaviour of many individuals for firms. Economic models = Generalisation In fact we realise that there will be a random and unpredictable component e that we will call random error. Hence the econometric model for price would be e oning airconditi pool age stories bathrooms bedrooms size f   ) , , , , , , ( Price The random error e, accounts for the many factors that affect sales that we have omitted from this simplistic model, and it also reflects the intrinsic uncertainty in economic activity. Take for example the demand relation: i p p p i p p p f q c s c s d 5 4 3 2 1 ) , , , (            The corresponding econometric model is: e i p p p i p p p f q c s c s d        5 4 3 2 1 ) , , , (      Econometric Models include the error term, e Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 2. In every model there are two parts: 1. A systematic portion – part we obtain from economic theory, includes assumptions about the functional form. 2. An unobservable random component – “noise” component which obscures our understanding of the relationship among variables: e. How Do we Obtain Data? In an ideal world: 1. We would design an experiment to obtain economic observations or sample information 2. Repeating the experiment N times would create a sample of N sample observations In the real world: Economists work in a complex world in which data on variables are “observed” and rarely obtained from a controlled experiment. It is just not feasible to conduct an experiment to obtain data. Thus we use non-experimental data generated by an uncontrolled experiment. Experimental data: Variables can be fixed at specific values in repeated trials of the experiment Non-experimental data: Values are neither fixed nor repeatable Most economic, financial or accounting data are collected for administrative rather than research purposes, often by government agencies or industry. The data may be:  Time-series form – data collected over discrete intervals of time (stock market index, CPI, GDP, interest rates, the annual price of wheat in Australia from 1880 to 2009)  Cross-sectional form – data collected over sample units in a particular time period (income in suburbs in Brisbane during 2009, or household census)  Panel data form – data that follow individual microunits over time (data for 30 countries for the period 1980-2005, monthly value of 3 stock market indices over the last 5 years) Data may be collected at various level of aggregation:  Micro – data collected on individual economic decision-making units units such as individuals, households, or firms  Macro – data resulting from a pooling or aggregating over individuals, households, or firms at the local, state, or national levels Data collected may also represent flow or a stock:  Flow – outcome measures over a period of time, such as the consumption of petrol during the last quarter of 2005  Stock – outcome measured at a particular point in time, such as the quantity of crude oil held by BHP in its Australian storage tanks on April 1, 2002, or the asset value of Macquarie Bank on 5th July 2009. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 3. Data collected may be quantitative or qualitative:  Quantitative – numerical data, data that can be expressed as numbers or some transformation of them such as real prices or per capital income  Qualitative – outcomes that of an “either-or” situation that is whether an attribute is present or not. Eg. Colour, or whether a consumer purchased a certain good or not (Dummy variables) Statistical Inference: The aim of statistics is to “infer” or learn something about the real world by analysing a sample of data. The ways which statistical inference are carried out include:  Estimating economic parameters, such as elasticities  Predicting economic outcomes, such as the enrolments in bachelor degree programs in Australia for the next 5 years.  Testing economic hypotheses, such as: Ii newspaper advertising better than “email” advertising for increasing sales? Econometrics includes all of these aspects of statistical inference. There are two types of inference: 1. Deductive: go from a general case  to  a specific case: this is used in mathematical proofs 2. Inferential: go from a specific case  to  a general case: this is used in statistics Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 4. Review of Statistic Concepts: Random variables: Discrete and Continuous Random variable: A random variable is a variable whose value is unknown until it is observed, it is not perfectly predictable. The value of the random variable results from an experiment (controlled or uncontrolled). Uppercase letters (e.g. X) are usually used to denote random variables. Lower case letters (e.g. x) are usually used to denote values of random variables. Discrete random variable: A discrete random variable can take only a finite number of values that can be counted by using the positive integers  E.g. The number of cars you own, your age in whole years, etc.  Dummy variables:       female) isnot person If female is person If 0 1 D Probability distribution of a discrete random variable: A discrete random variable has a probability density function which summarises all the possible values of a discrete random variable together with their associated probabilities. It can be in the form of a table, formula or graph. Two key features of a probability distribution are its centre (location) and width (dispersion); the mean, μ, and variance, σ2 , respectively. For a discrete random variable X: Mean:      ) ( ) ( x X P x X E  Variance:             ) ( ) ( ) ( ) ( 2 2 2 x X P x X E X Var    It can be seen in the graph above that there are only distinct values that the variable x can take which is what a discrete variable is – the probability density function is NOT continuous. Discrete probability distributions are: 1. Mutually exclusive – no overlap between values 2. Collectly exhausted – full sample space covered, includes every possibility frequency Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 5. Example: A 5 sided dice is biased: the sides have 0, 1, 2, 3 & 4 respectively the following table shows the probability distribution. a) Calculate the mean & variance of X b) Sketch the probability distribution of X c) Find P(X 2  ) X 0 1 2 3 4 P(X) 0.10 0.45 0.30 0.10 0.05 Solution: a) i) Mean: 55 . 1 05 . 0 4 10 . 0 3 30 . 0 2 45 . 0 1 10 . 0 0 ) 4 ( 4 ) 3 ( 3 ) 2 ( 2 ) 1 ( 1 ) 0 ( 0 ) (                               X P X P X P X P X P x X P X  ii) Variance:   9475 . 0 05 . 0 ) 55 . 1 4 ( 10 . 0 ) 55 . 1 3 ( 30 . 0 ) 55 . 1 2 ( 45 . 0 ) 55 . 1 1 ( 10 . 0 ) 55 . 1 0 ( ) ( ) ( 2 2 2 2 2 2 2                      x X P X   b) c) P(X 2  ) 85 . 0 40 . 0 45 . 0 10 . 0 ) 2 ( ) 1 ( ) 0 ( ) 2 (            X P X P X P X P X P(X=x) 0 1 2 3 4 0.45 0.30 0.10 0.05 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 6. Continuous random variable: A continuous random variable can take any real value (not just whole numbers or positive) generally measurable.  E.g. Your height, the temperature etc. Easy way to establish, is to pick a random number eg. 3.4314135315 and ask if the variable can take that value? If yes then it is continuous, if no it is discrete. Probability distribution of a continuous random variable: A continuous random variable has a probability density function which is a smooth non-negative function representing likely and unlikely values of the random variable. Two key features of a probability distribution are its centre (location) and width (dispersion); the mean, μ, and variance, σ2 , respectively. Let f(x) denote the pdf for a random continuous variable X. Mean:        dx x f x X E ) ( ) (  Variance:             dx x f x X E X Var ) ( ) ( ) ( ) ( 2 2 2    There are an infinite number of points in an interval of a continuous random variable, so a positive probability cannot be assigned to each point – the area of a line = 0. Therefore, for a continuous random variable, P(X= x) = 0. We can only assign probabilities to a range of values or to put it another way, we can only assign a probability that X will lie within a certain range of variables.     2 1 ) ( ) ( 2 1 x x dx x f x X x P Note that it does not matter if greater than or greater than or equal to symbols are used as the difference in negligible (the probability of a single value is 0). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 7. The Normal Distribution: The most useful continuous distribution is the normal distribution. The Normal distribution has a probability distribution function (pdf) of:                x e x f x - , 2 1 ) ( 2 2 2 ) ( 2    Important Parameters of the normal distribution: 1. μ = mean: the centre of the distribution. 2. σ2 = variance: level of dispersion 3. Properties of the normal distribution:  Symmetric about the mean  Bell shaped  The mean, μ median and mode are all the same  Used to find the probabilities of range  Probabilities of a single value = 0. E.g. P(X=3) = 0  There are an infinite number of normal distributions for each value of μ and σ  Area under the probability Density function is equal to 1 o As symmetric, each side has 0.5 area  Probability is measured by the area under the curve – the cumulative distribution function The Standardised Normal Distribution:  Variance and Standard Deviation of 1  Mean of 0  Values greater than the mean have positive Z-Values  Values less than the mean have negative Z-Values The most useful element of the normal distribution is that we can “standardise” it to the standard normal distribution of which we have tables to determine probabilities (Z values) Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 8.     X Z Example: In a given population, heights of people are normally distributed with a mean of 160cm and standard deviation of 10cm. a) What is the probability that a person is more than 163.5cm tall? b) What proportion of people have heights between 155cm and 164cm? Solution: a) b) 160cm     3632 . 0 1368 . 0 5 . 0 35 . 0 0 5 . 0 10 160 5 . 163 10 160 160 5 . 0 5 . 163 160 5 . 0 ) 5 . 163 (                         Z P Z P X P X P 160cm       3283 . 0 1368 . 0 1915 . 0 35 . 0 0 0 5 . 0 10 160 5 . 163 10 160 160 10 160 160 10 160 155 5 . 163 160 ) 160 155 ( ) 5 . 163 155 (                                         Z P Z P Z P Z P X P X P X P Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 9. The Chi-Square Distribution: The Chi-square random variables arise when standard normal random variables are squared. If Z1, Z2, ..., Zm denote m independent N(0,1) random variables, then          m i m i m m Z Z Z Z V 1 2 ) ( 2 2 ) ( 2 2 2 2 1 ~ ~    The notation 2 ) ( ~ m X V is read as: the random variable V has a chi-square distribution with m degrees of freedom. The degrees of freedom parameter m indicates the number of independent N(0,1) random variables that are squared and summed to form V. The value of m determines the entire shape of the chi- square distribution – including its mean and variance.     m V m E V E m m 2 var ) var( ) ( 2 ) ( 2 ) (       The Values of V must be non-negative, v  0, because V is formed by squaring and summing m standardised normal N(0,1) random variables. The distribution has a long tail, or is Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 10. skewed to the right (long tail to the right). As the degrees of freedom increase m gets larger and the distribution becomes more symmetric and “bell-shaped”. As m gets larger, the chi-square distribution converges to and essentially becomes the normal distribution. The student ‘t’ Distribution: A ‘t’ random variable is formed by dividing a standard normal random variable Z ~ N(0,1) by the square root of an independent chi-square random variable, V ~ χ2 m. m t m V Z t ~  The t-distributions shape is completely determined by the degrees of freedom parameter, m and the distribution is symbolised by tm. Note that the t distribution is more spreadout than the standard normal distribution and less peaked. With mean and variance: 2 ) var( 0 ) ( ) ( ) (    m m t t E m m As the number of degrees of freedom approaches infinity, the distribution approaches the standard normal. N(0,1). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 11. The F distribution: An F random variable is formed by the ratio of two independent chi-square random variables that have been divided by their degrees of freedom. If V1 ~ χ2 m1 and V2 ~ χ2 m2 and if V1 and V2 are independent, then: ) , ( ~ / / 2 1 2 2 1 1 m m F m V m V F  The F-distribution is said to have m1 numerator degrees of freedom and m2 denominator degree’s of freedom. The values of m1 and m2 determine the shape of the distribution, which in general looks like the figure below. The graph below shows the range of shapes the distribution can take for different degrees of freedom. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 12. Laws of Expectation and Variation: ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ 0 ] [ ] [ 2 2 Y Var X Var Y X Var Y E X E Y X E X Var a b aX Var b X aE b aX E X Var a aX Var X aE aX E b Var b b E                The Error Term: The error term in a regression model is a random variable. Like other random variables it is characterised by: a) A mean (or expected value) b) A variance c) A distribution (i.e. probability density function) We usually assume the random error term of an econometric model to: a) Have expected value of zero b) Have a variance which we will call σ2 Where: a and bare constants X and Y are random variables Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 13. The smaller the variance of the error term, the more efficient the model. Sampling Distributions: We can usually draw many samples of size n from a population. Each sample can be used to compute a sample statistic (eg. A sample mean) these statistics will vary from sample to sample. If we take infinitely many samples of a normally distributed random variable X in the population the sample statistic X will also be normally distributed. The probability distribution that gives all possible values of a statistic and associated probabilities is known as a sampling distribution. If Xi ~ N(μ,σ2 ) then, ) / , ( ~ 2 N N X   If the distribution of X is non-normal but n is large, then X is approximately normally distributed. The approximation is good when n 30  - this is known as the central limit theory. Central limit Theorem: If Y1, ..., YN are independent and identically distributed random variables with mean μ and variance σ2 , and   N Yi Y / , then N Y ZN     has a probability distribution that converges to the standard normal as N (0,1) as N  ∞. Estimators & Estimates: A point estimator is a rule or formula which tells us how to use a set of sample observations to estimate the value of a parameter of interest. A point estimate is the value obtained after the observations have been substituted into the formula. Desirable properties of point estimators include:  Unbiased – an estimator ˆ is an unbiased estimator of the population parameter θ if E(ˆ) = θ  Efficiency - 1 ˆ  is more efficient than 2 ˆ  if     2 1 ˆ var ˆ var     Consistency- the distribution of the estimator becomes more concentrated about the population parameter as the sample size becomes larger Note that both bias and variance approach 0 as n approaches infinity. Estimate: is a particular value for a parameter Estimator: a formula to get estimate Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 14. Examples: N Xi X   is the best linear unbiased estimator of ) (X E     N X Xi    2 2 ̂ is a biased but consistent estimator of 2 2 ) (     X E   1 ˆ 2 2     N X Xi  is an unbiased and consistent estimator of 2 2 ) (     X E Confidence Intervals: A confidence interval or interval estimate, is a range of values which contains information not only about the location of the population mean, but about the precision with which we estimate it. We can generally use the sampling distribution of an estimator to derive a confidence interval for the population parameter. In general, a 100(1-α)% confidence interval for the population mean is given by: n Z x CI      2 / Where α is the level of confidence. Prior to selecting a random sample, the probability that a CI will contain the population parameter is 100(1-α)%. Eg. If we took many samples of size n and calculated the many corresponding random 1-α = 0.95 α/2 α/2 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 15. intervals n Z x     2 / then 100(1-α)% would contain μ. After we construct a confidence interval, either it does or it does not contain the population parameter, with probabilities 1 and 0 (so we can only say we are 100(1-α)% confident that a particular confidence interval contains the parameter. General conclusion: “We can say with 100(1-α)% confidence that the population parameter is between lower bound and upper bound.” Hypothesis Testing: An hypothesis is a statement or claim about the value(s) of one or more population parameters. To test a hypothesis we 1. Identify a test statistic and find its sampling distribution when the hypothesis is true 2. Reject the hypothesis if the test statistic takes a value that is deemed unlikely 5 steps: 1. State H0 and H1 – H0 must contain an equality (    , , ) 2. State a decision rule – Reject H0 if... 3. Calculate test statistic 4. Compare, and make decision 5. Write conclusion Note: o One-tail or two tail tests can be used o Can use critical values or p-value method Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 16. Econometrics: ECON2300 – Lecture 2 An Econometric Model: For a given set of data the aim of a econometric model is to fit a regression line and then check how good it fits. In order to investigate this relationship between expenditure and income we must build an economic model and then a corresponding econometric model that forms the basis for a quantitative or empirical economic analysis. We must express mathematically which variables are dependent and independent. (In this case we can say that the weekly expenditure depends on income – y depends on x) We represent our economic model mathematically by the conditional mean: x x y E x y 2 1 | ) | (       The conditional mean ) | ( x y E is called a simple regression function as there is only one explanatory variable. The unknown regression parameters 1  and 2  are the intercept and slope respectively. dx x y dE x x y E ) | ( ) | ( 2      Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 17. For each value of x there is potentially a range of values of y – in fact each has a probability distribution. The figure above shows that the regression line passes through the mean of each distribution of expenditure at each level of income. The difference between the actual value of y and the expected value is known as the random error term. ) ( ) ( 2 1 x y y E y e        If we rearrange: e x y    2 1   Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 18. Assumptions of the Simple Linear Regression (SLR) Model: 1. The population can be represented by: e x y    2 1   2. The mean value of y, for each value of x is given by the linear regression function x x y E 2 1 ) | (     Error term: This means that the mean error term is 0. 0 ) (  e E 3. For each value of x, the values of y are distributed about their mean value, following probability distributions that all have the same variance 2 ) | var(   x y Error term: This means that the error terms are homoskedastic: constant variance. Violation of this is hetroskadastic. ) var( ) var( 2 y e    v 4. The sample values of y are all uncorrelated and have zero covariance, implying there is no linear association amoung them, 0 ) , cov(  j i y y Error term: There is no Serial Correlation. Note that this assumption can be made stronger by assuming that the random errors e are all statistically independent in which case the values of y are also statistically independent. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 19. 5. The variable x is not random and must take at least two different values. 6. (optional) The values of y are normally distributed about their mean for each value of x.     2 2 1 , ~    x N y  Error term: The values of e are normally distributed about their mean ) , 0 ( ~ 2  N e If the values of y are normally distributed, and vice versa. The Error term: If the regression parameters 1  and 2  were known then for any value of y we could calculate: ) ( ) ( 2 1 x y y E y e        However, the values of 1  and 2  are never known for certain and therefore it is impossible to calculate e. The random error e represents all factors affecting y other than x. These factors cause individual observations y to differ from the mean value: x y E 2 1 ) (     Estimating the Parameters of the Similar Linear Regression: Our problem is to estimate the location of x y E 2 1 ) (     that best represents our data. We would expect this line to be somewhere in the middle of all the data points ince it represents mean, or average behaviour. To estimate 1  and 2  we could simply draw a line through the middle of the data and then measure the slope and intercept with a ruler. The problem with this method is that different people would draw different lines – in fact there would be an infinite set of possibilities, and that it would not be accurate. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 20. The estimated regression line is given by: i i x b b y 2 1 ˆ   The least squares principle: The least squares method involves finding estimators 1  and 2  that provide the smallest sum of squared residuals:       2 2 ˆ min ˆ min i i i y y e       2 2 ) ( ) )( ( x x y y x x b i i i x b y b 2 1   We usually use a computer to calculate these values as the process would take too long and be too tedious to do by hand. Interpreting the estimates:  The value of b2 is an estimate of 2  , the amount by which y increases per unit increase in x  The value of b1 is an estimate of 1  , what y would be when x = 0 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 21. Because the least squares estimate is generated using sample data, different samples will lead to different values of b1 and b2. Therefore b1 and b2 are random variables. In this context we call b1 and b2 the least squares estimators, but when actual sample values are substituted then we obtain values of random variables which are estimates. Estimators: Formulas for estimates Estimates: Actual values given by the estimators The variances and Covariance of b1 and b2:             2 2 2 1 ) ( ) var( x x N x b i i             2 2 2 ) ( ) var( x x b i  The square roots of the estimated variances are known as standard errors.             2 2 2 1 ) ( ) , cov( x x x b b i  Summary: the variances and covariances of b1 and b2  The larger the variance in the error term, 2  , the greater the uncertainty there is in the statistical model, and the larger the variances and covariance of the least squares estimators.  The larger the sum of squares,  2   x xi , the smaller the variances of the least squares estimators and the more precisely we can estimate the unknown parameters Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 22. In a the data are bunched, the  2   x xi is smaller and we cannot estimate the line very accurately. In b the  2   x xi is larger and we can estimate the unknown parameters more precisely.  The larger the sample size N, the smaller the variances and covariance’s of the least squares estimators  The larger the term  2 i x is, the larger the variance of the least squares estimator b1 The further our data are from x = 0, the more difficult it is to interpret B1.  The absolute magnitude of the covariance increase the larger in magnitude is the sample mean x , and the covariance has a sign opposite that of x . The probability distribution of the Least Squares Estimators:  If the normality assumption about the error terms, is correct, the the least squares estimators are normally distributed.  If assumptions 1 – 5 hold, and if the sample size is sufficiently large ( 30  n ), then by the central limit theorem the least squares estimators have a distribution that approximates the normal distribution shown. The Gauss-Markov Theorem: Under the assumptions of SR1-SR5 of the linear regression model, the estimators b1 and b2 have the smallest variance of all linear and unbiased estimators 2 1 &   . They are the Best Linear Unbiased Estimators (BLUE) of 2 1 &   . To clarify what the Gauss-Markov theorem does, and does not, say: 1. The estimators b1 and b2 are “best” when compared to similar estimators, those that are linear and unbiased. The theorem does not say that b1 and b2 are the best of all possible estimators. 2. They are the “best” within their class because they have the minimum variance. When comparing two linear and unbiased estimators we always want to use the one with the smallest variance. 3. In order for the Gauss-Markov Theorem to hold, assumptions SR1-SR5 must be true. If any of these assumptions are not true, then b1 and b2 are not the best linear unbiased estimators of B1 and B2. 4. The Gauss-Markov theorem does not depend on the assumption of normality 5. In simple linear regression these are the estimators to use. 6. The theorem applies to the least squares estimators. It does not apply to the least squares estimates from a single same. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 23. Estimating the variance of the Error term: The variance of the random error ei is:     ) ( ) 0 ( ) ( ) var( 2 2 2 2 i i i i i e E e E e E e E e        Assuming that the mean error = 0 assumption is correct. The unbiased estimator of variance is: 2 ˆ ˆ 2 2    N ei  with 2 2 ) ˆ (    E Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 24. Interval Estimation: Confidence interval: ) ( . k crit k b error std t b CI    Where: bk = b1 or b2 tcrit = the critical value ) 2 , 2 / 1 (   N t  where N-2 are the degrees of freedom std. Error = is given by the regression estimation Before sampling, we can make the probability statement there is a 100(1-α)% chance that the real value lies within the interval. After sampling, we can only make a confidence interval – we are 100(1-α)% confident that the real value lies within the interval. Example: Construct a 95% confidence interval for B2 for the following equation when there are 40 observations. ) 09 . 2 ( ) 4 . 43 ( 21 . 10 4 . 83 ˆ x y   Solution: 23016 . 4 21 . 10 09 . 2 024 . 2 21 . 10 09 . 2 21 . 10 09 . 2 21 . 10 09 . 2 21 . 10 ) ( . ) 38 , 975 . 0 ( ) 2 40 , 2 / 05 . 0 1 ( ) 2 , 2 / 1 ( 2 2                      t t t b error std t b CI N crit  We can say with 95% confidence that the true value of β2 lies within the interval 5.98 to 14.44. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 25. Hypothesis Testing: We can conduct a hypothesis test on the slope of the regression line. Step 1: State Hypothesis: , , , : , , , : 1 0 c c c H c c c H k k k k k k             Step 2: Decision rule: Reject H0 if ..... Step 3: Calculate test statistic Step 4: Compare and decision Step 5: Conclusion Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 26. Example: Using 40 observations on food expenditure. ) 09 . 2 ( ) 4 . 43 ( 21 . 10 4 . 83 ˆ x y   Test whether B2 is less than or equal to 0 at the 5% level of significance. Step 1: State Hypothesis 0 : 0 : 2 1 2 0     H H Step 2: Decision Rule Reject H0 if tcalc > tcrit Step 3: Calculate test statistic 88 . 4 09 . 2 0 21 . 10 2 2 2      b Se b tcalc  Step 4: Compare and decision 4.88 > 2.024 therefore reject H0 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that the value of B2, that the increase in expenditure for a 1 unit increase in income, is not less than or equal to 0. Types of errors: H0 true H0 False Reject H0 Type 1 error = α No error DNR H0 No error Type 2 error Rejection region 024 . 2 38 , 975 . 0 ) 2 40 , 2 / 05 . 0 1 ( ) 2 , 2 / 1 (         tcalc t tcalc t tcalc t tcalc N  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 27. Econometrics: ECON2300 – Lecture 3 The least Squares Predictor: The linear regression model provides a way to predict y given any value of x. This is extremely important for forecasters; be it in politics, finance or business. Accurate predictions provide a basis for better decision making. Our first SR assumption is that our model is linear: For a given value of the explanatory variable, x0, the value of the dependent various y0 is given by the econometric model: 0 2 1 0 e x y      Where e0 is a random error. This random error has: 1. Mean: E(e0)= 0 2. Variance: var(e0) = σ2 3. Covariance: cov(e0,e1) = 0 The least squares predictor (or estimator) of y0 (given x0) is: 0 2 1 0 ˆ x b b y   To evaluate how well this predictor or estimator performs we define the forecast error, which is analogous to the least squares residual. The variance of the prediction error is: i i x b b y 2 1 ˆ   Forecast error: f Actual value: yi i ŷ i x 0 0 2 2 1 1 0 2 1 0 2 1 0 0 ) ( ) ( ) ( ˆ e x b b e x x b b y y f                 Now: if we apply the assumptions SR1 to SR5: 0 0 0 0 ) ( ) ( ) ( ) ˆ ( ) ( 0 0 2 2 1 1 0 0            e E x b E b E y y E f E   As: 0 ) ( & ) ( & ) ( 0 2 2 1 1    e E b E b E   Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 28.                 2 2 0 2 0 ) ( ) ( 1 1 ) ˆ var( ) var( x x x x N y y f i  If SR6 holds, or the sample size is large enough, then the prediction error is normally distributed. Note that, the further x0 is from the sample mean, the larger the variance of the prediction error.  This means that as you extrapolate more and more your predictions will be less accurate. Note the variance of the forecast error is smaller when: i) The overall uncertainty in the model is smaller, as measured by the variance of the random errors σ2 ii) The sample size N is larger iii) The variation in the explanatory variable is larger iv) The value of x0 from x is smaller The forecast error variance is estimated by replacing σ2 with its estimator: ) ( r̂ va ) ( ˆ ˆ ) ( ˆ ) ( ˆ ˆ ) ( ) ( ˆ ˆ ˆ ) ( ) ( 1 1 ˆ ) ˆ var( 2 2 0 2 2 2 2 2 0 2 2 2 2 0 2 2 2 2 2 0 2 b x x N x x x x N x x x x N x x x x N f i i i                                        i i x b b y 2 1 ˆ   i ŷ x x2 x1 Obviously: The estimate that the estimator or predictor gives at x1 will be close to the actual value as there are lots of data points that the regression is based on round x1 – it is close to the sample mean. At x2, there are no points very close that the regression was based on, so the prediction will be less accurate aka will have a larger variance. i.e. We can do a better job of predicting in the region where we have more sample information. The standard error of the forecast: ) ( r̂ va ) ( f f se  Hence, we can construct a (1-α)x100% prediction interval for y0: ) ( ˆ0 0 f se t y y crit   Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 29. Example: Calculate a 95% confidence interval for y when x0 = 20: ) ( ˆ0 0 f se t y y crit   Step 1: Linear equation From the output above we can determine a linear regression: ) 093 . 2 ( ) 41 . 43 ( 21 . 10 416 . 83 ˆ ˆ 2 1 x y x b b y     Therefore: when x = 20 616 . 287 ) 20 ( 21 . 10 416 . 83 ˆ    y Step 2: Determine se(f) ) ( r̂ va ) ( f f se  2 2 2 2 0 2 2 2 0 2 ) 0932 . 2 )( 605 . 19 20 ( 40 517 . 89 517 . 89 ) ( r̂ va ) ( ˆ ˆ ) ( ) ( 1 1 ˆ ) ( r̂ va                       b x x N x x x x N f i    S.E of regression Sample size (N) x-value mean of x se(b2) Note: Var(b2) = se(b2)2  ) ( r̂ va f 8214.34 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 30. Step 3: Confidence interval 34 . 8214 024 . 2 616 . 287 34 . 8214 616 . 287 ) ( r̂ va 616 . 287 ) ( ˆ 2 40 , 975 . 0 ) 2 , 2 / 1 ( 0 0              t f t f se t y y N crit  06 . 471 17 . 104   y Therefore we can say with 95% confidence that the true expenditure on food will be between $104.17 and $471.06. Transforming x to obtain se(f): A simple way to obtain the prediction and prediction interval estimates with EViews ( or any other econometrics package, including Excel) is as follows: 1. Transform the independent variable x by subtracting x0 from each of the values. Generate a new variable: Genr  x2 = x – x0 2. The estimate the regression model by running a regression analysis 3. The estimated standard error of the forecast is given by: 2 1 ˆ ) var( ) (    b f se Example: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 31. The transformation has the following effect: Measuring Goodness-of-Fit: Two major reasons for analysing the model e x y    2 1   1. To explain how the dependent variable (yi) changes as the independent variable (xi) changes 2. To predict y0 given an x0 These two objectives come under the broad headings of estimation and prediction. Closely allied with the prediction problem discussed in the previous section is the desire to use xi to explain as much of the variable in the dependent variable yi as possible. i i x b b y 2 1 ˆ   Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 32. SST = total sum of squares – measure of total variation in the dependent variable about its sample mean SSR = regression sum of squares – the part that is explained by the regression SSE = sum of squared errors – that part of the total variation that is unexplained Coefficient of determination: R2 The coefficient of determination measures the proportion of the variation in the dependent variable that is explained by the regression model: SST SSE SST SSR R    1 2 0 < R2 < 1 If R2 =1 the data fall exactly on the fitted least squares regression line and we have a perfect fit. If the sample data for y and x are uncorrelated and show no linear association, then the least squares fitted line is “horizontal” so SSR = 0 and R2 = 0. For a simple regression model, R2 can also be computed as the square of the correlation coefficient between yi and i ŷ .  R2 = 1: All the sample data falls exactly on the fitted least squares line, SSE = 0  R2 = 0: The sample data for y and x are uncorrelated, the least squares fitted line is horizontal and equal to the mean of y so that SSR = 0 Note: 1. R2 is a descriptive measure 2. By itself, it does NOT measure the quality of the regression model 3. It is NOT the objective of regression analysis to find the model with the highest R2 4. By adding more variables R2 will automatically increase even if the variables have no economic justification this is why we use adjusted R2 in multiple regression analysis(will expand on this when we study multiple regression): i ŷ x i x y SST = y yi  SSE i i i y y e ˆ ˆ   = unexplained SSR y yi  ˆ = explained component Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 33. ) 1 /( ) /( 1 2     N SSR K N SSE R Example: For the same data as before: The Effects of Scaling the Data: Data we obtain is not always in a convenient form for presentation in a table or use in a regression analysis. When the scale of the data is not convenient, it can be altered without changing any of the real underlying relationships between variables. If we scale x by 1/c: e c x c y e x y        ) / ( ) ( 2 1 2 1     If we scale y by 1/c: c e x c c c y e x y / ) ( ) / ( / / 2 1 2 1            Example if we now report income in $100 units. Because b2 = 10.21 and x = 200, After scaling: b2 = 0.1021 and x = 2 This has no change in the model. Choosing a Functional Form: So far we have assumed that the mean household food expenditure is a linear function of household income. That is, we assumed the underlying economic relationship to be x y E 2 1 ) (     , which implies that there is a linear, straight-line relationship between E(y) and x. When the scale of x is altered, the standard error of the regression coefficient changes by the same multiplicative factor as the coefficient, so that their ratio, the t-statistic, is unaffected. All other regression statistics are unchanged. Because the error term is scaled in this process the least squares residuals will also be scaled. This will affect the standard errors of the regression coefficients, but will not affect t statistics or R2 . Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 34. In the real world this might not be the case, and this was only assumed to make the analysis easier. The starting point in all econometric analysis is economic theory. What does economics really say about the relation between food expenditure and income, holding all else constant? We expect there to be a positive relationship between these variables because food is a normal good. But nothing says the relationship must be a straight line. In fact we do not expect that as household income rises that food expenditures will continue to rise indefinitely at the same constant rate. Instead, as income rises, we expect food expenditures to rise but at a decrease rate – law of diminishing returns. The term linear in “linear regression model” 1. Does not mean a linear relationship between the economic variables. 2. Does mean that the model is “linear in the parameters” (eg. βk values – must not be raised to powers or multiplied by other parameters etc.) but is not, necessarily, “linear in the variables” (eg. X can be x2 x3 etc etc.) Linear in parameters: the parameters are not multiplied together, divided, squared, cubed etc. k k x x x f        ... ) ( 1 1 0 1. each explanatory variable in the function is multiplied by an unknown parameter, 2. there is at most one unknown parameter with no corresponding explanatory variable, and 3. all of the individual terms are summed to produce the final function value. An example of a non-linear in parameter model is: x x f 1 0 0 ) (      or 1 0 ) (   x x f   This is non-linear because the slope of this line is expressed as a product of two parameters. As a result, nonlinear least squares regression must be used to fit this model, but linear least squares cannot be used. Because of this fact, the simple linear regression model is much more flexible than it appears at first glance. By transforming the variables y and x, we can represent many curved, nonlinear relationships and still use the linear regression model. Choosing an algebraic form for the relationship means choosing transformations of the original variables. The slopes of which can be determined by taking the derivatives of the function. Note: the most important implication of transforming variables is that the regression result interpretations change. Both the slope and elasticity change from the linear relationship case. Some common function types are: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 35. A Practical Approach: 1. Plotting the data and choosing economically-plausible models 2. Testing hypotheses concerning the parameters 3. Performing residual analysis 4. Assessing forecasting performance 5. Measuring goodness-of fit (R2 ) 6. Using the principle of parsimony – simplest model Example on Food Expenditure: 1. Plotting data Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 36. 2. Testing hypothesise: All slope coefficients are significantly different from zero at the 5% level of significance. 3. Performing residual analysis: Testing for normally distributed Errors The k-th moment (from physics) of the random variable e is: k k e E ) (     Where μ denotes the mean of e. Measures of spread, symmetry and “peakedness” are: Variance: 2 2     V Skewness: 3 3 /    S Kurtosis: 4 4 /    K - Whether the tails are thicker or thinner than expected If e is normally distributed then S = 0 and K = 3. Formalising this, is the Jarque-Bera Test: The Jarque-Bera test is a test of how far measurers of residual skewness and kurtosis are from 0 and 3 (normality). To test the null hypothesis of normality of the errors, we use the test statistic:            4 ) 3 ( 6 2 2 K S N JB Where: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 37. N = sample size S = skewness K = Kurtosis When the null hypothesis is true, the Jarque-Bera statistic, JB has a χ2 distribution with 2 df. Step 1: State the hypothesis: H0: the errors are normally distributed H1:the errors are not normally distributed Step 2: Decision rule: Rejet H0 if JB > χ2 (0.95,2) = 5.991 Step 3: Calculate test statistic: 063 . 0 4 ) 3 99 . 2 ( ) 097 . 0 ( 6 40 4 ) 3 ( 6 2 2 2 2                         K S N JB Step 4: Compare and decision 0.063 < 5.991 therefore do not reject H0. Step 5: conclusion There is insufficient evidence to conclude that the errors are not normally distributed at the 5% level of significance. 4. Assessing forecasting performance Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 38. 5. Measuring goodness-of-fit: With different dependent variables: Goodness of fit with different dependent variables: The R2 from a linear model, measures how well the linear model explains the variation in y, while the R2 from a log-linear model measures how well that model explains the variation in ln(y). The two measures should NOT be compared. To compare goodness-of-fit in models with different dependent variables, we can compute the generalised R2 .   2 ˆ , 2 2 ) ˆ , ( y y g r y y corr R   We can’t compare R2 as each has different dependent variable. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 39. 6. Using the principle of parsimony – Use the simplest model The principle of parsimony states that you should use the simplest model if two models appear to be of equal forecasting ability. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 40. Econometrics: ECON2300 – Lecture 4 Multiple Regression A: The simple regression model we have studied so far relates the dependent variable y to only ONE explanatory variable x. When we turn an economics model with more than one explanatory variable into its corresponding statistical model, we refer to it as a multiple regression model. Changes and Extensions from the simple regression model: 1. Interpretation of the β parameters: The population regression line is: ik k i iK i x x x x yi E        ... ) ,..., | ( 2 2 1 2 The k-th slope coefficient measures the effect of a change in the variable xk, upon the expected value of y, all other variables held constant. Mathematically: ik ik i i sconstnat allotherx iK i x x x y E xiK x x yi E      ) ,..., | ( ) ,..., | ( 2 ' 2 Note: the x’s start at 2 as 1 refers to the intercept term (which has no slope). 2. The assumption concerning the characteristics of the explanatory (x) variables The assumptions of the multiple regression model are: MR1: i K i K i i e x x y         ... 2 2 1 , where: i = 1,..., N - The model is linear in parameters but may be non-linear in the variables MR2: 0 ) E(e : with synonomous is which ... ) ( i 2 2 1       K i K i i x x y E    - The expected (average) value of yi depends on the values of the explanatory variables and the unknown parameters. MR3: 2 ) var( ) var(    i i e y - the error terms are homoskedastic (have constant variance) MR4: cov(yi,yj) = cov(ei,ej) = 0 – There is no serial correlation Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 41. MR5: The values of each xik are not random and are not exact linear functions of the other explanatory variables MR6: (optional) ) , 0 ( ~ ] ), ... [( ~ 2 2 2 2 1      N e x x N y i iK K i i     3. The degrees of freedom for the t-distribution We will go into further detail of this further in the summary. Least Squares Estimation: The fitted regression line for the multiple regression model is: iK k i i x b x b b y     ... ˆ 2 2 1 The least squares residual is: iK k i i i i x b x b b yi y y e        ... ˆ ˆ 2 2 1 Similarly to the simple linear regression, the unknown parameters β1,...,βK are obtained by minimising the residual sum of squares:                  N i iK k i N i i i N i i x b x b b yi y y e 1 2 2 2 1 1 2 1 ... ˆ ˆ Solving the first-order conditions for a minimum yields messy expressions for the ordinary least squares estimators, even when K is small. For example when K = 3 the OLS method gives: In practise we use matrix algebra to solve these systems: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 42. To understand graphically what a multiple regression model embodies look at the image below: The equation forms a surface or plane which describes the position of the variable. Example: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 43. The model is given by: (0.683) (1.096) (6.352) ) ( ) ( 862583 . 1 ) ( 907854 . 7 9136 . 118 ˆ se ADVERT PRICE S    Interpretation of the coefficients: b2: The number of sales is expected to fall by $7907 units when the price increases by $1 holding the amount of advertising constant. b1: The number of sales is expected to increase by $1863 units when the advertising increase by $1 holding the price constant. Properties Of The OLS Estimators: (OLS = Ordinary Least Squares) The Gauss-Markov Theorem says that: If MR1 to MR5 are correct, the OLS estimators b1,...,bK have the smallest variance of all linear and unbiased estimators of β1, ...,βK – they are the Best Linear and Unbiased Estimators (BLUE). Remember that the Gauss-Markov theorem does not depend on the assumption of normality (MR6). However, if MR6 does hold, then the OLS estimators are also normally distributed. Again with larger values of K, the formula’s for variances of the OLS estimators are messy. For example, when K = 3, we can show that: Where r23 is the sample correlation coefficient between x2 and x3. -1 < r < 1 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 44. The variances and covariances are often presented in the form of a covariance matrix. For K = 3, this matrix takes the form: In practise however, σ2 , the population variance is unknown. So instead we use an unbiased estimator of the error variance: K N e K N y y N i i N i i i          1 2 2 1 2 ˆ ) ˆ ( ̂ The estimated variances and covariances of the OLS estimators are obtained by replacing within the appropriate formulas. The square roots of the estimated variances are still known as standard errors. It is important to understand the factors affecting the variance of bi (i = 2,...,K): Inferences in the Multiple Regression Model: If the assumptions MR1 – MR6 hold, we can: 1. The larger σ2 the larger the variance of the least squares estimators. 2. The larger the sample size the smaller the variances 3. More variation in an explanatory variable around its mean, leads to a smaller variance of the least squares estimator. 4. The larger the correlation between the explanatory variables, the larger is the variance of the least squares estimators. “Independent” variables ideally exhibit variation that is “independent” of the variation in other explanatory variables. 5. Variation is one explanatory variable connected to variation in another explanatory variable is knonw as Multicoliniarity (see next week). E.g. A larger correlation between x2 and x3 leads to a larger variance of b2. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 45. 1. Construct confidence intervals for each of the K parameters 2. Conduct a significance test for each of the K parameters 3. Conduct a hypothesis test on any of the parameters or combinations of parameters The approach is that followed for the simple regression model in weeks 2 and 3 for the parameters of the simple regression model. 1. Confidence interval: A 100(1-α)% confidence interval for βk is given by: ) ( k crit k k b se t b    for k = 1, ..., K Where: K = Number of βi parameters . e.g. 3 3 2 2 1 ˆ i i i x b x b b y    : K = 3 ) , 2 / 1 ( K N t tcrit     Se = the standard error given in the regression output of the bi estimate Example: construct a 95% confidence interval for the coefficient of advertising for the following model which was based on N = 75 observations on hamburger sales. (0.683) (1.096) (6.352) ) ( ) ( 862583 . 1 ) ( 907854 . 7 9136 . 118 ˆ se ADVERT PRICE S    Solution: ) 683 . 0 ( 993 . 1 863 . 1 ) ( ) ( ) ( 3 3 ) 72 , 975 . 0 ( 3 3 3 ) 3 75 , 2 / 05 . 0 1 ( 3 3 ) , 2 / 1 (                  b se t b b se t b b se t b k K N k k 2. Hypotheses Testing 2.1.A simple null hypothesis is a null hypothesis with a single restriction on one or more parameters. Under MR1 to MR6, we can test the null hypothesis H0: βk = c using the t-statistic: ) ( ~ ) ( K N k k t b se c b t    Even if MR6 doesn’t hold, the test is still valid provided the sample size is large. Example: Test whether revenue is related to price at the 5% level of significance when N = 75. (0.683) (1.096) (6.352) ) ( ) ( 862583 . 1 ) ( 907854 . 7 9136 . 118 ˆ se ADVERT PRICE S    Solution: 224 . 3 502 . 0 3    We can say with 95% confidence that the true change in sales for a one dollar increase in advertising is between $502 and $3224. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 46. Step 1: State Hypotheses 0 : 0 : 2 1 2 0     H H Step 2: Decision Rule Reject H0 if |tcalc| > tcrit Step 3: Calculate Test Statistic 215 . 7 096 . 1 0 908 . 7 ) ( ) ( 2 2 2          b se b b se b t k k k calc   Step 4: Compare and Decision |-7.215| > 1.993 therefore reject H0 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that the price does not have no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as effect on revenue. 2.2.Testing of a null hypothesis consisting of two or more hypotheses about the parameters in the multiple regression model. F- Tests Used in: 1. Overall significance of the Model 2. Testing economic hypotheses involving more than one parameter in the model 3. Misspecification Tests 4. Testing for Heteroskedasticity 5. Testing for Serial correlation Note: We adopt assumptions MR1-MR6 (i.e. including normality). If the errors are not normal, then the results presented will hold approximately if the sample is large. 993 . 1 | | | | | | ) 72 , 975 . 0 ( ) , 2 / 1 (      tcalc t tcalc t tcalc K N  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 47. A Familiar Form of the F-test: From ECON1320 we saw that we could express F as: ) /( ) 1 /( ) ( ) /( ) ( ) 1 /( ) ( K N SSE K SSE SST K N SSE K SSR F        However, this is just a particular example of a more general F-statistic that can be used to test sets of joint hypotheses. The general F-test: A joint null hypothesis is a null hypothesis with two or more restrictions on two or more parameters. Under MR1 to MR6, we can test a joint null hypothesis concerning the using the F statistic: K N J U U R F K N SSE J SSE SSE F     , ~ ) /( ) ( / ) ( Where: J = the number of restrictions in H0 SSEU = The unrestricted sum of squared errors from the original, unrestricted multiple regression Model. SSER = The restricted sum of squared errors from a regression model in which the null hypothesis is assumed to be true Note: Even if MR6 doesn’t hold, the test is still valid provided the sample size is large (by the central limit theorem) The General F-test can be used to test 3 types of hypotheses: 1. When used to test H0: βk=0 against H1: βk ≠ 0; the F-test is equivalent to a t-test J = 1 2. When used to test: H0: β2= β3= ...= βk against H1: At least one βk ≠ 0 J = K - 1 3. The F-test can also be used to test whether some combination of parameters is collectively significant to the model K J   1 Restrictions: When we have a restriction, we assume that the null hypothesis is true, for example if the null hypotheses is 0, then we assume that the βk value in the null hypotheses is 0 in the regression equation. Instead of using the least squares estimates that minimise the sum of squared errors, we find estimates that minimise the sum of squared errors subject to parameter constraints – restrictions. This means that the sum of squared errors will increase; a constrained minimum is larger than an unconstrained minimum. The theory behind the F-test, is that if the Errors are significantly different, then the assumption that the parameter is the value assumed in the null hypothesis has significantly reduced the ability of the model to fit the data, and thus the data do not support the null hypothesis. On the other hand, if the null hypothesis is true, we expect that the data are compatible with the conditions placed on the parameters – we would expect little change in the sum of squared errors when the null hypothesis is true. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 48. 1. Testing with 1 restriction (J=1) Example: Test whether revenue is related to price at the 5% level of significance when N = 75. (0.683) (1.096) (6.352) ) ( ) ( 862583 . 1 ) ( 907854 . 7 9136 . 118 ˆ se ADVERT PRICE S    Solution: Step 1: State Hypotheses & apply restriction 0 : 0 : 2 1 2 0     H H Now, impose the restriction assuming the null is correct, ie. Price is not significant and β2 is 0 and then find the regression equation. (0.890) (1.80) ) ( ) ( 733 . 1 180 . 74 ˆ se ADVERT S   Step 2: Decision Rule Reject H0 if Fcalc > Fcrit Step 3: Calculate Test Statistic 06 . 52 ) 3 75 /( ) 943 . 1718 ( 1 / ) 943 . 1718 827 . 2961 ( ) /( ) ( / ) (        K N SSE J SSE SSE F U U R Step 4: Compare and Decision 52.06 > 3.97 therefore reject H0 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that the price does not have no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as effect on revenue. The t-test and F-test - a relationship: When conducting a two-tail test for a single parameter, either a t-test or an F-test can be used and the outcomes will be identical. In fact, the square of a t random variable with df degrees of freedom is an F random variable with distribution F(1,df) F-statistic = (t-statistic)2 F-crit = (t-crit)2 52.06 = (-7.215)2 3.97 = (1.993)2 97 . 3 ) 3 75 , 1 , 95 . 0 ( ) , , 1 (       Fcalc t Fcalc F Fcalc K N J  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 49. 2. Testing with J = K-1 restrictions: the overall significance of the model An important application of the F-test is for what is called “Testing the overall significance of a model”. Consider the general multiple regression model with (K - 1) explanatory variables and K unknown coefficients. Unrestricted model: i K K i i i i e x x x y           ... 3 3 2 2 1 To examine whether we have a viable explanatory model, we set up the following null and alternative hypotheses. Restricted model: i i e y   1  Therefore: SSER = SSTU SSEU = SSEU Step 1: State Hypotheses and calculate restricted model nonzero is the of one least at : 0 ,..., 0 , 0 : k 1 3 2 0     H H K    Estimate restricted model: ) 749 . 0 ( ) ( 375 . 77 ˆ se S  SSER = 3115.482 (=SSTU) Step 2: Decision rule Reject H0 if Fcalc > Fcrit Step 3: Calculate test statistic 248 . 29 ) 3 75 /( ) 943 . 1718 ( 2 / ) 943 . 1718 482 . 3115 ( ) /( ) ( / ) (        K N SSE J SSE SSE F U U R Step 4: Compare and decision 29.248 > 3.12 Therefore reject H0. Step 5: Conclusion There is sufficient evidence at the 5 % level of significance to conclude that at least one of the explanatory variables as an effect on sales. 3. Testing a Group of parameters (1 ≤ J < K) Consider the model: Note: the null has K – 1 hypotheses, it is referred to as a Joint hypothesis. 12 . 3 ) 3 75 , 1 3 , 95 . 0 ( ) , , 1 (        Fcalc F Fcalc F Fcalc K N J  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 50. Does advertising have an effect on sales? Step 1: State Hypotheses nonzero are both or 0 0 : 0 0 : 4 3 1 4 3 0         H H Step 2: Decision rule Reject H0 if Fcalc > Fcrit Step 3: Calculate test statistic 44 . 8 ) 4 75 /( ) 084 . 1532 ( 2 / ) 084 . 1532 391 . 1896 ( ) /( ) ( / ) (        K N SSE J SSE SSE F U U R Step 4: Compare and decision 8.44 > 3.126 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that advertising has a statistically significant effect on sales. 126 . 3 ) 4 75 , 2 , 95 . 0 ( ) , , 1 (       Fcalc F Fcalc F Fcalc K N J  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 51. Prediction: The value of y when the explanatory variables take the values x02. The prediction error (or forecast error) is 0 0 ˆ y y f   . The prediction error is a random variable with a mean and a variance. If assumptions MR1 to MR5 hold then 0 ) ˆ ( ) ( 0 0    y y E f E and ) ˆ var( ) var( 0 0 y y f   with many terms, each involving σ2 . The prediction error variance is estimated by replacing σ2 by 2 ˆ  . The square root of the estimated forecast error variance is still called the standard error of the forecast. If assumption MR6 (normality) is correct, or the sample size is large, then a 100(1-α)% confidence interval or prediction interval for y0 is: ) ( ˆ ) ( ˆ ) , 2 / 1 ( 0 0 0 0 f se t y y f se t y y K N c        Example: Construct a 95% confidence interval for the prediction of y0 when P = 5.50 and A = 1200 Solution: 2 1 ) 3 75 , 2 / 05 . 0 1 ( 0 ) , 2 / 1 ( 0 0 0 0 ˆ *) var( )) 1200 ( 863 . 1 ) 50 . 5 ( 91 . 7 91 . 118 ( ) ( ˆ ) ( ˆ                b t y f se t y y f se t y y K N c Therefore create two new variables: P * = (P – 5.50) and A* = (A – 1200) Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 52. 9429 . 4 993 . 1 66 . 77 0    y 5112 . 87 809 . 67 0   y We can therefore say with 95% confidence that the true value of sales when the price is $5.50 and the advertising expenditure is $1200, that the true value of sales lies between 67.8 thousand and 87.5112thousand. A reminder: Estimated regression models describe the relationship between the economic variables for values similar to those found in the sample data. Extrapolating the results to extreme values is generally not a good idea. Predicting the value of dependent variables for values of the explanatory variables far from the sample values invites disaster. Goodness of Fit: If the regression model constrains an intercept, we can still decompose the variation in the dependent variable (SST) into its explainable and unexplainable components (SSR and SSE). Then the coefficient of determination still measurers the proportion of the variation in the dependent variable that is explained by the regression model: SST SSE SST SSR R    1 2 The interpretation of R2 is identical to its interpretation in the simple regression model. i.e. R2 % of variation can be explained by the estimated equation. (1 implies a perfect fit) Adjusted R2: A problem with R2 is that it can be made large by adding more and more variables to the model, even when they have no economic justification. The adjusted R-squared imposes a penalty for adding more variables:           ) 1 /( ) /( 1 2 N SST K N SSE R Adjusted R-squared does not give the proportion of variation in the dependent variable that is explained by the model. It should not be used as a criterion for adding or deleting variables (if we add a variable, adjusted R-Squared will increase if the t-statistic on the new variable is greater than 1 in absolute value!) Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 53. SST = (N-1) (S.D. dependent variable)2 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 54. Econometrics: ECON2300 – Lecture 5 Multiple Regression B: Non-sample information: In many estimation problems, economic theory and experience provides us with information on the parameters that is over and above the information contained in the sample data. If this non-sample information is correct, and if we can combine it with the sample information, then we can estimate the parameters with greater precision. Some non-sample information can be written in the form of linear equality restrictions on the unknown parameters. (e.g. several parameters sum to one). We can incorporate this information into the estimation process by simply substituting the restrictions into the model. One example is when dealing with a firm which has constant returns to scale – take for example the cobb- dougles function whose parameters α and β must sum to 1 with constant returns to scale:   t t t t L K A y  We can show: when K and L increase by proportion λ, that this has the effect λ on y also with constant returns to scale.                          1 ) ( ) ( t t t t t t t t L K A y L K A y In order to incorporate the non-sample information, and impose constant returns to scale we should then estimate the following model:     1 t t t t L K A y The model is now the function of a single unknown parameter α. A technique to obtain an estimate of α in this case is known as restricted least squares - we “force” β = 1 – α To estimate the above model in practise, we can use the least squares method – as the model is linear in its parameters. We would convert the model to a log-log function as the model, Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 55. t t t t t e L K A y      ) ln( ) 1 ( ) ln( ) ln( ) ln(   To insure the restriction holds we re-arrange and collect terms: t t t t t e L K A L y    ) / ln( ) ln( ) / ln(  The restricted Least Squares Estimator: The least squares estimates we obtain after imposing the restrictions are known as restricted least squares (RLS) estimates. The RLS estimator:  Is biased unless the restrictions are EXACTLY true  Has a smaller variance than the OLS (ordinary least squares) estimator, whether or not the restrictions are true By incorporating the additional information with the data, we usually give up unbiasedness in return for reduced variances. Evidence on whether the restrictions are true can, of course, be obtained using an F-test (Wald test). Model Specification: There are several key questions you should ask yourself when specifying a model: Q1. What are the important considerations when choosing a model? A1. The problem, the economic model Q2. What are the consequences of choosing the wrong model? A2. If the wrong model is used, there can be omitted and irrelevant variables in the model Q3. Are there ways of assessing whether a model is adequate? A3. Yes you can use model Diagnostics – A test of adequate functional form In examining these model specifications we will look at the following example: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 56. Omitted variables: It is possible that a chosen model may have important variables omitted. Our economic principles may have overlooked a variable, or lack of data may lead us to drop a variable even when it is prescribed by economic theory. We will consider a sample of married couples where both husbands and wives work. This sample was used by labour economist Tom Mroz in a classic paper on female labour force participation. The variables from this sample are in edu_inc.dat. We are interested in the impact of level of education, both the husband’s education and the wife’s education, on family income. Summary statistics for the data appear in table 6.2. The estimated relationship is: We estimate that an additional year of education for the husband will increase annual income by $3132, and an additional year of education for the wife will increase income by $4523. If we now incorrectly omit wife’s education from the equation: FAMINC = the combined income of husband and wife If we omit a relevant variable, then the least squares estimator will generally be biased, although it will have lower variance. Including irrelevant variables does not cause least squares method to be biased – however variance and therefore standard errors will be greater. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 57. When we omit WEDU it leads us to overstate the effect of an extra year of education for they husband by about $2000. This change in magnitude of a coefficient is typical of the effect of incorrectly omitting a relevant variable. To right a general expression for this bias for the case where one explanatory variable is omitted froma model with two explanatory variables we write the underlying model as: i i i i e x x y     3 3 2 2 1    Omitting x3 from the equation is equivalent to imposing the restriction β3 = 0. It can be viewed as imposing an incorrect constraint on the parameters. This of course has the implication of a reduced variance, but causes biased coefficient estimators. We can show (in appendix 6B) that the new estimate b2* of β3 is:         ) var( ) , cov( *) ( *) ( 2 3 2 3 2 2 2 x x x b E b bias      We can include further variables for instance, KL6 – the number of children under the age of 6. The larger the number of young children, the fewer the number of hours likely to be worked and hence a lower family income would be expected. (0.004) (0.000) (0.000) (0.488) value) - (p (5004) (1061) (796) ) 11163 ( ) ( 6 14311 4777 3211 7755        se i KL WEDUi HEDUi FAMINC      Notice that compared to the original estimated equation that the coefficients haven’t changed considerably for HEDU and WEDU. This outcome occurs because KL6 is not highly correlated with the education variables. From a general modelling perspective, it means that useful results can still be obtained when a relevant variable is omitted if that variable is uncorrelated with the included variables and our interest is on the coefficients of the included variables. Irrelevant Variables: The consequences of omitting relevant variables may lead you to think that a good strategy is to include as many variables as possible in your model. However this will: Omission of a relevant variable leads to omitted variable bias. The bias increases with the correlation between the included and omitted relevant variable. Note: if cov(xi,xj) = 0 or if β3 = 0; then bias will be 0. i.e. will be unbiased. Corr(KL6, HEDU) = 0.105 Corr(KL6, WEDU) = 0.129 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 58. 1. Complicate your model 2. Inflate the variances of your estimates To examine this, we will add two artificially generated variables X5 and X6. These variables were constructed so that they are correlated with HEDU and WEDU, but are not expected to influence family income. (0.591) (0.692) (0.004) (0.000) (0.000) (0.488) value) - (p (1982) (2242) (50044) (2278) (1250) ) 11195 ( ) ( 1067 889 6 14200 5869 3340 7759 6 5          se X X KL WEDUi HEDUi FAMINC i i i      The first thing that we notice is that the p-values for the two new coefficients are much greater than 0.05. They do indeed appear to be irrelevant variables. Also, the standard errors of the coefficients for all other variables have increased, with p-values increasing correspondingly. The inclusion of these irrelevant variables has reduced the precision of the estimated coefficients for other variables in the equation. The result follows because, by the Gauss-Markov theorem, the least squares estimator of the correct model is the minimum variance linear unbiased estimator. A Practical Approach: We should choose a functional form that: 1. Is consistent with what economic theory tells us about the relationship between the variables 2. Is compatible with assumptions MR1 to MR5 3. Is flexible enough to fit the data In a multiple regression context, this mainly involves: 1. Hypothesis testing 2. Performing residual analysis 3. Assessing forecasting performance 4. Comparing information criteria 5. Using the principle of parsimony Hypothesis Testing: The usual t- and F-tests are available for testing simple and jpoint hypotheses concerning the coefficients. As usual, failure to reject a null hypothesis can occur because the data are not sufficiently rich to disprove the hypothesis. If a variable has an insignificant coefficient, it can either be (a) discarded because it is irrelevant, or (b) retained because there are strong theoretical reasons for including it. The adequacy of a model can also be tested using a general specification test known as RESET. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 59. Testing for Model Misspecification: RESET RESET (Regression Specification Error Test) is designed to detect omitted variables and incorrect functional form. Intuition: Hypotheses: H0: The functional form is correct, no omitted variables (extra terms are statistically not significantly) H1: The functional form is incorrect, or/and there are omitted variables (extra terms are statistically significant) Suppose that we have specified and estimated the regression model: i i i i e x x y     3 3 2 2 1    The predicted of “fitted” values of yt: 3 3 2 2 1 ˆ i i i x b x b b y    There are two alternative forms for the test: Artificial Model 1: i i i i i e y x x y      2 1 3 3 2 2 1 ˆ     Artificial Model 2: i i i i i i e y y x x y       3 2 2 1 3 3 2 2 1 ˆ ˆ      Example: FAMINC model: Step 1: State hypothesis H0: γ = 0 H1: γ ≠ 0 Step 2: Decision Rule Reject H0 if p-value < α = 0.05 Step 3: Calculate test statistic Ramsay RESET Test: If the chosen model and algebraic form are correct, then squared and cubed terms of the “fitted or predicted” values should not contain any explanatory power. If we can significantly improve the model by artificially including powers of the predictions of the model, then the original model must have been inadequate Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 60. F-calc = 0.0440 Step 4: Compare 0.0440 < 0.05 Therefore reject H0 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that there are omitted variables or the functional form is incorrect. Selection of Models – Information Criteria Akaike Information Criterion (AIC):  Is often used in model selection for non-nested alternatives – smaller values of the AIC are preferred N K N SSE AIC 2 ln         The Schwarz Criterion (SC):  Is an alternative to the AIC that imposes a larger penalty for additional coefficients Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 61. N N K N SSE SC ) ln( ln         Adjusted R2 :  Penalizes for the addition of regressors which do not contribute to the explanatory power of the model. It is sometimes used to select regressors, although the AIC and SC are superior. It does not have the interpretation of R2 . Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 62. Collinear Economic Variables: When data are the result of an uncontrolled experiment many of the economic variables may move together in systematic ways. Such variables are said to be collinear, and the problem is labelled collinearity, or multicollinearity when several variables are involved. Co-linearity: Moving together in a linear way When there is collinarity, there is no guarantee that the data will be “rich in information” nor that it will be possible to isolate the economic relationship or parameters of interest. Consequences of collinearity: 1. One or more exact linear relationships amount the explanatory variables: exact collinearity, or exact multicollinarity. Least squares estimator is not defined. Multicollinearity calculation:     y x x x b T T 1   From linear algebra, we know that a matrix whose rows and columns are not linearly independent does not have an inverse, so matrix b – the multicollinearity cannot be calculated. 2. Nearly exact linear dependencies among the explanatory variables: some of the variances, standard errors and covariances of the least squares estimators may be large.         2 2 2 2 23 2 2 ) 1 ( ) var( x x r b i  For perfect collinearity: r23 = -1 or 1 therefore (1-r23 2 ) = 0 3. Large standard errors make the usual t-values small and lead to the conclusion that parameter estimates are not significantly different from 0, ALTHOUGH high R2 or F-values indicate “significant” explanatory power of the model as a whole. smallvalue b se b tcalc i i i    ) (  In general: Reject H0 (βi =0) if |tcalc| > |tcrit| therefore would conclude that Bi is 0. 4. Estimates may be very sensitive to the addition or deletion of a few observations, or the deletion of an apparently insignificant variable. 5. Despite the difficulties in isolating the effects of individual variables from such a sample, accurate forecasts may still be possible. For Near Perfect collinearity: r23 ≈ -1 or 1 therefore (1-r23 2 ) ≈ 0 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 63. Example – Chinese Coal Production We can detect multicollinearity by:  Computing sample correlation coefficients between variables. A common rule of thumb is that multicollinearity is a problem if the sample correlation between Only look at pairs any pair of variables is greater than 0.8 or 0.9.  Estimate auxiliary regressions (i.e. regress each explanatory variable on all the others.) Multicollinearity is usually considered a problem if the R2 from an auxiliary regression is greater than about 0.8. Looks at combinations of variables. Eg. x2 = 2x3 + 5x4 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 64. Pair-wise Correlations: Conclusion: The pair wise correlation between some of the inputs is extremely high, such as between ln(x3) and ln(x2) and ln(x3). Auxillary regression on ln(x3): Solution: A possible solution in this case is to use non-sample information: 1. Constant returns to scale 2. Variables 4,5 & 6 all are statistically insignificant (=0) Conduct a Wald Test:                0 0 0 1 : 6 5 4 7 2 0     i i H Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 65. Mitigating the Effects of Multicollinearity: The collinearity problem occurs because the data do not contain enough information about the effects of the individual explanatory variables. We can include more information into the estimation process by:  Obtaining more, and better data – not always possible in non-experimental contexts  Introducing non-sample information into the estimation process in the form of restrictions on the parameters. Nonlinear Relationships: Relationships between economic variables cannot always be adequately represented by straight lines. We saw in Week 4 that we can add more flexibility to a regression model by considering logarithmic, reciprocal, polynomial and various other nonlinear-in-the-variables functional forms.  Linear in parameters, non-linear in variables We can also use these types of functional forms in multiple regression models. In multiple regression models, we also use models that involve interaction terms. When using these types of models some changes in model interpretation are required. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 66. Example: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 67. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 68. Introductory Econometrics: ECON2300 – Dummy Variable Models The Use of Dummy Variables in Econometric Models: Assumption MR1 in the multiple Regresison model is: i iK K i i e x x y         ... 2 2 1 for i = 1, ..., N 1. The statistical model we assume is appropriate for all N observations in our sample 2. The parameters of the model, βk, are the same for each and every observation 3. If this assumption does not hold, and if the parameters are not the same for all the observations, then the meaning of the least squares estimates of the parameters is not clear There are some economic problems or questions where we might expect the parameters to be different for different observations: 1. Everything else the same, is there a difference between male and female earnings? 2. Does studying econometrics make a difference in starting salaries of graduates? 3. Does having a pool make a difference in a house’s sale price in the Brisbane market? 4. Is there a difference in the demand for illicit drugs across race groups? Dummy variables: 1. The simplest procedures for extending the multiple regression model to situations in which the regression parameters are different for some or all of the observations in a sample 2. Dummy variables are explanatory variables that only take two values usually 0 and 1 3. These simple variables are a very powerful tool for capturing qualitative characteristics of individuals, such as gender, race and geographic region of residence. There are two main types of dummy variables: 1. Intercept Dummy Variables: parameter (coefficients) denoted - δ 2. Slope Dummy variables: parameter (coefficients) denoted – γ Intercept Dummy Variables: Intercept dummy variables allow the intercept to change for a subset of observations in the sample. Models with intercept dummy variables take the form: i iK K i i i e x x D y           ... 2 2 1 where Di = 1 if the i-th observation has a certain characteristic and Di = 0 otherwise: Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 69.                    intercept : 0 if ... ) 0 ( intercept : 1 if ... ) 1 ( ) ( 1 2 2 1 1 2 2 1            i i iK K i i i iK K i D e x x D e x x y E Note that the least squares estimator properties are not affected by the fact that one of the explanatory variables consists only of zeros and ones – D is treated as any other explanatory variable. We can construct an interval estimate for δ, or we can test the significance of its least squares estimate. Such a test is a statistical test of whether the effect is “statistically significant”. If δ = 0, the variable has no effect on the variable in question. Example: House prices A model that allows the intercept to vary with the presence or absence of a particular characteristic Estimated equation: Sqft Pool 60 . 8 69 . 5 68 . 29 e ĉ Pri    In this model the value of Pool = 0 defines the reference group (homes with no pool). Two equivalent model would be: ˆ Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 70. Log-Linear Models: If e SQFT POOL PRICE pool     2 1 ) ln(    If e SQFT PRICE nopool    2 1 ) ln(        ) 1 ( ) ln( ) ln( nopool pool PRICE PRICE Then:               nopool pool nopool pool PRICE PRICE PRICE PRICE PRICE ln ) ln( ) ln( ) ln( 1 1 1                   e PRICE PRICE PRICE PRICE e PRICE PRICE e PRICE PRICE nopool nopool nopool pool nopool pool nopool pool And: 1     e PRICE PRICE PRICE nopool nopool pool Thus, houses with pools are 100(eδ -1)% more expensive than houses without pools, all other things being equal. Slope Dummy variables: Slope dummy variables allow the slope to change for a subset of observations in the sample. A model that allows β2 to vary across observations takes the form: i iK K i i i i i e x x x D x y             ... 3 3 2 2 2 1 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 71.                   x of slope : 0 if ... x of slope : 1 if ... ) ( ) ( 1 2 2 2 2 1 2 2 2 2 1           i i iK K i i i iK K i D e x x D e x x y E Slope and Intercept Dummy Variables Combined: Testing for Qualitative Effects: Dummy variables are frequently used to measure: 1. Interactions between qualitative factors (e.g. race and gender) 2. The effects of qualitative factors having more than two categories (eg. level of schooling) Example: WAGES Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 72. Explaining wages as a function of individual characteristics using white males as the reference group: e FEMALE BLACK FEMALE BLACK EDUC WAGE        ) ( 1 2 1 2 1      Only if black and female does γ have an effect Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 73. To test the null hypothesis that neither race nor gender affect wages at the 1% Level of significance: Now: Explaining wages as a function of location using workers in the northeast as the reference group: e WEST MIDWEST SOUTH EDUC WAGE       3 2 1 2 1      Not significant at 5% LOS Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 74. Not significant at 5% LOS Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 75. Testing the Equivalence of Two regressions: By including an intercept dummy variable and an interaction term for every variable in a regression model, we are allowing every coefficient in the model to differ based on the qualitative factor – we are specifying two regressions. A test of the equivalence of the two regressions is a test of the joint null hypothesis that all the dummy variable coefficients are zero. We can test this null hypothesis using a standard F-test. This particular F-test is known as a Chow test. Explaining wage as a function of individual characteristics: e FEMALE BLACK FEMALE BLACK EDUC WAGE        ) ( 1 2 1 2 1      To test if there are differences between the wage regressions for the south and the rest of the country we estimate the model: The two regression equations are: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 76. If south = 1 If south = 0 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 77. A Chow test at the 10% level of significance: Controlling For Time: Dummy variables are frequently used to control for:  Seasonal effects  Annual effects  Regime effects (government) Example: Emergency room cases Data on number of emergency room cases per day is available in the file fullmoon.wk1. The model: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 78. Example – Stockton House prices Example – Investment tax credits Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 79. ECONOMETRICS: ECON2300 – Lecture 7 Heteroskedasticity If we were to guess food expenditure for a low-income household and food expenditure for a high- income household we would be more accurate for a low-income house-hold as they have less choice and only have a limited income which they MUST spend on food. Alternatively a high-income household could have extravagant or simple food taste – a large variance at high income levels: resulting in heteroskedasticity. How can we model this phenomena? Note that assumption MR3 says that the errors have equal variance, or equal (homo) spread (skedasticity). An alternative and much more general assumption is: 2 ) var( i i e   Heteroskedasticity is often encountered in cross-section studies, where different individuals may have very different characteristics. It is less common in time-series studies. Properties of the OLS Estimator: If the errors are heteroskedastic then: Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 80.  OLS is still a linear and unbiased estimator. But it is inefficient in that it is no longer BLUE – Best linear unbiased estimator  The variances of the OLS estimators are no longer given by the formulas we discussed in earlier lectures. Thus, confidence intervals and hypothesis tests based on these variances are no longer valid. There are three alternative courses of cation to deal with heteroskedasticity: 1. If in doubt, use Least Sqaures for the parameters and a standard errors formula that works either way. (White Robust Standard Errors) 2. If heteroskedasticity is known to be present, use Generalised Least Squares (Weighted Least Squares) – BLUE if variance is known 3. Test for Heteroskedasticity: (Goldfeld-Quant test or White’s General Test or Breusch- Pagan Test) a. If present, use feasible Generalised Least Squares (if variance unknown and must be estimated) b. If no evidence use least squares as it is BLUE White’s Approximate Estimator for the Variances of the Least Sqaures Estimator under Heteroskedasticity: Whites estimator: a) Is strictly appropriate only in large samples b) If errors are homoskedastic, it converges to the least squares formula The variances of the OLS estimators depend on σi 2 rather than σ2 . In the case of the simple linear model: i i i e x y    2 1   The variance of b2 is given by:                 N i i i N i i N i i i w x x x x b 1 2 2 1 2 1 2 2 2 ) ) ) ( r̂ va   If we replace 2 i  with 2 ˆi e we obtain White’s heteroskedasticity consistent estimator White’s Robust Standard errors give the same coefficients but a reduced standard error. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 81. What would happen if we always compute the standard errors (and therefore t-ratios) using White’s formula instead of the traditional Least Squares? This is known as Heteroskedasticity-Robust Inference, and it is used by many applied economists. Robust estimation is a “branch” of econometrics. When the true variance is homoskedastic and the sample is large, Whites formula converges approximately to: N SSE  2 ̂ The Generalised Least Squares (Weighted Least Squares): 1. Under heteroskedasticity the least squares estimator is not the best linear unbiased estimator 2. One way of overcoming this dilemma is to change or transform our statistical model into one with homoskedastic errors and then use Least squares 3. Leaving the basic structure of the model intact, it is possible to turn the heteroskedastic error model into a homosedastic error model. If 2 i  is known then we can weight the original data (including the constant term) and then perform OLS on the transformed model. The transformed model is: * * * 2 2 * 1 1 * 2 2 1 ... ... 1 i iK K i i i i i i iK K i i i i i e x x x y or e x x y                      The transformed model satisfies all the assumptions of the multiple regression model (including homoskedasticity). Thus, applying OLS to the transformed model yields best linear unbiased estimates. The estimator is known as Generalised Least Squares (GLS) or Weighted Least Squares (WLS). Sometimes 2 i  is only known up to a factor of proportionality. In this case, we can still transform the original model in such a way that the transformed errors are homoskedastic. Some popular heteroskedastic specifications: ij ij ij i ij ij ij i x x x x x x by divide therefore, ) ( by divide therefore, ) ( 2 2 2 2 2 2 2             If our assumptions about the form of heteroskedasticity are incorrect then GLS will yield biased estimates. For: ij ij i x x by divide therefore, 2 2 2     2 2 2 2 * 1 ) var( 1 var ) var( t t t t t t i x x e x x e e             For: ij ij i x x by divide therefore, 2 2     Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 82. t t t t t t i x x e x x e e 2 * 1 ) var( 1 var ) var(             Feasible Generalised Least Squares: If we reject the null hypotheses of homoskedasticity, we might wish to use an estimation technique for the coefficients and the standard errors that accounts for heteroskedasticity. We have already shown that if we “weight” the original data by some appropriate value we can achieve a transformed model with homoskedastic errors that can be estimated by Ordinary Least Squares (OLS). We also note that the task of finding na appropriate weight in a multiple regression model is more complicated as we might have several variables that are potentially an option. Feasible Generalised Least Squares is based on the idea that we should use all the information available, therefore, we will construct a suitable weight that is a function of all the explanatory variables in the original model. If 2 i  is unkown then it must be estimated. The resulting estimator is known as Feasible Generalised Least Squares (GLS) a popular specification: ) ... exp( 2 2 1 2 iS S i i z z         In this case, we estimate the model: i iS S i i i i v z z v e            ... ) ln( ) ˆ ln( 2 2 1 2 2 And then use the variance estimator: ) ˆ ... ˆ ˆ exp( ) ˆ ( 2 2 1 2 iS S i i z z         The aim is to produce a prediction 2 i  , based on the model and then use it to weight the original model. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205