12. Example of Correlation questions
Is there an association between:
Educational attainment and income
Children’s IQ and Parents’ IQ
Urban growth and air quality violations?
Number of police patrol and number of crime
Grade on exam and time on exam
13. Scatterplot
The relationship between any two variables
can be portrayed graphically on an x- and
y- axis.
Each subject i1 has (x1, y1). When score s
for an entire sample are plotted, the result
is called scatter plot.
15. How is the Correlation Coefficient
Computed?
The conceptual formula for the
correlation coefficient:
∑(X – X) (Y – Y)
[∑ (X – X)2 ] [∑ (Y – Y)2 ]
Where X is a person’s or case’s score on the independent variable, Y is a person’s or case’s
score on the dependent variable, and X-bar and Y-bar are the means of the scores on the
independent and dependent variables, respectively. The quantity in the numerator is called the
sum of the crossproducts (SP). The quantity in the denominator is the square root of the
product of the sum of squares for both variables (SSx and SSy)
r =
16. Direction of the relationship
Variables can be positively or negatively
correlated.
Positive correlation: A value of one variable
increase, value of other variable increase.
Negative correlation: A value of one variable
increase, value of other variable decrease.
Zero correlation: two variables are not
related
25. Types of Variables
Discrete variables:
Take exact numbers. Cannot be decimals
Number of children
Number of calls you make a day
26. Types of Variables
Continuous variables:
Always numeric
Can be any number, positive or negative
Examples: age in years, weight, blood pressure
readings, temperature, concentrations of
pollutants and other measurements
Categorical variables:
Information that can be sorted into categories
Types of categorical variables – ordinal, nominal
and dichotomous (binary)
27. Categorical Variables:
Ordinal Variables
Ordinal variable—a categorical variable with
some intrinsic order or numeric value
Examples of ordinal variables:
Education (no high school degree, HS degree,
some college, college degree)
Agreement (strongly disagree, disagree, neutral,
agree, strongly agree)
Rating (excellent, good, fair, poor)
Frequency (always, often, sometimes, never)
Any other scale (“On a scale of 1 to 5...”)
28. Categorical Variables:
Nominal Variables
Nominal variable – a categorical variable
without an intrinsic order
Examples of nominal variables:
Where a person lives in the U.S. (Northeast,
South, Midwest, etc.)
Sex (male, female)
Nationality (American, Mexican, French)
Race/ethnicity (African American, Hispanic, White,
Asian American)
Favorite pet (dog, cat, fish, snake)
29. Categorical Variables:
Dichotomous Variables
Dichotomous (or binary) variables – a
categorical variable with only 2 levels of
categories
Often represents the answer to a yes or no
question
For example:
“Did you attend the church picnic on May 24?”
“Did you eat potato salad at the picnic?”
Anything with only 2 categories
30. DUMMY VARIABLES
Let’s say that we want to predict the salary a
customer service agent gets. We think that years of
experience is one of the variables (X1).
We would also like to include whether the person is a
college graduate or not. We will use a dummy
variable to include this information. Therefore x2 will
be
x2 = 0, if the person is not a college graduate.
x2 = 1, if the person is a college graduate.
34. DUMMY VARIABLE EXAMPLE
Y: annual salary
X1: years of experience
X2: 1 if the person has a college degree, 0
otherwise.
Assume that the person has 5 years of experience.
What would his salary be if he is not a college
graduate? What would his salary be if he is a college
graduate?
21 85.225ˆ xxy
42. How to obtain the
parameters
(c) 2007 IUPUI SPEA K300 (4392)
43. (c) 2007 IUPUI SPEA K300 (4392)
1. Least Squares Method 1
XY
bXaYYE ˆ)(
bXaYbXaYYY )(ˆ
222
)()ˆ( bXaYYY
222
)()ˆ( bXaYYY
abXbXYaYXbaYbXaY 222)( 22222
22
)( bXaYMinMin
How to get a and b that can minimize the sum
of squares of errors?
44. (c) 2007 IUPUI SPEA K300 (4392)
Least Squares Method
• Linear algebraic solution
• Compute a and b so that partial derivatives
with respect to a and b are equal to zero
0222
)( 22
XbYna
a
bXaY
a
0 XbYna
XbY
n
X
b
n
Y
a
45. (c) 2007 IUPUI SPEA K300 (4392)
Least Squares Method 3
Take a partial derivative with respect to b and
plug in a you got, a=Ybar –b*Xbar
0222
)( 2
22
XaXYXb
b
bXaY
b
02
XaXYXb 02
XXbYXYXb
02
X
n
X
b
n
Y
XYXb
0
2
2
n
X
b
n
YX
XYXb
n
YXXY
n
XXn
b
22
46. (c) 2007 IUPUI SPEA K300 (4392)
Least Squares Method 4
Least squares method is an algebraic solution
that minimizes the sum of squares of errors
(variance component of error)
x
xy
SS
SP
XX
YYXX
XXn
YXXYn
b
222
)(
))((
XbY
n
X
b
n
Y
a
22
2
XXn
XYXXY
a Not recommended
47. (c) 2007 IUPUI SPEA K300 (4392)
OLS: Example 1
No x y x-xbar y-ybar (x-xb)(y-yb) (x-xbar)^2
1 43 128 -14.5 -8.5 123.25 210.25
2 48 120 -9.5 -16.5 156.75 90.25
3 56 135 -1.5 -1.5 2.25 2.25
4 61 143 3.5 6.5 22.75 12.25
5 67 141 9.5 4.5 42.75 90.25
6 70 152 12.5 15.5 193.75 156.25
Mean 57.5 136.5
Sum 345 819 541.5 561.5
0481.815.579644.5.136 XbYa
9644.
5.561
5.541
)(
))((
2
x
xy
SS
SP
XX
YYXX
b
48. (c) 2007 IUPUI SPEA K300 (4392)
OLS: Example 10-5 (3)120130140150
40 50 60 70
x
Fitted values y
Y hat = 81.048 + .964X
49. (c) 2007 IUPUI SPEA K300 (4392)
Hypothesis Testing: regression
parameters
How reliable are a and b we computed?
T-test (Wald test in general) can answer
The standardized effect size (effect size /
standard error)
Effect size is a-0 and b-0 assuming 0 is the
hypothesized value; H0: α=0, H0: β=0
Degrees of freedom is N-K, where K is the
number of regressors +1
How to compute standard error (deviation)?
50. (c) 2007 IUPUI SPEA K300 (4392)
Illustration: Test b
How to test whether beta is zero (no effect)?
Like y, α and β follow a normal distribution; a
and b follows the t distribution
b=.9644, SE(b)=.2381,df=N-K=6-2=4
Hypothesis Testing
1. H0:β=0 (no effect), Ha:β≠0 (two-tailed)
2. Significance level=.05, CV=2.776, df=6-2=4
3. TS=(.9644-0)/.2381=4.0510~t(N-K)
4. TS (4.051)>CV (2.776), Reject H0
51. (c) 2007 IUPUI SPEA K300 (4392)
Illustration: Test a
How to test whether alpha is zero?
Like y, α and β follow a normal distribution; a
and b follows the t distribution
a=81.0481, SE(a)=13.8809, df=N-K=6-2=4
Hypothesis Testing
1. H0:α=0, Ha:α≠0 (two-tailed)
2. Significance level=.05, CV=2.776
3. TS=(81.0481-0)/.13.8809=5.8388~t(N-K)
4. TS (5.839)>CV (2.776), Reject H0
53. (c) 2007 IUPUI SPEA K300 (4392)
ANOVA Table: F-test
H0: all parameters are zero, β0 = β1 = 0
Ha: at least one parameter is not zero
CV is 12.22 (1,4), TS>CV, reject H0
Sources Sum of Squares DF Mean Squares F
Model SSM K-1 MSM=SSM/(K-1) MSM/MSE
Residual SSE N-K MSE=SSE/(N-K)
Total SST N-1
Sources Sum of Squares DF Mean Squares F
Model 522.2124 1 522.2124 16.41047
Residual 127.2876 4 31.8219
Total 649.5000 5
54. (c) 2007 IUPUI SPEA K300 (4392)
R2 and Goodness-of-fit
Goodness-of-fit measures evaluates how well
a regression model fits the data
The smaller SSE, the better fit the model
F test examines if all parameters are zero.
(large F and small p-value indicate good fit)
R2 (Coefficient of Determination) is SSM/SST
that measures how much a model explains the
overall variance of Y.
R2=SSM/SST=522.2/649.5=.80
Large R square means the model fits the data
55. (c) 2007 IUPUI SPEA K300 (4392)
Myth and Misunderstanding in R2
R square is Karl Pearson correlation coefficient
squared. r2=.89672=.80
If a regression model includes many regressors, R2 is
less useful, if not useless.
Addition of any regressor always increases R2
regardless of the relevance of the regressor
Adjusted R2 give penalty for adding regressors, Adj.
R2=1-[(N-1)/(N-K)](1-R2)
R2 is not a panacea although its interpretation is
intuitive; if the intercept is omitted, R2 is incorrect.
Check specification, F, SSE, and individual parameter
estimators to evaluate your model; A model with
smaller R2 can be better in some cases.
113. Ordinary Least Squares (OLS)
Objective of OLS Minimize the sum of
squared residuals:
where
Remember that OLS is not the only possible
estimator of the βs.
But OLS is the best estimator under certain
assumptions…
n
i
ie
1
2
ˆ
min
iKiKiii XXXY ...22110
iii YYe ˆ
114. Classical Assumptions
1. Regression is linear in parameters
2. Error term has zero population mean
3. Error term is not correlated with X’s
4. No serial correlation
5. No heteroskedasticity
6. No perfect multicollinearity
and we usually add:
7. Error term is normally distributed
115. Assumption 1: Linearity
The regression model:
A) is linear
It can be written as
This doesn’t mean that the theory must be linear
For example… suppose we believe that CEO salary is
related to the firm’s sales and CEO’s tenure.
We might believe the model is:
iKiKiii XXXY ...22110
iiiii tenuretenuresalessalary 2
3210 )log()log(
116. Assumption 1: Linearity
The regression model:
B) is correctly specified
The model must have the right variables
No omitted variables
The model must have the correct functional form
This is all untestable We need to rely on economic
theory.
117. Assumption 1: Linearity
The regression model:
C) must have an additive error term
The model must have + εi
118. Assumption 2: E(εi)=0
Error term has a zero population mean
E(εi)=0
Each observation has a random error with
a mean of zero
What if E(εi)≠0?
This is actually fixed by adding a constant
(AKA intercept) term
119. Assumption 2: E(εi)=0
Example: Suppose instead the mean of εi
was -4.
Then we know E(εi+4)=0
We can add 4 to the error term and
subtract 4 from the constant term:
Yi =β0+ β1Xi+εi
Yi =(β0-4)+ β1Xi+(εi+4)
120. Assumption 2: E(εi)=0
Yi =β0+ β1Xi+εi
Yi =(β0-4)+ β1Xi+(εi+4)
We can rewrite:
Yi =β0*+ β1Xi+εi*
Where β0*= β0-4 and εi*=εi+4
Now E(εi*)=0, so we are OK.
121. Assumption 3: Exogeneity
Important!!
All explanatory variables are uncorrelated
with the error term
E(εi|X1i,X2i,…, XKi,)=0
Explanatory variables are determined
outside of the model (They are
exogenous)
122. Assumption 3: Exogeneity
What happens if assumption 3 is violated?
Suppose we have the model,
Yi =β0+ β1Xi+εi
Suppose Xi and εi are positively correlated
When Xi is large, εi tends to be large as
well.
126. Assumption 3: Exogeneity
Why would x and ε be correlated?
Suppose you are trying to study the
relationship between the price of a
hamburger and the quantity sold across a
wide variety of Ventura County
restaurants.
127. Assumption 3: Exogeneity
We estimate the relationship using the
following model:
salesi= β0+β1pricei+εi
What’s the problem?
128. Assumption 3: Exogeneity
What’s the problem?
What else determines sales of hamburgers?
How would you decide between buying a
burger at McDonald’s ($0.89) or a burger at TGI
Fridays ($9.99)?
Quality differs
salesi= β0+β1pricei+εi quality isn’t an X
variable even though it should be.
It becomes part of εi
129. Assumption 3: Exogeneity
What’s the problem?
But price and quality are highly positively
correlated
Therefore x and ε are also positively correlated.
This means that the estimate of β1will be too
high
This is called “Omitted Variables Bias” (More in
Chapter 6)
130. Assumption 4: No Serial Correlation
Serial Correlation: The error terms across
observations are correlated with each
other
i.e. ε1 is correlated with ε2, etc.
This is most important in time series
If errors are serially correlated, an
increase in the error term in one time
period affects the error term in the next.
131. Assumption 4: No Serial Correlation
The assumption that there is no serial
correlation can be unrealistic in time series
Think of data from a stock market…
132. Assumption 4: No Serial Correlation
-500
0
500
1000
1500
2000
1870 1920 1970 2020
Year
RealS&P500StockPriceIndex
Price
Stock data is serially correlated!
133. Assumption 5: Homoskedasticity
Homoskedasticity: The error has a
constant variance
This is what we want…as opposed to
Heteroskedasticity: The variance of the
error depends on the values of Xs.
137. Assumption 6: No Perfect Multicollinearity
Two variables are perfectly collinear if one
can be determined perfectly from the other
(i.e. if you know the value of x, you can
always find the value of z).
Example: If we regress income on age,
and include both age in months and age in
years.
But age in years = age in months/12
e.g. if we know someone is 246 months old, we
also know that they are 20.5 years old.
138. Assumption 6: No Perfect Multicollinearity
What’s wrong with this?
incomei= β0 + β1agemonthsi +
β2ageyearsi + εi
What is β1?
It is the change in income associated with
a one unit increase in “age in months,”
holding age in years constant.
But if you hold age in years constant, age in
months doesn’t change!
139. Assumption 6: No Perfect Multicollinearity
β1 = Δincome/Δagemonths
Holding Δageyears = 0
If Δageyears = 0; then Δagemonths = 0
So β1 = Δincome/0
It is undefined!
140. Assumption 6: No Perfect Multicollinearity
When more than one independent variable
is a perfect linear combination of the other
independent variables, it is called Perfect
MultiCollinearity
Example: Total Cholesterol, HDL and LDL
Total Cholesterol = LDL + HDL
Can’t include all three as independent
variables in a regression.
Solution: Drop one of the variables.
142. Assumption 7: Normally Distributed Error
This is required not required for OLS, but it
is important for hypothesis testing
More on this assumption next time.
143. Putting it all together
Last class, we talked about how to compare
estimators. We want:
1. is unbiased.
on average, the estimator is equal to the population
value
2. is efficient
The variance of the estimator is as small as possible
ˆ
)ˆ(E
ˆ
145. Gauss-Markov Theorem
Given OLS assumptions 1 through 6, the
OLS estimator of βk is the minimum
variance estimator from the set of all linear
unbiased estimators of βk for k=0,1,2,…,K
OLS is BLUE
The Best, Linear, Unbiased Estimator
146. Gauss-Markov Theorem
What happens if we add assumption 7?
Given assumptions 1 through 7, OLS is
the best unbiased estimator
Even out of the non-linear estimators
OLS is BUE?
147. Gauss-Markov Theorem
With Assumptions 1-7 OLS is:
1. Unbiased:
2. Minimum Variance – the sampling distribution
is as small as possible
3. Consistent – as n∞, the estimators
converge to the true parameters
As n increases, variance gets smaller, so each estimate
approaches the true value of β.
4. Normally Distributed. You can apply
statistical tests to them.
)ˆ(E