SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Problem set 4
Jonathan Zimmermann
7 November 2015
Exercise 1
Use the data in meap00_01.RData to answer this question. The original source of this data
set is Michigan Department of Education (www.michigan.gov/mde).
a) Estimate the model
math4 = β0 + β1lunch + β2log(enroll) + β3log(exppp) + u
by OLS and obtain the usual standard errors and the robust standard errors. How do they generally compare?
>>
library(dplyr)
library(ggvis)
library(sandwich)
library(lmtest)
library(car)
load("meap00_01.RData")
Standard OLS:
a = lm(math4 ~ lunch + log(enroll) + log(exppp), data) %>% print()
##
## Call:
## lm(formula = math4 ~ lunch + log(enroll) + log(exppp), data = data)
##
## Coefficients:
## (Intercept) lunch log(enroll) log(exppp)
## 91.9324 -0.4487 -5.3992 3.5248
summary(a)
##
## Call:
## lm(formula = math4 ~ lunch + log(enroll) + log(exppp), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.326 -8.758 0.914 9.164 51.416
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
1
## (Intercept) 91.93240 19.96170 4.605 4.42e-06 ***
## lunch -0.44874 0.01464 -30.648 < 2e-16 ***
## log(enroll) -5.39915 0.94041 -5.741 1.11e-08 ***
## log(exppp) 3.52475 2.09785 1.680 0.0931 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.3 on 1688 degrees of freedom
## Multiple R-squared: 0.3729, Adjusted R-squared: 0.3718
## F-statistic: 334.6 on 3 and 1688 DF, p-value: < 2.2e-16
Robust OLS:
b = coeftest(a, vcov = vcovHC(a, "HC1")) %>% print()
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 91.932404 23.087103 3.9820 7.125e-05 ***
## lunch -0.448743 0.016584 -27.0583 < 2.2e-16 ***
## log(enroll) -5.399152 1.131175 -4.7730 1.971e-06 ***
## log(exppp) 3.524751 2.353737 1.4975 0.1344
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The sample size is:
summary(a)$df[2] + length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would ha
## [1] 1692
The r-squared is:
summary(a)$r.squared
## [1] 0.3728875
Regular OLS regression results:
math4 = 91.93240
(19.96170)
− 0.44874
(0.01464)
lunch − 5.39915
(0.94041)
log(enroll) + 3.52475
(2.09785)
log(exppp)
Robust OLS regression results:
math4 = 91.93240
(23.087103)
− 0.44874
(0.016584)
lunch − 5.39915
(1.131175)
log(enroll) + 3.52475
(2.353737)
log(exppp)
As we can see, the two models give the same estimates for the coefficients, but the standard errors are larger
in the robust version. This is a general result of heteroskedasticity-robust regressions: the standard errors are
usually larger to compensate for the presence (or suspicion) of heteroskedasticity.
Note: When variances tends to increase as X is far from its mean, we might have a robust standard error
smaller than the regular standard error. But this case is quite rare in practice, and in almost any data set,
the robust standard errors are typically larger or equal.
2
b) Apply the White test for heteroskedasticity. What is the value of the F test? What do you conclude?
>>
We can perform the White test in R using a modified Breusch-Pagan function:
fitted.a <- a$fitted.values
bptest(a, ~fitted.a + I(fitted.a^2))
##
## studentized Breusch-Pagan test
##
## data: a
## BP = 229.78, df = 2, p-value < 2.2e-16
The resulting p-value is extremely low. We can therefore safely reject the null hypothesis that the model
has homoskedastic variance, i.e. Gauss-Markov’s assumption 5 doesn’t hold and OLS is not the best linear
estimator.
Alternatively, we could have computed the White test manually:
data$fittedValues <- fitted.a
data$residual <- a$residuals
data$fittedValuesSq <- data$fittedValues^2
data$residualMathSq <- data$residual^2
whiteRegression <- lm(residualMathSq ~ fittedValues + fittedValuesSq,
data)
summary(whiteRegression)
##
## Call:
## lm(formula = residualMathSq ~ fittedValues + fittedValuesSq,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -593.52 -163.92 -72.42 52.12 2735.87
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1664.12800 315.21436 5.279 1.46e-07 ***
## fittedValues -28.29903 9.21685 -3.070 0.00217 **
## fittedValuesSq 0.11553 0.06591 1.753 0.07982 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 365.1 on 1689 degrees of freedom
## Multiple R-squared: 0.1358, Adjusted R-squared: 0.1348
## F-statistic: 132.7 on 2 and 1689 DF, p-value: < 2.2e-16
whiteRsquared <- summary(whiteRegression)$r.squared
# Calculating the F-statistic
fStatistic <- (whiteRsquared/2)/((1 - whiteRsquared)/(1692 -
3)) %>% print
3
## [1] 0.0005116607
# Critical value for alpha = 5%
qf(0.95, df1 = 2, df2 = 1692 - 3)
## [1] 3.001052
# Critical value for alpha = 1%
qf(0.99, df1 = 2, df2 = 1692 - 3)
## [1] 4.617749
c) Obtain ˆgi as the fitted values from the regression log(ˆu2
i ) on math4i, math4
2
i where math4i are the
OLS fitted values and the ˆui are the OLS residuals. Let ˆhi = exp(ˆgi). Use the ˆhi to obtain WLS
estimates. Are there big differences with the OLS coefficients?
>>
fitted_square = fitted.a^2
c = lm(log(a$residuals^2) ~ fitted.a + fitted_square) %>% print()
##
## Call:
## lm(formula = log(a$residuals^2) ~ fitted.a + fitted_square)
##
## Coefficients:
## (Intercept) fitted.a fitted_square
## 4.2075441 0.0596369 -0.0008415
g = c$fitted.values
h = exp(g)
WLS_model = lm(math4 ~ lunch + log(enroll) + log(exppp), data,
weight = 1/h) %>% print()
##
## Call:
## lm(formula = math4 ~ lunch + log(enroll) + log(exppp), data = data,
## weights = 1/h)
##
## Coefficients:
## (Intercept) lunch log(enroll) log(exppp)
## 50.4782 -0.4486 -2.6473 6.4742
summary(WLS_model)
##
## Call:
## lm(formula = math4 ~ lunch + log(enroll) + log(exppp), data = data,
## weights = 1/h)
4
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -10.7122 -1.1914 0.1403 1.3741 3.9151
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.47815 16.51032 3.057 0.002268 **
## lunch -0.44859 0.01461 -30.697 < 2e-16 ***
## log(enroll) -2.64728 0.83594 -3.167 0.001569 **
## log(exppp) 6.47420 1.68588 3.840 0.000127 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.9 on 1688 degrees of freedom
## Multiple R-squared: 0.36, Adjusted R-squared: 0.3588
## F-statistic: 316.5 on 3 and 1688 DF, p-value: < 2.2e-16
Regular OLS regression results:
math4 = 91.93240
(19.96170)
− 0.44874
(0.01464)
lunch − 5.39915
(0.94041)
log(enroll) + 3.52475
(2.09785)
log(exppp)
Regular WLS regression results:
math4 = 50.47815
(16.51032)
− 0.44859
(0.01461)
lunch − 2.64728
(0.83594)
log(enroll) + 6.47420
(1.68588)
log(exppp)
Yes, there are significant differences with WLS compared to the OLS regression. None of the coefficients
changed sign and they all remained statistically significant, but their values changed a lot. In particular:
• The intercept decreased from 91.93240 to 50.47815
• The coefficient log(enroll) increased from -5.39915 to -2.64728 (i.e. increasing school enrollment is now
estimated to have less impact on 4th grade math)
• The coefficient log(exppp) increased from 3.52475 to 6.47420 (i.e. spending more on pupils is now
expected to have a bigger impact on 4th grade math)
The coefficient “lunch” almost didn’t change.
The significance of some coefficients also changed. In particular, log(exppp) is now highly statistically
significant.
Reminder: the interpretation of the coefficients doesn’t change when WLS is used instead of OLS. And in
both models, they are unbiased (as long as assumptions 1 to 4 held in the original model).
d) Obtain the standard errors for WLS that allow misspecification of the variance function. Do these differ
much from the usual WLS standard errors?
>>
If we are not sure of the specification of the variance function, we can couple WLS with the use of robust
standard errors:
5
WLS_model_robust = coeftest(WLS_model, vcov = vcovHC(WLS_model,
"HC1")) %>% print()
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.478152 18.925179 2.6672 0.0077206 **
## lunch -0.448592 0.014254 -31.4711 < 2.2e-16 ***
## log(enroll) -2.647282 1.054608 -2.5102 0.0121590 *
## log(exppp) 6.474200 1.812868 3.5712 0.0003652 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Regular WLS regression results:
math4 = 50.47815
(16.51032)
− 0.44859
(0.01461)
lunch − 2.64728
(0.83594)
log(enroll) + 6.47420
(1.68588)
log(exppp)
Robust WLS regression results:
math4 = 50.47815
(18.925179
− 0.44859
(0.014254)
lunch − 2.64728
(1.054608)
log(enroll) + 6.47420
(1.812868)
log(exppp)
As expected, the robust standard errors are usually larger than the regular WLS. But in this case, the
differences are small. The only relatively important impact this has is that, now, the coefficient log(enroll) is
statistically significant at the 1.2159% significance level only, compared to 0.1569% in the regular model.
e) For estimating the effect of spending on math4, does OLS or WLS appear to be more precise?
>>
WLS appears to be considerably more precise for this coefficient.
In the regular OLS model, log(exppp) is statistically significant at the 9.31% level whereas in the regular
WLS model it is statistically significant at the 0.0127% level. The same can be observed if we compare the
robust OLS model to the robust WLS model, in which the significance level for log(exppp) are respectively
13.44% and 0.03652%.
Exercise 2
Use the data from jtrain.RData for this exercise.
a) Consider the simple regression model
log(scrap) = β0 + β1grant + u
where scrap is the firm scrap rate and grant is a dummy variable indicating whether a firm received a job
training grant. Can you think of some reasons why the unobserved factors in u might be correlated with
grant?
>>
6
load("jtrain.RData")
There is an infinity of potential reasons. We are interested by factors that have an impact on both grant and
scrap and are not already included in the regression (i.e. unobserved factors).
For example, the attribution of a grant could be linked to the attributes of the workforce, i.e. a firm is more
likely to receive a job training grant if its workers have specific attributes (e.g. age, previous qualifications,
etc.). Those attributes, in turn, are very likely to affect the scrap rate, for example if the age of a worker has
a causal impact on his efficiency.
To give a comprehensive answer to this question, it would be useful to understand the process of attribution
of grants in more detail. If being eligible to receive such a grant requires to meet some specific criteria at the
firm level, such as being based in a specific geographical area, then these criteria might also, in turn, have an
impact on the scrap rate (for example, some geographical areas may increase the likelihood of defect because
of weather conditions).
b) Estimate the simple regression model using the data for 1988. (You should have 54 observations.) Does
receiving a job training grant significantly lower a firm’s scrap rate?
>>
The SRF is:
data_88 = filter(data, year == "1988")
model_88 = lm(log(scrap) ~ grant, data_88) %>% print
##
## Call:
## lm(formula = log(scrap) ~ grant, data = data_88)
##
## Coefficients:
## (Intercept) grant
## 0.4085 0.0566
summary(model_88)
##
## Call:
## lm(formula = log(scrap) ~ grant, data = data_88)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4043 -0.9536 -0.0465 0.9636 2.8103
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.4085 0.2406 1.698 0.0954 .
## grant 0.0566 0.4056 0.140 0.8895
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.423 on 52 degrees of freedom
## (103 observations deleted due to missingness)
## Multiple R-squared: 0.0003744, Adjusted R-squared: -0.01885
## F-statistic: 0.01948 on 1 and 52 DF, p-value: 0.8895
7
Regression results:
log(scrap) = 0.4085
(0.2406)
+ 0.0566
(0.4056)
grant
The sample size is:
summary(model_88)$df[2] + length(coef(model_88)) # = Degrees of freedom + number of coefficients. nrow(
## [1] 54
The r-squared is:
summary(model_88)$r.squared
## [1] 0.0003744378
No, there is no evidence that receiving a job grant significantly lowers a firm’s scrap rate since the coefficient
is positive, at least based on 1988’s data.
c) Now, add as an explanatory variable log(scrap87). How does this change the estimated effect of grant?
Interpret the coefficient on grant. Is it statistically significant at the 5% level against the one-sided
alternative H1 : βgrant < 0?
>>
new_data_88 = filter(data, year == "1987") %>% select(scrap87 = scrap) %>%
cbind(data_88)
new_model_88 = lm(log(scrap) ~ grant + log(scrap87), new_data_88) %>%
print
##
## Call:
## lm(formula = log(scrap) ~ grant + log(scrap87), data = new_data_88)
##
## Coefficients:
## (Intercept) grant log(scrap87)
## 0.02124 -0.25397 0.83116
summary(new_model_88)
##
## Call:
## lm(formula = log(scrap) ~ grant + log(scrap87), data = new_data_88)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9146 -0.1763 0.0057 0.2308 1.5991
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.02124 0.08910 0.238 0.8126
8
## grant -0.25397 0.14703 -1.727 0.0902 .
## log(scrap87) 0.83116 0.04444 18.701 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5127 on 51 degrees of freedom
## (103 observations deleted due to missingness)
## Multiple R-squared: 0.8728, Adjusted R-squared: 0.8678
## F-statistic: 174.9 on 2 and 51 DF, p-value: < 2.2e-16
The sample size is:
summary(new_model_88)$df[2] + length(coef(new_model_88)) # = Degrees of freedom + number of coefficient
## [1] 54
The r-squared is:
summary(new_model_88)$r.squared
## [1] 0.8727805
Let’s compare the two models, (2) being the one with the lagged dependant variable.
Independant Variables (1) (2)
(Intercept) 0.4085 (0.2406) 0.02124 (0.08910)
grant 0.0566 (0.4056) -0.25397 (0.14703)
log(scrap87) – 0.83116 (0.04444)
Observations 54 54
R-Squared 0.0003744378 0.8727805
Adding the lagged independant variable changed the sign of grant and increasing its statistical significance.
In the initial model (1), the interpretation of Beta_grant was “If a firm receives a job training grant, its scrap
rate is expected to increase by approximately 5.66%”. In the new model (2), the interpretation of Beta_grant
is “If a firm receives a job training grant, its scrap rate is expected to decrease by approximately 25.397%”.
Thus, by controlling for the scrap rate of the previous year, the estimated effect of grant changed considerably.
We want to test the following hypothesis:
H0 : βgrant = 0H1 : βgrant < 0
The t-statistic of βgrant is:
beta_grant.t_statistic = (coef(new_model_88)[["grant"]]/summary(new_model_88)$coefficients["grant",
"Std. Error"]) %>% print
## [1] -1.727319
At the 5% significance level, the one-sided t-test critical value is:
9
beta_grant.critical_value = qt(0.05, summary(new_model_88)$df[2] +
length(coef(new_model_88))) %>% print #P(T < critical_value) = 5%
## [1] -1.673565
Do we reject H0?
# Reject H0 if beta_grant.t_statistic <
# -beta_grant.critical_value.
beta_grant.t_statistic < -beta_grant.critical_value
## [1] TRUE
We can reject H0 at the 5% significance level against the one-sided alternative that βgrant is lower than 0.
This means that, in this new model, βgrant is statistically significant and smaller than 0.
d) Test the null hypothesis that the parameter on log(scrap87) is one against the two-sided alternative.
Report the p-value for the test.
We want to test the following hypothesis:
H0 : βlog(scrap87) = 1H1 : βlog(scrap87) = 1
The t-statistic of βlog(scrap87) is:
beta_logScrap87.t_statistic = ((coef(new_model_88)[["log(scrap87)"]] -
1)/summary(new_model_88)$coefficients["log(scrap87)", "Std. Error"]) %>%
print
## [1] -3.798889
At the 5% significance level, the two-sided t-test critical value is:
beta_logScrap87.critical_value = qt(0.05/2, summary(new_model_88)$df[2] +
length(coef(new_model_88))) %>% print #P(T < critical_value) = 5%
## [1] -2.004879
Do we reject H0?
# Reject H0 if abs(beta_logScrap87.t_statistic) >
# beta_logScrap87.critical_value.
abs(beta_logScrap87.t_statistic) > beta_logScrap87.critical_value
## [1] TRUE
We can reject H0 at the 5% significance level against the two-sided alternative.
We could also have performed that test using the “car” package:
10
linearHypothesis(new_model_88, "log(scrap87)=1")
## Linear hypothesis test
##
## Hypothesis:
## log(scrap87) = 1
##
## Model 1: restricted model
## Model 2: log(scrap) ~ grant + log(scrap87)
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 52 17.197
## 2 51 13.404 1 3.793 14.432 0.0003885 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value for the test is 0.0003885.
e) Repeat parts (c) and (d), using heteroskedasticity-robust standard errors, and briefly discuss any notable
differences.
We still want to test the following hypothesis, except that now it is with robust standard errors:
H0 : βgrant = 0H1 : βgrant < 0
linearHypothesis(new_model_88, "grant=0", white.adjust = "hc1")
## Linear hypothesis test
##
## Hypothesis:
## grant = 0
##
## Model 1: restricted model
## Model 2: log(scrap) ~ grant + log(scrap87)
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 52
## 2 51 1 3.0105 0.08876 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The one-sided p-value is 0.04438 (two-sided p-value divided by two), so we can still reject the null hypothesis
at the 5% significance level (the p-value is actually lower with the robust standard errors in this case).
Regarding log(scrap87), we still want to test the following hypothesis, but this time with robust standard
errors:
H0 : βlog(scrap87) = 1H1 : βlog(scrap87) = 1
11
linearHypothesis(new_model_88, "log(scrap87)=1", white.adjust = "hc1")
## Linear hypothesis test
##
## Hypothesis:
## log(scrap87) = 1
##
## Model 1: restricted model
## Model 2: log(scrap) ~ grant + log(scrap87)
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 52
## 2 51 1 5.271 0.02583 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is now 0.02583, compared to 0.0003885 with the non-robust standard errors. So we can still
reject the null hypothesis at the 2.583% significance level, but including robustness in the hypothesis testing
considerably reduced our confidence.
12

Weitere ähnliche Inhalte

Was ist angesagt?

Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...Jonathan Zimmermann
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r squareAkhilesh Joshi
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentationCarlo Magno
 
MANOVA DUA JALUR_Nur Maulidiawati Rahman.pptx
MANOVA DUA JALUR_Nur Maulidiawati Rahman.pptxMANOVA DUA JALUR_Nur Maulidiawati Rahman.pptx
MANOVA DUA JALUR_Nur Maulidiawati Rahman.pptxIsnawati78
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdfYashwanth Rm
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
HeteroscedasticityMuhammad Ali
 
Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival AnalysisChandan Reddy
 
Introduction to Structural Equation Modeling
Introduction to Structural Equation ModelingIntroduction to Structural Equation Modeling
Introduction to Structural Equation ModelingUniversity of Southampton
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysisJames Neill
 
Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in PracticeBigData Republic
 
APG Pertemuan 7 : Manova
APG Pertemuan 7 : ManovaAPG Pertemuan 7 : Manova
APG Pertemuan 7 : ManovaRani Nooraeni
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionVARUN KUMAR
 
APG Pertemuan 7 : Manova (2)
APG Pertemuan 7 : Manova (2)APG Pertemuan 7 : Manova (2)
APG Pertemuan 7 : Manova (2)Rani Nooraeni
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsAnirudha si
 
ders 3 Unit root test.pptx
ders 3 Unit root test.pptxders 3 Unit root test.pptx
ders 3 Unit root test.pptxErgin Akalpler
 
Mini project boston housing dataset v1
Mini project   boston housing dataset v1Mini project   boston housing dataset v1
Mini project boston housing dataset v1Wyendrila Roy
 

Was ist angesagt? (20)

Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
 
panel data.ppt
panel data.pptpanel data.ppt
panel data.ppt
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r square
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
 
MANOVA DUA JALUR_Nur Maulidiawati Rahman.pptx
MANOVA DUA JALUR_Nur Maulidiawati Rahman.pptxMANOVA DUA JALUR_Nur Maulidiawati Rahman.pptx
MANOVA DUA JALUR_Nur Maulidiawati Rahman.pptx
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdf
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
 
Introduction to Structural Equation Modeling
Introduction to Structural Equation ModelingIntroduction to Structural Equation Modeling
Introduction to Structural Equation Modeling
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in Practice
 
APG Pertemuan 7 : Manova
APG Pertemuan 7 : ManovaAPG Pertemuan 7 : Manova
APG Pertemuan 7 : Manova
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
APG Pertemuan 7 : Manova (2)
APG Pertemuan 7 : Manova (2)APG Pertemuan 7 : Manova (2)
APG Pertemuan 7 : Manova (2)
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
 
Lesson 11: Markov Chains
Lesson 11: Markov ChainsLesson 11: Markov Chains
Lesson 11: Markov Chains
 
ders 3 Unit root test.pptx
ders 3 Unit root test.pptxders 3 Unit root test.pptx
ders 3 Unit root test.pptx
 
Logistic regression sage
Logistic regression sageLogistic regression sage
Logistic regression sage
 
Mini project boston housing dataset v1
Mini project   boston housing dataset v1Mini project   boston housing dataset v1
Mini project boston housing dataset v1
 

Andere mochten auch (10)

02 1 Gmail如何永久删除郵件
02 1 Gmail如何永久删除郵件02 1 Gmail如何永久删除郵件
02 1 Gmail如何永久删除郵件
 
Visualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College LondonVisualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College London
 
20120140503019
2012014050301920120140503019
20120140503019
 
Chapter8
Chapter8Chapter8
Chapter8
 
gls
glsgls
gls
 
Heteroskedasticity
HeteroskedasticityHeteroskedasticity
Heteroskedasticity
 
Heteroskedasticity
HeteroskedasticityHeteroskedasticity
Heteroskedasticity
 
Sector externo de la economía
Sector externo de la economíaSector externo de la economía
Sector externo de la economía
 
Tut10 heteroskedasticity
Tut10 heteroskedasticityTut10 heteroskedasticity
Tut10 heteroskedasticity
 
Tut9 multicollinearity
Tut9 multicollinearityTut9 multicollinearity
Tut9 multicollinearity
 

Ähnlich wie Estimating effects of school factors on math scores using OLS and WLS regressions

Dem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sDem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sCorey Sparks
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examplesDennis
 
Econometria aplicada com dados em painel
Econometria aplicada com dados em painelEconometria aplicada com dados em painel
Econometria aplicada com dados em painelAdriano Figueiredo
 
5 structured programming
5 structured programming 5 structured programming
5 structured programming hccit
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Forecasting Revenue With Stationary Time Series Models
Forecasting Revenue With Stationary Time Series ModelsForecasting Revenue With Stationary Time Series Models
Forecasting Revenue With Stationary Time Series ModelsGeoffery Mullings
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Disprove of equality between riemann zeta function and euler product
Disprove of equality between riemann zeta function and euler productDisprove of equality between riemann zeta function and euler product
Disprove of equality between riemann zeta function and euler productChris De Corte
 
[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation Specialization[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation SpecializationAndrea Rubio
 
Classification examp
Classification exampClassification examp
Classification exampRyan Hong
 
Statistical flaws in excel calculations
Statistical flaws in excel calculationsStatistical flaws in excel calculations
Statistical flaws in excel calculationsProbodh Mallick
 
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docxE2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docxjacksnathalie
 

Ähnlich wie Estimating effects of school factors on math scores using OLS and WLS regressions (20)

Dem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sDem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM's
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
Curvefitting
CurvefittingCurvefitting
Curvefitting
 
hw4analysis
hw4analysishw4analysis
hw4analysis
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
 
Econometria aplicada com dados em painel
Econometria aplicada com dados em painelEconometria aplicada com dados em painel
Econometria aplicada com dados em painel
 
5 structured programming
5 structured programming 5 structured programming
5 structured programming
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Forecasting Revenue With Stationary Time Series Models
Forecasting Revenue With Stationary Time Series ModelsForecasting Revenue With Stationary Time Series Models
Forecasting Revenue With Stationary Time Series Models
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Statistics Assignment Help
Statistics Assignment HelpStatistics Assignment Help
Statistics Assignment Help
 
Disprove of equality between riemann zeta function and euler product
Disprove of equality between riemann zeta function and euler productDisprove of equality between riemann zeta function and euler product
Disprove of equality between riemann zeta function and euler product
 
[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation Specialization[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation Specialization
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Classification examp
Classification exampClassification examp
Classification examp
 
Statistical flaws in excel calculations
Statistical flaws in excel calculationsStatistical flaws in excel calculations
Statistical flaws in excel calculations
 
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docxE2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 

Kürzlich hochgeladen

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Kürzlich hochgeladen (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Estimating effects of school factors on math scores using OLS and WLS regressions

  • 1. Problem set 4 Jonathan Zimmermann 7 November 2015 Exercise 1 Use the data in meap00_01.RData to answer this question. The original source of this data set is Michigan Department of Education (www.michigan.gov/mde). a) Estimate the model math4 = β0 + β1lunch + β2log(enroll) + β3log(exppp) + u by OLS and obtain the usual standard errors and the robust standard errors. How do they generally compare? >> library(dplyr) library(ggvis) library(sandwich) library(lmtest) library(car) load("meap00_01.RData") Standard OLS: a = lm(math4 ~ lunch + log(enroll) + log(exppp), data) %>% print() ## ## Call: ## lm(formula = math4 ~ lunch + log(enroll) + log(exppp), data = data) ## ## Coefficients: ## (Intercept) lunch log(enroll) log(exppp) ## 91.9324 -0.4487 -5.3992 3.5248 summary(a) ## ## Call: ## lm(formula = math4 ~ lunch + log(enroll) + log(exppp), data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -56.326 -8.758 0.914 9.164 51.416 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) 1
  • 2. ## (Intercept) 91.93240 19.96170 4.605 4.42e-06 *** ## lunch -0.44874 0.01464 -30.648 < 2e-16 *** ## log(enroll) -5.39915 0.94041 -5.741 1.11e-08 *** ## log(exppp) 3.52475 2.09785 1.680 0.0931 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 15.3 on 1688 degrees of freedom ## Multiple R-squared: 0.3729, Adjusted R-squared: 0.3718 ## F-statistic: 334.6 on 3 and 1688 DF, p-value: < 2.2e-16 Robust OLS: b = coeftest(a, vcov = vcovHC(a, "HC1")) %>% print() ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 91.932404 23.087103 3.9820 7.125e-05 *** ## lunch -0.448743 0.016584 -27.0583 < 2.2e-16 *** ## log(enroll) -5.399152 1.131175 -4.7730 1.971e-06 *** ## log(exppp) 3.524751 2.353737 1.4975 0.1344 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The sample size is: summary(a)$df[2] + length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would ha ## [1] 1692 The r-squared is: summary(a)$r.squared ## [1] 0.3728875 Regular OLS regression results: math4 = 91.93240 (19.96170) − 0.44874 (0.01464) lunch − 5.39915 (0.94041) log(enroll) + 3.52475 (2.09785) log(exppp) Robust OLS regression results: math4 = 91.93240 (23.087103) − 0.44874 (0.016584) lunch − 5.39915 (1.131175) log(enroll) + 3.52475 (2.353737) log(exppp) As we can see, the two models give the same estimates for the coefficients, but the standard errors are larger in the robust version. This is a general result of heteroskedasticity-robust regressions: the standard errors are usually larger to compensate for the presence (or suspicion) of heteroskedasticity. Note: When variances tends to increase as X is far from its mean, we might have a robust standard error smaller than the regular standard error. But this case is quite rare in practice, and in almost any data set, the robust standard errors are typically larger or equal. 2
  • 3. b) Apply the White test for heteroskedasticity. What is the value of the F test? What do you conclude? >> We can perform the White test in R using a modified Breusch-Pagan function: fitted.a <- a$fitted.values bptest(a, ~fitted.a + I(fitted.a^2)) ## ## studentized Breusch-Pagan test ## ## data: a ## BP = 229.78, df = 2, p-value < 2.2e-16 The resulting p-value is extremely low. We can therefore safely reject the null hypothesis that the model has homoskedastic variance, i.e. Gauss-Markov’s assumption 5 doesn’t hold and OLS is not the best linear estimator. Alternatively, we could have computed the White test manually: data$fittedValues <- fitted.a data$residual <- a$residuals data$fittedValuesSq <- data$fittedValues^2 data$residualMathSq <- data$residual^2 whiteRegression <- lm(residualMathSq ~ fittedValues + fittedValuesSq, data) summary(whiteRegression) ## ## Call: ## lm(formula = residualMathSq ~ fittedValues + fittedValuesSq, ## data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -593.52 -163.92 -72.42 52.12 2735.87 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1664.12800 315.21436 5.279 1.46e-07 *** ## fittedValues -28.29903 9.21685 -3.070 0.00217 ** ## fittedValuesSq 0.11553 0.06591 1.753 0.07982 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 365.1 on 1689 degrees of freedom ## Multiple R-squared: 0.1358, Adjusted R-squared: 0.1348 ## F-statistic: 132.7 on 2 and 1689 DF, p-value: < 2.2e-16 whiteRsquared <- summary(whiteRegression)$r.squared # Calculating the F-statistic fStatistic <- (whiteRsquared/2)/((1 - whiteRsquared)/(1692 - 3)) %>% print 3
  • 4. ## [1] 0.0005116607 # Critical value for alpha = 5% qf(0.95, df1 = 2, df2 = 1692 - 3) ## [1] 3.001052 # Critical value for alpha = 1% qf(0.99, df1 = 2, df2 = 1692 - 3) ## [1] 4.617749 c) Obtain ˆgi as the fitted values from the regression log(ˆu2 i ) on math4i, math4 2 i where math4i are the OLS fitted values and the ˆui are the OLS residuals. Let ˆhi = exp(ˆgi). Use the ˆhi to obtain WLS estimates. Are there big differences with the OLS coefficients? >> fitted_square = fitted.a^2 c = lm(log(a$residuals^2) ~ fitted.a + fitted_square) %>% print() ## ## Call: ## lm(formula = log(a$residuals^2) ~ fitted.a + fitted_square) ## ## Coefficients: ## (Intercept) fitted.a fitted_square ## 4.2075441 0.0596369 -0.0008415 g = c$fitted.values h = exp(g) WLS_model = lm(math4 ~ lunch + log(enroll) + log(exppp), data, weight = 1/h) %>% print() ## ## Call: ## lm(formula = math4 ~ lunch + log(enroll) + log(exppp), data = data, ## weights = 1/h) ## ## Coefficients: ## (Intercept) lunch log(enroll) log(exppp) ## 50.4782 -0.4486 -2.6473 6.4742 summary(WLS_model) ## ## Call: ## lm(formula = math4 ~ lunch + log(enroll) + log(exppp), data = data, ## weights = 1/h) 4
  • 5. ## ## Weighted Residuals: ## Min 1Q Median 3Q Max ## -10.7122 -1.1914 0.1403 1.3741 3.9151 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 50.47815 16.51032 3.057 0.002268 ** ## lunch -0.44859 0.01461 -30.697 < 2e-16 *** ## log(enroll) -2.64728 0.83594 -3.167 0.001569 ** ## log(exppp) 6.47420 1.68588 3.840 0.000127 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.9 on 1688 degrees of freedom ## Multiple R-squared: 0.36, Adjusted R-squared: 0.3588 ## F-statistic: 316.5 on 3 and 1688 DF, p-value: < 2.2e-16 Regular OLS regression results: math4 = 91.93240 (19.96170) − 0.44874 (0.01464) lunch − 5.39915 (0.94041) log(enroll) + 3.52475 (2.09785) log(exppp) Regular WLS regression results: math4 = 50.47815 (16.51032) − 0.44859 (0.01461) lunch − 2.64728 (0.83594) log(enroll) + 6.47420 (1.68588) log(exppp) Yes, there are significant differences with WLS compared to the OLS regression. None of the coefficients changed sign and they all remained statistically significant, but their values changed a lot. In particular: • The intercept decreased from 91.93240 to 50.47815 • The coefficient log(enroll) increased from -5.39915 to -2.64728 (i.e. increasing school enrollment is now estimated to have less impact on 4th grade math) • The coefficient log(exppp) increased from 3.52475 to 6.47420 (i.e. spending more on pupils is now expected to have a bigger impact on 4th grade math) The coefficient “lunch” almost didn’t change. The significance of some coefficients also changed. In particular, log(exppp) is now highly statistically significant. Reminder: the interpretation of the coefficients doesn’t change when WLS is used instead of OLS. And in both models, they are unbiased (as long as assumptions 1 to 4 held in the original model). d) Obtain the standard errors for WLS that allow misspecification of the variance function. Do these differ much from the usual WLS standard errors? >> If we are not sure of the specification of the variance function, we can couple WLS with the use of robust standard errors: 5
  • 6. WLS_model_robust = coeftest(WLS_model, vcov = vcovHC(WLS_model, "HC1")) %>% print() ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 50.478152 18.925179 2.6672 0.0077206 ** ## lunch -0.448592 0.014254 -31.4711 < 2.2e-16 *** ## log(enroll) -2.647282 1.054608 -2.5102 0.0121590 * ## log(exppp) 6.474200 1.812868 3.5712 0.0003652 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Regular WLS regression results: math4 = 50.47815 (16.51032) − 0.44859 (0.01461) lunch − 2.64728 (0.83594) log(enroll) + 6.47420 (1.68588) log(exppp) Robust WLS regression results: math4 = 50.47815 (18.925179 − 0.44859 (0.014254) lunch − 2.64728 (1.054608) log(enroll) + 6.47420 (1.812868) log(exppp) As expected, the robust standard errors are usually larger than the regular WLS. But in this case, the differences are small. The only relatively important impact this has is that, now, the coefficient log(enroll) is statistically significant at the 1.2159% significance level only, compared to 0.1569% in the regular model. e) For estimating the effect of spending on math4, does OLS or WLS appear to be more precise? >> WLS appears to be considerably more precise for this coefficient. In the regular OLS model, log(exppp) is statistically significant at the 9.31% level whereas in the regular WLS model it is statistically significant at the 0.0127% level. The same can be observed if we compare the robust OLS model to the robust WLS model, in which the significance level for log(exppp) are respectively 13.44% and 0.03652%. Exercise 2 Use the data from jtrain.RData for this exercise. a) Consider the simple regression model log(scrap) = β0 + β1grant + u where scrap is the firm scrap rate and grant is a dummy variable indicating whether a firm received a job training grant. Can you think of some reasons why the unobserved factors in u might be correlated with grant? >> 6
  • 7. load("jtrain.RData") There is an infinity of potential reasons. We are interested by factors that have an impact on both grant and scrap and are not already included in the regression (i.e. unobserved factors). For example, the attribution of a grant could be linked to the attributes of the workforce, i.e. a firm is more likely to receive a job training grant if its workers have specific attributes (e.g. age, previous qualifications, etc.). Those attributes, in turn, are very likely to affect the scrap rate, for example if the age of a worker has a causal impact on his efficiency. To give a comprehensive answer to this question, it would be useful to understand the process of attribution of grants in more detail. If being eligible to receive such a grant requires to meet some specific criteria at the firm level, such as being based in a specific geographical area, then these criteria might also, in turn, have an impact on the scrap rate (for example, some geographical areas may increase the likelihood of defect because of weather conditions). b) Estimate the simple regression model using the data for 1988. (You should have 54 observations.) Does receiving a job training grant significantly lower a firm’s scrap rate? >> The SRF is: data_88 = filter(data, year == "1988") model_88 = lm(log(scrap) ~ grant, data_88) %>% print ## ## Call: ## lm(formula = log(scrap) ~ grant, data = data_88) ## ## Coefficients: ## (Intercept) grant ## 0.4085 0.0566 summary(model_88) ## ## Call: ## lm(formula = log(scrap) ~ grant, data = data_88) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.4043 -0.9536 -0.0465 0.9636 2.8103 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.4085 0.2406 1.698 0.0954 . ## grant 0.0566 0.4056 0.140 0.8895 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.423 on 52 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.0003744, Adjusted R-squared: -0.01885 ## F-statistic: 0.01948 on 1 and 52 DF, p-value: 0.8895 7
  • 8. Regression results: log(scrap) = 0.4085 (0.2406) + 0.0566 (0.4056) grant The sample size is: summary(model_88)$df[2] + length(coef(model_88)) # = Degrees of freedom + number of coefficients. nrow( ## [1] 54 The r-squared is: summary(model_88)$r.squared ## [1] 0.0003744378 No, there is no evidence that receiving a job grant significantly lowers a firm’s scrap rate since the coefficient is positive, at least based on 1988’s data. c) Now, add as an explanatory variable log(scrap87). How does this change the estimated effect of grant? Interpret the coefficient on grant. Is it statistically significant at the 5% level against the one-sided alternative H1 : βgrant < 0? >> new_data_88 = filter(data, year == "1987") %>% select(scrap87 = scrap) %>% cbind(data_88) new_model_88 = lm(log(scrap) ~ grant + log(scrap87), new_data_88) %>% print ## ## Call: ## lm(formula = log(scrap) ~ grant + log(scrap87), data = new_data_88) ## ## Coefficients: ## (Intercept) grant log(scrap87) ## 0.02124 -0.25397 0.83116 summary(new_model_88) ## ## Call: ## lm(formula = log(scrap) ~ grant + log(scrap87), data = new_data_88) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.9146 -0.1763 0.0057 0.2308 1.5991 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.02124 0.08910 0.238 0.8126 8
  • 9. ## grant -0.25397 0.14703 -1.727 0.0902 . ## log(scrap87) 0.83116 0.04444 18.701 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.5127 on 51 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.8728, Adjusted R-squared: 0.8678 ## F-statistic: 174.9 on 2 and 51 DF, p-value: < 2.2e-16 The sample size is: summary(new_model_88)$df[2] + length(coef(new_model_88)) # = Degrees of freedom + number of coefficient ## [1] 54 The r-squared is: summary(new_model_88)$r.squared ## [1] 0.8727805 Let’s compare the two models, (2) being the one with the lagged dependant variable. Independant Variables (1) (2) (Intercept) 0.4085 (0.2406) 0.02124 (0.08910) grant 0.0566 (0.4056) -0.25397 (0.14703) log(scrap87) – 0.83116 (0.04444) Observations 54 54 R-Squared 0.0003744378 0.8727805 Adding the lagged independant variable changed the sign of grant and increasing its statistical significance. In the initial model (1), the interpretation of Beta_grant was “If a firm receives a job training grant, its scrap rate is expected to increase by approximately 5.66%”. In the new model (2), the interpretation of Beta_grant is “If a firm receives a job training grant, its scrap rate is expected to decrease by approximately 25.397%”. Thus, by controlling for the scrap rate of the previous year, the estimated effect of grant changed considerably. We want to test the following hypothesis: H0 : βgrant = 0H1 : βgrant < 0 The t-statistic of βgrant is: beta_grant.t_statistic = (coef(new_model_88)[["grant"]]/summary(new_model_88)$coefficients["grant", "Std. Error"]) %>% print ## [1] -1.727319 At the 5% significance level, the one-sided t-test critical value is: 9
  • 10. beta_grant.critical_value = qt(0.05, summary(new_model_88)$df[2] + length(coef(new_model_88))) %>% print #P(T < critical_value) = 5% ## [1] -1.673565 Do we reject H0? # Reject H0 if beta_grant.t_statistic < # -beta_grant.critical_value. beta_grant.t_statistic < -beta_grant.critical_value ## [1] TRUE We can reject H0 at the 5% significance level against the one-sided alternative that βgrant is lower than 0. This means that, in this new model, βgrant is statistically significant and smaller than 0. d) Test the null hypothesis that the parameter on log(scrap87) is one against the two-sided alternative. Report the p-value for the test. We want to test the following hypothesis: H0 : βlog(scrap87) = 1H1 : βlog(scrap87) = 1 The t-statistic of βlog(scrap87) is: beta_logScrap87.t_statistic = ((coef(new_model_88)[["log(scrap87)"]] - 1)/summary(new_model_88)$coefficients["log(scrap87)", "Std. Error"]) %>% print ## [1] -3.798889 At the 5% significance level, the two-sided t-test critical value is: beta_logScrap87.critical_value = qt(0.05/2, summary(new_model_88)$df[2] + length(coef(new_model_88))) %>% print #P(T < critical_value) = 5% ## [1] -2.004879 Do we reject H0? # Reject H0 if abs(beta_logScrap87.t_statistic) > # beta_logScrap87.critical_value. abs(beta_logScrap87.t_statistic) > beta_logScrap87.critical_value ## [1] TRUE We can reject H0 at the 5% significance level against the two-sided alternative. We could also have performed that test using the “car” package: 10
  • 11. linearHypothesis(new_model_88, "log(scrap87)=1") ## Linear hypothesis test ## ## Hypothesis: ## log(scrap87) = 1 ## ## Model 1: restricted model ## Model 2: log(scrap) ~ grant + log(scrap87) ## ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 52 17.197 ## 2 51 13.404 1 3.793 14.432 0.0003885 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The p-value for the test is 0.0003885. e) Repeat parts (c) and (d), using heteroskedasticity-robust standard errors, and briefly discuss any notable differences. We still want to test the following hypothesis, except that now it is with robust standard errors: H0 : βgrant = 0H1 : βgrant < 0 linearHypothesis(new_model_88, "grant=0", white.adjust = "hc1") ## Linear hypothesis test ## ## Hypothesis: ## grant = 0 ## ## Model 1: restricted model ## Model 2: log(scrap) ~ grant + log(scrap87) ## ## Note: Coefficient covariance matrix supplied. ## ## Res.Df Df F Pr(>F) ## 1 52 ## 2 51 1 3.0105 0.08876 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The one-sided p-value is 0.04438 (two-sided p-value divided by two), so we can still reject the null hypothesis at the 5% significance level (the p-value is actually lower with the robust standard errors in this case). Regarding log(scrap87), we still want to test the following hypothesis, but this time with robust standard errors: H0 : βlog(scrap87) = 1H1 : βlog(scrap87) = 1 11
  • 12. linearHypothesis(new_model_88, "log(scrap87)=1", white.adjust = "hc1") ## Linear hypothesis test ## ## Hypothesis: ## log(scrap87) = 1 ## ## Model 1: restricted model ## Model 2: log(scrap) ~ grant + log(scrap87) ## ## Note: Coefficient covariance matrix supplied. ## ## Res.Df Df F Pr(>F) ## 1 52 ## 2 51 1 5.271 0.02583 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The p-value is now 0.02583, compared to 0.0003885 with the non-robust standard errors. So we can still reject the null hypothesis at the 2.583% significance level, but including robustness in the hypothesis testing considerably reduced our confidence. 12