SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
Problem set 3
Jonathan Zimmermann
31 October 2015
Exercise 1
Suppose you collect data from a survey on wages, education, experience, and gender. In
addition, you ask for information about marijuana usage. The original question is: “On how
many separate occasions last month did you smoke marijuana?"
a) Write an equation that would allow you to estimate the effects of marijuana usage on wage, while
controlling for other factors. You should be able to make statements such as,“Smoking marijuana five
more times per month is estimated to change wage by x%."
>>
To be able to interpret the variables in that way, we need to build a log-linear model. The regression equation
would look like that:
log(wage) = β0 + β1marijuna_usage + β2education + β3experience + δ1gender + u
b) Write a model that would allow you to test whether drug usage has different effects on wages for men
and women. How would you test that there are no differences in the effects of drug usage for men and
women?
>>
We would need to add an interaction variable between the gender and the marijuana variables. The new
regression equation would look like that:
log(wage) = β0 + β1marijuna_usage + β2education + β3experience + δ1gender + δ2gender ∗ marijuna_usage + u
To test whether there are differences in the effects of drug usage for men and women, we could test the
following hypothesis with a t-test:
H0 : δ2 = 0H1 : δ2 = 0
To perform the t-test, we would first need to calculate the t-statistic with the following formula:
t =
gender ∗ marijuna − 0
s/
√
n
We would then look for the critical value based on the (1 − α/2) percentile in the t distribution with n-1
degrees of freedom. If the absolute value of the t-statistic is greater than the critical value, we would then
reject H0.
c) Suppose you think it is better to measure marijuana usage by putting people into one of four categories:
nonuser, light user (1 to 5 times per month), moderate user (6 to 10 times per month), and heavy
user (more than 10 times per month). Now, write a model that allows you to estimate the effects of
marijuana usage on wage.
1
>>
Incorporating this change into the model in a), we would have:
log(wage) = β0 + β2education + β3experience + δ1gender + δ2light_user + δ3moderate_user + δ4heavy_user + u
It is now easy to estimate each of the coefficients by running the regression normally.
d) Using the model in part (c), explain in detail how to test the null hypothesis that marijuana usage has
no effect on wage.
>>
We would need to test the following hypothesis (i.e. we want to test whether delta2, delta3 and delta4 are
together jointly significant), using a F-test:
H0 : δ2 = 0 AND δ3 = 0 AND δ4 = 0H1 : H0 is false
Let’s call the model in c) the “unrestricted model”. The “restricted model” would then be be:
log(wage) = β0 + β2education + β3experience + δ1gender + u
We then calculate the F-statistic, using the following formula:
SSRrestricted − SSRunrestricted/q
SSRunrestricted/(n − k − 1)
Where q = number of restrictions = 3 (because we test three parameters), k = number of variables in the
unrestricted model = 6
We would then reject H0 if the F-statistic is higher than the critical value (based on the Fisher distribution
at d1=q, d2=n-k-1).
e) What are some potential problems with drawing causal inference using the survey data that you
collected?
>>
The survey data might have multiple problems that would make it non representative of the population. One
of the biggest issues is self-selection and social desirability bias. In the case of this study, we could expect
for example individuals to voluntarily (or unconsciously) report lower values than their actual marijuna
consumption, by fear of looking like an addict/junkie (social desirability). Other issues might be linked to
the way the data has been collected. For example, if the survey has been conducted in a particular area or at
a particular time of the day, the respondants might not be a truly random sample of the population; this
will be the case for example if the survey is conducted by phone during the day, at times when the active
population is at work (which would result in a overrepresentation of unemployed people, housewives, retired
people, etc.). There are of course many other response biases that could make the data inaccurate, such as
the acquiescence bias.
2
Exercise 2
** Use the data in nbasal.RData for this exercise. **
a) Estimate a linear regression model relating points per game to experience in the league and position
(guard, forward, or center). Include experience in quadratic form and use centers as the base group.
Report the results (including SRF, the sample size, and R-squared).
>>
load("nbasal.RData")
The regression model is:
points = β0 + β1exper + β2expersq + δ1guard + δ2forward + u
The SRF is:
a = lm(points~exper+expersq+guard+forward,data)
a
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward, data = data)
##
## Coefficients:
## (Intercept) exper expersq guard forward
## 4.76076 1.28067 -0.07184 2.31469 1.54457
summary(a)
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.220 -4.268 -1.003 3.444 22.265
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.76076 1.17862 4.039 7.03e-05 ***
## exper 1.28067 0.32853 3.898 0.000123 ***
## expersq -0.07184 0.02407 -2.985 0.003106 **
## guard 2.31469 1.00036 2.314 0.021444 *
## forward 1.54457 1.00226 1.541 0.124492
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.668 on 264 degrees of freedom
## Multiple R-squared: 0.09098, Adjusted R-squared: 0.07721
## F-statistic: 6.606 on 4 and 264 DF, p-value: 4.426e-05
3
Regression results:
points = 4.76076
(1.17862)
+ 1.28067
(0.32853)
exper − 0.07184
(0.02407)
expersq + 2.31469
(1.00036)
guard + 1.54457
(1.00226)
forward
The sample size is:
summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have
## [1] 269
The r-squared is:
summary(a)$r.squared
## [1] 0.09097856
b) Holding experience fixed, does a guard score more than a center? How much more? Is the difference
statistically significant?
>>
Yes, a guard seems to score more than a center. When we control for experience and experienceˆ2, a guard
seems to score on average 2.31469 (δ1) more points.
If we want to know whether it has a statistically significant positive effect, we can test the following hypothesis:
H0 : δ1 = 0H1 : δ1 > 0
The one-sided p-value of δ1 is 0.010722 (two-sided p-value divided by two), so it is statistically significant at
the 1.0722048% significance level.
c) Now, add marital status to the equation. Holding position and experience fixed, are married players
more productive (based on points per game)?
>>
The new regression model is:
points = β0 + β1exper + β2expersq + δ1guard + δ2forward + δ3marr + u
The SRF is:
a = lm(points~exper+expersq+guard+forward+marr,data)
a
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr,
## data = data)
##
## Coefficients:
## (Intercept) exper expersq guard forward
## 4.70294 1.23326 -0.07037 2.28632 1.54091
## marr
## 0.58427
4
summary(a)
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.874 -4.227 -1.251 3.631 22.412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.70294 1.18174 3.980 8.93e-05 ***
## exper 1.23326 0.33421 3.690 0.000273 ***
## expersq -0.07037 0.02416 -2.913 0.003892 **
## guard 2.28632 1.00172 2.282 0.023265 *
## forward 1.54091 1.00298 1.536 0.125660
## marr 0.58427 0.74040 0.789 0.430751
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.672 on 263 degrees of freedom
## Multiple R-squared: 0.09313, Adjusted R-squared: 0.07588
## F-statistic: 5.401 on 5 and 263 DF, p-value: 9.526e-05
Regression results:
points = 4.70294
(1.18174)
+ 1.23326
(0.33421)
exper − 0.07037
(0.02416)
expersq + 2.28632
(1.00172)
guard + 1.54091
(1.00298)
forward + 0.5842
(0.74040)
marr
The sample size is still:
summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have
## [1] 269
The r-squared is:
summary(a)$r.squared
## [1] 0.09312579
Yes, married players seem to be more productive than non-married players. When we control for experience,
experienceˆ2 and position, a guard seems to score on average 0.58427 (δ3) more points. However, if might
not be statistically significant.
If we want to know whether it has a statistically significant positive effect, we need to test the following
hypothesis:
H0 : δ3 = 0H1 : δ3 > 0
The one-sided p-value of δ3 is 0.2153757 (two-sided p-value divided by two), so it is statistically significant
at the 21.5375685% significance level. So for most practical purposes, we cannot consider it as statistically
significant.
5
d) Add interactions of marital status with both experience variables. In this expanded model, is there
strong evidence that marital status affects points per game?
>>
The new regression model is:
points = β0+β1exper+β2expersq+δ1guard+δ2forward+δ3marr+δ4marr∗experience+δ5marr∗expersq+u
The SRF is:
a = lm(points~exper+expersq+guard+forward+marr+marr*exper+marr*expersq,data)
a
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr +
## marr * exper + marr * expersq, data = data)
##
## Coefficients:
## (Intercept) exper expersq guard forward
## 5.81615 0.70255 -0.02950 2.25079 1.62915
## marr exper:marr expersq:marr
## -2.53750 1.27965 -0.09359
summary(a)
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr +
## marr * exper + marr * expersq, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.239 -4.328 -1.067 3.742 22.197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.81615 1.34878 4.312 2.29e-05 ***
## exper 0.70255 0.43405 1.619 0.1067
## expersq -0.02950 0.03267 -0.903 0.3674
## guard 2.25079 1.00002 2.251 0.0252 *
## forward 1.62915 1.00199 1.626 0.1052
## marr -2.53750 2.03822 -1.245 0.2143
## exper:marr 1.27965 0.68229 1.876 0.0618 .
## expersq:marr -0.09359 0.04887 -1.915 0.0566 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.654 on 261 degrees of freedom
## Multiple R-squared: 0.1058, Adjusted R-squared: 0.08184
## F-statistic: 4.413 on 7 and 261 DF, p-value: 0.0001188
6
Regression results:
points = 5.81615
(1.34878)
+0.70255
(0.43405)
exper−0.02950
(0.03267)
expersq+2.25079
(1.00002)
guard+1.62915
(1.00199)
forward−2.53750
(2.03822)
marr+1.27965
(0.68229)
exper∗marrm−0.
(0.
The sample size is still:
summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have
## [1] 269
The r-squared is:
summary(a)$r.squared
## [1] 0.1058214
This time, we want to perform a two-sided test (because we are interested in whether there is an effect in
either direction), on three different coefficients at the same time. Therefore, this is a joint hypothesis testing:
we want to know if, together, all the coefficients that include the marrital status have an effect on the points:
H0 : δ3 = 0ANDδ4 = 0ANDδ5 = 0H1 : H0isfalse
The two-sided p-value of δ3 is 0.2142624, so it is statistically significant at the 21.4262432% significance level.
So no, for most practical purposes, we cannot really say there is strong evidence that marital status affects
points per game.
e) Estimate the model from part (c) but use assists per game as the dependent variable. Are there any
notable differences from part (c)? Discuss.
>>
The new regression model is:
assists = β0 + β1exper + β2expersq + δ1guard + δ2forward + δ3marr + u
The SRF is:
a = lm(assists~exper+expersq+guard+forward+marr,data)
a
##
## Call:
## lm(formula = assists ~ exper + expersq + guard + forward + marr,
## data = data)
##
## Coefficients:
## (Intercept) exper expersq guard forward
## -0.22581 0.44360 -0.02673 2.49167 0.44747
## marr
## 0.32190
7
summary(a)
##
## Call:
## lm(formula = assists ~ exper + expersq + guard + forward + marr,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3127 -1.0780 -0.3157 0.6788 8.2488
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.225809 0.354904 -0.636 0.52516
## exper 0.443603 0.100372 4.420 1.45e-05 ***
## expersq -0.026726 0.007256 -3.683 0.00028 ***
## guard 2.491672 0.300842 8.282 6.19e-15 ***
## forward 0.447471 0.301220 1.486 0.13860
## marr 0.321899 0.222359 1.448 0.14891
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.704 on 263 degrees of freedom
## Multiple R-squared: 0.3499, Adjusted R-squared: 0.3375
## F-statistic: 28.31 on 5 and 263 DF, p-value: < 2.2e-16
Regression results:
assists = −0.225809
(0.354904)
+0.443603
(0.100372)
exper−0.026726
(0.007256)
expersq+2.491672
(0.300842)
guard+0.447471
(0.301220)
forward+0.321899
(0.222359)
marr
The sample size is still:
summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have
## [1] 269
The r-squared is:
summary(a)$r.squared
## [1] 0.3498759
As we can see, there are some differences compared to c), but nothing major. Except for the intercept,
which changed sign, the direction of all the effects is the same. The intercept, which was highly statistically
significant in c), is no longer statistically significant and the variable “guard” is now much more significant
than in c). All the variables changed in magnitude in sometimes significative ways. Most of these differences
in magnitude is explained by the different scales of “assists” and “points”:
8
mean(data$assists)
## [1] 2.408922
mean(data$points)
## [1] 10.21041
9

Weitere ähnliche Inhalte

Was ist angesagt?

Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Kazuki Yoshida
 
Autocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d testAutocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d testShilpa Chaudhary
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
HeteroscedasticityMuhammad Ali
 
Analysis of Time Series
Analysis of Time SeriesAnalysis of Time Series
Analysis of Time SeriesManu Antony
 
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Jonathan Zimmermann
 
臨床家が知っておくべき臨床疫学・統計
臨床家が知っておくべき臨床疫学・統計臨床家が知っておくべき臨床疫学・統計
臨床家が知っておくべき臨床疫学・統計Yasuaki Sagara
 
20150404 rm - autocorrelation
20150404   rm - autocorrelation20150404   rm - autocorrelation
20150404 rm - autocorrelationQatar University
 
バリデーション研究の計画・報告・活用
バリデーション研究の計画・報告・活用バリデーション研究の計画・報告・活用
バリデーション研究の計画・報告・活用Yasuyuki Okumura
 
Poisson distribution business statistics
Poisson distribution business statisticsPoisson distribution business statistics
Poisson distribution business statisticsRESHMI RAVEENDRAN
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptxmesfin69
 
単一事例研究法と統計的推測:ベイズ流アプローチを架け橋として (文字飛び回避版はこちら -> https://www.slideshare.net/yos...
単一事例研究法と統計的推測:ベイズ流アプローチを架け橋として (文字飛び回避版はこちら -> https://www.slideshare.net/yos...単一事例研究法と統計的推測:ベイズ流アプローチを架け橋として (文字飛び回避版はこちら -> https://www.slideshare.net/yos...
単一事例研究法と統計的推測:ベイズ流アプローチを架け橋として (文字飛び回避版はこちら -> https://www.slideshare.net/yos...Yoshitake Takebayashi
 
Applied Statistics - Introduction
Applied Statistics - IntroductionApplied Statistics - Introduction
Applied Statistics - IntroductionJulio Huato
 
Long-run Economic Growth
Long-run Economic GrowthLong-run Economic Growth
Long-run Economic Growthdwessler
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1Muhammad Ali
 

Was ist angesagt? (20)

Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
 
Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
 
Autocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d testAutocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d test
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Panel Data Models
Panel Data ModelsPanel Data Models
Panel Data Models
 
Analysis of Time Series
Analysis of Time SeriesAnalysis of Time Series
Analysis of Time Series
 
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
 
臨床家が知っておくべき臨床疫学・統計
臨床家が知っておくべき臨床疫学・統計臨床家が知っておくべき臨床疫学・統計
臨床家が知っておくべき臨床疫学・統計
 
20150404 rm - autocorrelation
20150404   rm - autocorrelation20150404   rm - autocorrelation
20150404 rm - autocorrelation
 
IV Slides 2020.pptx
IV Slides 2020.pptxIV Slides 2020.pptx
IV Slides 2020.pptx
 
バリデーション研究の計画・報告・活用
バリデーション研究の計画・報告・活用バリデーション研究の計画・報告・活用
バリデーション研究の計画・報告・活用
 
Regression models for panel data
Regression models for panel dataRegression models for panel data
Regression models for panel data
 
Difference-in-Difference Methods
Difference-in-Difference MethodsDifference-in-Difference Methods
Difference-in-Difference Methods
 
Poisson distribution business statistics
Poisson distribution business statisticsPoisson distribution business statistics
Poisson distribution business statistics
 
Ch07 ans
Ch07 ansCh07 ans
Ch07 ans
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 
単一事例研究法と統計的推測:ベイズ流アプローチを架け橋として (文字飛び回避版はこちら -> https://www.slideshare.net/yos...
単一事例研究法と統計的推測:ベイズ流アプローチを架け橋として (文字飛び回避版はこちら -> https://www.slideshare.net/yos...単一事例研究法と統計的推測:ベイズ流アプローチを架け橋として (文字飛び回避版はこちら -> https://www.slideshare.net/yos...
単一事例研究法と統計的推測:ベイズ流アプローチを架け橋として (文字飛び回避版はこちら -> https://www.slideshare.net/yos...
 
Applied Statistics - Introduction
Applied Statistics - IntroductionApplied Statistics - Introduction
Applied Statistics - Introduction
 
Long-run Economic Growth
Long-run Economic GrowthLong-run Economic Growth
Long-run Economic Growth
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1
 

Ähnlich wie Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperial College London

Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1Daniel Katz
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression Ashwini Mathur
 
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedBasic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedbob panic
 
A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxevonnehoggarth79783
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysismetalkid132
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxanhlodge
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regressionjamuga gitulho
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...Salehkhanovic
 
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docxInstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docxdirkrplav
 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsLong Beach City College
 
1. A law firm wants to determine the trend in its annual billings .docx
1. A law firm wants to determine the trend in its annual billings .docx1. A law firm wants to determine the trend in its annual billings .docx
1. A law firm wants to determine the trend in its annual billings .docxmonicafrancis71118
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data sciencepujashri1975
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance OptimizationAlbert Chu
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
 

Ähnlich wie Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperial College London (20)

1624.pptx
1624.pptx1624.pptx
1624.pptx
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression
 
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedBasic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
 
A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docx
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysis
 
2 simple regression
2   simple regression2   simple regression
2 simple regression
 
One way anova
One way anovaOne way anova
One way anova
 
Math 300 MM Project
Math 300 MM ProjectMath 300 MM Project
Math 300 MM Project
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
 
Probability unit2.pptx
Probability unit2.pptxProbability unit2.pptx
Probability unit2.pptx
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...
 
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docxInstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched Pairs
 
1. A law firm wants to determine the trend in its annual billings .docx
1. A law firm wants to determine the trend in its annual billings .docx1. A law firm wants to determine the trend in its annual billings .docx
1. A law firm wants to determine the trend in its annual billings .docx
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
3 es timation-of_parameters[1]
3 es timation-of_parameters[1]3 es timation-of_parameters[1]
3 es timation-of_parameters[1]
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance Optimization
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 

Mehr von Jonathan Zimmermann

Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...Jonathan Zimmermann
 
External analysis - french coffee industry
External analysis  - french coffee industryExternal analysis  - french coffee industry
External analysis - french coffee industryJonathan Zimmermann
 
Problem set 2 - Statistics and econometrics - Msc Business Analytics - Imper...
Problem set 2  - Statistics and econometrics - Msc Business Analytics - Imper...Problem set 2  - Statistics and econometrics - Msc Business Analytics - Imper...
Problem set 2 - Statistics and econometrics - Msc Business Analytics - Imper...Jonathan Zimmermann
 
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...Jonathan Zimmermann
 
Visualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College LondonVisualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College LondonJonathan Zimmermann
 
Target Corporation - Consulting project
Target Corporation - Consulting projectTarget Corporation - Consulting project
Target Corporation - Consulting projectJonathan Zimmermann
 

Mehr von Jonathan Zimmermann (7)

Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
 
External analysis - french coffee industry
External analysis  - french coffee industryExternal analysis  - french coffee industry
External analysis - french coffee industry
 
Problem set 2 - Statistics and econometrics - Msc Business Analytics - Imper...
Problem set 2  - Statistics and econometrics - Msc Business Analytics - Imper...Problem set 2  - Statistics and econometrics - Msc Business Analytics - Imper...
Problem set 2 - Statistics and econometrics - Msc Business Analytics - Imper...
 
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
 
Visualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College LondonVisualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College London
 
Target Corporation - Consulting project
Target Corporation - Consulting projectTarget Corporation - Consulting project
Target Corporation - Consulting project
 
Rambus v. FTC
Rambus v. FTCRambus v. FTC
Rambus v. FTC
 

Kürzlich hochgeladen

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 

Kürzlich hochgeladen (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 

Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperial College London

  • 1. Problem set 3 Jonathan Zimmermann 31 October 2015 Exercise 1 Suppose you collect data from a survey on wages, education, experience, and gender. In addition, you ask for information about marijuana usage. The original question is: “On how many separate occasions last month did you smoke marijuana?" a) Write an equation that would allow you to estimate the effects of marijuana usage on wage, while controlling for other factors. You should be able to make statements such as,“Smoking marijuana five more times per month is estimated to change wage by x%." >> To be able to interpret the variables in that way, we need to build a log-linear model. The regression equation would look like that: log(wage) = β0 + β1marijuna_usage + β2education + β3experience + δ1gender + u b) Write a model that would allow you to test whether drug usage has different effects on wages for men and women. How would you test that there are no differences in the effects of drug usage for men and women? >> We would need to add an interaction variable between the gender and the marijuana variables. The new regression equation would look like that: log(wage) = β0 + β1marijuna_usage + β2education + β3experience + δ1gender + δ2gender ∗ marijuna_usage + u To test whether there are differences in the effects of drug usage for men and women, we could test the following hypothesis with a t-test: H0 : δ2 = 0H1 : δ2 = 0 To perform the t-test, we would first need to calculate the t-statistic with the following formula: t = gender ∗ marijuna − 0 s/ √ n We would then look for the critical value based on the (1 − α/2) percentile in the t distribution with n-1 degrees of freedom. If the absolute value of the t-statistic is greater than the critical value, we would then reject H0. c) Suppose you think it is better to measure marijuana usage by putting people into one of four categories: nonuser, light user (1 to 5 times per month), moderate user (6 to 10 times per month), and heavy user (more than 10 times per month). Now, write a model that allows you to estimate the effects of marijuana usage on wage. 1
  • 2. >> Incorporating this change into the model in a), we would have: log(wage) = β0 + β2education + β3experience + δ1gender + δ2light_user + δ3moderate_user + δ4heavy_user + u It is now easy to estimate each of the coefficients by running the regression normally. d) Using the model in part (c), explain in detail how to test the null hypothesis that marijuana usage has no effect on wage. >> We would need to test the following hypothesis (i.e. we want to test whether delta2, delta3 and delta4 are together jointly significant), using a F-test: H0 : δ2 = 0 AND δ3 = 0 AND δ4 = 0H1 : H0 is false Let’s call the model in c) the “unrestricted model”. The “restricted model” would then be be: log(wage) = β0 + β2education + β3experience + δ1gender + u We then calculate the F-statistic, using the following formula: SSRrestricted − SSRunrestricted/q SSRunrestricted/(n − k − 1) Where q = number of restrictions = 3 (because we test three parameters), k = number of variables in the unrestricted model = 6 We would then reject H0 if the F-statistic is higher than the critical value (based on the Fisher distribution at d1=q, d2=n-k-1). e) What are some potential problems with drawing causal inference using the survey data that you collected? >> The survey data might have multiple problems that would make it non representative of the population. One of the biggest issues is self-selection and social desirability bias. In the case of this study, we could expect for example individuals to voluntarily (or unconsciously) report lower values than their actual marijuna consumption, by fear of looking like an addict/junkie (social desirability). Other issues might be linked to the way the data has been collected. For example, if the survey has been conducted in a particular area or at a particular time of the day, the respondants might not be a truly random sample of the population; this will be the case for example if the survey is conducted by phone during the day, at times when the active population is at work (which would result in a overrepresentation of unemployed people, housewives, retired people, etc.). There are of course many other response biases that could make the data inaccurate, such as the acquiescence bias. 2
  • 3. Exercise 2 ** Use the data in nbasal.RData for this exercise. ** a) Estimate a linear regression model relating points per game to experience in the league and position (guard, forward, or center). Include experience in quadratic form and use centers as the base group. Report the results (including SRF, the sample size, and R-squared). >> load("nbasal.RData") The regression model is: points = β0 + β1exper + β2expersq + δ1guard + δ2forward + u The SRF is: a = lm(points~exper+expersq+guard+forward,data) a ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward, data = data) ## ## Coefficients: ## (Intercept) exper expersq guard forward ## 4.76076 1.28067 -0.07184 2.31469 1.54457 summary(a) ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -11.220 -4.268 -1.003 3.444 22.265 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.76076 1.17862 4.039 7.03e-05 *** ## exper 1.28067 0.32853 3.898 0.000123 *** ## expersq -0.07184 0.02407 -2.985 0.003106 ** ## guard 2.31469 1.00036 2.314 0.021444 * ## forward 1.54457 1.00226 1.541 0.124492 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.668 on 264 degrees of freedom ## Multiple R-squared: 0.09098, Adjusted R-squared: 0.07721 ## F-statistic: 6.606 on 4 and 264 DF, p-value: 4.426e-05 3
  • 4. Regression results: points = 4.76076 (1.17862) + 1.28067 (0.32853) exper − 0.07184 (0.02407) expersq + 2.31469 (1.00036) guard + 1.54457 (1.00226) forward The sample size is: summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have ## [1] 269 The r-squared is: summary(a)$r.squared ## [1] 0.09097856 b) Holding experience fixed, does a guard score more than a center? How much more? Is the difference statistically significant? >> Yes, a guard seems to score more than a center. When we control for experience and experienceˆ2, a guard seems to score on average 2.31469 (δ1) more points. If we want to know whether it has a statistically significant positive effect, we can test the following hypothesis: H0 : δ1 = 0H1 : δ1 > 0 The one-sided p-value of δ1 is 0.010722 (two-sided p-value divided by two), so it is statistically significant at the 1.0722048% significance level. c) Now, add marital status to the equation. Holding position and experience fixed, are married players more productive (based on points per game)? >> The new regression model is: points = β0 + β1exper + β2expersq + δ1guard + δ2forward + δ3marr + u The SRF is: a = lm(points~exper+expersq+guard+forward+marr,data) a ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward + marr, ## data = data) ## ## Coefficients: ## (Intercept) exper expersq guard forward ## 4.70294 1.23326 -0.07037 2.28632 1.54091 ## marr ## 0.58427 4
  • 5. summary(a) ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward + marr, ## data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.874 -4.227 -1.251 3.631 22.412 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.70294 1.18174 3.980 8.93e-05 *** ## exper 1.23326 0.33421 3.690 0.000273 *** ## expersq -0.07037 0.02416 -2.913 0.003892 ** ## guard 2.28632 1.00172 2.282 0.023265 * ## forward 1.54091 1.00298 1.536 0.125660 ## marr 0.58427 0.74040 0.789 0.430751 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.672 on 263 degrees of freedom ## Multiple R-squared: 0.09313, Adjusted R-squared: 0.07588 ## F-statistic: 5.401 on 5 and 263 DF, p-value: 9.526e-05 Regression results: points = 4.70294 (1.18174) + 1.23326 (0.33421) exper − 0.07037 (0.02416) expersq + 2.28632 (1.00172) guard + 1.54091 (1.00298) forward + 0.5842 (0.74040) marr The sample size is still: summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have ## [1] 269 The r-squared is: summary(a)$r.squared ## [1] 0.09312579 Yes, married players seem to be more productive than non-married players. When we control for experience, experienceˆ2 and position, a guard seems to score on average 0.58427 (δ3) more points. However, if might not be statistically significant. If we want to know whether it has a statistically significant positive effect, we need to test the following hypothesis: H0 : δ3 = 0H1 : δ3 > 0 The one-sided p-value of δ3 is 0.2153757 (two-sided p-value divided by two), so it is statistically significant at the 21.5375685% significance level. So for most practical purposes, we cannot consider it as statistically significant. 5
  • 6. d) Add interactions of marital status with both experience variables. In this expanded model, is there strong evidence that marital status affects points per game? >> The new regression model is: points = β0+β1exper+β2expersq+δ1guard+δ2forward+δ3marr+δ4marr∗experience+δ5marr∗expersq+u The SRF is: a = lm(points~exper+expersq+guard+forward+marr+marr*exper+marr*expersq,data) a ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward + marr + ## marr * exper + marr * expersq, data = data) ## ## Coefficients: ## (Intercept) exper expersq guard forward ## 5.81615 0.70255 -0.02950 2.25079 1.62915 ## marr exper:marr expersq:marr ## -2.53750 1.27965 -0.09359 summary(a) ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward + marr + ## marr * exper + marr * expersq, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.239 -4.328 -1.067 3.742 22.197 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.81615 1.34878 4.312 2.29e-05 *** ## exper 0.70255 0.43405 1.619 0.1067 ## expersq -0.02950 0.03267 -0.903 0.3674 ## guard 2.25079 1.00002 2.251 0.0252 * ## forward 1.62915 1.00199 1.626 0.1052 ## marr -2.53750 2.03822 -1.245 0.2143 ## exper:marr 1.27965 0.68229 1.876 0.0618 . ## expersq:marr -0.09359 0.04887 -1.915 0.0566 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.654 on 261 degrees of freedom ## Multiple R-squared: 0.1058, Adjusted R-squared: 0.08184 ## F-statistic: 4.413 on 7 and 261 DF, p-value: 0.0001188 6
  • 7. Regression results: points = 5.81615 (1.34878) +0.70255 (0.43405) exper−0.02950 (0.03267) expersq+2.25079 (1.00002) guard+1.62915 (1.00199) forward−2.53750 (2.03822) marr+1.27965 (0.68229) exper∗marrm−0. (0. The sample size is still: summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have ## [1] 269 The r-squared is: summary(a)$r.squared ## [1] 0.1058214 This time, we want to perform a two-sided test (because we are interested in whether there is an effect in either direction), on three different coefficients at the same time. Therefore, this is a joint hypothesis testing: we want to know if, together, all the coefficients that include the marrital status have an effect on the points: H0 : δ3 = 0ANDδ4 = 0ANDδ5 = 0H1 : H0isfalse The two-sided p-value of δ3 is 0.2142624, so it is statistically significant at the 21.4262432% significance level. So no, for most practical purposes, we cannot really say there is strong evidence that marital status affects points per game. e) Estimate the model from part (c) but use assists per game as the dependent variable. Are there any notable differences from part (c)? Discuss. >> The new regression model is: assists = β0 + β1exper + β2expersq + δ1guard + δ2forward + δ3marr + u The SRF is: a = lm(assists~exper+expersq+guard+forward+marr,data) a ## ## Call: ## lm(formula = assists ~ exper + expersq + guard + forward + marr, ## data = data) ## ## Coefficients: ## (Intercept) exper expersq guard forward ## -0.22581 0.44360 -0.02673 2.49167 0.44747 ## marr ## 0.32190 7
  • 8. summary(a) ## ## Call: ## lm(formula = assists ~ exper + expersq + guard + forward + marr, ## data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.3127 -1.0780 -0.3157 0.6788 8.2488 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.225809 0.354904 -0.636 0.52516 ## exper 0.443603 0.100372 4.420 1.45e-05 *** ## expersq -0.026726 0.007256 -3.683 0.00028 *** ## guard 2.491672 0.300842 8.282 6.19e-15 *** ## forward 0.447471 0.301220 1.486 0.13860 ## marr 0.321899 0.222359 1.448 0.14891 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.704 on 263 degrees of freedom ## Multiple R-squared: 0.3499, Adjusted R-squared: 0.3375 ## F-statistic: 28.31 on 5 and 263 DF, p-value: < 2.2e-16 Regression results: assists = −0.225809 (0.354904) +0.443603 (0.100372) exper−0.026726 (0.007256) expersq+2.491672 (0.300842) guard+0.447471 (0.301220) forward+0.321899 (0.222359) marr The sample size is still: summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have ## [1] 269 The r-squared is: summary(a)$r.squared ## [1] 0.3498759 As we can see, there are some differences compared to c), but nothing major. Except for the intercept, which changed sign, the direction of all the effects is the same. The intercept, which was highly statistically significant in c), is no longer statistically significant and the variable “guard” is now much more significant than in c). All the variables changed in magnitude in sometimes significative ways. Most of these differences in magnitude is explained by the different scales of “assists” and “points”: 8