Some study materials

Multiple Linear Regression
The following case study and data set are taken from the book:
Chatterjee, S. and Hadi, A.S., 2015. Regression analysis by example. John Wiley & Sons.
The authors have used the data from a study in Industrial Psychology (Management).
An exploratory study was carried out in a large financial organization in an attempt to explain
specific supervisor characteristics/traits and overall satisfaction with supervisors as perceived by
the employees. Data were collected from 30 departments in the organization. Table 1 below
provides details of the variables (all can be treated as continuous) used in the study.
Table 1.
Variable Description
Y Overall rating of job being done by supervisor (0-100)
X1 Handles employee complaints (0-100)
X2 Does not allow special privileges (0-100)
X3 Opportunity learn new things (0-100)
X4 Raises based on performance (0-100)
X5 Too critical of employee’s poor performance (0-100)
X6 Rate of advancing to better job (0-100)
A multiple linear regression model is fitted on the data, and the output is presented in Table 2.
Answer the following questions:
a) Write down the equation for the full MLR model.
b) Use the model to predict a supervisor’s performance (overall rating of job being done)
when his scores for X1 – X6 for a department are all 50.
c) Explain the coefficient value ‘0.6132’ and standard error ‘0.1609’ corresponding to X1
given in the output.
d) Explain R2 for this model.
e) Find the missing values ‘a’, ‘b’, ‘c’ and ‘d’ from the output.
f) Is the model significant in predicting or explaining Y? Justify with proper explanation.
g) Do you find any individual factor being significant for Y? Justify with proper explanation.
• For any test of hypothesis, consider 5% as the level of significance if nothing
is specifically mentioned.
‘

Table 2.
Model answers:
a) The full multiple linear regression (MLR) equation is given by:
𝑌𝑌 = 10.787 + 0.613 𝑋𝑋1 − 0.073 𝑋𝑋2 + 0.320 𝑋𝑋3 + 0.082 𝑋𝑋4 + 0.038 𝑋𝑋5 − 0.217 𝑋𝑋6
(1)
b) 𝑌𝑌 = 10.787 + 0.613 × 50 − 0.073 × 50 × +0.320 × 50 + 0.082 × 50 + 0.038 ×
50 − 0.217 × 50 = 48.937
The supervisor’s predicted overall rating of job being done is 48.937.
c) The value of the partial regression coefficient related to X1 is 0.6132. It signifies
that if the score X1 (handling employee complaints) is increased by 1 unit, then Y
(supervisor’s overall rating of job being done) increases by 0.613 units when all
other factors/predictors (X2-X6) are kept constant at the same level. The standard
error of the estimate 0.6132 is 0.1609 which indicates the degree of imprecision or
variability involved in estimation. Lower the standard error, more precise is the
estimated value.
d) R2=SSR/SST is a measure of adequacy for a fitted linear regression model. R2
indicates the amount (%) of variability present in Y that is explained by the linear
regression model. In this case, R2 is 0.7326 which means 73.26% of the total
variation in Y (supervisor’s overall rating of job being done) is being explained by
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.855921721
R Square 0.732601993
Adjusted R Square 0.662845991
Standard Error 7.067993765
Observations 30
ANOVA
df SS MS F Significance F
Regression 6 b c d 1.24041E-05
Residual a 1149.000325 49.95653586
Total 29 4296.966667
Coefficients Standard Error t Stat P-value
Intercept 10.78707639 11.58925724 0.930782375 0.361633721
X1 0.613187608 0.160983115 3.809018158 0.000902868
X2 -0.073050143 0.13572469 -0.53822295 0.595593921
X3 0.320332116 0.168520319 1.900851595 0.069925346
X4 0.081732134 0.221477677 0.369031022 0.715480088
X5 0.038381447 0.146995442 0.261106377 0.796334264
X6 -0.217056682 0.178209471 -1.217986229 0.235577049

the linear regression model given in equation 1, and remaining 100-73.26=26.74%
of the total variability in Y could not be explained by the model. In this case, we
can conclude that the model fit is not adequate based on R2 value only, but in
practice other measures (Adjusted R2, PRESS, AIC, BIC) also need to be checked
to comment on model adequacy.
e) a = Residual df = Total df - Regression df = 29-6 = 23
b = SS Regression = SS Total – SS Residual = 4296.967 – 1149.000 = 3147.967
c = MS Regression = (SS Regression) / (Regression df) = 3147.967/6 = 524.661
d = F statistic value = (MS Regression)/(MS Residual) = 524.661/49.956 = 10.502
f) To check if the model is significant in predicting or explaining Y, we carry out the
following test of hypothesis:
𝐻𝐻0: 𝛽𝛽1 = 𝛽𝛽2 = ⋯ = 𝛽𝛽6 = 0 vs. 𝐻𝐻1: at least one inequality in H0
The test statistic related to this test is F=MSR/MSE which follows an F distribution
(sampling distribution) and the value of F is 10.502 obtained from the ANOVA table
given in Table 2. The p-value related to this test is given in the column ‘Significance
F’ of the ANOVA table. The p-value is 1.2404 × 10−5
< 0.05. Hence, we reject the
null hypothesis at 5% level of significance, which means at least one 𝛽𝛽 is
significantly different than 0 and the model IS significant in explaining or predicting
Y.
g) The individual significant predictor test tests one of the following at a time
𝐻𝐻0: 𝛽𝛽1 = 0 vs. 𝐻𝐻1: 𝛽𝛽1 ≠ 0
OR
𝐻𝐻0: 𝛽𝛽2 = 0 vs. 𝐻𝐻1: 𝛽𝛽2 ≠ 0
:
:
𝐻𝐻0: 𝛽𝛽6 = 0 vs. 𝐻𝐻1: 𝛽𝛽6 ≠ 0
The test statistic values related to these tests are given in the column ‘t Stat’ and
p-values in the column ‘P-value’ against the predictors. We note that p-value
related to the test
𝐻𝐻0: 𝛽𝛽1 = 0 vs. 𝐻𝐻1: 𝛽𝛽1 ≠ 0
Is 0.0009 < 0.05. Hence, we can reject the null hypothesis in favor of the
alternative, and say that X1 is a significant predictor of Y individually when all other
factors are adjusted for. No other p-value is less than 0.05. Hence, X1 (Handles
employee complaints) is found to be the only significant predictor which affects Y
(supervisor’s overall rating in job being done).

Testing of hypothesis
Based on Mini Case 9.2:
Lisa has been working at a beauty counter in a department store for 5 years. In her spare time, she has
also been creating lotions and fragrances using all natural products. After receiving positive feedback from
her friends and family about her beauty products, Lisa decides to open her own store. Lisa knows that
convincing a bank to help fund her new business will require more than few positive testimonials from
family. Based on her experience working at the department store, Lisa believes women in her area spend
more than national average on fragrance products. This fact could help her make her business successful.
Lisa would like to be able to support her belief with data to include in a business plan proposal that she
would then use to obtain a small business loan. Lisa took a business statistics course while in college and
decides to use the hypothesis tool she learned. After conducting research, she learns that the national
average spending by women on fragrance products is $59 every 3 months. Lisa takes a random sample of
25 women from the local area and finds that the sample mean is $68.10, and the sample standard deviation
is $14.46. Assume the amount spent every 3 months on fragrance products by women in the area follows
a normal distribution. Find the corresponding output in Table 3.
Table 3
Use the above case study to answer the following questions:
a) What is the population considered in the study?
- Amount of money spent every 3 months by women on fragrance products in
the local area (not the country)
b) What is the population parameter?
- Population mean (𝜇𝜇) of the amount spent every 3 months by women on fragrance
products in the local area
c) What is the related test of hypothesis?
- 𝐻𝐻0: 𝜇𝜇 = $59 vs 𝐻𝐻1: 𝜇𝜇 > $59
d) Which test of hypothesis should Lisa use?
- One sample t-test because the population distribution is normal, the population
SD is unknown, and the sample size (of 25) is not large
e) What is the sampling distribution related to the test statistic?
- t-distribution with 𝑛𝑛 − 1=25-1=24 df
f) What is the test statistic value?
- 3.1484
Variable 1 Variable 2
Mean 68.1055132 0
Variance 209.0951071 0
Observations 25 25
Hypothesized Mean Difference 59
df 24
t Stat 3.148491299
P(T<=t) one-tail 0.002174776
t Critical one-tail 1.71088208
P(T<=t) two-tail 0.004349552
t Critical two-tail 2.063898562

g) What is the p-value for this test?
- 0.0021
- 0.0043 is NOT the p-value for this test, it is a p-value for the test 𝐻𝐻0: 𝜇𝜇 = $59
vs 𝐻𝐻1: 𝜇𝜇 ≠ $59
h) What is the critical value related to this test?
- 1.7108
- 2.0638 and -2.0638 are NOT the critical values for this test; they are the critical
values for the test 𝐻𝐻0: 𝜇𝜇 = $59 vs 𝐻𝐻1: 𝜇𝜇 ≠ $59
i) How can you calculate the test statistic value using excel?
- T.INV(0.95, 24) = 1.7108
- T.INV(0.975, 24) = 2.0638 and T.INV(0.025, 24) = -2.0638 for 𝐻𝐻0: 𝜇𝜇 = $59 vs
𝐻𝐻1: 𝜇𝜇 ≠ $59
- Also T.INV.2T(0.05,24)=2.0638 gives the critical values for 𝐻𝐻0: 𝜇𝜇 = $59 vs
𝐻𝐻1: 𝜇𝜇 ≠ $59
j) What is the area above the critical value 1.7108 under the curve of t-distribution known as?
- Rejection region
- Rejection regions for 𝐻𝐻0: 𝜇𝜇 = $59 vs 𝐻𝐻1: 𝜇𝜇 ≠ $59 would be the area less than -
2.0638 and above 2.0638
k) What is the area above the test statistic value 3.1484 under the curve of t-distribution known
as?
- P-value = 0.0021
- P-value = 0.0043 for 𝐻𝐻0: 𝜇𝜇 = $59 vs 𝐻𝐻1: 𝜇𝜇 ≠ $59 would be the combined area
of less than -3.1484 and above 3.1484
l) What can you infer from the test at 5% level of significance?
- P-value is less than 0.05; We reject H0 at 5% level meaning that there is
sufficient evidence to believe that the average amount spend on fragrance
products by the women every 3 months in the area is indeed significantly
greater than $59.
m) Provide a 95% CI for the population mean.
�𝑋𝑋 − 𝑡𝑡𝑛𝑛−1;
𝛼𝛼
2
×
𝑆𝑆
√𝑛𝑛
, 𝑋𝑋 + 𝑡𝑡𝑛𝑛−1;
𝛼𝛼
2
×
𝑆𝑆
√𝑛𝑛
�
�68.10 − 𝑇𝑇. 𝐼𝐼𝐼𝐼𝐼𝐼(0.975,24) ×
14.46
√25
, 68.10 + 𝑇𝑇. 𝐼𝐼𝐼𝐼𝐼𝐼(0.975,24) ×
14.46
√25
�
(62.13, 74.06)
n) How would you interpret this confidence interval?
- We would say that we are 95% confident that the true value of the population
mean (average amount spend on fragrance products by the women every 3
months in the area) lies within (62.13, 74.06)
- Confidence intervals are random; out of every 100 randomly drawn samples
and random confidence intervals, approximately 95 of them would contain the
true population mean.

Some study materials

Recommended

Recommended

More Related Content

Similar to Some study materials

Similar to Some study materials (20)

More from SatishH5

More from SatishH5 (11)

Recently uploaded

Recently uploaded (20)

Some study materials