2. Ph.D Islamia College Peshawar Chap 12-2
Chapter Goals
After completing this chapter, you should be
able to:
Explain the simple linear regression model
Obtain and interpret the simple linear regression
equation for a set of data
Various Test in Regression Analysis such as T-test, F-
test etc
Explain measures of variation and determine whether
the independent variable is significant
3. Ph.D Islamia College Peshawar Chap 12-3
Introduction to
Regression Analysis
Regression analysis is used to:
Predict the value of a dependent variable based on the
value of at least one independent variable
Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain
the dependent variable
4. Ph.D Islamia College Peshawar Chap 12-4
Simple Linear Regression
Model
Only one independent variable, X
Relationship between X and Y is
described by a linear function
Changes in Y are assumed to be caused
by changes in X
5. Ph.D Islamia College Peshawar Chap 12-5
i
i
1
0
i ε
X
β
β
Y
Linear component
Simple Linear Regression
Model
The population regression model:
Population
Y intercept
Population
Slope
Coefficient
Random
Error
term
Dependent
Variable
Independent
Variable
Random Error
component
6. Ph.D Islamia College Peshawar Chap 12-6
(continued)
Random Error
for this Xi value
Y
X
Observed Value
of Y for Xi
Predicted Value
of Y for Xi
i
i
1
0
i ε
X
β
β
Y
Xi
Slope = β1
Intercept = β0
εi
Simple Linear Regression
Model
7. Ph.D Islamia College Peshawar Chap 12-7
i
1
0
i X
b
b
Ŷ
The simple linear regression equation provides an
estimate of the population regression line
Simple Linear Regression
Equation
Estimate of
the regression
intercept
Estimate of the
regression slope
Estimated
(or predicted)
Y value for
observation i
Value of X for
observation i
The individual random error terms ei have a mean of zero
8. Ph.D Islamia College Peshawar Chap 12-8
Types of Relationships
Y
X
Y
X
Y
Y
X
X
Linear relationships Curvilinear relationships
9. Ph.D Islamia College Peshawar Chap 12-9
Types of Relationships
Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
(continued)
10. Ph.D Islamia College Peshawar Chap 12-10
Types of Relationships
Y
X
Y
X
No relationship
(continued)
11. Ph.D Islamia College Peshawar Chap 12-11
The Multiple Regression
Model
Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (Xi)
ε
X
β
X
β
X
β
β
Y ki
k
2i
2
1i
1
0
i
Multiple Regression Model with k Independent Variables:
Y-intercept Population slopes Random Error
12. Ph.D Islamia College Peshawar Chap 12-12
Multiple Regression Equation
The coefficients of the multiple regression model are
estimated using sample data
ki
k
2i
2
1i
1
0
i X
b
X
b
X
b
b
Ŷ
Estimated
(or predicted)
value of Y
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimated
intercept
In this chapter we will always use Excel to obtain the
regression slope coefficients and other regression
summary measures.
13. Important components of Regression
Intercept coefficient
Slope coefficient(s)
T- Value
F- Value
R-square
Adjusted R-square
Ph.D Islamia College Peshawar Chap 12-13
14. Ph.D Islamia College Peshawar Chap 12-14
Finding the Least Squares
Equation
The coefficients b0 and b1 , and other
regression results in this chapter, will be
found using Excel and Stata software
Formulas are shown in the text at the end of
the chapter for those who are interested
15. Ph.D Islamia College Peshawar Chap 12-15
b0 is the estimated average value of Y
when the value of X is zero
b1 is the estimated change in the
average value of Y as a result of a
one-unit change in X
Interpretation of the
Slope and the Intercept
16. Ph.D Islamia College Peshawar Chap 12-16
Simple Linear Regression
Example
A real estate agent wishes to examine the
relationship between the selling price of a home
and its size (measured in square feet)
A random sample of 10 houses is selected
Dependent variable (Y) = house price in $1000s
Independent variable (X) = square feet
17. Ph.D Islamia College Peshawar Chap 12-17
Sample Data for House Price
Model
House Price in $1000s
(Y)
Square Feet
(X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
18. Ph.D Islamia College Peshawar Chap 12-18
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Feet
House
Price
($1000s)
Graphical Presentation
House price model: scatter plot
19. Ph.D Islamia College Peshawar Chap 12-19
Regression Using Excel
Tools / Data Analysis / Regression
20. Ph.D Islamia College Peshawar Chap 12-20
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
The regression equation is:
feet)
(square
0.10977
98.24833
price
house
21. Ph.D Islamia College Peshawar Chap 12-21
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Feet
House
Price
($1000s)
Graphical Presentation
House price model: scatter plot and
regression line
feet)
(square
0.10977
98.24833
price
house
Slope
= 0.10977
Intercept
= 98.248
22. Ph.D Islamia College Peshawar Chap 12-22
Interpretation of the
Intercept, b0
b0 is the estimated average value of Y when the
value of X is zero (if X = 0 is in the range of
observed X values)
Here, no houses had 0 square feet, so b0 = 98.24833
just indicates that, for houses within the range of
sizes observed, $98,248.33 is the portion of the
house price not explained by square feet
feet)
(square
0.10977
98.24833
price
house
23. Ph.D Islamia College Peshawar Chap 12-23
Interpretation of the
Slope Coefficient, b1
b1 measures the estimated change in the
average value of Y as a result of a one-
unit change in X
Here, b1 = .10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
feet)
(square
0.10977
98.24833
price
house
24. Ph.D Islamia College Peshawar Chap 12-24
317.85
0)
0.1098(200
98.25
(sq.ft.)
0.1098
98.25
price
house
Predict the price for a house
with 2000 square feet:
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Predictions using
Regression Analysis
25. Ph.D Islamia College Peshawar Chap 12-25
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Feet
House
Price
($1000s)
Interpolation vs. Extrapolation
When using a regression model for prediction,
only predict within the relevant range of data
Relevant range for
interpolation
Do not try to
extrapolate
beyond the range
of observed X’s
26. Ph.D Islamia College Peshawar Chap 12-26
Least Squares Method
b0 and b1 are obtained by finding the values
of b0 and b1 that minimize the sum of the
squared differences between Y and :
2
i
1
0
i
2
i
i ))
X
b
(b
(Y
min
)
Ŷ
(Y
min
Ŷ
28. The process of differentiation yields the following equations for
estimating β1 and β2:
Yi Xi = βˆ1Xi + βˆ2X2
i (3.1.4)
Yi = nβˆ1 + βˆ2Xi (3.1.5)
where n is the sample size. These simultaneous equations are known
as the normal equations. Solving the normal equations
simultaneously, we obtain
28
29. where X¯ and Y¯ are the sample means of X and Y and where we
define xi = (Xi − X¯ ) and yi = (Yi − Y¯). Henceforth we adopt the
convention of letting the lowercase letters denote deviations from
mean values.
29
30. T-Tests
T test is used to check that sample beta can
statistically and significantly represent population
beta.
The significance of this tests will show that there is
sufficient evidence that X variable (square footage)
affects the Y variable (house price).
This is done by comparing the T-calculated value
with the T-critical values at 95% or 99% level of
significance.
Where at 95% level of significance T-critical value is
1.96 and at 99% level its value is 2.33 as a rule of
thumb.
Ph.D Islamia College Peshawar Chap 12-30
31. Ph.D Islamia College Peshawar Chap 12-31
Inference about the Slope:
t Test
t test for a population slope
Is there a linear relationship between X and Y?
Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1 0 (linear relationship does exist)
Test statistic
1
b
1
1
S
β
b
t
2
n
d.f.
where:
b1 = regression slope
coefficient
β1 = hypothesized slope
Sb1 = standard
error of the slope
32. Ph.D Islamia College Peshawar Chap 12-32
Inferences about the Slope:
t Test Example
H0: β1 = 0
H1: β1 0
From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
1
b
S
t
b1
32938
.
3
03297
.
0
0
10977
.
0
S
β
b
t
1
b
1
1
33. Ph.D Islamia College Peshawar Chap 12-33
Inferences about the Slope:
t Test Example
H0: β1 = 0
H1: β1 0
Test Statistic: t = 3.329
There is sufficient evidence
that square footage affects
house price
From Excel output:
Reject H0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
1
b
S t
b1
Decision:
Conclusion:
(continued)
34. How to compute Standard Error
Ph.D Islamia College Peshawar Chap 12-34
1
b
1
1
S
β
b
t
35. F-Test / ANOVA for Significance
F-Test for Significance shows that the overall model
is statistically significance or not.
In case of multiple regression where some variables
may be significant and some variables may not be
significant measured in terms of t-statistics.
However, t-statistics cannot explain anything about
the overall model.
To check that the overall model is statically
significance we use F test / ANOVA( Analysis of
Variance tests).
Ph.D Islamia College Peshawar Chap 12-35
36. Ph.D Islamia College Peshawar Chap 12-36
Multiple Regression Equation
The coefficients of the multiple regression model are
estimated using sample data
ki
k
2i
2
1i
1
0
i X
b
X
b
X
b
b
Ŷ
Estimated
(or predicted)
value of Y
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimated
intercept
In this chapter we will always use Excel to obtain the
regression slope coefficients and other regression
summary measures.
37. F-Test for Significance
If the F calculated value is more than F-critical
value 4 as a rule of thumb. So the overall model
is statically significant and can be used for
predication.
If the P-value of the F-statistics is less than 5%
or 1% critical value then the overall model is
considered statistically significant.
Ph.D Islamia College Peshawar Chap 12-37
38. Ph.D Islamia College Peshawar Chap 12-38
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
11.0848
1708.1957
18934.9348
MSE
MSR
F
With 1 and 8 degrees
of freedom
P-value for
the F-Test
39. Ph.D Islamia College Peshawar Chap 12-39
Measures of Variation
Total variation is made up of two parts:
SSE
SSR
SST
Total Sum of
Squares
Regression Sum
of Squares
Error Sum of
Squares
2
i )
Y
Y
(
SST
2
i
i )
Ŷ
Y
(
SSE
2
i )
Y
Ŷ
(
SSR
where:
= Average value of the dependent variable
Yi = Observed values of the dependent variable
i = Predicted value of Y for the given Xi value
Ŷ
Y
40. Ph.D Islamia College Peshawar Chap 12-40
SST = total sum of squares
Measures the variation of the Yi values around their
mean Y
SSR = regression sum of squares
Explained variation attributable to the relationship
between X and Y
SSE = error sum of squares
Variation attributable to factors other than the
relationship between X and Y
(continued)
Measures of Variation
41. Ph.D Islamia College Peshawar Chap 12-41
F-Test for Significance
F Test statistic:
where
MSE
MSR
F
1
k
n
SSE
MSE
k
SSR
MSR
where F follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
(k = the number of independent variables in the regression model)
42. Ph.D Islamia College Peshawar Chap 12-42
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
11.0848
1708.1957
18934.9348
MSE
MSR
F
With 1 and 8 degrees
of freedom
P-value for
the F-Test
43. Ph.D Islamia College Peshawar Chap 12-43
The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable
The coefficient of determination is also called
r-squared and is denoted as r2
Coefficient of Determination, r2
1
r
0 2
note:
squares
of
sum
total
squares
of
sum
regression
SST
SSR
r2
44. Ph.D Islamia College Peshawar Chap 12-44
r2 = 1
Examples of Approximate
r2 Values
Y
X
Y
X
r2 = 1
r2 = 1
Perfect linear relationship
between X and Y:
100% of the variation in Y is
explained by variation in X
45. Ph.D Islamia College Peshawar Chap 12-45
Examples of Approximate
r2 Values
Y
X
Y
X
0 < r2 < 1
Weaker linear relationships
between X and Y:
Some but not all of the
variation in Y is explained
by variation in X
46. Ph.D Islamia College Peshawar Chap 12-46
Examples of Approximate
r2 Values
r2 = 0
No linear relationship
between X and Y:
The value of Y does not
depend on X. (None of the
variation in Y is explained
by variation in X)
Y
X
r2 = 0
47. Ph.D Islamia College Peshawar Chap 12-47
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
58.08% of the variation in
house prices is explained by
variation in square feet
0.58082
32600.5000
18934.9348
SST
SSR
r2
48. Ph.D Islamia College Peshawar Chap 12-48
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
41.33032
SYX
49. Ph.D Islamia College Peshawar Chap 12-49
Estimating a Multiple Linear
Regression Equation
Excel will be used to generate the coefficients
and measures of goodness of fit for multiple
regression
Excel:
Tools / Data Analysis... / Regression
Stata =
Commonds write reg y x x
50. Ph.D Islamia College Peshawar Chap 12-50
Multiple Regression Output
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
ertising)
74.131(Adv
ce)
24.975(Pri
-
306.526
Sales
51. Ph.D Islamia College Peshawar Chap 12-51
Adjusted r2
r2 never decreases when a new X variable is
added to the model
This can be a disadvantage when comparing
models
What is the net effect of adding a new variable?
We lose a degree of freedom when a new X
variable is added
Did the new X variable add enough
explanatory power to offset the loss of one
degree of freedom?
52. Ph.D Islamia College Peshawar Chap 12-52
Shows the proportion of variation in Y explained by all X
variables adjusted for the number of X variables used
(where n = sample size, k = number of independent variables)
Penalize excessive use of unimportant independent variables
Smaller than r2
Useful in comparing among models
Adjusted r2
(continued)
1
k
n
1
n
)
r
1
(
1
r 2
k
..
12
.
Y
2
adj
53. Ph.D Islamia College Peshawar Chap 12-53
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.44172
r2
adj
44.2% of the variation in pie sales is
explained by the variation in price and
advertising, taking into account the sample
size and number of independent variables
(continued)
Adjusted r2
54. Ph.D Islamia College Peshawar Chap 12-54
Two variable model
Y
X1
X2
2
2
1
1
0 X
b
X
b
b
Ŷ
Yi
Yi
<
x2i
x1i The best fit equation, Y ,
is found by minimizing the
sum of squared errors, e2
<
Sample
observation
Residuals in Multiple Regression
Residual =
ei = (Yi – Yi)
<