2. Module Objective
Agenda
•Introduce the concept of Simple Linear Regression
•Walk through the process of plotting our data
•Apply regression Techniques
•Evaluate our model
•Interpret the result
Expected Learning
•Understand key Simple Linear Regression terminology
•Evaluate the relationship between a continuous X and continuous Y
•Use regression analysis to make predictions about process
3. Historical Note
•Sir Francis Galton (1822-1911) used the term
Regression.
•To explain the relationship between the heights
(inches) of fathers and their sons.
4. Simple Linear Regression
Regression analysis is used to predict the value of one variable (the dependent
variable) on the basis of other variables (the independent variables).
Dependent variable: denoted Y
Independent variables: denoted X1, X2, …, Xk
If we only have ONE independent variable, the model is
5. Simple Linear Regression Analysis
Variables:
X = Independent Variable (we provide this)
Y = Dependent Variable (we observe this)
Parameters:
β0 = Intercept
The y-intercept of a line is the point at which the line crosses the y
axis. ( i.e. where the x value equals 0)
β1 = Slope
Change in the mean of Y for a unit change in X
ε ~ Normal Random Variable (με = 0, σε = ???) [Noise]
6. Simple Linear Regression Analysis
Meaning of and
> 0 [positive slope] < 0 [negative slope]
y
rise
run
=slope (=rise/run)
=y-intercept
x
7. Effect of Larger Values of σ ε
Lower vs. Higher
Variability
Y
25K$
Y= β0+ β1 +
X
8. The least Square Method
s
nce
dif fere
these differences are ed
quar
called residuals or hes
errors of t
e sum line…
ze s th d the
imi nts an
min poi
ne
s li n the
Thi wee
bet
9. Exercise
A black belt is connected with optimize a call center in a retail bank where clients place
the inquiries and order over the phone during 6 am to 6 pm (Monday to Friday). The
current staffing plan is begin with about 4 associates at 6 am and increases to about 35
associate by 9 am. At 3.30 pm the no. of associate begin to drop to about 7 by 6 pm. The
black belt is anticipating an increasing in call volume and want to know how many call
can be answer in 30 min time interval for various staffing level to better staff the call
center.
The black belt obtain the data on the no. of associate and the no. of calls answered
for each 30 min interval for the last two weeks. You have a total of 240 Samples.
Data sheet
11. Fitted Line Plot
Regression Analysis: CallsAnswd
versus Staff
The regression equation is
CallsAnswd = - 8.844 + 3.099 Staff
S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3%
Analysis of Variance
Source DF SS MS F P
Regression 1 164091 164091 1192.37 0.000
Error 238 32753 138
Total 239 196844
Fitted Line: CallsAnswd versus Staff
12. Minitab Output-Session Window
CallsAnswd = - 8.844 + 3.099 Staff
CallsAnswd = - 8.844 + 3.099 (17)=43.839
CallsAnswd = - 8.844 + 3.099 (22)=59.334
How Confident are you
with this model?
59 Calls
Staff of 22 ? 59 Calls
59 Calls
13. Assessing The Model
Evaluate the strength of the regression model
1. Determine how much variation in our calls answered data is actually explained by
staff.
2. Determine The strength of the relationship between Call Answered and Staff
Regression Analysis: CallsAnswd R-sq
versus Staff
The regression equation is
CallsAnswd = - 8.844 + 3.099 Staff % of variation in the Y
S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% values explained by the
linear relationship with X
Analysis of Variance
Source DF SS MS F P
Regression 1 164091 164091 1192.37 0.000
Error 238 32753 138 % of variation Calls Answered
Total 239 196844 explained by Staff
Fitted Line: CallsAnswd versus Staff
14. % of Variation-Explanation
Explained Total
Variation (Y) Variation (Y)
Explained Variation
R Squared = Total Variation = Between 0 and 1 (0% to 100%)
15. Properties of R
The Correlation Coefficient
√ R2 = R Measure the strength of the linear relationship
(Pearson’s Correlation of coefficient)
R
Strong Moderate Low Low Moderate Moderate
-1 Relationship -0.8 Relationship -0.5 Relationship 0 Relationship 0.5 Relationship 0.8 Relationship +1
16. Caution- Correlation and
Causation
Calls Answered
Staff
Correlated
Correlation Causation
Change in one variable causes
Two things vary together
change in another
21. Interpreting The Model
Regression Analysis: CallsAnswd versus Staff
The regression equation is
CallsAnswd = - 8.84 + 3.10 Staff
Predictor Coef SE Coef T P
Constant -8.844 2.247 -3.94 0.000 Predictor Table
Staff 3.09947 0.08976 34.53 0.000
S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% Additional Statistics
Analysis of Variance
Source DF SS MS F P
Regression 1 164091 164091 1192.37 0.000
ANOVA Table
Residual Error 238 32753 138
Total 239 196844
22. Interpreting The Model
Regression Analysis: CallsAnswd versus Staff
The regression equation is
CallsAnswd = - 8.84 + 3.10 Staff
Predictor Table
Predictor Coef SE Coef T P
Constant -8.844 2.247 -3.94 0.000
Staff 3.09947 0.08976 34.53 0.000 Ho: Slope = 0
•No difference in Y when X
changes
S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% •X has no impact on y
Analysis of Variance Ha: Slope ≠ 0
Source DF SS MS F P
Regression 1 164091 164091 1192.37 0.000
•Y changes as X changes
Residual Error 238 32753 138 •X has impact on y
Total 239 196844
23. Interpreting The Model
Regression Analysis: CallsAnswd versus Staff
The regression equation is
CallsAnswd = - 8.84 + 3.10 Staff
Additional Statistics
Predictor Coef SE Coef T P
Constant -8.844 2.247 -3.94 0.000
R-squared
Staff 3.09947 0.08976 34.53 0.000 •The amount of variation
explained by this variation
S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% R-squared (Adjusted)
•Account for the no. of X’s used in
model (Used in multiple
Analysis of Variance regression)
Source DF SS MS F P S
Regression 1 164091 164091 1192.37 0.000
Residual Error 238 32753 138
•Standard deviation for the
Total 239 196844 leftover or unexplained variation.
It is used for designing the
confidence interval
24. Interpreting The Model
Regression Analysis: CallsAnswd versus Staff
The regression equation is
CallsAnswd = - 8.84 + 3.10 Staff
ANOVA Table
Predictor Coef SE Coef T P
Constant -8.844 2.247 -3.94 0.000
Regression
Staff 3.09947 0.08976 34.53 0.000
•The explained variation
S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% Residual Error
•Unexplained Variation
Analysis of Variance
Source DF SS MS F P
Ho:
Regression 1 164091 164091 1192.37 0.000 •The model does not explain the
Residual Error 238 32753 138 observed variation
Total 239 196844
Ha:
•The model does explain the
observed variation
25. Confidence on Output
CallsAnswd = -8.844 + 3.099 Staff
If Staff 10 then, - 8.844 + 3.099 (10) = 22
If Staff 20 then, - 8.844 + 3.099 (20) = 53
If Staff 30 then, - 8.844 + 3.099 (30) = 84
Confidence = ???%
R-squared = 83.4%
Report result using
Unexplained Variation = 16.6%
confidence intervals
26. Confidence on Output
95% confidence the true
regression line lies within
95% prediction interval expect 95%
of the data points to fall within
28. Predicted Value for Y
Let management decide max 20 staff, then how may call answer by these
20 executive
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 53.145 0.822 (51.526, 54.765) (29.979, 76.312)
Values of Predictors for New Observations
New Obs Staff
1 20.0
29. Review Methodology
1. Get Familiar with the data 3. Check the model & the assumption
• Identify the output or Y variable • Look at the residual plot
• Identify the X variable or predictor Are assumption valid?
• Look at the time series, dot plots, histogram Do residual look ok?
and scatter plot Are there unusual observation? If yes,
• Run descriptive statistics address, if possible and rerun the analysis
Check for outliers • What is r-squared?
Check for gaps in the data • How much variation can be explained by the
model?
• Look at P –values
Does X have significant impact on Y?
2. Fit the model to the data 4. Report Result & Use Equation
• Run regression • Summarize the results for your stakeholders
• Use the regression • If you have excluded data, explain why
• If you kept outlier in the data, explain why
• How much variation in Y can be explained by X?
• Remember causation vs. correlation
• Make a predictions using confidence and
prediction intervals.