Z Score,T Score, Percential Rank and Box Plot Graph
NG BB 37 Multiple Regression
1. UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
National Guard
Black Belt Training
Module 37
Multiple Regression
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
2. UNCLASSIFIED / FOUO
CPI Roadmap – Analyze
8-STEP PROCESS
6. See
1.Validate 2. Identify 3. Set 4. Determine 5. Develop 7. Confirm 8. Standardize
Counter-
the Performance Improvement Root Counter- Results Successful
Measures
Problem Gaps Targets Cause Measures & Process Processes
Through
Define Measure Analyze Improve Control
ACTIVITIES TOOLS
• Value Stream Analysis
• Identify Potential Root Causes • Process Constraint ID
• Reduce List of Potential Root • Takt Time Analysis
Causes • Cause and Effect Analysis
• Brainstorming
• Confirm Root Cause to Output
• 5 Whys
Relationship
• Affinity Diagram
• Estimate Impact of Root Causes • Pareto
on Key Outputs • Cause and Effect Matrix
• FMEA
• Prioritize Root Causes
• Hypothesis Tests
• Complete Analyze Tollgate • ANOVA
• Chi Square
• Simple and Multiple
Regression
Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive. UNCLASSIFIED / FOUO
3. UNCLASSIFIED / FOUO
Learning Objectives
Understand how to identify correlation with multiple
variables
Learn how to create a mathematical model for the
effect of multiple inputs on an output variable
Understand and identify multicollinearity
Understand how to use best subsets to identify the
best model
Examine unusual observations to learn more about
the data
Multiple Regression UNCLASSIFIED / FOUO 3
4. UNCLASSIFIED / FOUO
Multiple Regression
In Simple Linear Regression, we
had:
Y = f(X)
Y = B0 + B1X
In Multiple Linear Regression,
we have:
Y = B0 + B1X1 + B2X2 + X3
B3X3 X1
X5
We’d like to identify which, if any,
of the predictor variables are X2
X4 Y
useful in predicting Y
Multiple Regression UNCLASSIFIED / FOUO 4
5. UNCLASSIFIED / FOUO
When Should I Use Multiple Regression?
Independent Variable (X)
Continuous Attribute
Dependent Variable (Y)
Continuous
Regression ANOVA
Attribute
Logistic Chi-Square (2)
Regression Test
The tool depends on the data type. Regression is typically used with a continuous
input and a continuous response but may also be used with count or categorical
inputs and outputs.
Multiple Regression UNCLASSIFIED / FOUO 5
6. UNCLASSIFIED / FOUO
Basic Steps for Regression Modeling
STEPS OBJECTIVES KEY QUESTIONS
1
Process Flowchart To identify KPIVs and Which KPIVs will significantly
SIPOC KPOVs improve which KPOVs?
2 Does it look like there is
Scatter Plot, To visualize the data
Histogram C&E relationship?
3 How strong is the C&E
To qualify the C&E relationship
Correlation, Test
(Strength, % Variability, P-value) relationship?
Hypothesis
4 To quantify the C&E relationship What is the prediction
Regression Analysis (Method of Least Squares) equation?
5 Is there anything suspicious
To validate the model selected
Residual Analysis with the model selected?
KPIV = Key Process Input Variables KPOV = Key Process Output Variables
Multiple Regression UNCLASSIFIED / FOUO 6
7. UNCLASSIFIED / FOUO
Example: Production Plant
A chemical engineer is investigating the amount of
silver required in the high volume production of contact
switches for a new Army radio. Although only a small amount of
silver is deposited on the switches, a larger amount is wasted
through a multiple step process. She has collected data and
would like to develop a prediction model. A-06 Production
Plant
Step 1: The variables identified as KPIVs are given below:
X1 = Average temperature of rinse bath (degrees C)
X2 = Speed of reel that feeds the switches through the line (inches/min)
X3 = Thickness of silver deposit (angstroms)
X4 = Water consumed (gallons per day) What questions
Y = Amount of silver consumed (pounds/day)
would you ask
about this data?
Source: Applied Regression Analysis, Draper and Smith
Multiple Regression UNCLASSIFIED / FOUO 7
8. UNCLASSIFIED / FOUO
Visualize the Data!
Step 2:
Visualize the Data
Data file: A-06 Production
Plant.mtw
Select Graph>Matrix Plot
Multiple Regression UNCLASSIFIED / FOUO 8
9. UNCLASSIFIED / FOUO
Step 2: Visualize the Data!
Looking for relationships between variables...
This dialog box comes up
first
Select Matrix of Plots – Simple
Since we have only one (Y)
variable and no groups
Click on OK to go the next
Dialog box
Multiple Regression UNCLASSIFIED / FOUO 9
10. UNCLASSIFIED / FOUO
Step 2: Visualize the Data!
Double click on all of the
variables you want to include in
the Matrix, to place them in the
Graph variables box
Select Matrix Options to move
on to the next dialog box
Multiple Regression UNCLASSIFIED / FOUO 10
11. UNCLASSIFIED / FOUO
Step 2: Visualize the Data!
Select Lower left to place all
the graph labels to the
lower left of the boxes
Click on OK here and on the
previous dialog box to get
the matrix
Multiple Regression UNCLASSIFIED / FOUO 11
12. UNCLASSIFIED / FOUO
Correlation Table
There appear to be some relationships
between certain variables and the response.
Matrix Plot of Temp, Speed, Thickness, Water, Amt of Ag
Temp
12
10 Speed Is this
8
14.0
good or
13.5
bad? Response
Thickness
13.0
Variable
170
(Y)
160
Water
150
21
20 Amt of Ag
19
55 60 65 8 10 12 13.0 13.5 14.0 150 160 170
Multiple Regression UNCLASSIFIED / FOUO 12
13. UNCLASSIFIED / FOUO
Quantify the Relationships Between Variables
Step 3: Quantify the relationship
Select Stat>Basic
Statistics> Correlation
Multiple Regression UNCLASSIFIED / FOUO 13
14. UNCLASSIFIED / FOUO
Correlation Matrix
Evaluating coefficients of correlation among predictors...
Double click on all of the
variables you want to
include, to place them in
the Variables box
Check to display p-values
(default setting)
Click on OK to get the
Correlation Matrix in your
Session Window
Multiple Regression UNCLASSIFIED / FOUO 14
15. UNCLASSIFIED / FOUO
Correlation Matrix
The TOP number in
each pair is the
Pearson
Coefficient of
Correlation,
(r-Value)
While the BOTTOM
number is the
p-Value
Predictor variable pairwise correlations larger than .5-.7 are signs of
trouble ... Multicollinearity. We will explain more shortly.
Multiple Regression UNCLASSIFIED / FOUO 15
16. UNCLASSIFIED / FOUO
Finding the Regression Equation...
Step 4: Develop a prediction model
Select: Stat>
Regression>
Regression
Multiple Regression UNCLASSIFIED / FOUO 16
17. UNCLASSIFIED / FOUO
Finding the Regression Equation... (Cont.)
Double click on C5 Amt of AG
and place it in the Response:
variable box, then double
click on all the variables you
want to place in the Predictors:
box.
Select Options to go to next
dialog box.
Multiple Regression UNCLASSIFIED / FOUO 17
18. UNCLASSIFIED / FOUO
Finding the Regression Equation... (Cont.)
In this dialog box, the only
thing you have to do is check
Variance inflation factors
Click on OK here and on
previous dialog box to get the
regression analysis in your
Session Window
Multiple Regression UNCLASSIFIED / FOUO 18
19. UNCLASSIFIED / FOUO
Regression Equation
Minitab displays the following regression equation:
Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness
+ 0.0449 Water
Predictor Coef SE Coef T P VIF
Constant 5.72 10.83 0.53 0.607
Temp -0.01558 0.02616 -0.60 0.563 1.276
Speed 0.2393 0.2644 0.90 0.383 10.997
Thickness 0.443 1.033 0.43 0.675 11.671
Water 0.04495 0.01481 3.04 0.010 1.731
S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5% The P-values indicate
whether a particular
predictor is significant
This new model R-Sq (adj) adjusts for degrees in presence of other
explains 80.9% of of freedom due to variables predictors in the
response variability that have no real value. It model
should be used when
comparing models
Multiple Regression UNCLASSIFIED / FOUO 19
20. UNCLASSIFIED / FOUO
Interpreting P-values
The P columns give the significance level
for each term in the model
Typically, if a P value is less than or equal
to 0.05, the variable is considered significant
(i.e., null hypothesis is rejected)
If a P value is greater than 0.10, the term is removed
from the model. A practitioner might leave the term in
the model, if the P value is within the gray region
between these two probability levels
Multiple Regression UNCLASSIFIED / FOUO 20
21. UNCLASSIFIED / FOUO
Regression Equation
Regression output in Minitab’s Session Window
Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness
+ 0.0449 Water
Predictor Coef SE Coef T P VIF
Constant 5.72 10.83 0.53 0.607 Variance Inflation Factor
Temp -0.01558 0.02616 -0.60 0.563 1.276
Speed 0.2393 0.2644 0.90 0.383 10.997
Thickness 0.443 1.033 0.43 0.675 11.671
Water 0.04495 0.01481 3.04 0.010 1.731
S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5%
High VIF values are signs of trouble (VIF > 10)
Multiple Regression UNCLASSIFIED / FOUO 21
22. UNCLASSIFIED / FOUO
Problems with Several Predictor Variables
Sometimes the Xs are correlated (dependent). This condition is
known as Multicollinearity
Multicollinearity can cause problems (sometimes severe)
Estimates of the coefficients are affected (unstable, inflated
variances)
Difficulty isolating the effects of each X
Coefficients depend on which Xs are included in the model
High multicollinearity inflates the standard error estimates,
which increases the P values
If case of extreme multicollinearity, Minitab will throw out one
term and give you notice
Multiple Regression UNCLASSIFIED / FOUO 22
23. UNCLASSIFIED / FOUO
Graphical Representation of Multicollinearity
Variation
Explained by
X1 Total
Variation
in Y
Variation
Explained by
X2
• Overlap represents correlation
• X1 and X2 are both correlated with Y
• X1 and X2 are highly correlated
• If X1 is in the model, we don’t need X2, and
vice versa
Multiple Regression UNCLASSIFIED / FOUO 23
24. UNCLASSIFIED / FOUO
Assessing the Degree of Multicollinearity
We use a metric called Variance Inflation Factor (VIF):
1
VIF 2
Select
1 Ri Stat>Regression>Regression>Options>
Display variance inflation factors
Where:
Ri2 is the R2 value you get when you regress Xi against the other X’s
A large Ri2 suggests that a variable is redundant
Rule of Thumb:
Ri2 > 0.9 is a cause for concern (high degree of collinearity) (VIF > 10)
0.8 < Ri2 < 0.9 (moderate degree of collinearity) (VIF > 5)
For the Production Plant data, Minitab gives us:
VIF
Temp 1.276
Speed 10.997 Two VIF’s are a bit large, but in this case with a R-sq.
Thickness 11.671 of 80.9%, some multicollinearity can be tolerated
Water 1.731
Multiple Regression UNCLASSIFIED / FOUO 24
25. UNCLASSIFIED / FOUO
Some Cautions About the Coefficients
Remember the prediction equation obtained earlier:
Amt of Ag 5.7 0.0156 Temp. 0.239 Speed 0.44 Thickness 0.0449 Water
Relative importance of predictors cannot be
determined from the size of their coefficients:
The coefficients are scale dependent
The coefficients are influenced by correlation among
the predictor variables
If a high degree of multicollinearity exists, even the
signs of the coefficients may be misleading
Multiple Regression UNCLASSIFIED / FOUO 25
26. UNCLASSIFIED / FOUO
Residual Analysis
Step 5: Validate the selected model
Select Stat>
Regression>
Regression
Is there anything
suspicious with
this model?
Multiple Regression UNCLASSIFIED / FOUO 26
27. UNCLASSIFIED / FOUO
Residual Analysis (Cont.)
Double click on C5 Amt of AG
and place it in the Response
variable box, then double
click on all the variables you
want to place in the Predictors
box
Select Graphs to go to next
dialog box
Multiple Regression UNCLASSIFIED / FOUO 27
28. UNCLASSIFIED / FOUO
Residual Analysis (Cont.)
Select Four in one to get all four
Residual plots on one graph, or
you can pick and choose the plots
You want
Click on OK here and on previous
Dialog box to get Residual plots
Multiple Regression UNCLASSIFIED / FOUO 28
29. UNCLASSIFIED / FOUO
Residual Analysis (Cont.)
Not too bad overall…
Residual Plots for Amt of Ag
Normal Probability Plot Versus Fits
99
N 17
AD 0.249 0.50
90 P-Value 0.705
0.25
Residual
Percent
50 0.00
If you want to see
-0.25
10 the value for any
-0.50
1
observation, just
-1.0 -0.5 0.0 0.5 1.0 19.5 20.0 20.5 21.0 hold your cursor
21.5
Residual Fitted Value
over that point
Histogram Versus Order
4
0.50
3 0.25
Frequency
Residual
2 0.00
-0.25
1
-0.50
0
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 2 4 6 8 10 12 14 16
Residual Observation Order
Multiple Regression UNCLASSIFIED / FOUO 29
30. UNCLASSIFIED / FOUO
How to Address Multicollinearity
Eliminate one or more input variables
We’ll look at a technique called Best Subsets
Regression
Collect additional data
Use process knowledge to determine the principal
relationship
Use DOE to further assess the multicollinearity
If neither are significant then eliminate both from the
analysis
Multiple Regression UNCLASSIFIED / FOUO 30
31. UNCLASSIFIED / FOUO
Best Subsets Regression
Rather than relying on the p-values alone, the
computer looks at all possible combinations of
variables and prints the resulting model
characteristics
Statistics like adjusted R-Sq and MSError will improve
as important model terms are added, then worsen as
“junk” terms are added to the model
Multiple Regression UNCLASSIFIED / FOUO 31
32. UNCLASSIFIED / FOUO
Best Subsets Regression Considerations
Objective: We want to select a model with predictive
accuracy and minimum multicollinearity
Seek compromise between:
Overfitting (including model terms with only
marginal, or no, contribution)
Underfitting (ignoring or deleting relatively
important model terms)
What are some problems with overfitting?
overfit underfit
What are some problems with underfitting?
Multiple Regression UNCLASSIFIED / FOUO 32
33. UNCLASSIFIED / FOUO
Best Subsets Regression
Evaluating Candidate Models
Four things to look at when evaluating candidate models:
1. R2 (large R2 is desired, although R2 increases as we add more
predictors to the model, so this should only be used for
comparing models with the same number of terms)
2. Adjusted R2 (large is desired)
3. Mallows Cp statistic (small Cp desired, close to the number of
terms in the model)
4. s (the estimate of the standard deviation around the regression)
Generally, the best three models are selected and checked for
significance of all factors and residual assumptions
Multiple Regression UNCLASSIFIED / FOUO 33
34. UNCLASSIFIED / FOUO
More on the Mallows C-p Statistic
In practice, the minimum number of parameters needed in
the model is when the Mallows’ C-p statistic is a minimum
Rule of Thumb:
We want C-p number of input variables
Multiple Regression UNCLASSIFIED / FOUO 34
35. UNCLASSIFIED / FOUO
Best Subsets Regression
Minitab data set: Production Plant
Select Stat>
Regression>
Best Subsets
Multiple Regression UNCLASSIFIED / FOUO 35
36. UNCLASSIFIED / FOUO
Best Subsets Regression (Cont.)
Enter Response variable
Enter Predictor variables
(Input Variables)
Click on OK to get analysis
in Session Window
Multiple Regression UNCLASSIFIED / FOUO 36
37. UNCLASSIFIED / FOUO
Best Subsets Regression (Cont.)
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 XX
What Model(s)
2 78.8 75.8 2.3 0.40200 X X are the best
3 80.6 76.1 3.2 0.39959 X X X candidates?
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
Multiple Regression UNCLASSIFIED / FOUO 37
38. UNCLASSIFIED / FOUO
Best Subsets Regression (Cont.)
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
R-Sq: Look for the highest value i
when comparing models with the c
S k W
same number of input variables
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 XX
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
Multiple Regression UNCLASSIFIED / FOUO 38
39. UNCLASSIFIED / FOUO
Best Subsets Regression (Cont.)
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
R-Sq (adj): Look for the h
i
highest value when comparing c
models with different number S k W
of input variables T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 XX
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
Multiple Regression UNCLASSIFIED / FOUO 39
40. UNCLASSIFIED / FOUO
Best Subsets Regression (Cont.)
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
Cp: Look for models where Cp is h
small and close to the number of i
c
input variables in the model S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 XX
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
Multiple Regression UNCLASSIFIED / FOUO 40
41. UNCLASSIFIED / FOUO
Best Subsets Regression (Cont.)
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
S: We want S, the estimate of i
the standard deviation about c
the regression, to be as small S k W
as possible T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 XX
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
Multiple Regression UNCLASSIFIED / FOUO 41
42. UNCLASSIFIED / FOUO
Once the Candidate Models Are Identified
Evaluate the candidate models under a “microscope”
Outliers
High leverage
Influential observations
Residuals
Prediction quality
Once a model has been selected, find the new
regression equation
Test its predictive capability for observations NOT
originally used in the modeling
Multiple Regression UNCLASSIFIED / FOUO 42
43. UNCLASSIFIED / FOUO
Regression with Reduced Model
We select the best model with two variables, Speed & Water,
and run Minitab again to obtain the new regression equation:
Select Stat>
Regression>
Regression
Multiple Regression UNCLASSIFIED / FOUO 43
44. UNCLASSIFIED / FOUO
Regression with Reduced Model (Cont.)
Enter Amt of Ag as the
Response
Enter only Speed and Water
as Predictors
Click on OK to get analysis
in Session Window
Multiple Regression UNCLASSIFIED / FOUO 44
45. UNCLASSIFIED / FOUO
Regression with Reduced Model (Cont.)
Session window of Minitab yields the following regression
equation for the reduced model:
Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water
Predictor Coef SE Coef T P
Constant 9.919 1.694 5.86 0.000
Speed 0.35689 0.08544 4.18 0.001
Water 0.04253 0.01206 3.53 0.003
S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2%
…to compare with the previous model:
Amt of Ag = 5.7 - 0.0156 Temp. + 0.239 Speed
+ 0.44 Thickness + 0.0449 Water
Predictor Coef SE Coef T P
Constant 5.72 10.83 0.53 0.607
H20 Temp -0.01558 0.02616 -0.60 0.563
Speed 0.2393 0.2644 0.90 0.383
Thick. 0.443 1.033 0.43 0.675
Water 0.04495 0.01481 3.04 0.010
S = 0.4127 R-Sq = 80.9% R-Sq(adj) = 74.5%
Multiple Regression UNCLASSIFIED / FOUO 45
46. UNCLASSIFIED / FOUO
Unusual Observations
Session window of Minitab also gives us the following output:
Unusual Observations
Obs Speed Amt of A Fit SE Fit Residual St Resid
3 11.5 21.0000 20.3784 0.2477 0.6216 2.06R
R denotes an observation with a large standardized residual
An unusual observation means a large standard residual
Let’s see what would happen if we
eliminated such an observation
from our collected data!
Multiple Regression UNCLASSIFIED / FOUO 46
47. UNCLASSIFIED / FOUO
Impact of the Unusual Observation
Without the Unusual Observation, the Session window of Minitab
yields the following regression equation:
Amt of Ag = 8.61 + 0.237 Speed + 0.0577 Water
Predictor Coef SE Coef T P
Constant 8.610 1.567 5.49 0.000
Speed 0.23698 0.08960 2.64 0.020
Water 0.05775 0.01226 4.71 0.000
R-Sq goes up a little
S = 0.3383 R-Sq = 85.0% R-Sq(adj) = 82.7%
because we’ve gotten rid
of “noise” in the model
…to compare with the regression equation of our
previous reduced model
Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water
Predictor Coef SE Coef T P
Constant 9.919 1.694 5.86 0.000
Speed 0.35689 0.08544 4.18 0.001
Water 0.04253 0.01206 3.53 0.003
S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2%
Multiple Regression UNCLASSIFIED / FOUO 47
48. UNCLASSIFIED / FOUO
Takeaways
Regression analysis can be used with historical data as well
data from designed experiments to build prediction models
Care must be exercised when using historical data
Correlation does not imply a cause and effect relationship
There may be serious problems with multicollinearity and
high leverage observations
There are several diagnostic tools available to evaluate
regression models:
Fit: R2, adjusted R2, Cp, S
Unusual observations: residual plots, leverage, CooksD
Multicollinearity: VIFs (Variance Inflation Factors)
Multiple Regression UNCLASSIFIED / FOUO 48
49. UNCLASSIFIED / FOUO
Considerations in Regression
Set goals before doing the analysis (what do you want to learn,
how well do you need to predict, etc.).
Gather enough observations to adequately measure error and
check the model assumptions.
Make sure that the sample of data is representative of the
population.
Excessive measurement error of the inputs (Xs) creates
uncertainty in the estimated coefficients, predictions, etc.
Be sure to collect data on all potentially important explanatory
variables.
Multiple Regression UNCLASSIFIED / FOUO 49
50. UNCLASSIFIED / FOUO
Regression Checklist
Scatterplots (Y vs. X)
Histograms and/or Boxplots of Ys and Xs
Coefficients
Significance (p < .05 - .10)
R2 and adjusted R2
S
Residuals (no obvious pattern)
Unusual Y values (standardized residuals > 2)
Unusual X values (leverage > 2p/n)
Overfitting vs. underfitting (C-p number of input variables in model)
Multicollinearity (VIF > 5-10)
Multiple Regression UNCLASSIFIED / FOUO 50
51. UNCLASSIFIED / FOUO
What other comments or questions
do you have?
UNCLASSIFIED / FOUO
52. UNCLASSIFIED / FOUO
References
Neter, Wasserman, and Kutner, Applied Linear Regression Models, Irwin, 1989
Draper and Smith, Applied Regression Analysis, Wiley, 1981
Schulman, Robert S., Statistics in Plain English, Chapman and Hall, 1992.
Gunst and Mason, Regression Analysis and its Application, Marcel Dekker, 1980
Myers, Raymond H., Classical and Modern Regression with Applications,
Duxbury, 1990
Dielman, Applied Regression Analysis for Business and Economics, Duxbury,
1991
Hosmer and Lemeshow, Applied Logistic Regression, Wiley, 1989
Iglewicz and Hoaglin, How to Detect and Handle Outliers, ASQ Press
Crocker, Douglas C., How to use Regression Analysis in Quality Control, ASQ
Press
Multiple Regression UNCLASSIFIED / FOUO 52
53. UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
National Guard
Black Belt Training
APPENDIX
Additional Exercises
Anthony’s Pizza
Customer Satisfaction
A Study of Supervisor
Performance
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
54. UNCLASSIFIED / FOUO
Additional Practice Example:
Anthony’s Pizza
We have received Voice of the Customer feedback
telling us that customers are dissatisfied if we cannot
accurately predict the time of their pizza delivery when
it is beyond the 30 minute target
We would like to develop a model so that when the
customer calls, we can accurately predict delivery time
Multiple Regression UNCLASSIFIED / FOUO 54
55. UNCLASSIFIED / FOUO
Additional Practice Example:
Six Sigma Pizza
Our Minitab data can be found in the file Multiple
Regression - Pizza.mpj
Based on the data that we have collected, we are going to
study the effects of total pizzas ordered, defects, and
incorrect order on delivery time
Multiple Regression UNCLASSIFIED / FOUO 55
56. UNCLASSIFIED / FOUO
Additional Practice Exercise:
Customer Satisfaction
Bob Black Belt would like to get a better understanding of the
customer satisfaction data
Use the data provided in the Minitab file A-06 Customer
Satisfaction Data.mtw to create a Regression Model to predict
Overall Satisfaction
Each row of data is a monthly average of how customers rated the
services on a scale of 1-10. For example, in January, the average
of customer ratings for Staff Responsiveness was a 7.9.
Multiple Regression UNCLASSIFIED / FOUO 56
57. UNCLASSIFIED / FOUO
Additional Practice Exercise:
Customer Satisfaction (Cont.)
Consider Staff Responsiveness, Check-out Speed,
Frequent Guest Program, and Problems Resolved as
possible inputs that could be used to predict Overall
Satisfaction.
First, study correlation with a Matrix Plot and Correlation
Table
Next, create the initial Regression Model
Find the best combination of inputs with Best Subsets
Finally, run the reduced Regression Model
Multiple Regression UNCLASSIFIED / FOUO 57
58. UNCLASSIFIED / FOUO
Additional Practice Exercise:
A Study of Supervisor Performance
A recent survey of clerical employees in a large financial organization
included questions related to employee satisfaction with their
supervisors. The company was interested in any relationships between
specific supervisor characteristics and overall satisfaction with
supervisors as perceived by the employees,
Y = Overall rating of the job being done by the supervisor
X1 = Handles employee complaints
X2 = Does not allow special privileges
X3 = Provides opportunity to learn new things
X4 = Raises based on performance
X5 = Too critical of poor performance
X6 = Rate of advancing to better jobs (employee’s perception
of their own advancement rate)
Source: Regression Analysis by Example, Chatterjee and Price
Multiple Regression UNCLASSIFIED / FOUO 58
59. UNCLASSIFIED / FOUO
Additional Practice Exercise:
A Study of Supervisor Performance
The survey responses were on a scale of 1-5
For purposes of analysis, a score of 1 or 2 was considered
“favorable”, while a score of 3, 4, or 5 was considered “unfavorable”
Data was collected from 30 departments, selected randomly form
the organization. Each department had approximately 35 employees
with one supervisor
For each department, the data was aggregated and the data
recorded was the percent favorable for each item
Data file is A-06 Attitude.mtw
Questions:
Can we predict the overall supervisor rating using this data?
What variable(s) have the strongest correlation with the supervisor rating?
Are there any unusual observations?
Comments on the data?
Multiple Regression UNCLASSIFIED / FOUO 59