2. General Linear Model
The regression models we have been looking at, (Simple Linear
Regression, Multiple Regression) are part of a larger family of models.
This is referred to as the General Linear Model.
We extend the notation of the regression model to yield the GLM:
y = β0 + β1z1 + β2z2 + ... + βpzp + ε
Where the independent variables zj is a function of x1, ... xk, which are
the actual variables on which data is collected.
This allows us to model more complex relationships.
3. Modeling Curvilinear Relationships
It is quite possible your data may not fit to a line, but rather to a
curved line. -- Figure 16.1 gives a good example of this using 1
independent variable.
For example, a second order model with 1 predictor variable
has the following notation:
y = β0 + β1x1 + β2x12 + ε
Notice we now have two β's with 1 independent variable.
4. Modeling Curvilinear Relationships
So two big questions:
1. How do we know we have a curvilinear relationship?
2. How do we account for this in our calculations?
For Question 1:
This is why it is important to understand all the ways we can
analyze our data.
What does our scatter plot look like?
What do our F and t tests reveal?
Are there any patterns with respect to our standardized
residuals?
For Question 2:
To use a model like the one on the previous slide, in Excel we
would simply create a new column for the squared variable.
5. Interaction Terms
There will be times when it is more fruitful to model the joint effect two (or more)
variables will have on a response.
When developing a model, we may see the combined effects of two variables
help us form a better prediction.
In that case we can add an interaction term to the model.
The following is a model with an interaction term:
y = β0 + β1x1 + β2x2 + β3x1x2 + ε
Knowing that the joint effects yield a better predictive model may lead us to
using an interaction term.
In general, when performing a regression analysis this model will undoubtedly
undergo several iterations. Luckily we have tools like Excel, SAS, SPSS, R, etc.
which make it very easy for us to change this and find new estimates.
6. Transformations of the Dependent
Variableexamples we showed transformations of the independent
In the previous
variables.
It might be worthwhile to consider a transformation of the dependent variable...
why, you say?
We can use this as a tool to correct for non-constant variance. So we may still
apply our key assumptions.
A good way to check for non-constant variance is to look at the standardized
residuals.
There are two common types of transformations for the dependent variable:
1. Modeling the log of the dependent variable:
1. ln(y) = β0 + β1x1 + β2x2 + ε, then to get the actual y value, take eln(y)
2. Modeling the reciprocal of the dependent variable:
1. 1/y = β0 + β1x1 + β2x2 + ε
There is no way to tell which would be better without actually trying them out!
7. Nonlinear Models (That are intrinsically linear)
A model in which the parameters (β0, β1,..., βp) have exponents
of 1 are considered linear.
But even when these parameters do not have exponents of 1,
we can perform some transformations that allow us to do
regression analysis.
For example, we could have an exponential equation:
E(y) = β0β1x
That could then be transformed by taking the log of both sides:
log E(y) = log β0 + x log β1
Luckily, we don't see this too much in business statistics!