Weitere ähnliche Inhalte
Ähnlich wie Logistic regression for ordered dependant variable with more than 2 levels (20)
Kürzlich hochgeladen (20)
Logistic regression for ordered dependant variable with more than 2 levels
- 2. Logistic regression CAN handle dependant variables
with more than two categories
It is important to note whether the response variable
is ordinal (consisting of ordered categories like young,
middle-aged, old) or nominal (dependant is unordered
like red, blue, black)
Some multinomial logistic models are appropriate only
for ordered response
It is not mathematically necessary to consider the
natural ordering when modeling ordinal response but,
Considering the natural ordering
Leads to a more parsimonious model
Increase power to detect relationships with other variables
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 3. Applying logistic regression considering the natural
order is done using a modeling technique called the
“Proportional Odds Model”
Say the dependant variable Y has 4 states measuring
the impact of radiation on the human body; fine,
sick, serious,dead
Let p1=prob of fine, p2=prob of sick, p3=prob of
serious, p4=prob of dead
Let us define a baseline category: fine, since this is
the normal stage (we shall see why we need this
later)
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 4. What if we break up the modeling of the 4 level
ordered dependant into 3 binary logistic
situations: 1 – (fine,sick), 2 – (fine,serious),3 –
(fine,dead)?
Then we would have 3 logit equations:
Log(p2/p1)=B11+B12X1+B13X2
Log(p3/p1)=B21+B22X1+B23X2
Log(p4/p1)=B31+B32X1+B33X2
X is the degree of radiation dummy with 3 levels so
broken into 2 binary dummies
So, 9 parameters to be estmated
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 5. Now consider an alternative model for the same
situation
Cumulative logit model:
L1=log(p1/p2+p3+p4)
L2=log(p1+p2/p3+p4)
L3=log(p1+p2+p3/p4)
The obvious way to introduce covariates is
L1=B11+B12X1+B13X2
L2=B21+B22X1+B23X2
L3=B31+B32X1+B33X2
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 6. Let us simplyfy the model by specifying that
the slope parameters are identical over the
logit equations. Then,
L1=A1+B1X1+B2X2
L2=A2+B1X1+B2X2
L3=A3+B1X1+B2X2
This is the proportional odds cumulative logit
model
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 7. Suppose that the categorical outcome is actually a
categorized version of an unobservable (latent)
continuous variable which has a logistic distribution
The continuous scale is divided into five regions by
four cut-points c1, c2, c3, c4 which are determined by
nature
If Z ≤ c1 we observe Y = 1; if c1 < Z ≤ c2 we observe Y =
2; and so on
Suppose that the Z is related to the X’s through a linear
regression
Then, the coarsened categorical variable would be
related Y will be related to the X’s by a proportional-
odds cumulative logit model
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 8. Let us go back to the model
L1=A1+B1X1+B2X2
L2=A2+B1X1+B2X2
L3=A3+B1X1+B2X2
Note that Lj is the log-odds of falling into or below category j
versus falling above it
Aj is the log-odds of falling into or below category j when X1 =
X2 = 0
B1 is the increase in log-odds of falling into or below any
category associated with a one-unit increase in Xk, holding all
the other X-variables constant.
Therefore, a positive slope indicates a tendency for the
response level to decrease as the variable decreases
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 9. Our example of 4 levels of impact of radiation
corresponding to 3 levels of radiation
proc logistic data=radiation_impact;
freq count;
class radiation / order=data param=ref ref=first;
model sickness (order=data descending) = radiation /
link=logit
aggregate=(radiation) scale=none;
run;
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 10. Freq=count
This is important for specifying grouped data
Count is the variable that contains the frequency of
occurrance of each observation
In its absence, each row would be considered a
separate row of data
Class=radiation
Specifies that radiation is a classification variable to
be used in the analysis
SAS would automatically generate n-1 binary
dummies for n categories of radiation with param=ref
option
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 11. Order=data
Simply tells SAS to arrange the response categories in
the order they occur in the input data 1,2,3,4
Param=ref
This implies that there is going to be dummy coding
for the classification variable ‘radiation’listed in class
Ref=first
Designates the first ordered level, in this case ‘fine’ as
the reference level
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 12. Order=data descending
This tells SAS to reverse the order of the logits
So, instead of the cumulative logit model being
L1=log(p1/p2+p3+p4)
L2=log(p1+p2/p3+p4)
L3=log(p1+p2+p3/p4), it becomes
L1=log(p4/p1+p2+p3)
L2=log(p4+p3/p1+p2)
L3=log(p4+p3+p2/p1)
Now, a positive B1 indicates that a higher value of X1
leads to greater chance of radiation sickness
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 13. Link=logit
fits the cumulative logit model when there are more
than two response categories
Aggregate=radiation
Indicates that the goodness of fit statistics are to be
calculated on the subpopulations of the variable:
radiation
Scale=none
No correction is need for the dispersion parameter
To understand this, read up. This happens when the
goodness of fit statistic exceeds its degrees of freedom
and need to be corrected for
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 14. When we fit this model, the first output we
see:
Score Test for the Proportional Odds Assumption
Chi-Square DF Pr > ChiSq
17.2866 21 0.6936
Null hypothesis is that the current proportional-odds
cumulative logit model is true
Seems like we fail to reject the null and so can proceed to the
rest of the output under the current assumption
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India
- 15. Ultimately we are interested in the predicted
probabilities
OUTPUT <OUT=SAS-data-set><options>
Predicted=
For a cumulative model, it is the predicted cumulative
probability (that is, the probability that the response
variable is less than or equal to the value of _LEVEL_);
PREDPROBS=I or C
Individual|I requests the predicted probability of each
response level.
CUMULATIVE | C requests the cumulative predicted
probability of each response level
January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India