Logistic regression for ordered dependant variable with more than 2 levels

Multinomial Logistic Regression Models

January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India

 Logistic regression CAN handle dependant variables
with more than two categories
 It is important to note whether the response variable
is ordinal (consisting of ordered categories like young,
middle-aged, old) or nominal (dependant is unordered
like red, blue, black)
 Some multinomial logistic models are appropriate only
for ordered response
 It is not mathematically necessary to consider the
natural ordering when modeling ordinal response but,
 Considering the natural ordering
 Leads to a more parsimonious model
 Increase power to detect relationships with other variables


 Applying logistic regression considering the natural
order is done using a modeling technique called the
“Proportional Odds Model”
 Say the dependant variable Y has 4 states measuring
the impact of radiation on the human body; fine,
sick, serious,dead
 Let p1=prob of fine, p2=prob of sick, p3=prob of
serious, p4=prob of dead
 Let us define a baseline category: fine, since this is
the normal stage (we shall see why we need this
later)

 What if we break up the modeling of the 4 level
ordered dependant into 3 binary logistic
situations: 1 – (fine,sick), 2 – (fine,serious),3 –
(fine,dead)?
 Then we would have 3 logit equations:
 Log(p2/p1)=B11+B12X1+B13X2
 Log(p3/p1)=B21+B22X1+B23X2
 Log(p4/p1)=B31+B32X1+B33X2
X is the degree of radiation dummy with 3 levels so
broken into 2 binary dummies
 So, 9 parameters to be estmated


 Now consider an alternative model for the same
situation
 Cumulative logit model:
 L1=log(p1/p2+p3+p4)
 L2=log(p1+p2/p3+p4)
 L3=log(p1+p2+p3/p4)
 The obvious way to introduce covariates is
 L1=B11+B12X1+B13X2
 L2=B21+B22X1+B23X2
 L3=B31+B32X1+B33X2


 Let us simplyfy the model by specifying that
the slope parameters are identical over the
logit equations. Then,
 L1=A1+B1X1+B2X2
 L2=A2+B1X1+B2X2
 L3=A3+B1X1+B2X2
 This is the proportional odds cumulative logit
model


 Suppose that the categorical outcome is actually a
categorized version of an unobservable (latent)
continuous variable which has a logistic distribution
 The continuous scale is divided into ﬁve regions by
four cut-points c1, c2, c3, c4 which are determined by
nature
 If Z ≤ c1 we observe Y = 1; if c1 < Z ≤ c2 we observe Y =
2; and so on
 Suppose that the Z is related to the X’s through a linear
regression
 Then, the coarsened categorical variable would be
related Y will be related to the X’s by a proportional-
odds cumulative logit model

 Let us go back to the model
 L1=A1+B1X1+B2X2
 L2=A2+B1X1+B2X2
 L3=A3+B1X1+B2X2
 Note that Lj is the log-odds of falling into or below category j
versus falling above it
 Aj is the log-odds of falling into or below category j when X1 =
X2 = 0
 B1 is the increase in log-odds of falling into or below any
category associated with a one-unit increase in Xk, holding all
the other X-variables constant.
 Therefore, a positive slope indicates a tendency for the
response level to decrease as the variable decreases

 Our example of 4 levels of impact of radiation
corresponding to 3 levels of radiation

proc logistic data=radiation_impact;
freq count;
class radiation / order=data param=ref ref=first;
model sickness (order=data descending) = radiation /
link=logit
aggregate=(radiation) scale=none;
run;

 Freq=count
 This is important for specifying grouped data
 Count is the variable that contains the frequency of
occurrance of each observation
 In its absence, each row would be considered a
separate row of data
 Class=radiation
 Specifies that radiation is a classification variable to
be used in the analysis
 SAS would automatically generate n-1 binary
dummies for n categories of radiation with param=ref
option

 Order=data
 Simply tells SAS to arrange the response categories in
the order they occur in the input data 1,2,3,4
 Param=ref
 This implies that there is going to be dummy coding
for the classification variable ‘radiation’listed in class
 Ref=first
 Designates the first ordered level, in this case ‘fine’ as
the reference level


 Order=data descending
 This tells SAS to reverse the order of the logits
 So, instead of the cumulative logit model being
 L1=log(p1/p2+p3+p4)
 L2=log(p1+p2/p3+p4)
 L3=log(p1+p2+p3/p4), it becomes
 L1=log(p4/p1+p2+p3)
 L2=log(p4+p3/p1+p2)
 L3=log(p4+p3+p2/p1)
 Now, a positive B1 indicates that a higher value of X1
leads to greater chance of radiation sickness

 Link=logit
 fits the cumulative logit model when there are more
than two response categories
 Aggregate=radiation
 Indicates that the goodness of fit statistics are to be
calculated on the subpopulations of the variable:
radiation
 Scale=none
 No correction is need for the dispersion parameter
 To understand this, read up. This happens when the
goodness of fit statistic exceeds its degrees of freedom
and need to be corrected for

 When we ﬁt this model, the first output we
see:
Score Test for the Proportional Odds Assumption
Chi-Square DF Pr > ChiSq
17.2866 21 0.6936

 Null hypothesis is that the current proportional-odds
cumulative logit model is true
 Seems like we fail to reject the null and so can proceed to the
rest of the output under the current assumption


 Ultimately we are interested in the predicted
probabilities
OUTPUT <OUT=SAS-data-set><options>
 Predicted=
 For a cumulative model, it is the predicted cumulative
probability (that is, the probability that the response
variable is less than or equal to the value of _LEVEL_);
 PREDPROBS=I or C
 Individual|I requests the predicted probability of each
response level.
 CUMULATIVE | C requests the cumulative predicted
probability of each response level

Logistic regression for ordered dependant variable with more than 2 levels

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Logistic regression for ordered dependant variable with more than 2 levels

Ähnlich wie Logistic regression for ordered dependant variable with more than 2 levels (20)

Mehr von Arup Guha

Mehr von Arup Guha (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Logistic regression for ordered dependant variable with more than 2 levels