SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS
S. D. KACHMAN
Abstract. Linear mixed models provide a powerful means of predicting breeding values. However,
for many traits of economic importance the assumptions of linear responses, constant variance,
and normality are questionable. Generalized linear mixed models provide a means of modeling
these deviations from the usual linear mixed model. This paper will examine what constitutes a
generalized linear mixed model, issues involved in constructing a generalized linear mixed model,
and the modifications necessary to convert a linear mixed model program into a generalized linear
mixed model program.
1. Introduction
Generalized linear mixed models (GLMM) [1, 2, 3, 6] have attracted considerable attention over
the years. With the advent of SAS’s GLIMMIX macro [5] generalized linear mixed models have
become available to a larger audience. However, in a typical breeding evaluation generic packages
are too inefficient and implementations in FORTRAN or C are needed. In addition, GLMM’s pose
additional challenges with some solutions heading for ±∞.
The objective of this paper is to provide an introduction to generalized linear mixed models.
In section 2, I will discuss some of the deficiencies of a linear model. In section 3, I will present
the generalized linear mixed model. In section 4, I will present the estimation equations for the
fixed and random effects. In section 5, I will present a set of estimating equations for the variance
components. In section 6, I will discuss some of the computational issues involved when these
approaches are used in practice.
2. Mixed models
In this section I will discuss the linear mixed model and when the assumptions it makes are not
appropriate. A linear mixed model is
y|u ∼ N(Xβ + Zu, R)
where u ∼ N(0, G), X and Z are known design matrices, and the covariance matrices R and G
may depend on a set of unknown variance components. The linear mixed model assumes that the
responses can be modeled as a linear function, the variance is not a function of the mean response,
and that the random effects follow a normal distribution. Any or all these assumptions may be
violated for certain traits.
A case where the assumption of linear responses is questionable is pregnancy rate. Pregnancy
is a zero/one trait, that is at a given point an animal is either pregnant (1) or is not pregnant
(0). A change in the management system which is expected to increase pregnancy rate by .1 in a
herd with a current pregnancy rate of .5 would be expected to have a smaller effect in a herd with
a current pregnancy rate of .8. That is a treatment, an environmental factor, or a sire would be
expected to have a larger effect when the pregnancy rate is .5 than when the pregnancy rate is .8.
Another case where the assumption of linear responses is questionable is growth. Cattle, pigs,
sheep, and mice have similar growth curves over time. That is after a period of rapid growth they
reach maturity and the growth rate is considerably slower. That is the effect of time on growth is
1
2 S. D. KACHMAN
not linear, with time having a much larger effect when the animal is young and a very small effect
when the animal is mature.
The assumption of constant variance is also questionable for pregnancy. When the predicted
pregnancy rate µ for a cow is .5 the variance is µ(1 − µ) = .25. If on the other hand the predicted
pregnancy rate for a cow is .8 the variance drops to .16. For some production traits the variance
increases as mean level of production increases.
The assumption of normality is also questionable for pregnancy. It is hard to justify the assump-
tion that the density function of a random variable which only takes on two values is close to a
continuous bell shaped curve from −∞ to +∞.
A number of approaches have been taken to address the deficiencies of a linear mixed model.
Transformations have been used to stabilize the variance, obtain a linear response, and normalize
the distribution. However the transformation needed to stabilize the variance may not be the same
transformation needed to obtain a linear response. For example a log transformation to stabilize
the variance has the side effect that the model on the original scale is multiplicative. Linear and
multiplicative adjustments are used to adjust to a common base and to account for heterogeneous
variances. Multiple trait analysis can be used to account for heterogeneity of responses in differ-
ent environments. Separate variances can be estimated for different environmental groups where
the environments groups are based on the observed production. A final option is to ignore the
deficiencies of the linear mixed model and proceed as if a linear model does hold.
The above options have the appeal that they are relatively simple and cheap to implement. Given
the robustness of the estimation procedures, they can be expected to produce reasonable results.
However, these options sidestep the issue that the linear mixed model is incorrect. Specifically we
have a set of estimation procedures which are based on a linear mixed model and manipulate the
data to make it fit a linear mixed model. It seems more reasonable to start with an appropriate
model for the data and use a estimation procedure derived from that model. A generalized linear
mixed model is a model which gives us the extra flexibility in developing an appropriate model for
the data.
3. A Generalized Linear Mixed Model
In this section I will present a formulation of the generalized linear mixed model. It differs from
presentations such as [1] in that it focuses more on the inverse link function instead of the link
function to model the relationship between the linear predictor and the conditional mean. It also
includes the nonlinear mixed models of [4].
Figure 1 provides a symbolic representation of a generalized linear mixed model. As in a linear
mixed model, a generalized linear mixed model includes fixed effects β, random effects u ∼ N(0, G),
design matrices X and Z, and a vector of observations y whose conditional distribution given the
random effects has mean µ and covariance R. In addition, a generalized linear mixed model includes
a linear predictor η, and a link and/or inverse link function. In addition, the conditional mean µ
depends on the linear predictor through an inverse link function h(·) and the covariance matrix R
depends on the µ through a variance function.
3.1. Linear Predictor η. As in a linear mixed model the fixed and random are combined to form
a linear predictor
η = Xβ + Zu.
In a linear mixed model the vector of observations y is obtained by adding a residual term
e ∼ N(0, R) as follows
y = η + e = Xβ + Zu + e.
Equivalently, the residual variability can be modeled as
y|u ∼ N(η, R).
AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS 3
Variance
Components
σ
Random
Effects
u
G
β
Fixed
Effects
Xβ + Zu
η
Linear
Predictor
Inverse
Link
µ
h(η)
Mean
Noise
y
R
Observations
Figure 1. Symbolic representation of a generalized linear mixed model.
Except when y has a Normal distribution, the formulation using e is clumsy. Therefore, a general-
ized linear mixed model uses the second approach to model the residual variability. The relationship
between the linear predictor and the vector of observations in a generalized linear mixed model is
modeled as
y|u ∼ (h(η), R)
where the notation y ∼ (µ, σ) says that the conditional distribution of y given u has mean µ
and variance Σ. The conditional distribution of y given u will be refered to as the error distri-
bution. Selection of which fixed and random effects to include in the model will follow the same
considerations you would use in a linear mixed model.
It is important to note, that the effect of the linear predictor is expressed through an inverse
link function. Except for the inverse link function h(η) = η + c the effect of a one unit change in
ηi will not correspond to a one unit change in the conditional mean. That is a predicted progeny
difference will depend on a progeny’s environment. This will be considered in more detail in the
next section.
3.2. (Inverse) Link Function. The inverse link function is used to map the value of the linear
predictor for observation i to the conditional mean for observation i. For many traits the inverse
link function is one to one, that is both µi and ηi are scalars. For threshold models, µi is a t × 1
4 S. D. KACHMAN
Distribution Link Inverse Link
Normal Identity Identity
Binomial/n ln[µ/(1 − µ)] eη/(1 + eη)
Probit Φ(η)
Poisson ln(µ) eη
Gamma 1/µ 1/η
ln(µ) eη
Table 1. Common link functions for various distributions
Distribution v(µ)
Normal 1
Binomial/n µ(1 − µ)/n
Poisson µ
Gamma µ2
Table 2. Common variance functions for various distributions
vector, where t is the number of ordinal levels. For growth curve models, µi is an ni × 1 vector and
ηi is a p × 1, where the animal is measured ni times and there are p growth curve parameters.
For the linear mixed model, the inverse link function is the identity function h(ηi) = ηi. For
zero/one traits a logit link function ηi = ln(µi/[1 − µi]) is often used, the corresponding inverse
link function is µi = eηi
1+eηi . The logit link function, unlike the identity link function, will always
yield estimated means in the range of zero to one. However, the effect of a one unit change in
the linear predictor is not constant. When the linear predictor is 0 the corresponding mean is .5.
Increasing the linear predictor by 1 increases the corresponding mean by .23. If the linear predictor
is 3 the corresponding mean is .95. Increasing the linear predictor by 1 increases the corresponding
mean by only .03. For most univariate link functions, link and inverse link functions are increasing
monotonic functions. In other words, an increase in the linear predictor results in an increase in
the conditional mean.
Selection of inverse link functions are typically based on the error distribution. Table 1 lists
a number of common distributions along with their link functions. Other considerations include
simplicity and the ability to interpret.
3.3. Variance Function. The variance function is used to model the residual variability. Typically
in a generalized linear model residual variability arises from two sources. First, variability arises
from the sampling distribution. For example, a Poisson random variable with mean µ has a variance
of µ. Second, additional variability, or over-dispersion, is often observed.
Modeling the variability due to the sampling distribution is straight forward. In Table 2 the
variance functions for some common sampling distributions are given.
Variability due to over-dispersion is typically modeled in a number of ways. One approach is to
scale the residual variability as var(yi|u) = φv(µi). Where φ is an over-dispersion parameter. A
second approach is to add a additional random effect ei ∼ N(0, φ) to the linear predictor for each
observation. A third approach is to select another distribution. For example, instead of using a one
parameter (µ) Poisson distribution for count data a two parameter (µ, φ) Negative Binomial could
be used. The three approaches all involve the estimation of an additional parameter φ. Scaling the
residual variability is the simplest, but can yield poor results. The addition of a random effect has
the effect of greatly increasing the computational costs.
AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS 5
3.4. The parts. To summarize, a generalized linear model is composed of three parts. First, a
linear predictor, η = Xβ + Zu, is used to model the relationship between the fixed and random
effects. The residual variability contained in the +e of the linear mixed model is incorporated
in the variance function of the generalized linear mixed model. Second, an inverse link function,
µi = h(ηi), is used to model the relationship between the linear predictor and the conditional mean
of the observed trait. In general, the link function is selected to be both simple and reasonable.
Third, a variance function, v(µi, φ), is used to model the residual variability. Selection of the
variance function is typically dictated by the error distribution that was used. In addition, the
observed residual variability is typically greater than expected due to sampling and needs to be
accounted for.
4. Estimation & Prediction
The estimating equations for a generalized linear mixed model can be derived in a number of
ways. From a Bayesian perspective the solution to the estimating equations are posterior mode
predictors. The estimating equations can be obtained by using a Laplacian approximation of the
likelihood. The estimating equation for the fixed and random effects are
X H R−1
HX X H R−1
HZ
Z H R−1
HX Z H R−1
HZ + G−1
β
u
=
X H R−1
y∗
Z H R−1
y∗
(1)
where
H =
∂µ
∂η
R = var(y|u)
y∗
= y − µ + Hη.
The first thing to note is the similarity to the usual mixed model equations. This is easier to see if
we rewrite the equations in the following form
X W X X W Z
Z W X Z W Z + G−1
β
u
=
X hry
Z hry
(2)
where
W = H R−1
H
hry = H R−1
y∗
.
Unlike the mixed model equations, the estimating equations (1) for a generalized linear mixed
must be solved iteratively.
4.1. Univariate Logit. To see how all the pieces fit together, we will examine a univariate logit
which would be appropriate for examining proportion data. Let yi be the proportion out of n, a
reasonable error distribution for nyi would be Binomial with parameters n and µi. The variance
function for a scaled Binomial random variable is v(µi) = µi(1 − µi)/n. The inverse link function
is µi = eηi
1+eηi . The weights and standardized dependent variables for observation i are
∂µi
∂ηi
= µi(1 − µi)
Wii = n µi(1 − µi)
hryi = n yi − µi + µi(1 − µi)ηi .
Translated into a FORTRAN subroutine yields
6 S. D. KACHMAN
SUBROUTINE LINK(Y,WT,ETA,MU,W,HRY)
REAL*8 Y,WT,ETA,MU,W,R,VAR,H
mu=exp(eta)/(1.+exp(eta))
h=mu*(1.-mu)
var=h/wt
W=(H/VAR)*H
HRY=(H/VAR)*(Y-MU)+W*ETA
RETURN
END
In the above subroutine Y= yi, WT= n, and ETA= ηi are passed to the subroutine. The subroutine
returns W=Wii, MU= µi, and HRY= hryi. In the subroutine the lines
mu=exp(eta)/(1.+exp(eta))
h=mu*(1.-mu)
would need to be changed if a different link function was selected. The line
var=h/wt
would need to be changed if a different variance function was selected. The changes to the LINK
subroutine to take into account boundary conditions will be discussed in section 6.
Each round of solving the estimating equations (2) a new set of linear predictors needs to be
calculated. This can be accomplished during the construction of the LHS and RHS assuming
solutions are not destroyed. The FORTRAN code for this
DO I=1,N
Read in record
ETA=0
DO J=1,NEFF
ETA=ETA+X*SOL
END DO
CALL LINK(Y,WT,ETA,MU,W,HRY)
Build LHS and RHS
END DO
5. Variance Component Estimation
The code given above assumes that the variance components are known. In this section I will
discuss how estimates of the variance components can be obtained. Conceptually the variance
component problem can be broken into two parts. The first part is the estimation of the variance
components associated with the random effects. The second part is the estimation of the variance
components associated with the error distribution.
Estimation of the variance components associated with the random effects does not require any
additional changes except those associated with (2).
Estimation of the variance components associated with the error distribution is more problematic.
In the case where the over-dispersion parameter φ simply scales the variance, then estimation of
φ is identical to the estimation of the residual variance when weights are used with data following
a Normal distribution. In the general case, estimation of the over-dispersion parameter proceeds
as follows. First, ei = yi − µi is obtained. Second, the prediction error variance for the residual is
obtained as
PEV (ei) = hi xi zi LHS− xi
zi
hi.
AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS 7
Third,
Qi = v(µi, φ)−1 ∂v(µi, φ)
∂φ
v(µi, φ)−1
i
eiQiei =
i
tr Qi {v(µi, φ) − PEV (ei)}
is solved. The major difficulty, is that this is computationally expensive.
6. Some Computational Issues
While the mathematics are “straight forward,” the implementation in practice is often challeng-
ing. Many of the difficulties associated with generalized linear mixed models are related to either
estimates going to infinity or divide by zero problems. Consider the univariate logit. The calcula-
tions involve 1
µi(1−µi) . Provided 0 < µ < 1 this quantity is well defined. If µ approaches either zero
or one, then the quantity 1
µi(1−µi) approaches infinity. Estimates of µ of zero or one occur when a
contemporary group consists entirely of zeros or ones.
Several options exists for handling this situation. One approach is to remove from the data set
any troublesome contemporary groups. While mathematically sound it does involve an additional
edit to remove legitimate data values. A second approach is to treat contemporary group effects
as a random effects. However, the decision to treat a set of effects as fixed or random should be
driven from a modelling standpoint and not as an artifact of the estimation procedure. A third
approach is to adjust yi away from 0 and 1. For example, one could use (yi +∆)/(1+2∆). A fourth
approach is to look at what happens to the quantities Wii and hryi when ηi approaches infinity.
ηi → ±∞
⇒ Wii → 0
⇒ hryi → n(yi − µi)
In the limit Wii and hryi are both well defined. One could then recode the link function as
SUBROUTINE LINK(Y,WT,ETA,MU,W,HRY)
REAL*8 Y,WT,ETA,MU,W,R,VAR,H
if(abs(eta) > BIG) then
if(eta > 0) then
mu=1.
else
mu=0
end if
w=0
hry=wt*(y-mu)
return
end if
mu=exp(eta)/(1.+exp(eta))
h=mu*(1.-mu)
var=h/wt
W=(H/VAR)*H
HRY=(H/VAR)*(Y-MU)+W*ETA
RETURN
END
8 S. D. KACHMAN
where BIG is a sufficiently but not too large of a number. For example BIG= 10 might be a
reasonable choice. However, using Wii = 0 will usually result in a zero diagonal in the estimating
equations. When these equations are solved, the fixed effect which was approaching infinity will
be set to zero. This can result in some interesting convergence problems. A solution, would be to
“fix” the offensive fixed effect by making the following changes
SUBROUTINE LINK(Y,WT,ETA,MU,W,HRY)
REAL*8 Y,WT,ETA,MU,W,R,VAR,H
if(abs(eta) > BIG) then
if(eta > 0) then
eta=BIG
else
eta=-BIG
end if
end if
mu=exp(eta)/(1.+exp(eta))
h=mu*(1.-mu)
var=h/wt
W=(H/VAR)*H
HRY=(H/VAR)*(Y-MU)+W*ETA
RETURN
END
6.1. Additional Parameters. Models such as threshold models with more than two classes and
growth curve models require more than one linear predictor per observation. In the case of a
threshold model the value of an observation is determined by the usual linear predictor and a
threshold linear predictor. In growth curve models animal i has ni observations. The linear
predictor for animal i is a p × 1 vector, where p is the the number of growth curve parameters.
Unlike the univariate case the matrices H, W , and R may not be diagonal. Typically, the
structure for these matrices are
R = Diag(Ri)
Ri is a ni × ni residual covariance matrix
H = Diag(Hi)
Hi is a ni × p matrix of partial derivatives
Hi =





∂µi1
∂ηi1
. . . ∂µi1
∂ηip
...
∂µini
∂ηi1
. . .
∂µini
∂ηip





W = Diag(W i)
W i = HiR−1
i Hi
7. Conclusions
Generalized linear mixed models provide a very flexible means of modeling production traits.
This flexibility allows the researcher to focus more on selecting an appropriate model as opposed
to manipulations to make the data fit a restricted class of models. The flexibility of a generalized
AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS 9
linear mixed model provides an extra challenge in selecting an appropriate model. As a general rule
the aim should be to select as simple a model as possible which does a reasonable job of modeling
the data.
While generalized linear mixed models programs are not as readily available as linear mixed
model programs, the modifications to a linear mixed model program should be minor. The two
changes that are needed are an addition of a Link subroutine and to iteratively solve the estimat-
ing equations. In constructing a link subroutine it is important to handle boundary conditions
gracefully.
When selecting a link function it is important to remember that when you get the results from
the analysis, you will need to both interpret your results and present your results in a meaningful
manner. Generalized linear mixed models provide us with a very powerful tool. However, just
because it is fancy and expensive does not imply that it is better.
References
[1] N. E. Breslow and D. G. Clayton. Approximate inference in generalized linear mixed models. J. Amer. Statist.
Assoc., 88:9–25, 1993.
[2] B. Engel and A. Keen. A simple approach for the analysis of generalized linear mixed models. Statist. Neerlandica,
48(1):1–22, 1994.
[3] D. Gianola and J. L. Foulley. Sire evaluation for ordered categorical data with a threshold model. G´en´et. S´el.
Evol., 15(2):210–224, 1983.
[4] M. J. Lindstrom and D. M. Bates. Nonlinear mixed effects models for repeated measures data. Biometrics, 46:673–
687, 1990.
[5] Ramon C. Littell, George A. Milliken, Walter W. Stroup, and Russell D. Wolfinger. SAS System for Mixed Models.
SAS Institute Inc., Cary, NC, 1996.
[6] I. Misztal, D. Gianola, and J. L. Foulley. Computing aspects of a nonlinear method of sire evaluation for categorical
data. J. Dairy Sci., 72:1557–1568, 1989.
Department of Biometry, University of Nebraska–Lincoln
E-mail address: skachman@unl.edu

Weitere ähnliche Inhalte

Was ist angesagt?

Dummy variable
Dummy variableDummy variable
Dummy variable
Akram Ali
 
Solving stepwise regression problems
Solving stepwise regression problemsSolving stepwise regression problems
Solving stepwise regression problems
Soma Sinha Roy
 

Was ist angesagt? (19)

Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Dummy variable
Dummy variableDummy variable
Dummy variable
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Econometrics - lecture 18 and 19
Econometrics - lecture 18 and 19Econometrics - lecture 18 and 19
Econometrics - lecture 18 and 19
 
Econometrics chapter 8
Econometrics chapter 8Econometrics chapter 8
Econometrics chapter 8
 
Heteroscedasticity | Eonomics
Heteroscedasticity | EonomicsHeteroscedasticity | Eonomics
Heteroscedasticity | Eonomics
 
Solving stepwise regression problems
Solving stepwise regression problemsSolving stepwise regression problems
Solving stepwise regression problems
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Multivariate reg analysis
Multivariate reg analysisMultivariate reg analysis
Multivariate reg analysis
 
Introduction to Econometrics
Introduction to EconometricsIntroduction to Econometrics
Introduction to Econometrics
 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
 
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
 
7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 

Ähnlich wie Modelo Generalizado

Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
recepmaz
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
recepmaz
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
AlemAyahu
 
A researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docxA researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docx
evonnehoggarth79783
 
IDBR-DP-TaverneLambert
IDBR-DP-TaverneLambertIDBR-DP-TaverneLambert
IDBR-DP-TaverneLambert
Cedric Taverne
 
ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015
Sayantan Baidya
 

Ähnlich wie Modelo Generalizado (20)

Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
journals public
journals publicjournals public
journals public
 
journal in research
journal in research journal in research
journal in research
 
research journal
research journalresearch journal
research journal
 
published in the journal
published in the journalpublished in the journal
published in the journal
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.ppt
 
MCA_UNIT-4_Computer Oriented Numerical Statistical Methods
MCA_UNIT-4_Computer Oriented Numerical Statistical MethodsMCA_UNIT-4_Computer Oriented Numerical Statistical Methods
MCA_UNIT-4_Computer Oriented Numerical Statistical Methods
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression Analysis
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
 
Mixed models
Mixed modelsMixed models
Mixed models
 
A researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docxA researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docx
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
IDBR-DP-TaverneLambert
IDBR-DP-TaverneLambertIDBR-DP-TaverneLambert
IDBR-DP-TaverneLambert
 
Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.
 
ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015
 

Mehr von Julio Martinez Andrade (8)

Revista ecoplural mefp
Revista ecoplural mefpRevista ecoplural mefp
Revista ecoplural mefp
 
Algodon
AlgodonAlgodon
Algodon
 
Certificacion mv tc
Certificacion mv tcCertificacion mv tc
Certificacion mv tc
 
Cuencas pedagogicas
Cuencas pedagogicasCuencas pedagogicas
Cuencas pedagogicas
 
Manual mantenimeitno aspersion
Manual mantenimeitno aspersionManual mantenimeitno aspersion
Manual mantenimeitno aspersion
 
Gestion y-fundamentos-de-eia
Gestion y-fundamentos-de-eiaGestion y-fundamentos-de-eia
Gestion y-fundamentos-de-eia
 
Clasificadores 2017
Clasificadores 2017Clasificadores 2017
Clasificadores 2017
 
Tricodamp
 Tricodamp Tricodamp
Tricodamp
 

Kürzlich hochgeladen

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 

Modelo Generalizado

  • 1. AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS S. D. KACHMAN Abstract. Linear mixed models provide a powerful means of predicting breeding values. However, for many traits of economic importance the assumptions of linear responses, constant variance, and normality are questionable. Generalized linear mixed models provide a means of modeling these deviations from the usual linear mixed model. This paper will examine what constitutes a generalized linear mixed model, issues involved in constructing a generalized linear mixed model, and the modifications necessary to convert a linear mixed model program into a generalized linear mixed model program. 1. Introduction Generalized linear mixed models (GLMM) [1, 2, 3, 6] have attracted considerable attention over the years. With the advent of SAS’s GLIMMIX macro [5] generalized linear mixed models have become available to a larger audience. However, in a typical breeding evaluation generic packages are too inefficient and implementations in FORTRAN or C are needed. In addition, GLMM’s pose additional challenges with some solutions heading for ±∞. The objective of this paper is to provide an introduction to generalized linear mixed models. In section 2, I will discuss some of the deficiencies of a linear model. In section 3, I will present the generalized linear mixed model. In section 4, I will present the estimation equations for the fixed and random effects. In section 5, I will present a set of estimating equations for the variance components. In section 6, I will discuss some of the computational issues involved when these approaches are used in practice. 2. Mixed models In this section I will discuss the linear mixed model and when the assumptions it makes are not appropriate. A linear mixed model is y|u ∼ N(Xβ + Zu, R) where u ∼ N(0, G), X and Z are known design matrices, and the covariance matrices R and G may depend on a set of unknown variance components. The linear mixed model assumes that the responses can be modeled as a linear function, the variance is not a function of the mean response, and that the random effects follow a normal distribution. Any or all these assumptions may be violated for certain traits. A case where the assumption of linear responses is questionable is pregnancy rate. Pregnancy is a zero/one trait, that is at a given point an animal is either pregnant (1) or is not pregnant (0). A change in the management system which is expected to increase pregnancy rate by .1 in a herd with a current pregnancy rate of .5 would be expected to have a smaller effect in a herd with a current pregnancy rate of .8. That is a treatment, an environmental factor, or a sire would be expected to have a larger effect when the pregnancy rate is .5 than when the pregnancy rate is .8. Another case where the assumption of linear responses is questionable is growth. Cattle, pigs, sheep, and mice have similar growth curves over time. That is after a period of rapid growth they reach maturity and the growth rate is considerably slower. That is the effect of time on growth is 1
  • 2. 2 S. D. KACHMAN not linear, with time having a much larger effect when the animal is young and a very small effect when the animal is mature. The assumption of constant variance is also questionable for pregnancy. When the predicted pregnancy rate µ for a cow is .5 the variance is µ(1 − µ) = .25. If on the other hand the predicted pregnancy rate for a cow is .8 the variance drops to .16. For some production traits the variance increases as mean level of production increases. The assumption of normality is also questionable for pregnancy. It is hard to justify the assump- tion that the density function of a random variable which only takes on two values is close to a continuous bell shaped curve from −∞ to +∞. A number of approaches have been taken to address the deficiencies of a linear mixed model. Transformations have been used to stabilize the variance, obtain a linear response, and normalize the distribution. However the transformation needed to stabilize the variance may not be the same transformation needed to obtain a linear response. For example a log transformation to stabilize the variance has the side effect that the model on the original scale is multiplicative. Linear and multiplicative adjustments are used to adjust to a common base and to account for heterogeneous variances. Multiple trait analysis can be used to account for heterogeneity of responses in differ- ent environments. Separate variances can be estimated for different environmental groups where the environments groups are based on the observed production. A final option is to ignore the deficiencies of the linear mixed model and proceed as if a linear model does hold. The above options have the appeal that they are relatively simple and cheap to implement. Given the robustness of the estimation procedures, they can be expected to produce reasonable results. However, these options sidestep the issue that the linear mixed model is incorrect. Specifically we have a set of estimation procedures which are based on a linear mixed model and manipulate the data to make it fit a linear mixed model. It seems more reasonable to start with an appropriate model for the data and use a estimation procedure derived from that model. A generalized linear mixed model is a model which gives us the extra flexibility in developing an appropriate model for the data. 3. A Generalized Linear Mixed Model In this section I will present a formulation of the generalized linear mixed model. It differs from presentations such as [1] in that it focuses more on the inverse link function instead of the link function to model the relationship between the linear predictor and the conditional mean. It also includes the nonlinear mixed models of [4]. Figure 1 provides a symbolic representation of a generalized linear mixed model. As in a linear mixed model, a generalized linear mixed model includes fixed effects β, random effects u ∼ N(0, G), design matrices X and Z, and a vector of observations y whose conditional distribution given the random effects has mean µ and covariance R. In addition, a generalized linear mixed model includes a linear predictor η, and a link and/or inverse link function. In addition, the conditional mean µ depends on the linear predictor through an inverse link function h(·) and the covariance matrix R depends on the µ through a variance function. 3.1. Linear Predictor η. As in a linear mixed model the fixed and random are combined to form a linear predictor η = Xβ + Zu. In a linear mixed model the vector of observations y is obtained by adding a residual term e ∼ N(0, R) as follows y = η + e = Xβ + Zu + e. Equivalently, the residual variability can be modeled as y|u ∼ N(η, R).
  • 3. AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS 3 Variance Components σ Random Effects u G β Fixed Effects Xβ + Zu η Linear Predictor Inverse Link µ h(η) Mean Noise y R Observations Figure 1. Symbolic representation of a generalized linear mixed model. Except when y has a Normal distribution, the formulation using e is clumsy. Therefore, a general- ized linear mixed model uses the second approach to model the residual variability. The relationship between the linear predictor and the vector of observations in a generalized linear mixed model is modeled as y|u ∼ (h(η), R) where the notation y ∼ (µ, σ) says that the conditional distribution of y given u has mean µ and variance Σ. The conditional distribution of y given u will be refered to as the error distri- bution. Selection of which fixed and random effects to include in the model will follow the same considerations you would use in a linear mixed model. It is important to note, that the effect of the linear predictor is expressed through an inverse link function. Except for the inverse link function h(η) = η + c the effect of a one unit change in ηi will not correspond to a one unit change in the conditional mean. That is a predicted progeny difference will depend on a progeny’s environment. This will be considered in more detail in the next section. 3.2. (Inverse) Link Function. The inverse link function is used to map the value of the linear predictor for observation i to the conditional mean for observation i. For many traits the inverse link function is one to one, that is both µi and ηi are scalars. For threshold models, µi is a t × 1
  • 4. 4 S. D. KACHMAN Distribution Link Inverse Link Normal Identity Identity Binomial/n ln[µ/(1 − µ)] eη/(1 + eη) Probit Φ(η) Poisson ln(µ) eη Gamma 1/µ 1/η ln(µ) eη Table 1. Common link functions for various distributions Distribution v(µ) Normal 1 Binomial/n µ(1 − µ)/n Poisson µ Gamma µ2 Table 2. Common variance functions for various distributions vector, where t is the number of ordinal levels. For growth curve models, µi is an ni × 1 vector and ηi is a p × 1, where the animal is measured ni times and there are p growth curve parameters. For the linear mixed model, the inverse link function is the identity function h(ηi) = ηi. For zero/one traits a logit link function ηi = ln(µi/[1 − µi]) is often used, the corresponding inverse link function is µi = eηi 1+eηi . The logit link function, unlike the identity link function, will always yield estimated means in the range of zero to one. However, the effect of a one unit change in the linear predictor is not constant. When the linear predictor is 0 the corresponding mean is .5. Increasing the linear predictor by 1 increases the corresponding mean by .23. If the linear predictor is 3 the corresponding mean is .95. Increasing the linear predictor by 1 increases the corresponding mean by only .03. For most univariate link functions, link and inverse link functions are increasing monotonic functions. In other words, an increase in the linear predictor results in an increase in the conditional mean. Selection of inverse link functions are typically based on the error distribution. Table 1 lists a number of common distributions along with their link functions. Other considerations include simplicity and the ability to interpret. 3.3. Variance Function. The variance function is used to model the residual variability. Typically in a generalized linear model residual variability arises from two sources. First, variability arises from the sampling distribution. For example, a Poisson random variable with mean µ has a variance of µ. Second, additional variability, or over-dispersion, is often observed. Modeling the variability due to the sampling distribution is straight forward. In Table 2 the variance functions for some common sampling distributions are given. Variability due to over-dispersion is typically modeled in a number of ways. One approach is to scale the residual variability as var(yi|u) = φv(µi). Where φ is an over-dispersion parameter. A second approach is to add a additional random effect ei ∼ N(0, φ) to the linear predictor for each observation. A third approach is to select another distribution. For example, instead of using a one parameter (µ) Poisson distribution for count data a two parameter (µ, φ) Negative Binomial could be used. The three approaches all involve the estimation of an additional parameter φ. Scaling the residual variability is the simplest, but can yield poor results. The addition of a random effect has the effect of greatly increasing the computational costs.
  • 5. AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS 5 3.4. The parts. To summarize, a generalized linear model is composed of three parts. First, a linear predictor, η = Xβ + Zu, is used to model the relationship between the fixed and random effects. The residual variability contained in the +e of the linear mixed model is incorporated in the variance function of the generalized linear mixed model. Second, an inverse link function, µi = h(ηi), is used to model the relationship between the linear predictor and the conditional mean of the observed trait. In general, the link function is selected to be both simple and reasonable. Third, a variance function, v(µi, φ), is used to model the residual variability. Selection of the variance function is typically dictated by the error distribution that was used. In addition, the observed residual variability is typically greater than expected due to sampling and needs to be accounted for. 4. Estimation & Prediction The estimating equations for a generalized linear mixed model can be derived in a number of ways. From a Bayesian perspective the solution to the estimating equations are posterior mode predictors. The estimating equations can be obtained by using a Laplacian approximation of the likelihood. The estimating equation for the fixed and random effects are X H R−1 HX X H R−1 HZ Z H R−1 HX Z H R−1 HZ + G−1 β u = X H R−1 y∗ Z H R−1 y∗ (1) where H = ∂µ ∂η R = var(y|u) y∗ = y − µ + Hη. The first thing to note is the similarity to the usual mixed model equations. This is easier to see if we rewrite the equations in the following form X W X X W Z Z W X Z W Z + G−1 β u = X hry Z hry (2) where W = H R−1 H hry = H R−1 y∗ . Unlike the mixed model equations, the estimating equations (1) for a generalized linear mixed must be solved iteratively. 4.1. Univariate Logit. To see how all the pieces fit together, we will examine a univariate logit which would be appropriate for examining proportion data. Let yi be the proportion out of n, a reasonable error distribution for nyi would be Binomial with parameters n and µi. The variance function for a scaled Binomial random variable is v(µi) = µi(1 − µi)/n. The inverse link function is µi = eηi 1+eηi . The weights and standardized dependent variables for observation i are ∂µi ∂ηi = µi(1 − µi) Wii = n µi(1 − µi) hryi = n yi − µi + µi(1 − µi)ηi . Translated into a FORTRAN subroutine yields
  • 6. 6 S. D. KACHMAN SUBROUTINE LINK(Y,WT,ETA,MU,W,HRY) REAL*8 Y,WT,ETA,MU,W,R,VAR,H mu=exp(eta)/(1.+exp(eta)) h=mu*(1.-mu) var=h/wt W=(H/VAR)*H HRY=(H/VAR)*(Y-MU)+W*ETA RETURN END In the above subroutine Y= yi, WT= n, and ETA= ηi are passed to the subroutine. The subroutine returns W=Wii, MU= µi, and HRY= hryi. In the subroutine the lines mu=exp(eta)/(1.+exp(eta)) h=mu*(1.-mu) would need to be changed if a different link function was selected. The line var=h/wt would need to be changed if a different variance function was selected. The changes to the LINK subroutine to take into account boundary conditions will be discussed in section 6. Each round of solving the estimating equations (2) a new set of linear predictors needs to be calculated. This can be accomplished during the construction of the LHS and RHS assuming solutions are not destroyed. The FORTRAN code for this DO I=1,N Read in record ETA=0 DO J=1,NEFF ETA=ETA+X*SOL END DO CALL LINK(Y,WT,ETA,MU,W,HRY) Build LHS and RHS END DO 5. Variance Component Estimation The code given above assumes that the variance components are known. In this section I will discuss how estimates of the variance components can be obtained. Conceptually the variance component problem can be broken into two parts. The first part is the estimation of the variance components associated with the random effects. The second part is the estimation of the variance components associated with the error distribution. Estimation of the variance components associated with the random effects does not require any additional changes except those associated with (2). Estimation of the variance components associated with the error distribution is more problematic. In the case where the over-dispersion parameter φ simply scales the variance, then estimation of φ is identical to the estimation of the residual variance when weights are used with data following a Normal distribution. In the general case, estimation of the over-dispersion parameter proceeds as follows. First, ei = yi − µi is obtained. Second, the prediction error variance for the residual is obtained as PEV (ei) = hi xi zi LHS− xi zi hi.
  • 7. AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS 7 Third, Qi = v(µi, φ)−1 ∂v(µi, φ) ∂φ v(µi, φ)−1 i eiQiei = i tr Qi {v(µi, φ) − PEV (ei)} is solved. The major difficulty, is that this is computationally expensive. 6. Some Computational Issues While the mathematics are “straight forward,” the implementation in practice is often challeng- ing. Many of the difficulties associated with generalized linear mixed models are related to either estimates going to infinity or divide by zero problems. Consider the univariate logit. The calcula- tions involve 1 µi(1−µi) . Provided 0 < µ < 1 this quantity is well defined. If µ approaches either zero or one, then the quantity 1 µi(1−µi) approaches infinity. Estimates of µ of zero or one occur when a contemporary group consists entirely of zeros or ones. Several options exists for handling this situation. One approach is to remove from the data set any troublesome contemporary groups. While mathematically sound it does involve an additional edit to remove legitimate data values. A second approach is to treat contemporary group effects as a random effects. However, the decision to treat a set of effects as fixed or random should be driven from a modelling standpoint and not as an artifact of the estimation procedure. A third approach is to adjust yi away from 0 and 1. For example, one could use (yi +∆)/(1+2∆). A fourth approach is to look at what happens to the quantities Wii and hryi when ηi approaches infinity. ηi → ±∞ ⇒ Wii → 0 ⇒ hryi → n(yi − µi) In the limit Wii and hryi are both well defined. One could then recode the link function as SUBROUTINE LINK(Y,WT,ETA,MU,W,HRY) REAL*8 Y,WT,ETA,MU,W,R,VAR,H if(abs(eta) > BIG) then if(eta > 0) then mu=1. else mu=0 end if w=0 hry=wt*(y-mu) return end if mu=exp(eta)/(1.+exp(eta)) h=mu*(1.-mu) var=h/wt W=(H/VAR)*H HRY=(H/VAR)*(Y-MU)+W*ETA RETURN END
  • 8. 8 S. D. KACHMAN where BIG is a sufficiently but not too large of a number. For example BIG= 10 might be a reasonable choice. However, using Wii = 0 will usually result in a zero diagonal in the estimating equations. When these equations are solved, the fixed effect which was approaching infinity will be set to zero. This can result in some interesting convergence problems. A solution, would be to “fix” the offensive fixed effect by making the following changes SUBROUTINE LINK(Y,WT,ETA,MU,W,HRY) REAL*8 Y,WT,ETA,MU,W,R,VAR,H if(abs(eta) > BIG) then if(eta > 0) then eta=BIG else eta=-BIG end if end if mu=exp(eta)/(1.+exp(eta)) h=mu*(1.-mu) var=h/wt W=(H/VAR)*H HRY=(H/VAR)*(Y-MU)+W*ETA RETURN END 6.1. Additional Parameters. Models such as threshold models with more than two classes and growth curve models require more than one linear predictor per observation. In the case of a threshold model the value of an observation is determined by the usual linear predictor and a threshold linear predictor. In growth curve models animal i has ni observations. The linear predictor for animal i is a p × 1 vector, where p is the the number of growth curve parameters. Unlike the univariate case the matrices H, W , and R may not be diagonal. Typically, the structure for these matrices are R = Diag(Ri) Ri is a ni × ni residual covariance matrix H = Diag(Hi) Hi is a ni × p matrix of partial derivatives Hi =      ∂µi1 ∂ηi1 . . . ∂µi1 ∂ηip ... ∂µini ∂ηi1 . . . ∂µini ∂ηip      W = Diag(W i) W i = HiR−1 i Hi 7. Conclusions Generalized linear mixed models provide a very flexible means of modeling production traits. This flexibility allows the researcher to focus more on selecting an appropriate model as opposed to manipulations to make the data fit a restricted class of models. The flexibility of a generalized
  • 9. AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS 9 linear mixed model provides an extra challenge in selecting an appropriate model. As a general rule the aim should be to select as simple a model as possible which does a reasonable job of modeling the data. While generalized linear mixed models programs are not as readily available as linear mixed model programs, the modifications to a linear mixed model program should be minor. The two changes that are needed are an addition of a Link subroutine and to iteratively solve the estimat- ing equations. In constructing a link subroutine it is important to handle boundary conditions gracefully. When selecting a link function it is important to remember that when you get the results from the analysis, you will need to both interpret your results and present your results in a meaningful manner. Generalized linear mixed models provide us with a very powerful tool. However, just because it is fancy and expensive does not imply that it is better. References [1] N. E. Breslow and D. G. Clayton. Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc., 88:9–25, 1993. [2] B. Engel and A. Keen. A simple approach for the analysis of generalized linear mixed models. Statist. Neerlandica, 48(1):1–22, 1994. [3] D. Gianola and J. L. Foulley. Sire evaluation for ordered categorical data with a threshold model. G´en´et. S´el. Evol., 15(2):210–224, 1983. [4] M. J. Lindstrom and D. M. Bates. Nonlinear mixed effects models for repeated measures data. Biometrics, 46:673– 687, 1990. [5] Ramon C. Littell, George A. Milliken, Walter W. Stroup, and Russell D. Wolfinger. SAS System for Mixed Models. SAS Institute Inc., Cary, NC, 1996. [6] I. Misztal, D. Gianola, and J. L. Foulley. Computing aspects of a nonlinear method of sire evaluation for categorical data. J. Dairy Sci., 72:1557–1568, 1989. Department of Biometry, University of Nebraska–Lincoln E-mail address: skachman@unl.edu