SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Generalized linear models, and extensions, in R

                                        Ben Bolker

           Departments of Mathematics & Statistics and Biology, McMaster University


                                     7 January 2011




Ben Bolker (McMaster University)           GLMs in R                      7 January 2011   1 / 25
1   Introduction


2   Example


3   Challenges, tricks, extensions


4   (Extended examples)




Ben Bolker (McMaster University)     GLMs in R   7 January 2011   2 / 25
What are generalized linear models?




      Modeling framework to solve two common statistical problems:
             Non-normal data
             Non-linearity (continuous predictors)
     . . . superset of, and often confused with,
     “general” linear models (i.e. ANOVA/ANCOVA/regression:
     SAS PROC GLM)




Ben Bolker (McMaster University)        GLMs in R         7 January 2011   3 / 25
GLMs: technical details


      Constraints:
             Distributions from exponential family
             (Normal, Poisson, binomial, Gamma, inverse Gaussian)
             Invertible nonlinearities, i.e. there exists a link function that would
             make the relationship linear
             (log, logit, probit, inverse, square root, “cauchit” . . . )
                                                                  ,
      Efficient, stable algorithm: iteratively re-weighted least squares (IRLS)
      / Fisher scoring)
      standard methods (methods(class="glm")):
      coef, summary, plot, predict, residuals, vcov, profile,
      update, confint, simulate, anova, add1/drop1, logLik, AIC, . . .
      logistic and Poisson regression probably make up 99% of GLMs . . .



Ben Bolker (McMaster University)         GLMs in R                     7 January 2011   4 / 25
Google scholar scraping



                       logistic+regression                                       q
                                                                            580000



                      Poisson+regression                      q
                                                          39300



             generalized+linear+model                     q
                                                      28700



                     binomial+regression        q
                                              13500


                                             104         104.5    105   105.5        106
                                             Ghits


Ben Bolker (McMaster University)             GLMs in R                          7 January 2011   5 / 25
Example: reed frog predation data


                   1.0


                                                                                 Vonesh and Bolker (2005):
                   0.8


                                 q
                                                                                 > library(emdbook)
     Fraction killed




                   0.6       q q                                                 > data(ReedfrogFuncresp)
                             q       q    q
                                                                                 > glm1 <- glm(Killed/Initial~
                                                     q          q
                   0.4   q
                                     q
                                          q
                                                                           q
                                                                                                  Initial,
                                                                q          q
                                                                                      weight=Initial,
                   0.2   q
                                                                                      family=binomial,
                                                     q
                                                                                      data=ReedfrogFuncresp)
                   0.0
                                     20        40        60         80    100
                                              Initial density




Ben Bolker (McMaster University)                                         GLMs in R                 7 January 2011   6 / 25
Summary
> summary(glm1)
Call:
glm(formula = Killed/Initial ~ Initial, family = binomial, data = ReedfrogFuncresp,
    weights = Initial)

Deviance Residuals:
    Min       1Q   Median               3Q        Max
-4.4132 -0.7275    0.4347           1.0120     1.8172

Coefficients:
             Estimate Std. Error z           value Pr(>|z|)
(Intercept) -0.094563   0.188952             -0.50 0.61675
Initial     -0.008416   0.002697             -3.12 0.00181 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’            0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 47.518          on 15   degrees of freedom
Residual deviance: 37.717          on 14   degrees of freedom
AIC: 98.639

Number of Fisher Scoring iterations: 4
Ben Bolker (McMaster University)                 GLMs in R                 7 January 2011   7 / 25
Diagnostics
                                          Residuals vs Fitted                                                                                   Normal Q−Q




                                                                                                       20
                2




                                         q13                                5q
                                                                                                                                                                             13q
                                                                                                                                                                                    16q
                            q
                                                                            q
                                                                    q            q




                                                                                                              10
                                                       q                q
                                                                                 q                                                                                       q
                                                                                                                                                                 qq q
                                                                                                                                                                                                    diagnostics inherit



                                                                                         Std. deviance resid.
                0




                                                                                     q                                                                      qq
                                                                    q                                                                                   q




                                                                                                      0
      Residuals




                            q                                           q                                                                           q
                                                                                                                                          q qq
                                         q
                                                                                     q
                                                                                                                                      q                                                             from plot.lm


                                                                                             −10
     −2




                                                                                                                              q

                                                                                                                                                                                                    overdispersion:

                                                                                                       −20
                −4




                                                                                                                                                                                                    residual deviance
                                                                                                       −30
                                                       q11
                                                                                                                    q11

                                  −0.8           −0.6        −0.4
                                               Predicted values
                                                                            −0.2                                   −2             −1                 0
                                                                                                                                           Theoretical Quantiles
                                                                                                                                                                 1                        2
                                                                                                                                                                                                    ≈ χ2 n−p
                                               Scale−Location                                                                         Residuals vs Leverage
                                                     11q                                                                                                                                            (Venables and Ripley,
                                                                                                                                                q13
                                                                                                       2




                                                                                                                                                                                    16q
                                                                                                                                      q
                                                                                                                                                                                                    2002, p. 209):
                                                                                                                                                                                              1
                5




                                                                                                                                      q                                                       0.5
                                                                                                                                  q        q
                            q16                                                                                                           q
                                                                                                                                          q
                                         q13
                                                                                                                                                                                                    sum(residuals(glm1,
     Std. deviance resid.




                                                                                                                                  q
                       4




                                         q
                                                                                         Std. Pearson resid.
                                                                                                         0




                                                                                                                          q
                                                                                                                                           q
                                                                                                                                          q                                           q
                            q
                                                                                                                                                                                                    type="pearson")^2)
               3




                                                                                                                          q                                                                   0.5
                                                                            q
                                                                    q                                                                           q
                                                       q                    q                                                                                                                 1
                                                                                           −2




                                                                                                                                                                                                    =34.3:
      2




                                                                        q
                                                                    q   q        q q


                                                                                 q
                1




                                                                                     q
                                                                                                                                                                                                    p   0.05
                                                                                                       −4




                                                                                                                                          q11
                                                                                                                           Cook's distance
                0




                                  −0.8           −0.6        −0.4           −0.2                                   0.00   0.05        0.10      0.15 0.20         0.25       0.30   0.35
                                               Predicted values                                                                                  Leverage




Ben Bolker (McMaster University)                                                                                                            GLMs in R                                                        7 January 2011   8 / 25
Inference


      Coefficients: may be hard to communicate (reflect differences on the
      scale of linear predictor, e.g. logit/log-odds differences)
      Wald statistics: beware the Hauck-Donner effect
      (Venables and Ripley, 2002, p. 198). Wald CI of slope:
      stats:::confint.lm(glm1) (-0.0142,-0.0026)
      Likelihood ratio test, via anova:
      > anova(glm1,test="Chisq") ## OR
      > glm0 <- update(glm1, . ~ -Initial)
      > anova(glm1,glm0,test="Chisq")
      Likelihood profiles (via MASS::profile.glm),
      profile confidence intervals:
      MASS:::confint.glm(glm1) (-0.0137,-0.0031)



Ben Bolker (McMaster University)   GLMs in R                   7 January 2011   9 / 25
Estimation issues




      Convergence difficulties, especially with non-standard links: set
      starting values, center/scale variables (?)
      Complete separation: brglm, logistf, arm (bayesglm)
      Big data: biglm (bigglm)
      Many predictors (penalized regression):
      glmnet, glmpath, penalized (Machine learning task view)




Ben Bolker (McMaster University)   GLMs in R                7 January 2011   10 / 25
Tricks (within GLM framework)


      non-standard link functions:
             fitting hyperbolic models of predator attack rates (Michaelis-Menten)
             via binomial/inverse link
             (http://emdbolker.wikidot.com/voneshglm)
             exponential survivorship models via binomial/log link (Strong et al.,
             1999; Tiwari et al., 2006)
             Gaussian family with log link: fit exponential growth models with
             constant variance
      subtleties with Gamma GLMs and dispersion parameter:
      V&R MASS online complements,
      Paul Johnson’s notes
      offsets: variation in sampling area/intensity
      (e.g. strict proportionality)



Ben Bolker (McMaster University)       GLMs in R                   7 January 2011   11 / 25
Overdispersion

      Quasilikelihood models:
      > glmQ <- update(glm1,family="quasibinomial")
      > anova(glmQ,test="F")
       ˆ
      (φ = 2.45). No likelihood: qAIC requires some contortions
      extended GLMs
             negative binomial: MASS (glm.nb)
             beta-binomial:
                     aod (betabin)
                     gnlm (gnlr)
                     VGAM (vglm)
                     bbmle (mle2)
      GLMMs: lognormal-Poisson, logit-normal-binomial
      robust estimation (lmtest, sandwich):
      > coeftest(glm1,vcov=sandwich)
See also the vignette for the pscl package.
Ben Bolker (McMaster University)      GLMs in R            7 January 2011   12 / 25
Extensions




      Generalized additive models (Wood, 2006): mgcv, gamlss
      Zero-inflated/altered/hurdle models: pscl, VGAM
      Beta regression: betareg
      Generalized regression models: bbmle, VGAM, gnlm
      Random effects (generalized linear mixed models): lme4 and other
      packages (http://glmm.wikidot.com/faq)




Ben Bolker (McMaster University)   GLMs in R             7 January 2011   13 / 25
References



Strong, D.R., Whipple, A.V., et al., 1999. Ecology, 80:2750–2761.
Tiwari, M., Bjorndal, K.A., et al., 2006. Marine Ecological Progress Series,
  326:283–293.
Venables, W. and Ripley, B.D., 2002. Modern Applied Statistics with S.
  Springer, New York, 4th edition.
Vonesh, J.R. and Bolker, B.M., 2005. Ecology, 86(6):1580–1591.
Wood, S.N., 2006. Generalized Additive Models: An Introduction with R.
 Chapman & Hall/CRC.




Ben Bolker (McMaster University)   GLMs in R                7 January 2011   14 / 25
Basic ggplot code




> qplot(Initial,Killed/Initial,data=ReedfrogFuncresp)+
   geom_smooth(method=glm,family=binomial,
               aes(weight=Initial,group=NA))




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   15 / 25
Confidence intervals on # killed, by hand



> pframe <- data.frame(Initial=1:100)
> pp <- predict(glm1,newdata=pframe,se.fit=TRUE)
> pmat <- with(pp,plogis(cbind(fit,
                              fit-1.96*se.fit,
                              fit+1.96*se.fit)))
> par(bty="l",las=1)
> with(ReedfrogFuncresp,plot(Initial,Killed/Initial,
                            xlim=c(0,100),ylim=c(0,1),
                            pch=16))
> matlines(pframe$Initial,pmat,lty=c(1,2,2),col=1,type="l")




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   16 / 25
Prediction intervals

                                                                                               > simhack <- function(params) {
                                                                                                  glmnew <- glm1
                                                                                                  glmnew$coefficients <- params
                                                                                                  ## simulates on PROBABILITY scale
                                                                                                  simulate(glmnew)[[1]]
                         1.0
                                                                                                }
                                                                                               > set.seed(101)
                         0.8                                                                   > params <- MASS::mvrnorm(1000,mu=coef(glm1),
                                                                                                                        Sigma=vcov(glm1))
                                           q
                                                                                               > sims <- apply(params,1,simhack)
        Killed/Initial




                         0.6           q   q
                                                                                               > qmat <- t(apply(sims,1,quantile,
                                       q       q    q
                                           q
                                               q
                                                                                                                c(0.5,0.025,0.975)))
                                                               q            q
                         0.4       q   q            q
                                                               q
                                                    q
                                               q
                                                                            q

                                                                            q
                                                                                      q

                                                                                      q
                                                                                      q
                                                                                               (Constructing the simulated
                         0.2       q


                                                               q
                                                                                               values at Initial densities from
                         0.0                                                                   1 to 100 is a bit more work —
                               0               20       40             60       80   100
                                                                                               ideally all simulate methods
                                                             Initial
                                                                                               would have newdata and
                                                                                               newparam arguments . . . )



Ben Bolker (McMaster University)                                                          GLMs in R                          7 January 2011    17 / 25
Alternative display (display, coefplot from arm
package)

                                        −0.015   −0.010        −0.005   0.000




                              Initial                     q




> display(glm1)
glm(formula = Killed/Initial ~ Initial, family = binomial, data = Re
    weights = Initial)
            coef.est coef.se
(Intercept) -0.09     0.19
Initial     -0.01     0.00
---
  n = 16, k = 2
  residual deviance = 37.7, null deviance = 47.5 (difference = 9.8)
Ben Bolker (McMaster University)                   GLMs in R                    7 January 2011   18 / 25
Beta-binomial with aod




> library(aod)
> glmBB1 <- betabin(cbind(Killed, Initial-Killed)~Initial,
                       random=~1,
                       data=ReedfrogFuncresp)




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   19 / 25
Beta-binomial with bbmle




> library(bbmle)
> glmBB3 <- mle2(Killed~dbetabinom(prob=plogis(logitp),
      theta=exp(logtheta),size=Initial),
      parameters=list(logitp~Initial),
      data=ReedfrogFuncresp,
      start=list(logitp=0,logtheta=0))




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   20 / 25
Beta-binomial with VGAM




> library(VGAM)
> glmBB4 <- vglm(cbind(Killed,Initial-Killed)~Initial,
                betabinomial,
                data=ReedfrogFuncresp)
> coef(glmBB4,matrix=TRUE)




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   21 / 25
Beta-binomial with gnlm



> library(gnlm)
> attach(ReedfrogFuncresp) ## no data= argument!
> glmBB2 <- gnlr(cbind(Killed,Initial-Killed),
      dist="beta binomial",
      pmu=c(0,0),pshape=0,
      mu=function(p,linear) plogis(linear),
      linear=~Initial)
> detach(ReedfrogFuncresp)
> detach("package:gnlm")
> detach("package:rmutil")




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   22 / 25
Logit-normal-Poisson with lme4




> library(lme4)
> ReedfrogFuncresp$ID <- 1:nrow(ReedfrogFuncresp)
> glmLNP <- glmer(cbind(Killed,Initial-Killed)~Initial+(1|ID),
                 family=binomial,
                 data=ReedfrogFuncresp)
> summary(glmLNP)




Ben Bolker (McMaster University)   GLMs in R   7 January 2011   23 / 25
Alternate link functions for reed frog data


               1.0

               0.8
 Fraction killed




                           q
               0.6       q q
                         q     q    q
                     q                         q               q
               0.4                  q
                               q                                        q
                                                               q        q
               0.2   q
                                               q
               0.0
                               20        40           60           80   100
                                        Initial density


Ben Bolker (McMaster University)                   GLMs in R                7 January 2011   24 / 25
Comparing overdispersion estimates


                  LN−binomial                           q
                 beta−binomial                          q
                     sandwich                               q
         model




                 q−binom Wald                               q
             binomial profile                               q
                 binomial Wald                              q
                                       −0.015         −0.010    −0.005     0.000
                                   initial density effect


Ben Bolker (McMaster University)          GLMs in R                      7 January 2011   25 / 25

Weitere ähnliche Inhalte

Was ist angesagt?

σημειώσεις 1.1 1.7
σημειώσεις 1.1   1.7σημειώσεις 1.1   1.7
σημειώσεις 1.1 1.7mitsoz
 
ჯანსაღი ცხოვრება
ჯანსაღი ცხოვრებაჯანსაღი ცხოვრება
ჯანსაღი ცხოვრებაshorena984
 
εργο δυναμης ελατηριου και βαρους
εργο δυναμης ελατηριου και βαρουςεργο δυναμης ελατηριου και βαρους
εργο δυναμης ελατηριου και βαρουςnmandoulidis
 
τραπεζα θεματων 2014 γεωμετρια α λυκειου 4ο θεμα τευχος 3ο
τραπεζα θεματων 2014 γεωμετρια α λυκειου 4ο θεμα τευχος 3οτραπεζα θεματων 2014 γεωμετρια α λυκειου 4ο θεμα τευχος 3ο
τραπεζα θεματων 2014 γεωμετρια α λυκειου 4ο θεμα τευχος 3οCHRISTOS Xr.Tsif
 
ηλεκτρικο ρευμα
ηλεκτρικο ρευμαηλεκτρικο ρευμα
ηλεκτρικο ρευμαtvagelis96
 
Ιοντική ισορροπία - "Γενική Χημεία Γ Λυκείου" Κ. Καλαματιανος Κεφ2 Ενότητα 2....
Ιοντική ισορροπία - "Γενική Χημεία Γ Λυκείου" Κ. Καλαματιανος Κεφ2 Ενότητα 2....Ιοντική ισορροπία - "Γενική Χημεία Γ Λυκείου" Κ. Καλαματιανος Κεφ2 Ενότητα 2....
Ιοντική ισορροπία - "Γενική Χημεία Γ Λυκείου" Κ. Καλαματιανος Κεφ2 Ενότητα 2....koskal
 
Κινηση-στηριξη μονοκύτταροι και φυτά
Κινηση-στηριξη μονοκύτταροι και φυτάΚινηση-στηριξη μονοκύτταροι και φυτά
Κινηση-στηριξη μονοκύτταροι και φυτάDespina Setaki
 
ვისწავლოთ ანბანი ასო-ბგერაჯ
ვისწავლოთ ანბანი   ასო-ბგერაჯვისწავლოთ ანბანი   ასო-ბგერაჯ
ვისწავლოთ ანბანი ასო-ბგერაჯmakaafriamashvili
 

Was ist angesagt? (8)

σημειώσεις 1.1 1.7
σημειώσεις 1.1   1.7σημειώσεις 1.1   1.7
σημειώσεις 1.1 1.7
 
ჯანსაღი ცხოვრება
ჯანსაღი ცხოვრებაჯანსაღი ცხოვრება
ჯანსაღი ცხოვრება
 
εργο δυναμης ελατηριου και βαρους
εργο δυναμης ελατηριου και βαρουςεργο δυναμης ελατηριου και βαρους
εργο δυναμης ελατηριου και βαρους
 
τραπεζα θεματων 2014 γεωμετρια α λυκειου 4ο θεμα τευχος 3ο
τραπεζα θεματων 2014 γεωμετρια α λυκειου 4ο θεμα τευχος 3οτραπεζα θεματων 2014 γεωμετρια α λυκειου 4ο θεμα τευχος 3ο
τραπεζα θεματων 2014 γεωμετρια α λυκειου 4ο θεμα τευχος 3ο
 
ηλεκτρικο ρευμα
ηλεκτρικο ρευμαηλεκτρικο ρευμα
ηλεκτρικο ρευμα
 
Ιοντική ισορροπία - "Γενική Χημεία Γ Λυκείου" Κ. Καλαματιανος Κεφ2 Ενότητα 2....
Ιοντική ισορροπία - "Γενική Χημεία Γ Λυκείου" Κ. Καλαματιανος Κεφ2 Ενότητα 2....Ιοντική ισορροπία - "Γενική Χημεία Γ Λυκείου" Κ. Καλαματιανος Κεφ2 Ενότητα 2....
Ιοντική ισορροπία - "Γενική Χημεία Γ Λυκείου" Κ. Καλαματιανος Κεφ2 Ενότητα 2....
 
Κινηση-στηριξη μονοκύτταροι και φυτά
Κινηση-στηριξη μονοκύτταροι και φυτάΚινηση-στηριξη μονοκύτταροι και φυτά
Κινηση-στηριξη μονοκύτταροι και φυτά
 
ვისწავლოთ ანბანი ასო-ბგერაჯ
ვისწავლოთ ანბანი   ასო-ბგერაჯვისწავლოთ ანბანი   ასო-ბგერაჯ
ვისწავლოთ ანბანი ასო-ბგერაჯ
 

Mehr von Ben Bolker

Ecological synthesis across scales: West Nile virus in individuals and commun...
Ecological synthesis across scales: West Nile virus in individuals and commun...Ecological synthesis across scales: West Nile virus in individuals and commun...
Ecological synthesis across scales: West Nile virus in individuals and commun...Ben Bolker
 
evolution of virulence: devil in the details
evolution of virulence: devil in the detailsevolution of virulence: devil in the details
evolution of virulence: devil in the detailsBen Bolker
 
model complexity and model choice for animal movement models
model complexity and model choice for animal movement modelsmodel complexity and model choice for animal movement models
model complexity and model choice for animal movement modelsBen Bolker
 
model complexity and model choice for animal movement models
model complexity and model choice for animal movement modelsmodel complexity and model choice for animal movement models
model complexity and model choice for animal movement modelsBen Bolker
 
Fundamental principles (?) of biological data
Fundamental principles (?) of biological dataFundamental principles (?) of biological data
Fundamental principles (?) of biological dataBen Bolker
 
ESS of minimal mutation rate in an evo-epidemiological model
ESS of minimal mutation rate in an evo-epidemiological modelESS of minimal mutation rate in an evo-epidemiological model
ESS of minimal mutation rate in an evo-epidemiological modelBen Bolker
 
math bio for 1st year math students
math bio for 1st year math studentsmath bio for 1st year math students
math bio for 1st year math studentsBen Bolker
 
MBRS detectability talk
MBRS detectability talkMBRS detectability talk
MBRS detectability talkBen Bolker
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talkBen Bolker
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talkBen Bolker
 
Bolker esa2014
Bolker esa2014Bolker esa2014
Bolker esa2014Ben Bolker
 
virulence evolution (IGERT symposium)
virulence evolution (IGERT symposium)virulence evolution (IGERT symposium)
virulence evolution (IGERT symposium)Ben Bolker
 
Davis eco-evo virulence
Davis eco-evo virulenceDavis eco-evo virulence
Davis eco-evo virulenceBen Bolker
 
intro to knitr with RStudio
intro to knitr with RStudiointro to knitr with RStudio
intro to knitr with RStudioBen Bolker
 
Stats sem 2013
Stats sem 2013Stats sem 2013
Stats sem 2013Ben Bolker
 
computational science & engineering seminar, 16 oct 2013
computational science & engineering seminar, 16 oct 2013computational science & engineering seminar, 16 oct 2013
computational science & engineering seminar, 16 oct 2013Ben Bolker
 

Mehr von Ben Bolker (20)

Ecological synthesis across scales: West Nile virus in individuals and commun...
Ecological synthesis across scales: West Nile virus in individuals and commun...Ecological synthesis across scales: West Nile virus in individuals and commun...
Ecological synthesis across scales: West Nile virus in individuals and commun...
 
evolution of virulence: devil in the details
evolution of virulence: devil in the detailsevolution of virulence: devil in the details
evolution of virulence: devil in the details
 
model complexity and model choice for animal movement models
model complexity and model choice for animal movement modelsmodel complexity and model choice for animal movement models
model complexity and model choice for animal movement models
 
model complexity and model choice for animal movement models
model complexity and model choice for animal movement modelsmodel complexity and model choice for animal movement models
model complexity and model choice for animal movement models
 
Fundamental principles (?) of biological data
Fundamental principles (?) of biological dataFundamental principles (?) of biological data
Fundamental principles (?) of biological data
 
ESS of minimal mutation rate in an evo-epidemiological model
ESS of minimal mutation rate in an evo-epidemiological modelESS of minimal mutation rate in an evo-epidemiological model
ESS of minimal mutation rate in an evo-epidemiological model
 
math bio for 1st year math students
math bio for 1st year math studentsmath bio for 1st year math students
math bio for 1st year math students
 
MBRS detectability talk
MBRS detectability talkMBRS detectability talk
MBRS detectability talk
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talk
 
Waterloo GLMM talk
Waterloo GLMM talkWaterloo GLMM talk
Waterloo GLMM talk
 
Bolker esa2014
Bolker esa2014Bolker esa2014
Bolker esa2014
 
Montpellier
MontpellierMontpellier
Montpellier
 
virulence evolution (IGERT symposium)
virulence evolution (IGERT symposium)virulence evolution (IGERT symposium)
virulence evolution (IGERT symposium)
 
Igert glmm
Igert glmmIgert glmm
Igert glmm
 
Davis eco-evo virulence
Davis eco-evo virulenceDavis eco-evo virulence
Davis eco-evo virulence
 
Google lme4
Google lme4Google lme4
Google lme4
 
intro to knitr with RStudio
intro to knitr with RStudiointro to knitr with RStudio
intro to knitr with RStudio
 
Stats sem 2013
Stats sem 2013Stats sem 2013
Stats sem 2013
 
computational science & engineering seminar, 16 oct 2013
computational science & engineering seminar, 16 oct 2013computational science & engineering seminar, 16 oct 2013
computational science & engineering seminar, 16 oct 2013
 
Threads 2013
Threads 2013Threads 2013
Threads 2013
 

GLMs and extensions in R

  • 1. Generalized linear models, and extensions, in R Ben Bolker Departments of Mathematics & Statistics and Biology, McMaster University 7 January 2011 Ben Bolker (McMaster University) GLMs in R 7 January 2011 1 / 25
  • 2. 1 Introduction 2 Example 3 Challenges, tricks, extensions 4 (Extended examples) Ben Bolker (McMaster University) GLMs in R 7 January 2011 2 / 25
  • 3. What are generalized linear models? Modeling framework to solve two common statistical problems: Non-normal data Non-linearity (continuous predictors) . . . superset of, and often confused with, “general” linear models (i.e. ANOVA/ANCOVA/regression: SAS PROC GLM) Ben Bolker (McMaster University) GLMs in R 7 January 2011 3 / 25
  • 4. GLMs: technical details Constraints: Distributions from exponential family (Normal, Poisson, binomial, Gamma, inverse Gaussian) Invertible nonlinearities, i.e. there exists a link function that would make the relationship linear (log, logit, probit, inverse, square root, “cauchit” . . . ) , Efficient, stable algorithm: iteratively re-weighted least squares (IRLS) / Fisher scoring) standard methods (methods(class="glm")): coef, summary, plot, predict, residuals, vcov, profile, update, confint, simulate, anova, add1/drop1, logLik, AIC, . . . logistic and Poisson regression probably make up 99% of GLMs . . . Ben Bolker (McMaster University) GLMs in R 7 January 2011 4 / 25
  • 5. Google scholar scraping logistic+regression q 580000 Poisson+regression q 39300 generalized+linear+model q 28700 binomial+regression q 13500 104 104.5 105 105.5 106 Ghits Ben Bolker (McMaster University) GLMs in R 7 January 2011 5 / 25
  • 6. Example: reed frog predation data 1.0 Vonesh and Bolker (2005): 0.8 q > library(emdbook) Fraction killed 0.6 q q > data(ReedfrogFuncresp) q q q > glm1 <- glm(Killed/Initial~ q q 0.4 q q q q Initial, q q weight=Initial, 0.2 q family=binomial, q data=ReedfrogFuncresp) 0.0 20 40 60 80 100 Initial density Ben Bolker (McMaster University) GLMs in R 7 January 2011 6 / 25
  • 7. Summary > summary(glm1) Call: glm(formula = Killed/Initial ~ Initial, family = binomial, data = ReedfrogFuncresp, weights = Initial) Deviance Residuals: Min 1Q Median 3Q Max -4.4132 -0.7275 0.4347 1.0120 1.8172 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.094563 0.188952 -0.50 0.61675 Initial -0.008416 0.002697 -3.12 0.00181 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 47.518 on 15 degrees of freedom Residual deviance: 37.717 on 14 degrees of freedom AIC: 98.639 Number of Fisher Scoring iterations: 4 Ben Bolker (McMaster University) GLMs in R 7 January 2011 7 / 25
  • 8. Diagnostics Residuals vs Fitted Normal Q−Q 20 2 q13 5q 13q 16q q q q q 10 q q q q qq q diagnostics inherit Std. deviance resid. 0 q qq q q 0 Residuals q q q q qq q q q from plot.lm −10 −2 q overdispersion: −20 −4 residual deviance −30 q11 q11 −0.8 −0.6 −0.4 Predicted values −0.2 −2 −1 0 Theoretical Quantiles 1 2 ≈ χ2 n−p Scale−Location Residuals vs Leverage 11q (Venables and Ripley, q13 2 16q q 2002, p. 209): 1 5 q 0.5 q q q16 q q q13 sum(residuals(glm1, Std. deviance resid. q 4 q Std. Pearson resid. 0 q q q q q type="pearson")^2) 3 q 0.5 q q q q q 1 −2 =34.3: 2 q q q q q q 1 q p 0.05 −4 q11 Cook's distance 0 −0.8 −0.6 −0.4 −0.2 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Predicted values Leverage Ben Bolker (McMaster University) GLMs in R 7 January 2011 8 / 25
  • 9. Inference Coefficients: may be hard to communicate (reflect differences on the scale of linear predictor, e.g. logit/log-odds differences) Wald statistics: beware the Hauck-Donner effect (Venables and Ripley, 2002, p. 198). Wald CI of slope: stats:::confint.lm(glm1) (-0.0142,-0.0026) Likelihood ratio test, via anova: > anova(glm1,test="Chisq") ## OR > glm0 <- update(glm1, . ~ -Initial) > anova(glm1,glm0,test="Chisq") Likelihood profiles (via MASS::profile.glm), profile confidence intervals: MASS:::confint.glm(glm1) (-0.0137,-0.0031) Ben Bolker (McMaster University) GLMs in R 7 January 2011 9 / 25
  • 10. Estimation issues Convergence difficulties, especially with non-standard links: set starting values, center/scale variables (?) Complete separation: brglm, logistf, arm (bayesglm) Big data: biglm (bigglm) Many predictors (penalized regression): glmnet, glmpath, penalized (Machine learning task view) Ben Bolker (McMaster University) GLMs in R 7 January 2011 10 / 25
  • 11. Tricks (within GLM framework) non-standard link functions: fitting hyperbolic models of predator attack rates (Michaelis-Menten) via binomial/inverse link (http://emdbolker.wikidot.com/voneshglm) exponential survivorship models via binomial/log link (Strong et al., 1999; Tiwari et al., 2006) Gaussian family with log link: fit exponential growth models with constant variance subtleties with Gamma GLMs and dispersion parameter: V&R MASS online complements, Paul Johnson’s notes offsets: variation in sampling area/intensity (e.g. strict proportionality) Ben Bolker (McMaster University) GLMs in R 7 January 2011 11 / 25
  • 12. Overdispersion Quasilikelihood models: > glmQ <- update(glm1,family="quasibinomial") > anova(glmQ,test="F") ˆ (φ = 2.45). No likelihood: qAIC requires some contortions extended GLMs negative binomial: MASS (glm.nb) beta-binomial: aod (betabin) gnlm (gnlr) VGAM (vglm) bbmle (mle2) GLMMs: lognormal-Poisson, logit-normal-binomial robust estimation (lmtest, sandwich): > coeftest(glm1,vcov=sandwich) See also the vignette for the pscl package. Ben Bolker (McMaster University) GLMs in R 7 January 2011 12 / 25
  • 13. Extensions Generalized additive models (Wood, 2006): mgcv, gamlss Zero-inflated/altered/hurdle models: pscl, VGAM Beta regression: betareg Generalized regression models: bbmle, VGAM, gnlm Random effects (generalized linear mixed models): lme4 and other packages (http://glmm.wikidot.com/faq) Ben Bolker (McMaster University) GLMs in R 7 January 2011 13 / 25
  • 14. References Strong, D.R., Whipple, A.V., et al., 1999. Ecology, 80:2750–2761. Tiwari, M., Bjorndal, K.A., et al., 2006. Marine Ecological Progress Series, 326:283–293. Venables, W. and Ripley, B.D., 2002. Modern Applied Statistics with S. Springer, New York, 4th edition. Vonesh, J.R. and Bolker, B.M., 2005. Ecology, 86(6):1580–1591. Wood, S.N., 2006. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC. Ben Bolker (McMaster University) GLMs in R 7 January 2011 14 / 25
  • 15. Basic ggplot code > qplot(Initial,Killed/Initial,data=ReedfrogFuncresp)+ geom_smooth(method=glm,family=binomial, aes(weight=Initial,group=NA)) Ben Bolker (McMaster University) GLMs in R 7 January 2011 15 / 25
  • 16. Confidence intervals on # killed, by hand > pframe <- data.frame(Initial=1:100) > pp <- predict(glm1,newdata=pframe,se.fit=TRUE) > pmat <- with(pp,plogis(cbind(fit, fit-1.96*se.fit, fit+1.96*se.fit))) > par(bty="l",las=1) > with(ReedfrogFuncresp,plot(Initial,Killed/Initial, xlim=c(0,100),ylim=c(0,1), pch=16)) > matlines(pframe$Initial,pmat,lty=c(1,2,2),col=1,type="l") Ben Bolker (McMaster University) GLMs in R 7 January 2011 16 / 25
  • 17. Prediction intervals > simhack <- function(params) { glmnew <- glm1 glmnew$coefficients <- params ## simulates on PROBABILITY scale simulate(glmnew)[[1]] 1.0 } > set.seed(101) 0.8 > params <- MASS::mvrnorm(1000,mu=coef(glm1), Sigma=vcov(glm1)) q > sims <- apply(params,1,simhack) Killed/Initial 0.6 q q > qmat <- t(apply(sims,1,quantile, q q q q q c(0.5,0.025,0.975))) q q 0.4 q q q q q q q q q q q (Constructing the simulated 0.2 q q values at Initial densities from 0.0 1 to 100 is a bit more work — 0 20 40 60 80 100 ideally all simulate methods Initial would have newdata and newparam arguments . . . ) Ben Bolker (McMaster University) GLMs in R 7 January 2011 17 / 25
  • 18. Alternative display (display, coefplot from arm package) −0.015 −0.010 −0.005 0.000 Initial q > display(glm1) glm(formula = Killed/Initial ~ Initial, family = binomial, data = Re weights = Initial) coef.est coef.se (Intercept) -0.09 0.19 Initial -0.01 0.00 --- n = 16, k = 2 residual deviance = 37.7, null deviance = 47.5 (difference = 9.8) Ben Bolker (McMaster University) GLMs in R 7 January 2011 18 / 25
  • 19. Beta-binomial with aod > library(aod) > glmBB1 <- betabin(cbind(Killed, Initial-Killed)~Initial, random=~1, data=ReedfrogFuncresp) Ben Bolker (McMaster University) GLMs in R 7 January 2011 19 / 25
  • 20. Beta-binomial with bbmle > library(bbmle) > glmBB3 <- mle2(Killed~dbetabinom(prob=plogis(logitp), theta=exp(logtheta),size=Initial), parameters=list(logitp~Initial), data=ReedfrogFuncresp, start=list(logitp=0,logtheta=0)) Ben Bolker (McMaster University) GLMs in R 7 January 2011 20 / 25
  • 21. Beta-binomial with VGAM > library(VGAM) > glmBB4 <- vglm(cbind(Killed,Initial-Killed)~Initial, betabinomial, data=ReedfrogFuncresp) > coef(glmBB4,matrix=TRUE) Ben Bolker (McMaster University) GLMs in R 7 January 2011 21 / 25
  • 22. Beta-binomial with gnlm > library(gnlm) > attach(ReedfrogFuncresp) ## no data= argument! > glmBB2 <- gnlr(cbind(Killed,Initial-Killed), dist="beta binomial", pmu=c(0,0),pshape=0, mu=function(p,linear) plogis(linear), linear=~Initial) > detach(ReedfrogFuncresp) > detach("package:gnlm") > detach("package:rmutil") Ben Bolker (McMaster University) GLMs in R 7 January 2011 22 / 25
  • 23. Logit-normal-Poisson with lme4 > library(lme4) > ReedfrogFuncresp$ID <- 1:nrow(ReedfrogFuncresp) > glmLNP <- glmer(cbind(Killed,Initial-Killed)~Initial+(1|ID), family=binomial, data=ReedfrogFuncresp) > summary(glmLNP) Ben Bolker (McMaster University) GLMs in R 7 January 2011 23 / 25
  • 24. Alternate link functions for reed frog data 1.0 0.8 Fraction killed q 0.6 q q q q q q q q 0.4 q q q q q 0.2 q q 0.0 20 40 60 80 100 Initial density Ben Bolker (McMaster University) GLMs in R 7 January 2011 24 / 25
  • 25. Comparing overdispersion estimates LN−binomial q beta−binomial q sandwich q model q−binom Wald q binomial profile q binomial Wald q −0.015 −0.010 −0.005 0.000 initial density effect Ben Bolker (McMaster University) GLMs in R 7 January 2011 25 / 25