Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Trondheim glmm
1. Precursors GLMMs Results Conclusions References
Generalized linear mixed models for ecologists:
coping with non-normal, spatially and temporally
correlated data
Ben Bolker
McMaster University
Departments of Mathematics & Statistics and Biology
30 August 2011
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
2. Precursors GLMMs Results Conclusions References
Outline
1 Precursors
Examples
Definitions
ANOVA vs. (G)LMMs
2 GLMMs
Estimation
Inference
3 Results
Coral symbionts
Glycera
Arabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
3. Precursors GLMMs Results Conclusions References
Outline
1 Precursors
Examples
Definitions
ANOVA vs. (G)LMMs
2 GLMMs
Estimation
Inference
3 Results
Coral symbionts
Glycera
Arabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
4. Precursors GLMMs Results Conclusions References
Examples
Coral protection by symbionts
Number of predation events
10
8 2
Number of blocks
2
2
6 2
1
1
4
0
2 0 0
1
0
none shrimp crabs both
Symbionts
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
7. Precursors GLMMs Results Conclusions References
Definitions
Generalized linear models (GLMs)
non-normal data: binary, binomial,
count (Poisson/negative binomial)
non-linearity: log/exponential, logit/logistic:
link function L
flexibility via linear predictor: L(response) = a + bi + cx . . .
stable, robust, fast, easy to use
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
8. Precursors GLMMs Results Conclusions References
Definitions
Random vs. fixed effects
Fixed effects (FE) Interested in specific levels (
“treatments”)
Random effects (RE): 2
Interested in distribution (
“blocks”)
Experimental
Temporal, spatial
Genera, species, genotypes
Individuals (
“repeated measures” )
inference on population of blocks
(blocks randomly selected?)
(large number of blocks [> 5 − 7]?)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
9. Precursors GLMMs Results Conclusions References
Definitions
Random vs. fixed effects
Fixed effects (FE) Interested in specific levels (
“treatments”)
Random effects (RE): 2
Interested in distribution (
“blocks”)
Experimental
Temporal, spatial
Genera, species, genotypes
Individuals (
“repeated measures” )
inference on population of blocks
(blocks randomly selected?)
(large number of blocks [> 5 − 7]?)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
10. Precursors GLMMs Results Conclusions References
Definitions
Random vs. fixed effects
Fixed effects (FE) Interested in specific levels (
“treatments”)
Random effects (RE): 2
Interested in distribution (
“blocks”)
Experimental
Temporal, spatial
Genera, species, genotypes
Individuals (
“repeated measures” )
inference on population of blocks
(blocks randomly selected?)
(large number of blocks [> 5 − 7]?)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
11. Precursors GLMMs Results Conclusions References
Definitions
Random vs. fixed effects
Fixed effects (FE) Interested in specific levels (
“treatments”)
Random effects (RE): 2
Interested in distribution (
“blocks”)
Experimental
Temporal, spatial
Genera, species, genotypes
Individuals (
“repeated measures” )
inference on population of blocks
(blocks randomly selected?)
(large number of blocks [> 5 − 7]?)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
12. Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
Mixed models: classical approach
traditional approach to
non-independence
nested, randomized block,
split-plot . . .
sum-of-squares
decomposition/ANOVA:
figure out treatment SSQ/df,
error SQ/df
3
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
13. Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
You can use an ANOVA if . . .
data are normal
(or can be transformed)
responses are linear
design is (nearly) balanced
simple design (single or nested REs)
(not crossed REs: e.g. year effects that apply across all spatial
blocks)
no spatial or temporal correlation within blocks
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
14. Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
“Modern” mixed models
Data still normal(izable), linear, but
unbalanced/crossed/correlated
Balance
(dispersion of observation around block mean)
with
(dispersion of block means around overall average)
Good for large, messy data
. . . and when variation is interesting
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
17. Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
GLMMs
Data not normal(izable), nonlinear
Standard distributions (Poisson, binomial etc.)
Specific forms of nonlinearity (exponential, logistic etc.)
Conceptually v. similar to LMMs, but harder
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
18. Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
Challenges
Small # RE levels (<5–6)
Big data (> 1000 observations)
Spatial/temporal correlation structure (in GLMMs)
Unusual distributions of data (in GLMMs)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
19. Precursors GLMMs Results Conclusions References
Outline
1 Precursors
Examples
Definitions
ANOVA vs. (G)LMMs
2 GLMMs
Estimation
Inference
3 Results
Coral symbionts
Glycera
Arabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
20. Precursors GLMMs Results Conclusions References
Estimation
Penalized quasi-likelihood (PQL) 1
flexible (e.g. handles spatial/temporal correlations)
least accurate: biased for small samples (low counts per block)
SAS PROC GLIMMIX, R MASS:glmmPQL
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
21. Precursors GLMMs Results Conclusions References
Estimation
Laplace and Gauss-Hermite quadrature
more accurate than PQL: speed/accuracy tradeoff
lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder,
gamlss.mx:gamlssNP, repeated
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
22. Precursors GLMMs Results Conclusions References
Estimation
Bayesian approaches
usually slow but flexible
best confidence intervals
must specify priors, assess convergence
specialized: glmmAK, MCMCglmm 6 , INLA
general: BUGS (glmmBUGS, R2WinBUGS, BRugs, WinBUGS,
OpenBUGS, R2jags, rjags, JAGS)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
23. Precursors GLMMs Results Conclusions References
Estimation
Extensions
Overdispersion Variance > expected from statistical model
Quasi-likelihood MASS:glmmPQL;
overdispersed distributions (e.g. negative
binomial): glmmADMB, gamlss.mx:gamlssNP;
observation-level random effects (e.g.
lognormal-Poisson): lme4, MCMCglmm
Zero-inflation Overabundance of zeros in a discrete distribution
zero-inflated models: glmmADMB, MCMCglmm
hurdle models: MCMCglmm
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
24. Precursors GLMMs Results Conclusions References
Inference
Wald tests/CIs
Widely available (e.g. summary())
Assume data set is large/well-behaved
Always approximate, sometimes awful; bad for variance
estimates
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
25. Precursors GLMMs Results Conclusions References
Inference
Likelihood ratio tests
Compare models (easy)
Confidence intervals — expensive and rarely available
(lme4a for LMMs)
Asymptotic assumption
LMMs: F tests; estimate “equivalent” denominator df?
approximations 8;13 : doBy:KRmodcomp
don’t really know what to do for GLMMs
OK if number obs number of parameters and
large # of blocks . . .
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
26. Precursors GLMMs Results Conclusions References
Inference
Information-theoretic approaches
Above issues apply, but less well understood 4;5;7;11 :
AIC is asymptotic too
For comparing models with different REs,
or for AICc , what is p?
“Level of focus” issue: what are you trying to predict? 5;14;15
(cAIC)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
27. Precursors GLMMs Results Conclusions References
Inference
Bootstrapping
1 fit null model to data
2 simulate “data” from null model
3 fit null and working model, compute likelihood difference
4 repeat to estimate null distribution
simulate/refit methods; bootMer in lme4a (LMMs only!),
doBy:PBModComp, or “by hand”:
> pboot <- function(m0, m1) {
s <- simulate(m0)
2 * (logLik(refit(m1, s)) - logLik(refit(m0, s)))
}
> replicate(1000, pboot(fm2, fm1))
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
28. Precursors GLMMs Results Conclusions References
Inference
Bayesian inference
CIs, prediction intervals etc. computationally “free” after
estimation
Post hoc MCMC sampling:
(glmmADMB, R2ADMB, lme4:MCMCsamp)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
29. Precursors GLMMs Results Conclusions References
Inference
Bottom line
Large data: computation slow, inference easy
Bayesian computation slow, inference easy
Small data: computation fast
Problems with zero variance (blme), correlations = ±1
Bootstrapping for inference?
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
30. Precursors GLMMs Results Conclusions References
Outline
1 Precursors
Examples
Definitions
ANOVA vs. (G)LMMs
2 GLMMs
Estimation
Inference
3 Results
Coral symbionts
Glycera
Arabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
33. Precursors GLMMs Results Conclusions References
Glycera
Glycera: parametric bootstrap results
Osm Cu
0.5
0.1
0.05
0.01
0.005
Inferred p value
variable
0.001
normal
H2S Anoxia
t7
0.5 t14
0.1
0.05
0.01
0.005
0.001
0.001 0.0050.01 0.05 0.1 0.5 0.001 0.0050.01 0.05 0.1 0.5
True p value
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
34. Precursors GLMMs Results Conclusions References
Arabidopsis
Arabidopsis results
Regression estimates
−1.0 0.0 1.0
statusTransplant q
statusPetri.Plate q
rack2 q
nutrient8:amdclipped q
amdclipped q
nutrient8 q
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
35. Precursors GLMMs Results Conclusions References
Outline
1 Precursors
Examples
Definitions
ANOVA vs. (G)LMMs
2 GLMMs
Estimation
Inference
3 Results
Coral symbionts
Glycera
Arabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
36. Precursors GLMMs Results Conclusions References
What about space and/or time?
if in blocks, no problem (crossed random effects) 10
test residuals, try to fail to reject NH of no autocorrelation
if normal (LMM), corStruct in lme, spdep
otherwise . . . spatcounts, geoRglm, geoBUGS, . . . ???
big data 9
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
37. Precursors GLMMs Results Conclusions References
Primary tools
Special-purpose:
lme4: multiple/crossed REs, (profiling): fast
MCMCglmm: Bayesian, fairly flexible
glmmADMB: negative binomial, zero-inflated etc.
General-purpose:
AD Model Builder (and interfaces)
BUGS/JAGS (and interfaces)
INLA 12
Tools are getting better, but still not easy!
Info: http://glmm.wikidot.com
Slides: http://www.slideshare.net/bbolker
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
38. Precursors GLMMs Results Conclusions References
Acknowledgements
Funding: NSF, NSERC, NCEAS
Data: Josh Banta and Massimo Pigliucci (Arabidopsis);
Adrian Stier and Seabird McKeon (coral symbionts); Courtney
Kagan, Jocelynn Ortega, David Julian (Glycera);
Co-authors: Mollie Brooks, Connie Clark, Shane Geange, John
Poulsen, Hank Stevens, Jada White
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
39. Precursors GLMMs Results Conclusions References
[1] Breslow NE, 2004. In DY Lin & PJ Heagerty, [9] Latimer AM, Banerjee S et al., 2009. Ecology
eds., Proceedings of the second Seattle Letters, 12(2):144–154.
symposium in biostatistics: Analysis of correlated
[10] Ozgul A, Oli MK et al., Apr. 2009. Ecological
data, pp. 1–22. Springer. ISBN 0387208623.
Applications: A Publication of the Ecological
[2] Gelman A, 2005. Annals of Statistics, 33(1):1–53. Society of America, 19(3):786–798. ISSN
doi:doi:10.1214/009053604000001048. 1051-0761. URL http:
//www.ncbi.nlm.nih.gov/pubmed/19425439.
[3] Gotelli NJ & Ellison AM, 2004. A Primer of
PMID: 19425439.
Ecological Statistics. Sinauer, Sunderland, MA.
[11] Richards SA, 2005. Ecology, 86(10):2805–2814.
[4] Greven S, 2008. Non-Standard Problems in
doi:10.1890/05-0074.
Inference for Additive and Linear Mixed Models.
Cuvillier Verlag, G¨ttingen, Germany. ISBN
o [12] Rue H, Martino S, & Chopin N, 2009. Journal of
3867274916. URL http://www.cuvillier.de/ the Royal Statistical Society, Series B,
flycms/en/html/30/-UickI3zKPS,3cEY= 71(2):319–392.
/Buchdetails.html?SID=wVZnpL8f0fbc. [13] Schaalje G, McBride J, & Fellingham G, 2002.
[5] Greven S & Kneib T, 2010. Biometrika, Journal of Agricultural, Biological &
97(4):773–789. URL http: Environmental Statistics, 7(14):512–524. URL
//www.bepress.com/jhubiostat/paper202/. http://www.ingentaconnect.com/content/
asa/jabes/2002/00000007/00000004/art00004.
[6] Hadfield JD, 2 2010. Journal of Statistical
Software, 33(2):1–22. ISSN 1548-7660. URL [14] Spiegelhalter DJ, Best N et al., 2002. Journal of
http://www.jstatsoft.org/v33/i02. the Royal Statistical Society B, 64:583–640.
[7] Hurvich CM & Tsai CL, Jun. 1989. Biometrika, [15] Vaida F & Blanchard S, Jun. 2005. Biometrika,
76(2):297 –307. 92(2):351–370.
doi:10.1093/biomet/76.2.297. URL doi:10.1093/biomet/92.2.351. URL
http://biomet.oxfordjournals.org/content/ http://biomet.oxfordjournals.org/cgi/
76/2/297.abstract. content/abstract/92/2/351.
[8] Kenward MG & Roger JH, 1997. Biometrics,
53(3):983–997.
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
40. Precursors GLMMs Results Conclusions References
Extras
Spatial and temporal correlation (R-side effects):
MASS:glmmPQL (sort of), GLMMarp, INLA;
WinBUGS, AD Model Builder
Additive models: amer, gamm4, mgcv, lmeSplines
Ordinal models: ordinal
Population genetics: pedigreemm, kinship
Survival: coxme, kinship, phmm
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs