To the Rescue of the Orphans of Scholarly Communication
lyashevskaE1-AbundanceV
1. Grid spacing and quality of spatially predicted species
abundances
A case-study for zero-inflated spatial data
Olga Lyashevska* Dick Brus** Jaap van der Meer*
*Royal Netherlands Institute for Sea Research
Department of Marine Ecology
**Alterra, Wageningen University and Research Centre
olga.lyashevska@nioz.nl
July, 2 2014
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 1 / 16
2. Problem
Sampling is expensive, therefore it is important to statistically
evaluate sampling designs prior to implementation of
monitoring network;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 2 / 16
3. Problem
Sampling is expensive, therefore it is important to statistically
evaluate sampling designs prior to implementation of monitoring
network;
This has been done before . . . (Bijleveld et al., 2012; Brus and
de Gruijter, 2013), but. . .
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 2 / 16
4. Problem
Sampling is expensive, therefore it is important to statistically
evaluate sampling designs prior to implementation of monitoring
network;
This has been done before . . . (Bijleveld et al., 2012; Brus and
de Gruijter, 2013), but. . .
spatial empirical ecological data are typically zero-inflated
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 2 / 16
5. Problem
Sampling is expensive, therefore it is important to statistically
evaluate sampling designs prior to implementation of monitoring
network;
This has been done before . . . (Bijleveld et al., 2012; Brus and
de Gruijter, 2013), but. . .
spatial empirical ecological data are typically zero-inflated
and accounting for spatial dependence of such data is not
straightforward.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 2 / 16
6. Aim
1. To work out a methodology for statistical evaluation of
sampling designs for zero-inflated spatially correlated count
data;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 3 / 16
7. Aim
1. To work out a methodology for statistical evaluation of sampling
designs for zero-inflated spatially correlated count data;
2. To test proposed methodology in a real-world case study.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 3 / 16
8. Methodology
Postulate a statistical model of the spatial distribution of the
variable;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
9. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
10. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
11. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Sample each pseudo-reality repeatedly with candidate sampling
designs;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
12. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Sample each pseudo-reality repeatedly with candidate sampling
designs;
Predict variable of interest at validation points;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
13. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Sample each pseudo-reality repeatedly with candidate sampling
designs;
Predict variable of interest at validation points;
Compute performance statistics;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
14. Methodology
Postulate a statistical model of the spatial distribution of the variable;
Use prior data to calibrate such model;
Simulate a large number of pseudo-realities;
Sample each pseudo-reality repeatedly with candidate sampling
designs;
Predict variable of interest at validation points;
Compute performance statistics;
Select the best candidate design out of evaluated candidates
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 4 / 16
15. Case Study
Dutch Wadden Sea;
Area: 2483 km2;
Abundance of Baltic tellin
(M. balthica);
Centrifuge tube (17.3 – 17.7
cm) to a depth of 25 cm
June–October 2010
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 5 / 16
16. Field data - Species Abundance
0
1000
2000
3000
0 25 50 75
Species abundance
Counts
90% observations are zeros
max 100 individuals
µ = 1.39 individuals
var = 24 individuals
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 6 / 16
18. Modelling of the spatial distribution
1. Calibrate zero-inflated Poisson mixture model (assuming independent
data);
2. Use fitted model to classify each zero either as a Bernoulli or a
Poisson zero;
3. Model the Bernoulli and Poisson variables separately (accounting for
spatial dependence).
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 8 / 16
19. Modelling of the spatial distribution
1. Zero inflated Poisson mixture model (Lambert, 1992);
P(y|x) =
exp(−µ)µy
y!
(1)
logit(ψ) = log(
ψ
1 − ψ
) = xT
β (2)
P(Y = y)
ψ + (1 − ψ)exp(−µ) y=0
(1 − ψ)exp(−µ)µy
y! for y = 1, 2, 3, . . .
(3)
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 9 / 16
20. Modelling of the spatial distribution
2. Bernoulli/Poisson zeros;
Compute the ratio of the probability of a Bernoulli zero to the total
probability of a zero;
ψ
ψ + (1 − ψ)exp(−µ)
(1)
Randomly allocate each zero to a Bernoulli zero or a Poisson zero.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 9 / 16
21. Modelling of the spatial distribution
3. Bernoulli and Poisson variables are modelled separately by GLGM
(Diggle et al., 1998; Christensen, 2004)
GLGM is GLM for dependent data (spatial random effect);
Transformed model parameters, logit(ψ) and log(µ) are modelled with
Gaussian Random Field.
S1 = logit(ψ) = x1β1 + 1 (1)
S2 = log(µ) = x2β2 + 2 (2)
The model parameters are obtained through Marcov Chain Monte
Carlo (MCML);
MCML is computationally prohibitive for large data sets.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 9 / 16
22. Simulation of the pseudo-realities
Simulate signals S (linear combination of covariates and
Gaussian noise) with GLGM models for Bernoulli and Poisson
variables at sampling locations (original grid);
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 10 / 16
23. Simulation of the pseudo-realities
Simulate signals S (linear combination of covariates and Gaussian
noise) with GLGM models for Bernoulli and Poisson variables at
sampling locations (original grid);
Use sequential Gaussian simulation to simulate signals at very
fine grid (100 m x 100 m) supplemented with validation points;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 10 / 16
24. Simulation of the pseudo-realities
Simulate signals S (linear combination of covariates and Gaussian
noise) with GLGM models for Bernoulli and Poisson variables at
sampling locations (original grid);
Use sequential Gaussian simulation to simulate signals at very fine
grid (100 m x 100 m) supplemented with validation points;
Combine pairwise the simulated fields of Bernoulli indicators
and Poisson counts to pseudo-realities of zero-inflated Poisson
counts;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 10 / 16
25. Simulated data vs Original
Figure : Simulated data, species occurrence
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 11 / 16
27. Grid spacing and Performance
Sample each pseudo-reality of zero-inflated Poisson data
repeatedly by grid-sampling with a given spacing;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 12 / 16
28. Grid spacing and Performance
Sample each pseudo-reality of zero-inflated Poisson data repeatedly
by grid-sampling with a given spacing;
Repeat it for all considered grid-spacings;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 12 / 16
29. Grid spacing and Performance
Sample each pseudo-reality of zero-inflated Poisson data repeatedly
by grid-sampling with a given spacing;
Repeat it for all considered grid-spacings;
Predict values with IDW interpolation at validation points;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 12 / 16
30. Grid spacing and Performance
Sample each pseudo-reality of zero-inflated Poisson data repeatedly
by grid-sampling with a given spacing;
Repeat it for all considered grid-spacings;
Predict values with IDW interpolation at validation points;
Calculate the performance statistics: the Mean Squared Error
MSE =
1
N
N
i=1
Y (a0) − ˆY (a0)
2
(3)
MMSE =
1
(R ∗ S)
R
i=1
S
j=1
MSEji (4)
N is a number of validation points, R - simulations and
S - samples.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 12 / 16
32. Conclusions
Sampling design for zero-inflated spatial count data is
evaluated;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
33. Conclusions
Sampling design for zero-inflated spatial count data is evaluated;
A strong monotonous increase of the MMSE is observed;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
34. Conclusions
Sampling design for zero-inflated spatial count data is evaluated;
A strong monotonous increase of the MMSE is observed;
MSEji varies strongly between simulations and samples,
especially for large grid spacings;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
35. Conclusions
Sampling design for zero-inflated spatial count data is evaluated;
A strong monotonous increase of the MMSE is observed;
MSEji varies strongly between simulations and samples, especially for
large grid spacings;
So numerous simulations and samples are needed for estimating
MMSE;
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
36. Conclusions
Sampling design for zero-inflated spatial count data is evaluated;
A strong monotonous increase of the MMSE is observed;
MSEji varies strongly between simulations and samples, especially for
large grid spacings;
So numerous simulations and samples are needed for estimating
MMSE;
Spatial modelling of zero-inflated spatial data is laborious and
computer-intensive.
Is there an easier way: INLA?
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 14 / 16
37. Thanks!
Acknowledgements:
This work was done in the framework of the WaLTER (Wadden Sea Long-Term
Ecosystem Research) project (WP5)
www.walterproject.nl
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 15 / 16
38. References I
Bijleveld, A. I., van Gils, J. A., van der Meer, J., Dekinga, A., Kraan, C., van der
Veer, H. W., and Piersma, T. (2012). Designing a benthic monitoring
programme with multiple conflicting objectives. Methods in Ecology and
Evolution, 3(3):526–536.
Brus, D. and de Gruijter, J. (2013). Effects of spatial pattern persistence on the
performance of sampling designs for regional trend monitoring analyzed by
simulation of spacetime fields. Computers & Geosciences, 61(0):175 – 183.
Christensen, O. F. (2004). Monte carlo maximum likelihood in model-based
geostatistics. Journal of Computational and Graphical Statistics, 13(3):pp.
702–718.
Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1998). Model-based geostatistics.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 47(3):pp.
299–350.
Lambert, D. (1992). Zero-inflated poisson regression, with an application to
defects in manufacturing. Technometrics, 34(1):pp. 1–14.
Lyashevska et al, 2014 olga.lyashevska@nioz.nl July, 2 2014 16 / 16