Spectral clustering with motifs and higher-order structures
Ähnlich wie Undergraduate Modeling Workshop - Hierarchical Models for Sparsely Sampled High-Dimensional LiDAR and Forest Variables, Andrew Finley, May 22, 2018
Ähnlich wie Undergraduate Modeling Workshop - Hierarchical Models for Sparsely Sampled High-Dimensional LiDAR and Forest Variables, Andrew Finley, May 22, 2018 (20)
Undergraduate Modeling Workshop - Hierarchical Models for Sparsely Sampled High-Dimensional LiDAR and Forest Variables, Andrew Finley, May 22, 2018
1. Hierarchical models for sparsely sampled
high-dimensional LiDAR and forest variables: An
interior Alaska FIA case study
Andrew Finley (Michigan State University)
Hans-Erik Andersen (Forest Service, Forest Inventory and Analysis)
Sudipto Banerjee (University of California, LA)
Bruce Cook (NASA Goddard Space Flight Center)
Doug Morton (NASA Goddard Space Flight Center)
SAMSI Undergraduate Modelling Workshop
2. Extraordinary opportunities to understand the spatial and temporal
complexity of environmental processes at broad scales.
Unprecedented investment to collect, develop, and distribute data and
tools to further large-scale and long-term science.
For example:
National Ecological Observatory Network (NEON)
designed to detect and enable forecasting of ecological change at
continental scales over time — NSF $434 million 30 year project
USDA Forest Service Forest Inventory and Analysis (FIA)
designed to monitor status and trends in forest land — since 1998
FIA measured 510,340 inventory plots across conterminous US
measuring 5,839,642 trees (now with 2+ repeated measurements)!
National Aeronautics and Space Administration (NASA)
Global Ecosystem Dynamics Investigation LiDAR (GEDI) — $95
million 5 year project
SAMSI Undergraduate Modelling Workshop
3. Key challenges in spatiotemporal environmental data analysis
Data sets often exhibit:
missingness and misalignment among outcomes
space- and time-varying impact of covariates
complex residual dependence structures
nonstationarity among multiple outcomes across locations
unknown time and perhaps space lags between outcomes and
covariates
SAMSI Undergraduate Modelling Workshop
4. Key challenges in spatiotemporal environmental data analysis
Data sets often exhibit:
missingness and misalignment among outcomes
space- and time-varying impact of covariates
complex residual dependence structures
nonstationarity among multiple outcomes across locations
unknown time and perhaps space lags between outcomes and
covariates
Interest in modeling frameworks that:
incorporate many sources of space and time indexed data
accommodate structured residual dependence
propagate uncertainty through to predictions
scale to effectively exploit information in massive data sets
SAMSI Undergraduate Modelling Workshop
5. Joint NASA and FIA Forest Service initiative
Project goal: Design and implement an operational forest inventory in
Interior AK by extending sparse networks of ground samples with space
and airborne multi-sensor data.
Data products:
1. Complete coverage maps (e.g., 15×15 m resolution) of forest:
Above ground biomass (AGB; mg/ha)
Basal area (BA; m2
/ha)
Density (TPH; trees/ha)
From inventory plots
Fractional cover (FC; %)
Canopy height (P95; m)
From LiDAR
2. Pixel level prediction with uncertainty estimates
3. Biologically consistent relationships among predictions
4. Reporting with uncertainty for user defined areas
Fun read: www.wired.com/2014/12/alaska-laser-survey-3d-map
SAMSI Undergraduate Modelling Workshop
6. Tanana Inventory Unit (TIU)
Data:
1. LiDAR transects
50,000 km flight lines
25 TB data
∼43 million signals
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
3. Complete coverage
% forest canopy (TC)
Forest fire (FIRE)
SAMSI Undergraduate Modelling Workshop
8. Tanana Inventory Unit (TIU)
Data:
1. LiDAR transects
50,000 km flight lines
25 TB data
∼43 million signals
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
3. Complete coverage
% forest canopy (TC)
Forest fire (FIRE)
SAMSI Undergraduate Modelling Workshop
9. Tanana Inventory Unit (TIU)
Data:
1. LiDAR transects
50,000 km flight lines
25 TB data
∼43 million signals
{FC, P95}
2. Inventory plots
1,461 ∼7 m radius
{AGB, BA, TPH}
3. Complete coverage
% forest canopy (TC)
Forest fire (FIRE)
Easting (km)
Northing(km)
0 100 200 300 400 500 600
0100200300400500
Forest fire within 20 years
SAMSI Undergraduate Modelling Workshop
10. Key inferential challenges
incorporate many sources of spatially indexed data
address misalignment (missingness) among responses
accommodate and leverage residual spatial dependence
propagate parameter uncertainty through to predictions
deliver statistically valid probabilistic prediction of arbitrary areas
maintain observed covariance among multivariate predictions
scale to effectively exploit information in massive data sets
SAMSI Undergraduate Modelling Workshop
11. Key inferential challenges
incorporate many sources of spatially indexed data
address misalignment (missingness) among responses
accommodate and leverage residual spatial dependence
propagate parameter uncertainty through to predictions
deliver statistically valid probabilistic prediction of arbitrary areas
maintain observed covariance among multivariate predictions
scale to effectively exploit information in massive data sets
Some anticipated extensions:
incorporate time-indexed observations
model nonstationarity among multiple responses across locations
estimate space- and time-varying impact of covariates
SAMSI Undergraduate Modelling Workshop
12. Hierarchical Gaussian process models
Say we observe q outcomes at a given location within domain L. A
multivariate spatial regression:
yk ( ) = xk ( ) βk + wk ( ) + ek ( ), for k = 1, 2, . . . , q
yk ( ) is the kth
outcome at generic location (e.g., AGB, BA, TPH,
FC, P95)
Mean: xk ( ) includes an intercept, TC, and FIRE
Cov: w( ) = (w1( ), w2( ), . . . , wq( )) ∼ MVGP(0, Γθ(·, ·)) where
Γθ( , ) = {Cov(wi ( ), wj ( ))} for i, j = 1, 2, . . . , q
Error: e( ) = (e1( ), e2( ), . . . , eq( )) ∼ MVN(0, Ψ)
TIU we must accommodate spatial misalignment (i.e., yk ’s are partially
observed at some locations), see, e.g., Gelfand et al. 2004, Finley et al.
2014.
Skip to results
SAMSI Undergraduate Modelling Workshop
13. Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
14. Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
15. Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
16. Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
17. Hierarchical Gaussian process models
Multivariate spatial regression model for TIU and similar settings
Forest variables
Spatial
random effect
Trend
Spatial
decay
LiDAR variables
Trend Spatial
random effect
Spatial
decay
Landsat, etc.
Non-spatial error variance-covariance
Spatial variance-covariance
SAMSI Undergraduate Modelling Workshop
18. Spatiotemporal regression models
Start with a simple univariate regression:
y( ) = x( ) β + w( ) + e( )
Potentially very rich: understand spatially- and/or
temporally-varying impact of intercept or predictors on outcome
Produce maps for random effects: {w( ) : ∈ L}
L is spatial domain (e.g., D ⊂ d
) or spatiotemporal domain (e.g.,
D ⊂ d
× +
)
Model-based predictions: y( 0) | {y( 1), y( 2), . . . , y( n)}
SAMSI Undergraduate Modelling Workshop
19. Gaussian spatiotemporal process
{w( ) : ∈ L} ∼ GP(0, Kθ(·, ·)) implies
w = (w( 1), w( 2), . . . , w( n)) ∼ MVN(0, Kθ)
for every finite set of points 1, 2, . . . , n.
Kθ = {Kθ( i , j )} is a spatial variance-covariance matrix, where
θ = {σ, φ}
Stationary: Kθ( , ) = Kθ( − ). Isotropy:
Kθ( , ) = Kθ( − ).
SAMSI Undergraduate Modelling Workshop
20. Likelihood from (full rank) GP models
Assuming {w( ) : ∈ L} ∼ GP(0, Kθ(·, ·)) implies
w = (w( 1), w( 2), . . . , w( n)) ∼ MVN(0, Kθ)
Estimating process parameters from the likelihood involves:
p(w) ∝ −
1
2
log det(Kθ) −
1
2
w K−1
θ w
Bayesian inference: priors on θ and many Markov chain Monte Carlo
(MCMC) iterations
See, e.g., Finley et al. 2015 and Finley et al. 2017 for some coding tips.
SAMSI Undergraduate Modelling Workshop
21. Computation issues
Storage: n2
pairwise distances to compute Kθ
Kθ is dense; Need to solve Kθx = b and need det(Kθ)
This is best achieved using chol(Kθ) = LDL
Complexity: roughly O(n3
) flops
Computationally infeasible for large datasets
SAMSI Undergraduate Modelling Workshop
22. Burgeoning literature on spatial big data
Low-rank models: (Wahba 1990; Higdon 2002; Kamman & Wand 2003;
Paciorek 2007; Rasmussen & Williams 2006; Stein 2007, 2008; Cressie &
Johannesson 2008; Banerjee et al. 2008, 2010; Gramacy & Lee 2008;
Finley et al. 2009; Sang et al. 2011, 2012; Lemos et al. 2011; Guhaniyogi
et al. 2011, 2013; Salazar et al. 2013; Katzfuss 2016)
Spectral approximations and composite likelihoods: (Fuentes 2007;
Paciorek 2007; Eidsvik et al. 2016)
Multi-resolution approaches: (Nychka, 2014; Johannesson et al. 2007;
Matsuo et al. 2010; Tzeng & Huang 2015; Katzfuss 2016)
Sparsity: (Solve Ax = b by (i) sparse A, or (ii) sparse A−1
)
1. Covariance tapering (Furrer et al. 2006; Du et al. 2009; Kaufman et
al. 2009; Shaby and Ruppert 2013)
2. GMRFs to GPs: INLA (Rue et al. 2009; Lindgren et al. 2011)
3. LAGP (Gramacy et al. 2014; Gramacy & Apley 2015)
4. Nearest-neighbor Gaussian Process (NNGP) models (Datta et al.
2015, 2016; Finley et al. 2017)
SAMSI Undergraduate Modelling Workshop
23. Reduced (Low) rank models
Kθ ≈ BθK∗
θ Bθ + Dθ
Bθ is n × r matrix of spatial basis functions, r << n
K∗
θ is r × r spatial covariance matrix
Dθ is either diagonal or sparse
Examples: Kernel projections, Splines, Predictive process, FRK,
spectral basis . . .
Computations exploit above structure: roughly O(nr2
) << O(n3
)
flops
SAMSI Undergraduate Modelling Workshop
24. Low-rank models: hierarchical approach
N(w∗
| 0, K∗
θ ) × N(w | Bθw∗
, D)
w is n × 1 and n is large
w∗
is r × 1, where r << n; so K∗
θ is r × r
Bθ is n × r is a matrix of “basis” functions
D is n × n, but easy to invert (e.g., diagonal)
Derive var(w) (or var(w∗
| y)) in alternate ways to obtain
(BθK∗
θ Bθ + D)−1
= D−1
− D−1
Bθ(K∗−1
θ + Bθ D−1
Bθ)−1
Bθ D−1
.
This is the famous Sherman-Woodbury-Morrison formula.
Modeling: specifying w∗
and Bθ.
See Finley et al. 2015 for implementation details in spBayes R package
SAMSI Undergraduate Modelling Workshop
25. Oversmoothing due to reduced-rank models
(a) True w (b) Full GP (c) PPGP 64 knots
Figure: Comparing full GP vs low-rank GP with 2500 locations. Figure (c)
exhibits oversmoothing by a low-rank process (predictive process with 64 knots)
See Stein 2014 for good reasons not to use reduced-rank spatial models
SAMSI Undergraduate Modelling Workshop
27. Simple method of introducing sparsity (e.g. graphical models)
p(w) = N(w | 0, Kθ)
= p(w1)p(w2 | w1)
× p(w3 | w1, w2)
× p(w4 |¨¨w1, w2, w3)
× p(w5 |¨¨w1,¨¨w2, w3, w4)
× p(w6 |¨¨w1,¨¨w2,¨¨w3, w4, w5)
× p(w7 | w1,¨¨w2,¨¨w3, w4,¨¨w5, w6) .
We need to solve n − 1 linear systems of size at most m × m, where m is
the number of neighbors in the conditional set.
SAMSI Undergraduate Modelling Workshop
28. Sparse likelihood approximations (Vecchia, 1988; Stein et al., 2004)
With w( ) ∼ GP(0, Kθ(·)), write the joint density p(w) as:
N(w | 0, Kθ) =
n
i=1
p(w( i ) | wH( i ))
≈
n
i=1
p(w( i ) | wN( i )) = N(w | 0, ˜Kθ) .
where N( i ) ⊆ H( i ).
Shrinkage: Choose N( ) as the set of “m nearest-neighbors” among
H( i ). Theory: “Screening” effect of kriging.
˜K−1
θ depends on Kθ, but is sparser with at most nm2
non-zero
entries
Extension to a GP (Datta et al., JASA, 2016) called the Nearest
Neighbor Gaussian Process (NNGP)
SAMSI Undergraduate Modelling Workshop
29. (a) True w (b) Full GP (c) PPGP 64 knots
(d) NNGP, m = 10 (e) NNGP, m = 20
SAMSI Undergraduate Modelling Workshop
30. q
q
q
q
q
q q
q q q q q q q q q q q q q q q q q q
m
RMSPE
1.15
1.20
1.25
1.30
1.35
q
q
q
q
q
q
q
q
q q q q q q q q q q q q q q q q q
2.10
2.15
2.20
2.25
2.30
2.35
2.40
Mean95%CIwidth
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
q
q
NNGP RMSPE
NNGP Mean 95% CI width
Full GP RMSPE
Full GP Mean 95% CI width
Figure: Choice of m in NNGP models: Out-of-sample Root Mean Squared
Prediction Error (RMSPE) and mean width between the upper and lower 95%
posterior predictive credible intervals for a range of m for the univariate
synthetic data analysis
SAMSI Undergraduate Modelling Workshop
31. Figure: Wall time required for one MCMC iteration by number of locations n
and m=10 nearest neighbors (both axes are on the log scale).
SAMSI Undergraduate Modelling Workshop
32. Concluding remarks: Storage and computation
Algorithms: Gibbs, RWM, HMC, VB, INLA; NNGP/HMC especially
promising
Model-based solution for spatial “BIG DATA”
Never needs to store n × n distance matrix—store n small m × m
matrices, where m is the number of nearest neighbors considered
and m << n, e.g., m ≈ 15.
Total flop count per iteration is O(nm3
) i.e., linear in n
Scalable to massive datasets because m is small—you never need
more than a few neighbors.
Compare with reduced-rank models: O(nm3
) << O(nr2
).
New R package spNNGP (on CRAN
https://cran.r-project.org/web/packages/spNNGP)
SAMSI Undergraduate Modelling Workshop
33. Tanana Valley initial run results
Initial analysis fit the multivariate spatial NNGP model (with
misalignment between inventory plots and LiDAR outcomes) Skip TIU model
Model fit and prediction algorithms written in C with heavy use of
OpenMP for parallelization.
Outcome vector included: AGB, BA, TPH, FC, and P95
AGB, BA, and TPH measured on 1,461 forest inventory plots
FC and P95 measured on 5 million LiDAR pixels
We considered m=15 neighbors for NNGPs
Posterior inference was based on 25k post burn-in MCMC samples
Full GP covariance matrix Kθ would be 5,001,461×5,001,461!
NNGP run time was ∼12 hours (Intel 18 core machine) Prediction for
TIU takes ∼5 days to deliver pixel level posterior distributions.
SAMSI Undergraduate Modelling Workshop
37. Prototype for FIA/NASA TIU data products user interface
http://www.globalfiredata.org/temp/tanana.html
SAMSI Undergraduate Modelling Workshop
38. Prototype for FIA/NASA TIU data products user interface
http://www.globalfiredata.org/temp/tanana.html
SAMSI Undergraduate Modelling Workshop
39. Thank You !
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large
geostatistical datasets. Journal of the American Statistical Association, 111:800-812.
Datta, A., S. Banerjee, A.O. Finley, N.A.S. Hamm, and M. Schaap. (2016) Non-separable Dynamic Nearest Neighbor Gaussian
Process Models for Large Spatio-temporal Data with Application to Particulate Matter Analysis. Annals of Applied Statistics,
31286-1316.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) On Nearest-Neighbor Gaussian Process Models for Massive Spatial
Data. WIREs Computational Statistics, 8:162-171.
Finley, A.O., S. Banerjee, Y., Zhou, B.D. Cook. (2017) Joint hierarchical models for sparsely sampled high-dimensional LiDAR
and forest variables. Remote Sensing of Environment, 1:149-161.
Finley, A.O., A. Datta, B.C. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee (2017) Applying Nearest Neighbor Gaussian
Processes to massive spatial data sets: Forest canopy height prediction across Tanana Valley Alaska.
https://arxiv.org/abs/1702.00434
Finley, A.O., S. Banerjee, A.E. Gelfand. (2015) spBayes for large univariate and multivariate point-referenced spatio-temporal
data models. Journal of Statistical Software, 63:1-28.
Heaton, M.J. A. Datta, A.O. Finley, R. Furrer, R. Guhaniyogi, F. Gerber, R.B. Gramacy, D. Hammerling, M. Katzfuss, F.
Lindgren, D.W. Nychka, F. Sun, and A. Zammit-Mangion. (2017) Methods for analyzing large spatial data: A review and
comparison. https://arxiv.org/abs/1710.05013
Other references provided upon request.
SAMSI Undergraduate Modelling Workshop
40. Concluding remarks: Comparisons
Are low-rank spatial models well and truly beaten?
Certainly do not seem to scale as nicely as NNGP
Have somewhat greater theoretical tractability (e.g. Bayesian
asymptotics)
Can be used to flexibly model smoothness
Can be constructed for other processes—e.g., Spatial Dirichlet
Predictive Process
Compare with scalable multi-resolution frameworks (Katzfuss, 2016)
Highly scalable meta-kriging frameworks (Guhaniyogi, 2016)
Future work: High-dimensional multivariate spatial-temporal variable
selection
SAMSI Undergraduate Modelling Workshop