CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and Surfaces - Douglas Nychka, Oct 3, 2017
1. Estimating curves and surfaces
Lecture 1.2
Douglas Nychka
National Center for Atmospheric Research
National Science Foundation April, 2017
2. Outline for Lecture 1.2
D. Nychka Spatial statistics 1
• Basis functions and least squares
• Least squares smoothers adding a penalty
• Q and the smoothing parameter, λ
• Choosing the basis
• Cross-validation for λ
• fields
• The Bayesian connection.
3. Estimating a curve or surface.
D. Nychka Spatial statistics 2
An additive statistical model:
Given n pairs of observations (xi, yi), i = 1, . . . , n
yi = g(xi) + i
i’s are random errors and g is an unknown, smooth func-
tion.
Goals
• Estimate g based
on the observations
• Quantify the uncertainty in the estimate.
5. Air quality
D. Nychka Spatial statistics 4
Predict surface ozone where it is not monitored.
Ambient daily ozone
in PPB June 16,
1987, US Midwestern
Region.
−92 −90 −88 −86 −84
38404244
0
50
100
150
6. Supercomputer operations
D. Nychka Spatial statistics 5
Predict power needed for cooling.
Power for mechanical systems based on temperature and humidity.
NCAR/Wy Supercomputing Center, Yellowstone – 72K cores, 15 Pb
7. Human CO2 emissions
D. Nychka Spatial statistics 6
Relationship of economics and CO2 emissions.
CO2 per capita by country for 1999 ( Source: World Bank )
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
100 500 2000 10000
5e+025e+035e+045e+055e+06
GDP per capita
CO2Emmissionspercapita
US−
China−
−Russia
9. Representing a curve
D. Nychka Spatial statistics 8
Start with your favorite m basis functions {b1(x), b2(x), . . . , bm(x)}
The estimate has the form
ˆg(x) =
m
k=1
βkbk(x)
where β = (β1, . . . , βm) are the coefficients.
The basis functions are fixed and so the problem is to just
find the coefficients.
Many spatial statistics problems have this general form or can be ap-
proximated by it.
10. Example of basis functions
D. Nychka Spatial statistics 9
−3 −2 −1 0 1 2 3
0.00.40.8
xp
bump(xp)
• Build a basis by translating and scaling a bump shaped curve
• Not your usual sine/cosine or polynomials!
• Bsplines not required!
16. Adding a constraint on parameters
D. Nychka Spatial statistics 15
g(x) = m
k=1 βkbk(x)
minimize over β:
Sum of squares(β) + penalty on β
min
β
n
i=1
(y − [Xβ]i)2 + λβT Qβ
Fit to the data + penalty for complexity/smoothness
• λ > 0 a smoothing parameter
• Q a nonnegative definite matrix.
17. Solution to penalized problem
D. Nychka Spatial statistics 16
ˆβ = (XT X + λQ)−1XT y
Also known as ridge regression
The Hat matrix and effective degrees of freedom
For ordinary least squares
tr(X(XT X)−1XT ) = number of parameters
i.e. the column rank of X
For the penalized estimate
ˆg = X(XT X + λQ)−1XT y
tr(X(XT X + λQ)−1XT ) = effective number of parameters
18. More on Q
D. Nychka Spatial statistics 17
ˆβ = (XT X + λQ)−1XT y
• Q is an identity βT Qβ = (βl)2
• Choose Q so that βT Qβ = (βl − βl−1)2
• Cubic smoothing spline type: βT Qβ = (g (x))2dx
i.e. Qkl = bk(x)bl (x)dx
• Spatial model: Q−1
kl = exp(−|xk − xl|/θ)
• Lasso/Wavelet:
Replace quadratic form by L1 type criteria: βT Qβ → |βl|
19. The A matrix
D. Nychka Spatial statistics 18
Smoothers in general
If we just predict at the observation locations we have
ˆg =
ˆg(x1)
ˆg(x2)
...
ˆg(xn)
= X(XT X + λQ)−1XT y = A(λ)y (1)
• A(λ) is the smoother matrix
• rows of A are the weights on y to predict g.
• The penalized least squares estimator defines the rows of A. This is
different than a running average or kernel estimator.
• If X = Q−1 then A(λ) = (I + λQ)−1y
( So simple – is this even possible?)
20. λ and eff. degrees of freedom
D. Nychka Spatial statistics 19
effective degrees of freedom = tr(A(λ))
1e−08 1e−05 1e−02 1e+01 1e+04
01020304050
lambda
EffectiveDF
Basis: Constant, linear, and 50 bump-shaped functions.
Second derivative type penalty.
23. Basis selection
D. Nychka Spatial statistics 22
Reformulate the estimate in terms of the function, not the basis.
ming
n
i=1
(yi − g(xi))2 + λ g (x)2dx
Solution is a cubic smoothing spline.
Of course in practice we still need to represent g as
g(x) =
m
k=1
βkbk(x)
but this is a computational detail – not part of our model for g.
24. D. Nychka Spatial statistics 23
Splines
• Usually based on derivative penalities
• Basis has closed form and solves the minimization
• Cubic smoothing splines – basis is piecewise cubic functions with the
join points at the observations.
• Three different but equivalent cubic spline bases.
1, x, (x − xi)3
+
1, x, |x − xi|3
1, x, H(x, xi)
with H(u, v) =
u2v/2 − u3/3 for u < v
uv2/2 − v3/3 u > v
25. D. Nychka Spatial statistics 24
Thin plate spline:
min
g
n
i=1
(yi − g(xi)2 + λJ(g)
J(g) integral of squared partial derivatives of g.
E.g. in two dimension (d = 2), second order (m = 2)
J(g) = 2 ( ∂2
∂2x1
g)2 + 2( ∂2
∂x1∂x2
g)2 + ( ∂2
∂2x2
g)2 dx1dx2
Incredibly this minimization over functions has a simple and
computable solution!
Basis for two dimensions, m = 2
1, x1, x2, ||x − ui||2log(||x − ui||) where ui are the data locations.
27. Cross-validation – Wahba (1970)
D. Nychka Spatial statistics 26
• Leave each observation out. Predict it with the remaining data for a
fixed lambda.
• Find λ that the minimizes mean squared error.
• Initially proposed for cubic smoothing splines but works for any smoother.
CV: obvious leave-one-out MSE
GCV: Adjust each CV squared residual for prediction error.
31. D. Nychka Spatial statistics 30
Tps -Thin plate spline function
# x and y are the synthetic example
library( fields)
obj<- Tps( x,y)
32. In R
D. Nychka Spatial statistics 31
> obj
Call:
Tps(x = x, Y = y)
Number of Observations: 150
Number of parameters in the null space 2
Parameters for fixed spatial drift 2
Model degrees of freedom: 11.2
Residual degrees of freedom: 138.8
GCV estimate for sigma: 0.08037
MLE for sigma: 0.08091
MLE for rho: 31.1
lambda 0.00021
User supplied rho NA
User supplied sigma^2 NA
Summary of estimates:
lambda trA GCV shat -lnLike Prof converge
GCV 0.0002105075 11.17840 0.006979633 0.08037097 -147.2260 1
REML 0.0001397338 12.25141 0.006998224 0.08016631 -147.5925 8
>
38. What have we built?
D. Nychka Spatial statistics 37
[y|β, σ, ρ, K] [β|ρ, σ, K] [ρ, σ, K]
• y given the function and parameters is Gaussian.
• Coefficients, β, given the parameters is Gaussian (0, ρK)
• Fixing parameters y is Gaussian (0, σ2I + ρK)
Penalized least squares estimator ≡ a Gaussian statistical model for the
data and coefficients.
Minimizing the penalized LS ≡ posterior maximum or a conditional ex-
pectation from Gaussian.