CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and Surfaces - Douglas Nychka, Oct 3, 2017

Estimating curves and surfaces
Lecture 1.2
Douglas Nychka
National Center for Atmospheric Research
National Science Foundation April, 2017

Outline for Lecture 1.2
D. Nychka Spatial statistics 1
• Basis functions and least squares
• Least squares smoothers adding a penalty
• Q and the smoothing parameter, λ
• Choosing the basis
• Cross-validation for λ
• fields
• The Bayesian connection.

Estimating a curve or surface.
An additive statistical model:
Given n pairs of observations (xi, yi), i = 1, . . . , n
yi = g(xi) + i
i’s are random errors and g is an unknown, smooth func-
tion.
Goals
• Estimate g based
on the observations
• Quantify the uncertainty in the estimate.

Some data examples

Air quality
Predict surface ozone where it is not monitored.
Ambient daily ozone
in PPB June 16,
1987, US Midwestern
Region.
−92 −90 −88 −86 −84
38404244
0
50
100
150

Supercomputer operations
Predict power needed for cooling.
Power for mechanical systems based on temperature and humidity.
NCAR/Wy Supercomputing Center, Yellowstone – 72K cores, 15 Pb

Human CO2 emissions
Relationship of economics and CO2 emissions.
CO2 per capita by country for 1999 ( Source: World Bank )
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
100 500 2000 10000
5e+025e+035e+045e+055e+06
GDP per capita
CO2Emmissionspercapita
US−
China−
−Russia

Basis functions and least squares.

Representing a curve
Start with your favorite m basis functions {b1(x), b2(x), . . . , bm(x)}
The estimate has the form
ˆg(x) =
m
k=1
βkbk(x)
where β = (β1, . . . , βm) are the coefficients.
The basis functions are fixed and so the problem is to just
find the coefficients.
Many spatial statistics problems have this general form or can be ap-
proximated by it.

Example of basis functions
−3 −2 −1 0 1 2 3
0.00.40.8
xp
bump(xp)
• Build a basis by translating and scaling a bump shaped curve
• Not your usual sine/cosine or polynomials!
• Bsplines not required!

Two Bases
10 Functions:
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
20 Functions:
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8

Least squares.
Xi,j = bk(xi)
(ˆg(x1), ˆg(x2), . . . , ˆg(xn))T = Xβ and y = Xβ +
minimize over β:
min
β
n
i=1
(y − [Xβ]i)2
Solution:
ˆβ = (XT X)−1XT y
Curve/surface estimate:
ˆg(x) =
m
k=1
ˆβkbk(x)

Some synthetic data: Yk = g(xk) + ek
g(x) = 9x(1 − x)3
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
qq
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.61.0
X
Y
xk are 150 unequally spaced points in [0, 1]
h(x) = 9x(1 − x)3 , ek ∼ N(0, (.1)2)

Varying basis size
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
qq
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.61.0
basis functions 4
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
qq
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.61.0
basis functions 8
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
qq
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.61.0
basis functions 16
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
qq
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.61.0
basis functions 32
Problem: How to choose basis? How to choose number of basis
functions? Uncertainty of estimate?

Penalized least squares

Adding a constraint on parameters
g(x) = m
k=1 βkbk(x)
minimize over β:
Sum of squares(β) + penalty on β
min
β
n
i=1
(y − [Xβ]i)2 + λβT Qβ
Fit to the data + penalty for complexity/smoothness
• λ > 0 a smoothing parameter
• Q a nonnegative deﬁnite matrix.

Solution to penalized problem
ˆβ = (XT X + λQ)−1XT y
Also known as ridge regression
The Hat matrix and eﬀective degrees of freedom
For ordinary least squares
tr(X(XT X)−1XT ) = number of parameters
i.e. the column rank of X
For the penalized estimate
ˆg = X(XT X + λQ)−1XT y
tr(X(XT X + λQ)−1XT ) = eﬀective number of parameters

More on Q
ˆβ = (XT X + λQ)−1XT y
• Q is an identity βT Qβ = (βl)2
• Choose Q so that βT Qβ = (βl − βl−1)2
• Cubic smoothing spline type: βT Qβ = (g (x))2dx
i.e. Qkl = bk(x)bl (x)dx
• Spatial model: Q−1
kl = exp(−|xk − xl|/θ)
• Lasso/Wavelet:
Replace quadratic form by L1 type criteria: βT Qβ → |βl|

The A matrix
Smoothers in general
If we just predict at the observation locations we have
ˆg =





ˆg(x1)
ˆg(x2)
...
ˆg(xn)





= X(XT X + λQ)−1XT y = A(λ)y (1)
• A(λ) is the smoother matrix
• rows of A are the weights on y to predict g.
• The penalized least squares estimator deﬁnes the rows of A. This is
diﬀerent than a running average or kernel estimator.
• If X = Q−1 then A(λ) = (I + λQ)−1y
( So simple – is this even possible?)

λ and eﬀ. degrees of freedom
eﬀective degrees of freedom = tr(A(λ))
1e−08 1e−05 1e−02 1e+01 1e+04
01020304050
lambda
EffectiveDF
Basis: Constant, linear, and 50 bump-shaped functions.
Second derivative type penalty.

Varying smoothing parameter
Cubic spline-type choice with 50 basis functions
Eﬀective number of parameters (basis functions) depends on λ.
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.40.60.81.0
lambda 0.01 DF 3.6
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.40.60.81.0
lambda 1e−04 DF 12.2
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.40.60.81.0
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.40.60.81.0
Problem: How to choose basis? How to choose λ? Uncertainty of
estimate?

Choosing the basis.

Basis selection
Reformulate the estimate in terms of the function, not the basis.
ming
n
i=1
(yi − g(xi))2 + λ g (x)2dx
Solution is a cubic smoothing spline.
Of course in practice we still need to represent g as
g(x) =
m
k=1
βkbk(x)
but this is a computational detail – not part of our model for g.

Splines
• Usually based on derivative penalities
• Basis has closed form and solves the minimization
• Cubic smoothing splines – basis is piecewise cubic functions with the
join points at the observations.
• Three diﬀerent but equivalent cubic spline bases.
1, x, (x − xi)3
+
1, x, |x − xi|3
1, x, H(x, xi)
with H(u, v) =
u2v/2 − u3/3 for u < v
uv2/2 − v3/3 u > v

Thin plate spline:
min
g
n
i=1
(yi − g(xi)2 + λJ(g)
J(g) integral of squared partial derivatives of g.
E.g. in two dimension (d = 2), second order (m = 2)
J(g) = 2 ( ∂2
∂2x1
g)2 + 2( ∂2
∂x1∂x2
g)2 + ( ∂2
∂2x2
g)2 dx1dx2
Incredibly this minimization over functions has a simple and
computable solution!
Basis for two dimensions, m = 2
1, x1, x2, ||x − ui||2log(||x − ui||) where ui are the data locations.

Estimating λ from data

Cross-validation – Wahba (1970)
• Leave each observation out. Predict it with the remaining data for a
ﬁxed lambda.
• Find λ that the minimizes mean squared error.
• Initially proposed for cubic smoothing splines but works for any smoother.
CV: obvious leave-one-out MSE
GCV: Adjust each CV squared residual for prediction error.

Finding the smoothing parameter
Cross-validation functions Cubic spline at minimum
0 50 100 150
0.010.020.04
Effective Degrees of Freedom
Meansquarederror
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.61.0

Machine learning and λ
Divide data up into 90/10 training/validation and do out-sample-CV
Based on 100 random divisions ...
0 50 100 150
0.0050.0200.050
Effective Degrees of Freedom
Meansquarederror
||| | || ||| || ||||| | | | ||| ||| |||| | | ||| | || ||| ||| ||||| | |||||| | ||||| || | ||| | | ||||| | |||| || ||| ||||||||| || || || |
90/10 CV curves, minima |, median - - - , GCV —

In R

Tps -Thin plate spline function
# x and y are the synthetic example
library( fields)
obj<- Tps( x,y)

In R
> obj
Call:
Tps(x = x, Y = y)
Number of Observations: 150
Number of parameters in the null space 2
Parameters for fixed spatial drift 2
Model degrees of freedom: 11.2
Residual degrees of freedom: 138.8
GCV estimate for sigma: 0.08037
MLE for sigma: 0.08091
MLE for rho: 31.1
lambda 0.00021
User supplied rho NA
User supplied sigma^2 NA
Summary of estimates:
lambda trA GCV shat -lnLike Prof converge
GCV 0.0002105075 11.17840 0.006979633 0.08037097 -147.2260 1
REML 0.0001397338 12.25141 0.006998224 0.08016631 -147.5925 8
>

set.panel( 2,2)
plot( obj)
q
q q
qqq
q q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
q
qq
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
qq
q
0.0 0.2 0.4 0.6 0.8
−0.20.20.61.0
predicted values
Y
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8−1012
predicted values
(STD)residuals
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
0 20 40 60 80 100 140
0.010.020.030.040.05
Eff. number of parameters
GCVfunction
−150−50050
logprofilelikelihood
GCV−points, solid−model, dots− single
REML dashed
Histogram of std.residuals
std.residuals
−2 −1 0 1 2
051015202530

xgrid<- seq( 0,1, length.out=200)
fhat<- predict( obj, xgrid)
q
qq
qqq
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
qqq
q
q
qq
q
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.61.0
X
Y

The Bayesian Connection

Reverse engineering our estimator
A statistical refactoring
minβ
n
i=1(y − [Xβ]i)2 + λβT Qβ ,
Divide up λ = σ2/ρ and switch to ”max”
maxβ − n
i=1
(y−[Xβ]i)2
2σ2 − βT
Qβ
2ρ ,
exponentiate
maxβ e
− n
i=1
(y−[Xβ]i)2
2σ2 e
−
βT
Qβ
2ρ

What have we built?
[y|β, σ, ρ, K] [β|ρ, σ, K] [ρ, σ, K]
• y given the function and parameters is Gaussian.
• Coeﬃcients, β, given the parameters is Gaussian (0, ρK)
• Fixing parameters y is Gaussian (0, σ2I + ρK)
Penalized least squares estimator ≡ a Gaussian statistical model for the
data and coeﬃcients.
Minimizing the penalized LS ≡ posterior maximum or a conditional ex-
pectation from Gaussian.

Thank you!

CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and Surfaces - Douglas Nychka, Oct 3, 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and Surfaces - Douglas Nychka, Oct 3, 2017

Similar to CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and Surfaces - Douglas Nychka, Oct 3, 2017 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and Surfaces - Douglas Nychka, Oct 3, 2017