Just some ideas how low-rank matrices/tensors can be useful in spatial and environmental statistics, where one usually has to deal with very large data
Possible applications of low-rank tensors in statistics and UQ (my talk in Bonn, Germany)
1. Possible applications of low-rank tensors in statistics
and UQ
Alexander Litvinenko,
Extreme Computing Research Center and Uncertainty
Quantification Center, KAUST
(joint work with H.G. Matthies, MIT and KAUST)
Center for Uncertainty
Quantification
ntification Logo Lock-up
http://sri-uq.kaust.edu.sa/
2. 4*
Problem 1. Predict temperature, velocity, salinity
Grid: 50Mi locations on 50 levels, 4*(X*Y*Z) = 4*500*500*50=
50Mi.
High-resolution time-dependent data about Red Sea: zonal velocity and
temperature
Center for Uncertainty
Quantification
tion Logo Lock-up
2 / 13
3. 4*
Problem 1. Apply low-rank tensor for
1. Kriging estimate
ˆs := Csy C−1
yy y
2. Estimation of variance ˆσ, is the diagonal of conditional cov.
matrix
Css|y = diag Css − Csy C−1
yy Cys
,
3. Gestatistical optimal design
ϕA := n−1
trace{Css|y }
ϕC := cT
Css − Csy C−1
yy Cys c
,
Center for Uncertainty
Quantification
tion Logo Lock-up
3 / 13
4. 4*
Problem 2. Stochastic Galerkin Operator
Problem 2. Stochastic Galerkin Operator
Center for Uncertainty
Quantification
tion Logo Lock-up
4 / 13
5. 4*
Discretization of stoch. PDE − div(κ(p, x) u(p, x)) = f(x, p)
Pictures 1, 2 (poor and rich discretization of p):
(
i=1
∆i ⊗ Ki) · (x ⊗ e) = (f ⊗ e) (1)
Picture 3:
(
i=1
Ki ⊗ ∆i) · (x ⊗ e) = (f ⊗ e) (2)
Center for Uncertainty
Quantification
antification Logo Lock-up
1 / 1
Center for Uncertainty
Quantification
tion Logo Lock-up
5 / 13
6. 4*
Problem 3. Predict moisture, estimate covariance parameters
Grid: 1830 × 1329 = 2, 432, 070 locations with 2,153,888
observations and 278,182 missing values.
−120 −110 −100 −90 −80 −70
253035404550
Soil moisture
longitude
latitude
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
High-resolution daily soil moisture data at the top layer of the Mississippi
basin, U.S.A., 01.01.2014 (Chaney et al., in review).
Important for agriculture, defense. Moisture is very heterogeneous.
Center for Uncertainty
Quantification
tion Logo Lock-up
5 / 13
7. 4*
Problem 4: Identifying uncertain parameters
Given: a vector of measurements z = (z1, ..., zn)T with a
covariance matrix C(θ∗) = C(σ2, ν, ).
To identify: uncertain parameters (σ2, ν, ).
Plan: Maximize the log-likelihood function
L(θ) = −
1
2
Nlog2π + log det{C(θ)} + zT
C(θ)−1
z ,
On each iteration i we have a new matrix C(θi ).
Center for Uncertainty
Quantification
tion Logo Lock-up
6 / 13
8. 4*
Solution: Estimation of uncertain parameters
H-matrix rank
3 7 9
cov.length
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.06
Box-plots for = 0.0334 (domain [0, 1]2) vs different H-matrix
ranks k = {3, 7, 9}.
Which H-matrix rank is sufficient for identification of parameters
of a particular type of cov. matrix?
Center for Uncertainty
Quantification
tion Logo Lock-up
7 / 13
9. 0 10 20 30 40
−4000
−3000
−2000
−1000
0
1000
2000
parameter θ, truth θ*=12
Log−likelihood(θ)
Shape of Log−likelihood(θ)
log(det(C))
zT
C−1
z
Log−likelihood
Figure : Minimum of negative log-likelihood (black) is at
θ = (·, ·, ) ≈ 12 (σ2
and ν are fixed)
Center for Uncertainty
Quantification
tion Logo Lock-up
8 / 13
10. 4*
Problem 5: Multivariate characteristic function
Multivariate characteristic function
Center for Uncertainty
Quantification
tion Logo Lock-up
9 / 13
11. 4*
Problem 5: Multivariate characteristic function
The multivariate characteristic function ϕX(t) of a d-dimensional
random vector X = (X1, ..., Xd ) with X1,...,Xd independent, is
ϕX(t) =
Rd
pX(y)exp(i y, t )dy, t = (t1, ..., td ) ∈ Rd
, (1)
The probability density is
pX(y) =
1
(2π)d
Rd
exp(−i y, t )ϕX(t)dt, y ∈ Rd
(2)
Center for Uncertainty
Quantification
tion Logo Lock-up
10 / 13
12. 4*
Elliptically contoured multivariate stable distribution
The characteristic function ϕX(t) of the elliptically contoured
multivariate stable distribution is defined as follow:
ϕX(t) = exp i(t1, t2) · (µ1, µ2)T
− (t1, t2)
σ2
1 0
0 σ2
2
(t1, t2)T
α/2
(3)
Now the question is to find a separation of
(t1, t2)
σ2
1 0
0 σ2
2
(t1, t2)T
α/2
≈
R
ν=1
φν,1(t1) · φν,2(t2), (4)
Center for Uncertainty
Quantification
tion Logo Lock-up
11 / 13
13. 4*
Multivariate distribution
Let ϕX(t) of some multivariate d-dimensional distribution is
approximated as follow:
ϕX(t) ≈
R
=1
d
µ=1
ϕX ,µ
(tµ). (5)
pX(y) ≈
Rd
exp(−i y, t )ϕX(t)dt (6)
≈
Rd
exp(−i
d
j=1
yj tj )
R
=1
d
µ=1
ϕX ,µ
(tµ)dt1...dtd (7)
≈
R
=1
d
µ=1 R
exp(−iyµtµ)ϕX ,µ
(tµ)dtµ ≈
R
=1
d
µ=1
pX ,µ
(yµ)
(8)
Center for Uncertainty
Quantification
tion Logo Lock-up
12 / 13
14. 4*
Literature
1. PCE of random coefficients and the solution of stochastic partial
differential equations in the Tensor Train format, S. Dolgov, B. N.
Khoromskij, A. Litvinenko, H. G. Matthies, 2015/3/11, arXiv:1503.03210
2. Efficient analysis of high dimensional data in tensor formats, M. Espig,
W. Hackbusch, A. Litvinenko, H.G. Matthies, E. Zander Sparse Grids and
Applications, 31-56, 40, 2013
3. Application of hierarchical matrices for computing the Karhunen-Loeve
expansion, B.N. Khoromskij, A. Litvinenko, H.G. Matthies, Computing
84 (1-2), 49-67, 31, 2009
4. Efficient low-rank approximation of the stochastic Galerkin matrix in
tensor formats, M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies,
P. Waehnert, Comp. & Math. with Appl. 67 (4), 818-829, 2012
Center for Uncertainty
Quantification
tion Logo Lock-up
13 / 13