YSC 2013

Piecewise Gaussian Process Modelling for
Change-Point Detection
Application to Atmospheric Dispersion Problems

Adrien Ickowicz

CMIS
CSIRO

February 2013

Background

Scientic collaboration with the University College London, the
UNSW and Universite Lille 1.
Atmospheric specialists;
Informatics engineer;
Statisticians.

Input
Concentration value of CBRN material at sensors location;
Wind eld.

Output
Source location, time of release, strength for Fire-ghters;
Quarantine Map for Politicians and MoD.

Statistical Modelling

Observation modelling:

obs (i )
Yt j =
(i )
Dtj
i
(θ) + ζtj

Cθ (x , t )h(x , t |xi , tj )dxdt i
ζtj ∼ N (0, σ 2 )
Ω×T

where Cθ is the solution of the pde:
∂C
+u C − (K C) = Q (θ)
∂t
s.t. nC = 0 at ∂Ω

Parameter of interest: θ ∈ (Ω × T )

Existing Techniques
Source term estimation

The Optimization techniques.

Gradient-based methods
(Elbern et al [2000], Li and Niu [2005], Lushi and Stockie [2010])
Patern search methods
(Zheng et al [2008])
Genetic Algorithms
(Haupt [2005], Allen et al [2009])

The Bayesian techniques.

Forward modelling and MCMC
(Patwardhan and Small [1992])
Backward (Adjoint) modelling and MCMC
(Issartel et al [2002], Hourdin et al [2006], Yee [2010])

Contribution : Gaussian Process modelling
Overview

We consider several observations of a stochastic process in space
and time.

Idea: Bayesian non-parametric estimation.
Tool: Gaussian Process (Rasmussen [2006])

Joint distribution: y ∼ GP(m(x), κ(x, x ))

m ∈ L2 (Ω × T , R) is the prior mean function,
and κ ∈ L2 (Ω2 × T 2 , R) is the prior covariance function1

Posterior distribution: L y∗ |x∗ , x, y = N κ(x∗ , x)κ(x, x)−1 y,
κ(x∗ , x∗ ) − κ(x∗ , x)κ(x, x)−1 κ(x, x∗ )

1 the matrix K associated should be positive semidenite

On the Kernel Specication

A complex non parametric modelling needs to be very careful on kernel
shape and kernel hyper-parameters.

Basic Kernel: Isotropic, κ(x, x ) = α1 exp − 1
2α2
(x − x )2

Hyper-parameters: α1 , α2
3

3

3
2

2

2
1

1

1
0

0

0
−1

−1

−1
−2

−2

−2
Figure: Prediction of 3 Gaussian Process Models (and their according 0.95 CI) given 7
noisy observations. On the left, α2 = 0.1. In the middle, α2 = 2. On the right,
α2 = 1000.

Likelihood and Multiple Kernels

The hyper-parameters estimation is provided through the marginal
likelihood,

log p (y|x) = − 1 yT (K + σ 2 In )−1 y − 1 log |K + σ 2 In | − n log 2π
2 2 2

What if the best-tted kernel was,

κ(x, x ) = i
κi (x, x )1{x,x }∈
i

Figure: Synthetic two-phase signal.

Change-Point Estimation

A. Parametric Estimation

We assume that there exist βi such that,

(x , x ) ∈ Ωi ⇔ f (x , x , βi ) ≥ 0

and f is known. Then, θ = {(αi , βi )i }, and we have,

θ = argmax
ˆ log p (y|x)
θ

Limitations:
Knowledge of f
Dimension of the parameter space
Convexity of the marginal likelihood function


B. Adaptive Estimation (1)

Let XkNN ∩Br (i ) the sequence of observations associated with xi ,

XkNN ∩Br (i ) = xj |{xj ∈ Bir } ∩ {dji ≤ d(ik ) }

k is the number of neighbours to be considered,
r is the limiting radius.
Justication:
Avoid the lack of observations
Equivalent number of observations for each estimator
Avoid the hyper-parametrization of the likelihood


B. Adaptive Estimation (2)

Let xI = XkNN ∩Br (i ) and yI be the corresponding observations.

αi = argmax
ˆ log p (yI |xI )
α

Idea 1: Idea 2:

Cluster on αi
ˆ Build the Gram matrices Ki = κ(xI , αi )
ˆ
xi xi
Let Λxi = {λ1 . . . λn } be the eigenvalues of
but what if dim(ˆ i ) ≥ 2 ?
α Ki
Cluster on µi = max{Λxi }

Simulation Results

Figure: Gaussian Process prediction with 1 classical isotropic kernel (green), 2 isotropic kernels with eigenvalue-based
change point estimation (yellow), hyper-parameter-based change point estimation (purple) and parametric estimation (blue).
50 50

45 45

40 40

35 35

30 30

25 25

20 20

15 15

10 10

5 5

0 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50

Figure: Mean of the Gaussian Process for the two-dimensional scenario. On the left, the mean is calculated with only one
kernel. On the right, the mean is calculated with two kernels.

Simulation Results


10


 Evolution of the Root MSE of the
Change-point Estimation when the

8




number of observations increase


RMSE


6

from 20 to 100, in the 1D case.
4





MMLE

2





JD
0

10 20 30 40 50
MEV
Ns

Methods:
2D 2D-donut 3D




Parametric  JD 0.834 (0.0034) 0.763 (0.0015) 0.666 (0.0016)

-MMLE,


approach MEV 0.825 (0.0053) 0.817 (0.0021) 0.643 (0.0014)


-MEV, EigenValue  MMLE 0.858 (0.0025) 0.806 (0.0008) 0.666 (0.0002)
approach




-JD, Est. approach  Table: The number of obs. is equal to 10d , where d is the dimension of the problem. 1000


simulations are provided. The variance is specied under brackets.


Application to the Concentration Measurements

We may consider the concentration measurements as observations
of a stochastic process in space and time.

Idea: Apply the dened approach to estimate t0 .

Prior distribution: C ∼ GP(m, κ)

m ∈ L2 (Ω × T , R) is the prior mean function,
and κ ∈ L2 (Ω2 × T 2 , R) is the prior covariance function2

Posterior distribution: C|Y ,m=0 ∼ GP(κx ∗ x κ−1 Y , κx ∗ x ∗ − κx ∗ x κ−1 κxx ∗ )
xx xx

2 the matrix K associated should be positive semidenite

Kernel Specication

Isotropic Kernel Drif-dependant Kernel
x
˙ = u (x , t )
1 x−x 2 x (t 0 ) = x0
κiso x, x = exp −
α β2
sx0 ,t0 (t ) is the solution of this system.
where α and β are hyper-parameters.
1 ds (x, x )
κdyn x, x = exp −
σ(t , t ) 2σ(t , t )2
where we have:
ds (x, x ) = (x − sx ,t (t ))2 + (x − sx ,t (t ))2
σ(t , t ) = α × (|t0 − min(t , t )| + 1)β

Consider the inuence of the wind eld
Consider the time-decreasing correlation
Consider the evolution of the process

Two Stage estimation process: Instant of Release

The proposed kernel is then complex: 




κf = κiso 1{t ,t t } + κdyn 1{t ,t ≥t }

The likelihood is not convex.

0 0


 t0 has to be estimated separately.



Maximum Likelihood Estimation of




Hyperparameters


Method: Exhaustive research of t0 .

Calculation of the trace of the Gram
matrix.
ˆ tr = argmax tr (K (t ))
t0
t ∈T

Two Stage estimation process: Source location

Given the time of release, we can Estimation of the source location. Comparison between the
calculate the location estimation. estimators (5, 20 and 50 sensors). Target is x0 = 115, y0 = 10.

x0
ˆ y0
ˆ σ(x0 )
ˆ σ(y0 )
ˆ
x0
ˆ = argmax E[C|Y ,m=0 (x , tˆ )]
0 κiso 5 68.97 62.58 42.82 38.96
x ∈Ω
20 97.13 26.37 27.64 26.08
= argmax κx ∗ x κ−1 Y
˜ ˜ xx
50 104.47 21.60 28.94 19.47
x ∈Ω
κf 5 108.94 12.21 42.00 17.05
where κ = κ(., tˆ )
˜ 0 20 120.28 8.28 12.50 4.64
50 114.51 9.48 6.37 3.07

Zero-Inated Poisson and Dirichlet Process3

We can also consider the concentration as a count of particles.
Y ∼ ZIP (p , λ)

p ∼ DP (H , α) log λ ∼ GP (m, κ)

which then dene the mixture distribution,

−λxt
e k
Pr (Y = k |p , λ) = pxt 1{Y =0} + (1 − pxt ) λxt 1{Y =k }
k!
k

Major Issue: the tractability of the likelihood calculation relies on the distribution of
both p and λ.

3 Joint work with Dr. G .Peters and Dr. I. Nevat

Contribution : Bibliography

A. Ickowicz, F. Septier, P. Armand, Adaptive Algorithms for the
Estimation of Source Term in a Complex Atmospheric Release.
Submitted to Atmospheric Environment Journal

A. Ickowicz, F. Septier, P. Armand, Estimating a CBRN atmospheric
release in a complex environment using Gaussian Processes.
15th international conference on information fusion, Singapore, Singapore,
July 2012

F. Septier, A. Ickowicz, P. Armand, Methodes de Monte-Carlo adaptatives
pour la caractérisation de termes de sources.
Technical report, CEA, EOTP A-54300-05-07-AW-26, Mar. 2012

A. Ickowicz, F. Septier, P. Armand, Statistic Estimation for Particle
Clouds with Lagrangian Stochastic Algorithms.
Technical report, CEA, EOTP A-24300-01-01-AW-20, Nov. 2011

YSC 2013

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie YSC 2013

Ähnlich wie YSC 2013 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

YSC 2013