Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, Multi model estimation & information theory for improving probabilistic predictions - Michal Branicki , Aug 24, 2017
Multi Model Ensemble (MME) predictions are a popular ad-hoc technique for improving predictions of high-dimensional, multi-scale dynamical systems. The heuristic idea behind MME framework is simple: given a collection of models, one considers predictions obtained through the convex superposition of the individual probabilistic forecasts in the hope of mitigating model error. However, it is not obvious if this is a viable strategy and which models should be included in the MME forecast in order to achieve the best predictive performance. I will present an information-theoretic approach to this problem which allows for deriving a sufficient condition for improving dynamical predictions within the MME framework; moreover, this formulation gives rise to systematic and practical guidelines for optimising data assimilation techniques which are based on multi-model ensembles. Time permitting, the role and validity of “fluctuation-dissipation” arguments for improving imperfect predictions of externally perturbed non-autonomous systems - with possible applications to climate change considerations - will also be addressed.
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Ähnlich wie Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, Multi model estimation & information theory for improving probabilistic predictions - Michal Branicki , Aug 24, 2017
Complex models in ecology: challenges and solutionsPeter Solymos
Ähnlich wie Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, Multi model estimation & information theory for improving probabilistic predictions - Michal Branicki , Aug 24, 2017 (20)
Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, Multi model estimation & information theory for improving probabilistic predictions - Michal Branicki , Aug 24, 2017
1. Multi model estimation & information theory for
improving probabilistic predictions
Michal Branicki
Department of Mathematics, University of Edinburgh,
& TheAlanTuring Institute for Data Science, London, UK
2. Optimisation of (reduced-order) models for best predictions
Tuning: Minimise the lack of information in imperfect predictions by improving the
models in the “training phase” (when lots of data is available)
Time-sequential data assimilation: use real-time data in the “prediction phase”
D(µtk⌫t)
⌫t
µt, ⌫t
prob. measures of the truth and model
on a subspace of variables of interest
⌫t
2
3. Multi model predictions
Suppose a bunch of models is available
Use a single model or a mixture of
models for best predictions ?
(MME predictions not always the ‘best’)
truth
time
Information-based condition guaranteeing the utility of MME?
4. Multi model predictions
Suppose a bunch of models is available
Use a single model or a mixture of
models for best predictions ?
(MME predictions not always the ‘best’)
truth
time
Information-based condition guaranteeing the utility of MME?
global mean ann.temperature
time
5. Choose a ‘metric’ on a manifold of probability densities s.t.
Sufficient condition for using MME given only the error of individual models
Necessary condition for using a mixture of models instead of a single model m
Outline/Summary: Multi-Model Ensemble (MME)predictions
4
6. Choose a ‘metric’ on a manifold of probability densities s.t.
Sufficient condition for using MME given only the error of individual models
Necessary condition for using a mixture of models instead of a single model m
Outline/Summary: Multi-Model Ensemble (MME)predictions
k lD ⇡ k i↵⇡mi
⇣ ⌘X X
i k l
⇣
6 ↵D ⇡ k ⇡mi
i i
Dkl(⇡ k ⇡mi
) = Dkl(⇡ k ⇡l ) + Dk l (⇡l k ⇡mi
)
l 2 l 1
Dk l (⇡k⇡ ) < Dk l (⇡k⇡ ) for l2 >l1
4
7. Choose a ‘metric’ on a manifold of probability densities s.t.
DI
kl (⇡k ⇡mme
↵ ) - D I
kl
m⇧
(⇡ k ⇡ ) < 0
Sufficient condition for using MME given only the error of individual models
Necessary condition for using a mixture of models instead of a single model m
Outline/Summary: Multi-Model Ensemble (MME)predictions
k lD ⇡ k i↵⇡mi
⇣ ⌘X X
i k l
⇣
6 ↵D ⇡ k ⇡mi
i i
Dkl(⇡ k ⇡mi
) = Dkl(⇡ k ⇡l ) + Dk l (⇡l k ⇡mi
)
l 2 l 1
Dk l (⇡k⇡ ) < Dk l (⇡k⇡ ) for l2 >l1
4
8. Choose a ‘metric’ on a manifold of probability densities s.t.
DI
kl (⇡k ⇡mme
↵ ) - D I
kl
m⇧
(⇡ k ⇡ ) < 0
Sufficient condition for using MME given only the error of individual models
4
I
k lD (⇡k ⇡m⇧
i =
↵i
1 - ↵
X
i6=
i
I
k l) > D (⇡k ⇡mi
)
l
Necessary condition for using a mixture of models instead of a single model m
Outline/Summary: Multi-Model Ensemble (MME)predictions
k lD ⇡ k i↵⇡mi
⇣ ⌘X X
i k l
⇣
6 ↵D ⇡ k ⇡mi
i i
Dkl(⇡ k ⇡mi
) = Dkl(⇡ k ⇡l ) + Dk l (⇡l k ⇡mi
)
l 2 l 1
Dk l (⇡k⇡ ) < Dk l (⇡k⇡ ) for l2 >l1
9. DK L (µk⌫):=
K LD (µ k✓ ✓+8✓
µ ) =
1
2
† ✓ 3
8✓F (µ )8✓+ O(8✓)
5
¢-entropies (or their rates in path-space): D(¢(x)) = E[¢(x)] - ¢(E[x])
In particular, the Kullback-Leibler divergence (relative entropy)
µ
E ln
dµ
d⌫
✓ ◆
µ ⌫
Info-theoretic tools for assessing lack of information/ model error
‘Information’ bounds for specific observables Eµ[f],e.g.,
D(µk⌫)
Sensitivity analysis (Fisher information, linear response to perturbations)
10. Outline of the rest:
More details on ‘info’ tools
Why MME ?
(Information) geometry of of MME predictions
When MME ?
Example (attractor tuning): Turbulent transport of a passive tracer
Example (MME): Exactly solvable toy model with information barriers
Information bounds
11. Info-theoretic tools for assessing lack of information/ model error
The lack of information in p relative to q is given by
k lD
H
pkq
)
=
Z
p(x)log
p(x)
q(x)
✓ ◆
dx = Eµ
dµ
d⌫
Dk l pkq > 0 and Dk l pkq = 0 p = q
7
µ(dx) = p(x)dx ⌫(dx) = q(x)dx µ ⌫, p, q > 0,
Z
p(x)dx =
Z
q(x)dx = 1 .
“information” barrier
optimised model
Dk l 1
For pl max-entof p with l > 1 moment constraints
Consider two probability measures on M ✓Rn, n <1
l l
Dk l (pkq) = Dk l (pkp ) + Dk l (p kq)
q2M
Dkl(plkq⇤) = min Dkl(plkq)
12. Improving imperfect predictions via tuning attractor fidelity
Dk
F
l(⇡8 k ⇡m,8) = Dk
F
l(⇡8 k ⇡l,8) + Dk
F
l(⇡l,8 k ⇡m,8)
FACT [MB’15] Improving attractor fidelity of model improves predictions
Dk
F
l(⇡l,6 k ⇡m,6) 6 ✓m-✓ E¯1/ 2 6
L2(T )
1/2
L2 (F)
+ O (fJE¯)2
perturbation of attractor general initial value problem
8
13. Improving probabilistic predictions by tuning attractor fidelity
klD (⇡F l ,6
k⇡mme,6
↵ ) 6 i
m
i↵✓- ✓
1/2
L2(T )
E¯6 1/2
L2 (F)
+ O (5E¯)2
kl
9
D (⇡ k⇡F b mme,b
↵ kl
F b
) = D (⇡ k⇡l ,b
kl) + D (⇡F l ,b
k⇡mme,b
↵ )
⇡mme
↵,t =
X
i
i↵ ⇡ im
t
FACT [MB’15] Improving attractor fidelity of MME improves predictions
14. MME predictions are not always the “best”
Prediction error
correlation time correlation time
t,↵
⇡mme
= N
i i↵ ⇡mi
t
⇡mme = 1
N i
N
⇡mi
Prediction error Dk l t(⇡ k⇡mme
t ) k l tD (⇡ k⇡mme
t )
10
15. Outline of the rest:
More details on tools
Why MME ?
(Information) geometry of of MME predictions
When MME ?
Example (attractor tuning): Turbulent transport of a passive tracer
Example (MME): Exactly solvable toy model with information barriers
Information bounds
16. where and are the least biased densities
Improving imperfect predictions via the MME approach
maximising the Shannon entropy
with
General formulation ( see Branicki & Majda, J. Nonlin. Sci. 2015 )
Thm.
12
17. Improving imperfect predictions via the MME approach
Necessary condition:
“finite-dim” sketch
DI
kl (⇡k ⇡mme
↵ ) - D I
kl
m⇧(⇡ k ⇡ ) < 0
13
m⇤
m⇤
m
MME better than m⇤
( ) when
⇤
18. Improving imperfect predictions via the MME approach
Use a single model or a mixture of
models for best predictions ?
use mixture
Dkl
I
(⇡ k ⇡m⇧
) >
X
i
i6=⇧
Dk
I
l(⇡k ⇡mi
)
k l t
use single
model
information barrier
14
m
tD (⇡ k ⇡ )
19. Improving imperfect predictions via the MME approach
Use a single model or a mixture of
models for best predictions ?
use mixture
DI
kl
m⇧
X
i6=
⇧
(⇡ k ⇡ ) > i DI
kl
mi(⇡ k ⇡ )
k l t
use single
model
information barrier
14
m
tD (⇡ k ⇡ )
20. Improving imperfect predictions via the MME approach
15
Simplified condition:
I
kl
m⇧D (⇡ k ⇡ ) + >
X
i=6 ⇧
i
I
kl
miD (⇡ k ⇡ )
k l t
m
tD (⇡ k ⇡ )
k l t 2
1 m⇤
t
m3
tD (⇡ k (⇡ + ⇡ ))
MME better than m⇤when
m⇤
k l t 2
D (⇡ k (⇡
⇤1 m
t
m3
t+ ⇡ ))
m⇧
Dk l
⇣
⇡ k
X
↵i⇡mi
⌘
6
X
↵iDk l
⇣
⇡ k ⇡mi
i i
&
mi l l mi
Dk l (⇡ k ⇡ ) = Dk l (⇡ k ⇡ ) + Dk l (⇡ k ⇡ )
m⇤
Dkl
I (⇡ k ⇡mme) - D I (⇡ k ⇡ ) < 0
↵ k l
21. Outline of the rest:
More details on tools
Why MME ?
(Information) geometry of of MME predictions
When MME ?
Example (attractor tuning): Turbulent transport of a passive tracer
Example (MME): Exactly solvable toy model with information barriers
Information bounds
22. Reduced-order model & stochastic parameterisation
Exactly solvable test models for turbulent tracer with realistic features
@TM
@t
M M
+ v¯ · r T = (+ eddy)1:T M
@T
@t
+ v(xx,t )·r T = �T
17
Non-Gaussian passive tracer with mean
gradient
tracer spectrump(T)
T = ↵y+ T0(x, t)
˙+ aT W
m2M
Model improvement
Dkl(⇡ k ⇡m⇤) = min Dkl(⇡ k ⇡m)
Majda & Branicki, DCDS2012
23. Exactly solvable test models for turbulent tracer with realistic features
Fourier domain
Physical space
Identification of mechanisms for intermittency
Rigorous justification/critique of various turbulent closures
Non-local effects due to mean flow - fluctuation interactions
Majda & Branicki DCDS 2012 @tT + v(xx,t)·rT = �T
k
Tˆk
T = ↵y+ T0
(x, t)
⇡(T0)
18
24. 100
20
0
60
60
40
80
time
Improving reduced-order models for turbulent tracer
Model error on attractor for models with optimised noise is greatly reduced
k = 5
k = 1
erT = 0 erT optimal
@tTm+ vm· r T m = ˜!).T m + CJT
⇤ ˙W˜@tT+v· r T = !).T
100
0
80
time
15
0
60
DI
kl att(⇡ k⇡m
att)
Baby configuration: model improvement on attractor by simple noise inflation
19
25. 0 20 40 60 80 100
0 20 40 60 80 100
0.04
0.02
0 20 40 80 100
1
0.5
0
>|
2
|<T
)
2
Var(
T
(t)U
f
60
time
Improving reduced-order models for turbulent tracer
20
Forced response for attractor-tuned model
with optimal noise
0.15
0.1
E[T]Var(T)bf
26. Information-theoretic improvement of predictive skill of GCMs
joint work with
21
Z t
t 0
JR = RR(t - s)Jf(t)ds
1+
2 2
✓Z
0
u¯(R (t - m
u¯s) - R (t - s))5f(s)ds
◆2
+
1
4a4
✓Z t
0
o-2
m
o-2(R (t - s) - R (t - s))bf(s)ds
◆2
3
+ O(b )
Z t
t0
“Climate change”
error
FDT
5u = Ru¯(t - s)5f (t)ds
linear response
(fluctuation-dissipation
relationships)
(stalled for now …)
DK L
8
t(⇡ k⇡m,8
t ) = H {g} 8
t⇡ - H ⇡
t
8
t
27. Outline of the rest:
More details on tools
Why MME ?
(Information) geometry of of MME predictions
When MME ?
Example (attractor tuning): Turbulent transport of a passive tracer
Example (MME): Exactly solvable toy model with information barriers
Information bounds
28. A simple example linear Gaussian example
Perfect model
Gaussian equilibrium if
a + A < 0, aA - q > 0.
‘resolved’ dynamics
‘unresolved’ dynamics
u˙= au + v + F
v˙= qu + Av + o-W˙
Imperfect model: Mean Stochastic Model
u˙m= -,ymum + Fm +⇥mW˙m
23
29. Tuning the marginal statistics on attractor
Infinite-time response to change in forcing
Perfect model Imperfect model
Tuning the imperfect model equilibrium statistics
truth model
tuning at unperturbed
equilibrium
perturbed forcing
F + bF
?
u¯m
V ar[um]
Fm⇤
m⇤
= -
AF
aA - q
,
cr2
m⇤
m⇤
= -
cr2
2(aA - q)(a + A)
m⇤
m⇤
u¯1 =
1
F
aA q
A
u¯1 = F
duM
dt
= -,yMuM + FM + aM W˙M .
r1m
2r1m
r1m
u˙m
The imperfect model
(Fm⇤,erm⇤)fixed
'"Ym free
24
30. Model error & information barriers
Model error on the perturbed attractor
More details in:
Majda & Branicki,Lessons in Uncertainty
Quantification forTurbulent Dynamical
Systems, DCDS 2012
Branicki & Majda,Quantifying uncertainty for
predictions with model errors in non-Gaussian
models with intermittency, Nonlinearity, 2012
FP(⇡ ,⇡m⇤
F ) /
A
aA q
+
1
m⇤
|5F|2A
A
m
u˙= au + v + F
v˙= qu + Av + o-W˙
u˙m= ,mum + Fm⇤+ (J'm⇤W˙m
aA - q > 0
m > 0, m
No minimum of for finite 1m > 0
A > 0 : Intrinsic barrier to improving sensitivity
2
A <0 :
M
⇤ - 1
= - A (aA - q)
Perturbed attractor fidelity and
sensitivity captured for
25
1mp rf
31. Model error & information barriers in MME prediction
m mi
26
33. MME prediction with information barrier A >0 A
m
u˙= au + v + F
v˙= qu + Av + o-W˙
u˙m= ,mum + Fm⇤ + (J'm⇤W˙m
The MME prediction
does not reduce the
information barrier
The infinite time
response can be
improved for any
overdamped
ensemble
28
34. Information bounds
Eµt
[f ] - E⌫t
[f] 6 |f | 1 2DKL(µtk⌫t)
Eµt
[f ]- E⌫t
[f ] 6 2 Eµt
[f 2]+ E⌫t
[f 2]
1/ 2
DKL(µtk⌫t)
(Pinsker)
Consider measures µ and
{µt}t2R+ {⌫t}t2R+t t0
t ⇤
= (P ) µ0s.t. µ s.t. ⌫t
t
t0
= (Q )⇤
⌫0
t0
t ⇤
{(P ) } 0t,t 2 I
t
t0 t,t02Ilinked by the Markov evolutions
Time-point-wise uncertainty bounds for observables
29
and {(Q )⇤
}
f 2 Mb(⌦)
2
f 2 L (⌦)
on spaces (M, B(M)) where M separable Banach space
We have sequences of measures where M is the state space
35. Eµt
[f ] - E⌫t
[f ] L2 (I)
6 |f | 1 I
k l2D (µ kt t⌫)
Uncertainty bounds for path-based observables f (cpt
t0
(u; ! )) u
E tµ t[(¢ )⇤f] - Eµt
[f ] 6 Dkl(µtkµ0)
µt
t0
E [(¢t
t 0
)⇤f] - Eµt
[f ] L 1 ( I )
6 Dk
I
l(µtkµ0)
These utilise variational definition of KL-divergence (Donsker &Varadhan)
Dkl(µk⌫) =
f 2Mb(⌦)
sup
⇣
Eµ[f ]- log
(
E⌫[ef]
Information bounds (B., Uda’17)
Dkl(µtk⌫t)dtDk
I
l(µtk⌫t) :=
Z
µt, ⌫tprobability measures on (M, B(M))where M state space.
Uncertainty bounds for observables over time interval
30
M
36. Information bounds on perturbations
µ✓
t µ✓+ ✓
tE [f] - E [f ]
L 2 ( I )
6 |f |1
q
I ✓2D (µ kµ✓+c5✓
k l t t )
Assuming that µ
✓+✓
t ⌧ µ✓
t
µt probability measure on (M, B(M)) where M state space.
37. Information bounds
be measures on path space.Then, the KL-rateDefinition [KL-rate]: Let µI and⌫
is defined as
Dk l (µI k⌫I ) := lim
| I |! 1
1 Dkl(µIk⌫I)
|I|
if it exists.
Thm. [Shannon]: Let{Xt}t2N+ and {Yt}t2N+ be two Markov chains with
path measures µI and ⌫
Then, KL-rate existsand
.If {Xt}t2N+ is stationary with invariant measure µ.
Dkl(µIk⌫I) = |I|Dkl(µIk⌫I) + Dkl(µk⌫0)
for arbitrary initial measure ⌫0of {Yt}t2N+ .
µI , ⌫I prob.measures on (M, B(M))where M path-space over an interval
Eµ I
[f ] - E⌫I
[f ] 6 |f | 1 2Dkl(µIk⌫I)
6 |f | 1 2(|I|Dkl(µIk⌫I) + Dkl(µk⌫0))
32
38. Summary:
Sufficient condition for improving imperfect predictions via MME approach
obtained within the information-theoretic framework
Natural synergy between the information theoretic framework and empirical data
If correctly implemented, the MME framework is useful for improving forced
response of the unknown truth dynamics based solely on the information from its
statistical equilibrium
Information-theoretic framework is useful for UQ on reduced subspaces
of dynamical variables
Systematic framework for dimensionality reduction and ‘information retainment’
depending on amount/quality of available data and computational cost
The framework naturally suited to deal with model error and partial observability
of the true dynamics
For a general initial value problem the MME framework has to be combined with
filtering/data assimilation algorithms
Path space framework in development, including more detailed measures of
predictive fidelity
33
39. References:
34
Branicki, Information theory in prediction of complex systems, Enc.Applied Math,2015
Branicki & Majda,An information-theoretic framework for improving Multi-Model
Ensemble forecasts, J.Nonlin. Sci., 2015
Branicki & Majda, Quantifying Bayesian filter performance for turbulent dynamical systems
via Information theory, Comm.Math.Sci, 2014
Branicki & Majda, Quantifying uncertainty for predictions with model errors in
non-Gaussian models with intermittency, Nonlinearity, 2012
Majda & Branicki, Lessons in UQ forTurbulentDynamical Systems, DCDS 2012
Branicki, Chen & Majda, Non-Gaussian test models for prediction and state estimation
with model errors,ChineeseAnn.Math 2013
Majda & Gershgorin,The Link Between Statistical Equilibrium Fidelity and Forecasting
Skill for Complex Systems with Model Error, PNAS 2011
Majda & Gershgorin, Improving Model Fidelity and Sensitivity for Complex Systems
through Empirical InformationTheory, PNAS 2011