Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Bayesian Probabilistic Numerical Methods (Part I) - Chris Oates, Aug 29, 2017
In this work, numerical computation - such as numerical solution of a PDE - is treated as an inverse problem in its own right. The popular Bayesian approach to inversion is considered, wherein a posterior distribution is induced over the object of interest by conditioning a prior distribution on the same finite information that would be used in a classical numerical method. The main technical consideration is that the data in this context are non-random and thus the standard Bayes' theorem does not hold. General conditions will be presented under which such Bayesian probabilistic numerical methods are well-posed, and a sequential Monte-Carlo method will be shown to provide consistent estimation of the posterior. The paradigm is extended to computational ``pipelines'', through which a distributional quantification of numerical error can be propagated. A sufficient condition is presented for when such propagation can be endowed with a globally coherent Bayesian interpretation, based on a novel class of probabilistic graphical models designed to represent a computational work-flow. The concepts are illustrated through explicit numerical experiments involving both linear and non-linear PDE models. Full details are available in arXiv:1702.03673.
Ähnlich wie Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Bayesian Probabilistic Numerical Methods (Part I) - Chris Oates, Aug 29, 2017
A New Double Numerical Integration Formula Based On The First Order DerivativeIRJESJOURNAL
Ähnlich wie Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Bayesian Probabilistic Numerical Methods (Part I) - Chris Oates, Aug 29, 2017 (20)
ICT role in 21st century education and it's challenges.
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Bayesian Probabilistic Numerical Methods (Part I) - Chris Oates, Aug 29, 2017
2. The SAMSI Working Group on Probabilistic Numerics
Fran¸cois-Xavier Briol Oksana Chkrebtii Jon Cockayne Mark Girolami Philipp Hennig
Warwick Ohio State Warwick Imperial MPI T¨ubingen
Han Cheng Lie Houman Owhadi Florian Schaefer Andrew Stuart Tim Sullivan
FU Berlin Caltech Caltech Caltech FU Berlin
3. Motivation
Consider the task of solving the PDE:
−∆u = f on Ω
u = 0 on ∂Ω
Given an approximate solution un we can obtain an a posteriori error bound
∀β, y (u − un) 2
≤ (1 + β) un − y 2
+
1 + β
β
C2
Ω ∆y + f 2
does not involve u
the “deviation majorant”
CΩ = diam(Ω)
Babu˘ska and Rheinboldt (1978) A posteriori error estimates for the finite element method. Cited 1378.
Ainsworth and Oden (2011) A posteriori error estimation in finite element analysis. Cited 2252.
Problem: ∆y + f is a quadrature and in a pipeline our computational budget will be limited.
4. Motivation
Consider the task of solving the PDE:
−∆u = f on Ω
u = 0 on ∂Ω
Given an approximate solution un we can obtain an a posteriori error bound
∀β, y (u − un) 2
≤ (1 + β) un − y 2
+
1 + β
β
C2
Ω ∆y + f 2
does not involve u
the “deviation majorant”
CΩ = diam(Ω)
Babu˘ska and Rheinboldt (1978) A posteriori error estimates for the finite element method. Cited 1378.
Ainsworth and Oden (2011) A posteriori error estimation in finite element analysis. Cited 2252.
Problem: ∆y + f is a quadrature and in a pipeline our computational budget will be limited.
5. Motivation
Consider the task of solving the PDE:
−∆u = f on Ω
u = 0 on ∂Ω
Given an approximate solution un we can obtain an a posteriori error bound
∀β, y (u − un) 2
≤ (1 + β) un − y 2
+
1 + β
β
C2
Ω ∆y + f 2
does not involve u
the “deviation majorant”
CΩ = diam(Ω)
Babu˘ska and Rheinboldt (1978) A posteriori error estimates for the finite element method. Cited 1378.
Ainsworth and Oden (2011) A posteriori error estimation in finite element analysis. Cited 2252.
Problem: ∆y + f is a quadrature and in a pipeline our computational budget will be limited.
6. Motivation
Consider the task of solving the PDE:
−∆u = f on Ω
u = 0 on ∂Ω
Given an approximate solution un we can obtain an a posteriori error bound
∀β, y (u − un) 2
≤ (1 + β) un − y 2
+
1 + β
β
C2
Ω ∆y + f 2
does not involve u
the “deviation majorant”
CΩ = diam(Ω)
Babu˘ska and Rheinboldt (1978) A posteriori error estimates for the finite element method. Cited 1378.
Ainsworth and Oden (2011) A posteriori error estimation in finite element analysis. Cited 2252.
Problem: ∆y + f is a quadrature and in a pipeline our computational budget will be limited.
7. Numerical sol’n
of PDE
−∆u = f
↑
f
Computational pipelines are efficient precisely because “going back is not allowed”.
=⇒ a posteriori error bounds are precluded.
8. Numerical sol’n
of PDE
−∆u = f
↑
f
Computational pipelines are efficient precisely because “going back is not allowed”.
=⇒ a posteriori error bounds are precluded.
9. Information-based complexity viewpoint:
An(f ) =
f (x1)
...
f (xn)
based on size n computational budget.
Consider a numerical solution un, based on the information An(f ).
Problem: It is impossible to get a computable bound on u − un , based only on An(f ).
10. Information-based complexity viewpoint:
An(f ) =
f (x1)
...
f (xn)
based on size n computational budget.
Consider a numerical solution un, based on the information An(f ).
Problem: It is impossible to get a computable bound on u − un , based only on An(f ).
11. Information-based complexity viewpoint:
An(f ) =
f (x1)
...
f (xn)
based on size n computational budget.
Consider a numerical solution un, based on the information An(f ).
Problem: It is impossible to get a computable bound on u − un , based only on An(f ).
12. Proof: One cannot distinguish between f 1, f 2 ∈ H−1
(D) such that
f 1 = 0 on D = [0, 1]
f 2 =
2
(b − a)(2 − a − b)
1[a < x < b] such that {x1, . . . , xn} ∩ (a, b) = ∅
since An(f 1) = An(f 2).
Yet these yield wildly different solutions:
u1(x) = 0 u 2 = 0
u2(x) =
x 0 < x < a
x − (x−a)2
(b−a)(2−a−b)
a < x < b
(a+b)(1−x)
(2−a−b)
b < x < 1
u 2 ≥
a3/2
31/2
Moral: A posteriori error analysis requires global information on f , such as f .
How to proceed when global information cannot be obtained?
13. Proof: One cannot distinguish between f 1, f 2 ∈ H−1
(D) such that
f 1 = 0 on D = [0, 1]
f 2 =
2
(b − a)(2 − a − b)
1[a < x < b] such that {x1, . . . , xn} ∩ (a, b) = ∅
since An(f 1) = An(f 2).
Yet these yield wildly different solutions:
u1(x) = 0 u 2 = 0
u2(x) =
x 0 < x < a
x − (x−a)2
(b−a)(2−a−b)
a < x < b
(a+b)(1−x)
(2−a−b)
b < x < 1
u 2 ≥
a3/2
31/2
Moral: A posteriori error analysis requires global information on f , such as f .
How to proceed when global information cannot be obtained?
14. Proof: One cannot distinguish between f 1, f 2 ∈ H−1
(D) such that
f 1 = 0 on D = [0, 1]
f 2 =
2
(b − a)(2 − a − b)
1[a < x < b] such that {x1, . . . , xn} ∩ (a, b) = ∅
since An(f 1) = An(f 2).
Yet these yield wildly different solutions:
u1(x) = 0 u 2 = 0
u2(x) =
x 0 < x < a
x − (x−a)2
(b−a)(2−a−b)
a < x < b
(a+b)(1−x)
(2−a−b)
b < x < 1
u 2 ≥
a3/2
31/2
Moral: A posteriori error analysis requires global information on f , such as f .
How to proceed when global information cannot be obtained?
15. Proof: One cannot distinguish between f 1, f 2 ∈ H−1
(D) such that
f 1 = 0 on D = [0, 1]
f 2 =
2
(b − a)(2 − a − b)
1[a < x < b] such that {x1, . . . , xn} ∩ (a, b) = ∅
since An(f 1) = An(f 2).
Yet these yield wildly different solutions:
u1(x) = 0 u 2 = 0
u2(x) =
x 0 < x < a
x − (x−a)2
(b−a)(2−a−b)
a < x < b
(a+b)(1−x)
(2−a−b)
b < x < 1
u 2 ≥
a3/2
31/2
Moral: A posteriori error analysis requires global information on f , such as f .
How to proceed when global information cannot be obtained?
16. Idea: Exploit domain-specific subjective prior belief.
Indeed, if we model f as a draw from a distribution Pf ,n, then f has statistical properties which can
be leveraged.
=⇒ statistical analogue of a posteriori error analysis
=⇒ statistical local error indicators, etc.
How to select Pf ,n? To be useful, Pf ,n should depend on An(f ).
A natural (Bayesian) approach:
Pf ,n ∝ Pf
“prior”
× δAn(f )
“likelihood”
Randomness used as an allegorical device to represent epistemic uncertainty.
17. Idea: Exploit domain-specific subjective prior belief.
Indeed, if we model f as a draw from a distribution Pf ,n, then f has statistical properties which can
be leveraged.
=⇒ statistical analogue of a posteriori error analysis
=⇒ statistical local error indicators, etc.
How to select Pf ,n? To be useful, Pf ,n should depend on An(f ).
A natural (Bayesian) approach:
Pf ,n ∝ Pf
“prior”
× δAn(f )
“likelihood”
Randomness used as an allegorical device to represent epistemic uncertainty.
18. Idea: Exploit domain-specific subjective prior belief.
Indeed, if we model f as a draw from a distribution Pf ,n, then f has statistical properties which can
be leveraged.
=⇒ statistical analogue of a posteriori error analysis
=⇒ statistical local error indicators, etc.
How to select Pf ,n? To be useful, Pf ,n should depend on An(f ).
A natural (Bayesian) approach:
Pf ,n ∝ Pf
“prior”
× δAn(f )
“likelihood”
Randomness used as an allegorical device to represent epistemic uncertainty.
19. Idea: Exploit domain-specific subjective prior belief.
Indeed, if we model f as a draw from a distribution Pf ,n, then f has statistical properties which can
be leveraged.
=⇒ statistical analogue of a posteriori error analysis
=⇒ statistical local error indicators, etc.
How to select Pf ,n? To be useful, Pf ,n should depend on An(f ).
A natural (Bayesian) approach:
Pf ,n ∝ Pf
“prior”
× δAn(f )
“likelihood”
Randomness used as an allegorical device to represent epistemic uncertainty.
20. Idea: Exploit domain-specific subjective prior belief.
Indeed, if we model f as a draw from a distribution Pf ,n, then f has statistical properties which can
be leveraged.
=⇒ statistical analogue of a posteriori error analysis
=⇒ statistical local error indicators, etc.
How to select Pf ,n? To be useful, Pf ,n should depend on An(f ).
A natural (Bayesian) approach:
Pf ,n ∝ Pf
“prior”
× δAn(f )
“likelihood”
Randomness used as an allegorical device to represent epistemic uncertainty.
21. Idea: Exploit domain-specific subjective prior belief.
Indeed, if we model f as a draw from a distribution Pf ,n, then f has statistical properties which can
be leveraged.
=⇒ statistical analogue of a posteriori error analysis
=⇒ statistical local error indicators, etc.
How to select Pf ,n? To be useful, Pf ,n should depend on An(f ).
A natural (Bayesian) approach:
Pf ,n ∝ Pf
“prior”
× δAn(f )
“likelihood”
Randomness used as an allegorical device to represent epistemic uncertainty.
22. Idea: Exploit domain-specific subjective prior belief.
Indeed, if we model f as a draw from a distribution Pf ,n, then f has statistical properties which can
be leveraged.
=⇒ statistical analogue of a posteriori error analysis
=⇒ statistical local error indicators, etc.
How to select Pf ,n? To be useful, Pf ,n should depend on An(f ).
A natural (Bayesian) approach:
Pf ,n ∝ Pf
“prior”
× δAn(f )
“likelihood”
Randomness used as an allegorical device to represent epistemic uncertainty.
23. Bayesian Probabilistic Numerical Methods
In a Bayesian probabilistic numerical method;
a prior measure Pf is placed on f
a posterior measure Pf ,n is defined as the “restriction of Pf to those functions f for which
An(f ) = a e.g. An(f ) =
f (x1)
...
f (xn)
= a
is satisfied” (needs to be formalised)
equivalent to prior and posterior measures
Pu Pu,n[a]
on the solution space of the PDE.
=⇒ principled and general uncertainty quantification for numerical methods.
=⇒ probabilistic quantification of numerical error that can be propagated forward.
24. Bayesian Probabilistic Numerical Methods
In a Bayesian probabilistic numerical method;
a prior measure Pf is placed on f
a posterior measure Pf ,n is defined as the “restriction of Pf to those functions f for which
An(f ) = a e.g. An(f ) =
f (x1)
...
f (xn)
= a
is satisfied” (needs to be formalised)
equivalent to prior and posterior measures
Pu Pu,n[a]
on the solution space of the PDE.
=⇒ principled and general uncertainty quantification for numerical methods.
=⇒ probabilistic quantification of numerical error that can be propagated forward.
25. Bayesian Probabilistic Numerical Methods
In a Bayesian probabilistic numerical method;
a prior measure Pf is placed on f
a posterior measure Pf ,n is defined as the “restriction of Pf to those functions f for which
An(f ) = a e.g. An(f ) =
f (x1)
...
f (xn)
= a
is satisfied” (needs to be formalised)
equivalent to prior and posterior measures
Pu Pu,n[a]
on the solution space of the PDE.
=⇒ principled and general uncertainty quantification for numerical methods.
=⇒ probabilistic quantification of numerical error that can be propagated forward.
26. Bayesian Probabilistic Numerical Methods
In a Bayesian probabilistic numerical method;
a prior measure Pf is placed on f
a posterior measure Pf ,n is defined as the “restriction of Pf to those functions f for which
An(f ) = a e.g. An(f ) =
f (x1)
...
f (xn)
= a
is satisfied” (needs to be formalised)
equivalent to prior and posterior measures
Pu Pu,n[a]
on the solution space of the PDE.
=⇒ principled and general uncertainty quantification for numerical methods.
=⇒ probabilistic quantification of numerical error that can be propagated forward.
27. Bayesian Probabilistic Numerical Methods
In a Bayesian probabilistic numerical method;
a prior measure Pf is placed on f
a posterior measure Pf ,n is defined as the “restriction of Pf to those functions f for which
An(f ) = a e.g. An(f ) =
f (x1)
...
f (xn)
= a
is satisfied” (needs to be formalised)
equivalent to prior and posterior measures
Pu Pu,n[a]
on the solution space of the PDE.
=⇒ principled and general uncertainty quantification for numerical methods.
=⇒ probabilistic quantification of numerical error that can be propagated forward.
28. Example
Consider again the linear PDE
−∆u = f 1 on Ω
u = f 2 on ∂Ω.
Gaussian prior Pu and condition on
An(f ) =
...
f 1( )
...
f 2( )
...
= a.
=⇒ Gaussian conditional distribution Pu,n[a].
29. Outline of the Research
Bayesian Probabilistic Numerical Methods
Cockayne, Oates, Sullivan, Girolami (2017)
arXiv:1702.03673
1. Elicit the Abstract Structure . . .
2. Establish Well-Defined, Existence and Uniqueness of Pu,n[a]
3. Characterise the Optimal Information Operator An ← next
4. Algorithms to Sample from Pu,n[a]
5. Extend to Pipelines of Computation
30. Optimal Information
Consider an information operator
An(f ) =
f (x1)
...
f (xn)
.
The aim is to select locations x1, . . . , xn that are optimal in the sense that
{x1, . . . , xn} ∈ arg inf L {Pu,n[An(f )](ω), u(f )} dω dPf .
=⇒ L is a loss function on the solution space of the PDE that must be specified.
L(u, u ) = |u − u | corresponds to Wasserstein metric d(Pu,n, δ(u)).
=⇒ not equivalent to the Bayes risk from decision theory.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
31. Optimal Information
Consider an information operator
An(f ) =
f (x1)
...
f (xn)
.
The aim is to select locations x1, . . . , xn that are optimal in the sense that
{x1, . . . , xn} ∈ arg inf L {Pu,n[An(f )](ω), u(f )} dω dPf .
=⇒ L is a loss function on the solution space of the PDE that must be specified.
L(u, u ) = |u − u | corresponds to Wasserstein metric d(Pu,n, δ(u)).
=⇒ not equivalent to the Bayes risk from decision theory.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
32. Optimal Information
Consider an information operator
An(f ) =
f (x1)
...
f (xn)
.
The aim is to select locations x1, . . . , xn that are optimal in the sense that
{x1, . . . , xn} ∈ arg inf L {d[An(f )], u(f )} dω dPf .
=⇒ L is a loss function on the solution space of the PDE that must be specified.
L(u, u ) = |u − u | corresponds to Wasserstein metric d(Pu,n, δ(u)).
=⇒ not equivalent to the Bayes risk from decision theory.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
33. Optimal Information
Consider an information operator
An(f ) =
f (x1)
...
f (xn)
.
The aim is to select locations x1, . . . , xn that are optimal in the sense that
{x1, . . . , xn} ∈ arg inf L {d[An(f )], u(f )} dω dPf .
=⇒ L is a loss function on the solution space of the PDE that must be specified.
L(u, u ) = |u − u | corresponds to Wasserstein metric d(Pu,n, δ(u)).
=⇒ not equivalent to the Bayes risk from decision theory.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
34. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red?
Q: Is it ♠?
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
35. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red?
Q: Is it ♠?
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
36. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
Q: Is it ♠? 1
4
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
37. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
1
4
♥
·
p(¬♥|A(♥))
1
2
Q: Is it ♠? 1
4
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
38. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
1
4
♥
·
p(¬♥|A(♥))
1
2
+ 1
4
♦
·
p(♥|A(♦))
1
2
Q: Is it ♠? 1
4
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
39. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
1
4
♥
·
p(¬♥|A(♥))
1
2
+ 1
4
♦
·
p(♥|A(♦))
1
2
+ 1
4
♣
·
p(♥|A(♣))
0
Q: Is it ♠? 1
4
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
40. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
1
4
♥
·
p(¬♥|A(♥))
1
2
+ 1
4
♦
·
p(♥|A(♦))
1
2
+ 1
4
♣
·
p(♥|A(♣))
0 + 1
4
♠
·
p(♥|A(♠))
0 = 1
4
Q: Is it ♠? 1
4
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
41. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
1
4
♥
·
p(¬♥|A(♥))
1
2
+ 1
4
♦
·
p(♥|A(♦))
1
2
+ 1
4
♣
·
p(♥|A(♣))
0 + 1
4
♠
·
p(♥|A(♠))
0 = 1
4
Q: Is it ♠? 1
4
1
4
♥
·
p(¬♥|A(♥))
2
3
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
42. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
1
4
♥
·
p(¬♥|A(♥))
1
2
+ 1
4
♦
·
p(♥|A(♦))
1
2
+ 1
4
♣
·
p(♥|A(♣))
0 + 1
4
♠
·
p(♥|A(♠))
0 = 1
4
Q: Is it ♠? 1
4
1
4
♥
·
p(¬♥|A(♥))
2
3
+ 1
4
♦
·
p(♥|A(♦))
1
3
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
43. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
1
4
♥
·
p(¬♥|A(♥))
1
2
+ 1
4
♦
·
p(♥|A(♦))
1
2
+ 1
4
♣
·
p(♥|A(♣))
0 + 1
4
♠
·
p(♥|A(♠))
0 = 1
4
Q: Is it ♠? 1
4
1
4
♥
·
p(¬♥|A(♥))
2
3
+ 1
4
♦
·
p(♥|A(♦))
1
3
+ 1
4
♣
·
p(♥|A(♣))
1
3
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
44. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
1
4
♥
·
p(¬♥|A(♥))
1
2
+ 1
4
♦
·
p(♥|A(♦))
1
2
+ 1
4
♣
·
p(♥|A(♣))
0 + 1
4
♠
·
p(♥|A(♠))
0 = 1
4
Q: Is it ♠? 1
4
1
4
♥
·
p(¬♥|A(♥))
2
3
+ 1
4
♦
·
p(♥|A(♦))
1
3
+ 1
4
♣
·
p(♥|A(♣))
1
3
+ 1
4
♠
·
p(♥|A(♠))
0 = 1
3
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
45. An adversary picks a card at random and our goal is to ascertain whether the suit
of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two
possible experiments:
Experiment Bayes’ risk Probabilistic numerics risk
(maximum a posteriori (full posterior)
point estimate)
Q: Is it red? 1
4
1
4
♥
·
p(¬♥|A(♥))
1
2
+ 1
4
♦
·
p(♥|A(♦))
1
2
+ 1
4
♣
·
p(♥|A(♣))
0 + 1
4
♠
·
p(♥|A(♠))
0 = 1
4
Q: Is it ♠? 1
4
1
4
♥
·
p(¬♥|A(♥))
2
3
+ 1
4
♦
·
p(♥|A(♦))
1
3
+ 1
4
♣
·
p(♥|A(♣))
1
3
+ 1
4
♠
·
p(♥|A(♠))
0 = 1
3
since“¬♥” always a posterior mode.
=⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.
46. Average Case
(∗)
↔ Bayesian Decision
?
↔ Bayesian Probabilistic
Analysis Theory Numerical Methods
(∗) Kadane and Wasilkowski (1985) Average Case -Complexity in Computer Science: A Bayesian View.
47. Average Case
(∗)
↔ Bayesian Decision
?
↔ Bayesian Probabilistic
Analysis Theory Numerical Methods
(∗) Kadane and Wasilkowski (1985) Average Case -Complexity in Computer Science: A Bayesian View.
Bayes
rule
(decision
theory)
Optimal
(Bayesian
probabilistic
numerics)
Contours of constant average risk
Risk set
(decision
theory)
Risk set
(Bayesian
probabilistic
numerics)
48. Average Case
(∗)
↔ Bayesian Decision
?
↔ Bayesian Probabilistic
Analysis Theory Numerical Methods
(∗) Kadane and Wasilkowski (1985) Average Case -Complexity in Computer Science: A Bayesian View.
Bayes
rule
(decision
theory)
Optimal
(Bayesian
probabilistic
numerics)
Contours of constant average risk
Risk set
(decision
theory)
Risk set
(Bayesian
probabilistic
numerics)
Theorem
Let u(f ) be the quantity of interest.
Assume that u(f ) belongs to an inner-product
space with associated norm · and consider
the canonical loss
L(u(f ), u(f )) = u(f ) − u(f ) 2
.
Then optimal information for Bayesian prob-
abilistic numerics = Bayesian decision theory
(= average case analysis).
49. Average Case
(∗)
↔ Bayesian Decision
?
↔ Bayesian Probabilistic
Analysis Theory Numerical Methods
(∗) Kadane and Wasilkowski (1985) Average Case -Complexity in Computer Science: A Bayesian View.
Bayes
rule
(decision
theory)
Optimal
(Bayesian
probabilistic
numerics)
Contours of constant average risk
Risk set
(decision
theory)
Risk set
(Bayesian
probabilistic
numerics)
Theorem
Let u(f ) be the quantity of interest.
Assume that u(f ) belongs to an inner-product
space with associated norm · and consider
the canonical loss
L(u(f ), u(f )) = u(f ) − u(f ) 2
.
Then optimal information for Bayesian prob-
abilistic numerics = Bayesian decision theory
(= average case analysis).
50. Average Case
(∗)
↔ Bayesian Decision
?
↔ Bayesian Probabilistic
Analysis Theory Numerical Methods
(∗) Kadane and Wasilkowski (1985) Average Case -Complexity in Computer Science: A Bayesian View.
Bayes
rule
(decision
theory)
Optimal
(Bayesian
probabilistic
numerics)
Contours of constant average risk
Risk set
(decision
theory)
Risk set
(Bayesian
probabilistic
numerics)
Theorem
Let u(f ) be the quantity of interest.
Assume that u(f ) belongs to an inner-product
space with associated norm · and consider
the canonical loss
L(u(f ), u(f )) = u(f ) − u(f ) 2
.
Then optimal information for Bayesian prob-
abilistic numerics = Bayesian decision theory
(= average case analysis).
51. Example
For the linear PDE
−∆u = f 1 on Ω
u = f 2 on ∂Ω
we can consider a loss function
L(u, u ) = u − u 2
L2(Ω).
Corollary
The points {. . . , , . . . , , . . . } are asymptotically op-
timal iff h1 ∨ h2 = O(n−1/2
) where
h1 = max
x∈Ω
min x − 2
h2 = max
x∈∂Ω
min x − 2.
Wendland (2005) Scattered Data Approximation.
52. Conclusion
In Part I it has been argued that:
Efficient large-scale computation precludes popular a posteriori methods.
Probabilistic numerical methods provide a principled alternative framework.
Optimal information for Bayesian probabilistic numerical methods is not always equivalent to
optimal information in Bayesian decision theory.
Full details (Parts I and II) can be found in the preprint:
Bayesian Probabilistic Numerical Methods
Cockayne, Oates, Sullivan, Girolami (2017)
arXiv:1702.03673
Thank you for your attention!
53. Conclusion
In Part I it has been argued that:
Efficient large-scale computation precludes popular a posteriori methods.
Probabilistic numerical methods provide a principled alternative framework.
Optimal information for Bayesian probabilistic numerical methods is not always equivalent to
optimal information in Bayesian decision theory.
Full details (Parts I and II) can be found in the preprint:
Bayesian Probabilistic Numerical Methods
Cockayne, Oates, Sullivan, Girolami (2017)
arXiv:1702.03673
Thank you for your attention!
54. Conclusion
In Part I it has been argued that:
Efficient large-scale computation precludes popular a posteriori methods.
Probabilistic numerical methods provide a principled alternative framework.
Optimal information for Bayesian probabilistic numerical methods is not always equivalent to
optimal information in Bayesian decision theory.
Full details (Parts I and II) can be found in the preprint:
Bayesian Probabilistic Numerical Methods
Cockayne, Oates, Sullivan, Girolami (2017)
arXiv:1702.03673
Thank you for your attention!
55. Conclusion
In Part I it has been argued that:
Efficient large-scale computation precludes popular a posteriori methods.
Probabilistic numerical methods provide a principled alternative framework.
Optimal information for Bayesian probabilistic numerical methods is not always equivalent to
optimal information in Bayesian decision theory.
Full details (Parts I and II) can be found in the preprint:
Bayesian Probabilistic Numerical Methods
Cockayne, Oates, Sullivan, Girolami (2017)
arXiv:1702.03673
Thank you for your attention!
56. Conclusion
In Part I it has been argued that:
Efficient large-scale computation precludes popular a posteriori methods.
Probabilistic numerical methods provide a principled alternative framework.
Optimal information for Bayesian probabilistic numerical methods is not always equivalent to
optimal information in Bayesian decision theory.
Full details (Parts I and II) can be found in the preprint:
Bayesian Probabilistic Numerical Methods
Cockayne, Oates, Sullivan, Girolami (2017)
arXiv:1702.03673
Thank you for your attention!
57. Conclusion
In Part I it has been argued that:
Efficient large-scale computation precludes popular a posteriori methods.
Probabilistic numerical methods provide a principled alternative framework.
Optimal information for Bayesian probabilistic numerical methods is not always equivalent to
optimal information in Bayesian decision theory.
Full details (Parts I and II) can be found in the preprint:
Bayesian Probabilistic Numerical Methods
Cockayne, Oates, Sullivan, Girolami (2017)
arXiv:1702.03673
Thank you for your attention!