Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Bayesian Probabilistic Numerical Methods (Part I) - Chris Oates, Aug 29, 2017

1. Bayesian Probabilistic Numerical Methods (Part I) Chris. J. Oates Newcastle University Alan Turing Institute August 2017 @ SAMSI

2. The SAMSI Working Group on Probabilistic Numerics Fran¸cois-Xavier Briol Oksana Chkrebtii Jon Cockayne Mark Girolami Philipp Hennig Warwick Ohio State Warwick Imperial MPI T¨ubingen Han Cheng Lie Houman Owhadi Florian Schaefer Andrew Stuart Tim Sullivan FU Berlin Caltech Caltech Caltech FU Berlin

3. Motivation Consider the task of solving the PDE: −∆u = f on Ω u = 0 on ∂Ω Given an approximate solution un we can obtain an a posteriori error bound ∀β, y (u − un) 2 ≤ (1 + β) un − y 2 + 1 + β β C2 Ω ∆y + f 2 does not involve u the “deviation majorant” CΩ = diam(Ω) Babu˘ska and Rheinboldt (1978) A posteriori error estimates for the ﬁnite element method. Cited 1378. Ainsworth and Oden (2011) A posteriori error estimation in ﬁnite element analysis. Cited 2252. Problem: ∆y + f is a quadrature and in a pipeline our computational budget will be limited.

7. Numerical sol’n of PDE −∆u = f ↑ f Computational pipelines are eﬃcient precisely because “going back is not allowed”. =⇒ a posteriori error bounds are precluded.

8. Numerical sol’n of PDE −∆u = f ↑ f Computational pipelines are eﬃcient precisely because “going back is not allowed”. =⇒ a posteriori error bounds are precluded.

9. Information-based complexity viewpoint: An(f ) =    f (x1) ... f (xn)    based on size n computational budget. Consider a numerical solution un, based on the information An(f ). Problem: It is impossible to get a computable bound on u − un , based only on An(f ).

12. Proof: One cannot distinguish between f 1, f 2 ∈ H−1 (D) such that f 1 = 0 on D = [0, 1] f 2 = 2 (b − a)(2 − a − b) 1[a < x < b] such that {x1, . . . , xn} ∩ (a, b) = ∅ since An(f 1) = An(f 2). Yet these yield wildly diﬀerent solutions: u1(x) = 0 u 2 = 0 u2(x) =    x 0 < x < a x − (x−a)2 (b−a)(2−a−b) a < x < b (a+b)(1−x) (2−a−b) b < x < 1 u 2 ≥ a3/2 31/2 Moral: A posteriori error analysis requires global information on f , such as f . How to proceed when global information cannot be obtained?

16. Idea: Exploit domain-speciﬁc subjective prior belief. Indeed, if we model f as a draw from a distribution Pf ,n, then f has statistical properties which can be leveraged. =⇒ statistical analogue of a posteriori error analysis =⇒ statistical local error indicators, etc. How to select Pf ,n? To be useful, Pf ,n should depend on An(f ). A natural (Bayesian) approach: Pf ,n ∝ Pf “prior” × δAn(f ) “likelihood” Randomness used as an allegorical device to represent epistemic uncertainty.

23. Bayesian Probabilistic Numerical Methods In a Bayesian probabilistic numerical method; a prior measure Pf is placed on f a posterior measure Pf ,n is defined as the “restriction of Pf to those functions f for which An(f ) = a e.g. An(f ) =    f (x1) ... f (xn)    = a is satisfied” (needs to be formalised) equivalent to prior and posterior measures Pu Pu,n[a] on the solution space of the PDE. =⇒ principled and general uncertainty quantification for numerical methods. =⇒ probabilistic quantification of numerical error that can be propagated forward.

28. Example Consider again the linear PDE −∆u = f 1 on Ω u = f 2 on ∂Ω. Gaussian prior Pu and condition on An(f ) =           ... f 1( ) ... f 2( ) ...           = a. =⇒ Gaussian conditional distribution Pu,n[a].

29. Outline of the Research Bayesian Probabilistic Numerical Methods Cockayne, Oates, Sullivan, Girolami (2017) arXiv:1702.03673 1. Elicit the Abstract Structure . . . 2. Establish Well-Deﬁned, Existence and Uniqueness of Pu,n[a] 3. Characterise the Optimal Information Operator An ← next 4. Algorithms to Sample from Pu,n[a] 5. Extend to Pipelines of Computation

30. Optimal Information Consider an information operator An(f ) =    f (x1) ... f (xn)    . The aim is to select locations x1, . . . , xn that are optimal in the sense that {x1, . . . , xn} ∈ arg inf L {Pu,n[An(f )](ω), u(f )} dω dPf . =⇒ L is a loss function on the solution space of the PDE that must be speciﬁed. L(u, u ) = |u − u | corresponds to Wasserstein metric d(Pu,n, δ(u)). =⇒ not equivalent to the Bayes risk from decision theory. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

31. Optimal Information Consider an information operator An(f ) =    f (x1) ... f (xn)    . The aim is to select locations x1, . . . , xn that are optimal in the sense that {x1, . . . , xn} ∈ arg inf L {Pu,n[An(f )](ω), u(f )} dω dPf . =⇒ L is a loss function on the solution space of the PDE that must be speciﬁed. L(u, u ) = |u − u | corresponds to Wasserstein metric d(Pu,n, δ(u)). =⇒ not equivalent to the Bayes risk from decision theory. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

32. Optimal Information Consider an information operator An(f ) =    f (x1) ... f (xn)    . The aim is to select locations x1, . . . , xn that are optimal in the sense that {x1, . . . , xn} ∈ arg inf L {d[An(f )], u(f )} dω dPf . =⇒ L is a loss function on the solution space of the PDE that must be speciﬁed. L(u, u ) = |u − u | corresponds to Wasserstein metric d(Pu,n, δ(u)). =⇒ not equivalent to the Bayes risk from decision theory. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

33. Optimal Information Consider an information operator An(f ) =    f (x1) ... f (xn)    . The aim is to select locations x1, . . . , xn that are optimal in the sense that {x1, . . . , xn} ∈ arg inf L {d[An(f )], u(f )} dω dPf . =⇒ L is a loss function on the solution space of the PDE that must be speciﬁed. L(u, u ) = |u − u | corresponds to Wasserstein metric d(Pu,n, δ(u)). =⇒ not equivalent to the Bayes risk from decision theory. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

34. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? Q: Is it ♠? since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

35. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? Q: Is it ♠? since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

36. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 Q: Is it ♠? 1 4 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

37. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 1 4 ♥ · p(¬♥|A(♥)) 1 2 Q: Is it ♠? 1 4 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

38. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 1 4 ♥ · p(¬♥|A(♥)) 1 2 + 1 4 ♦ · p(♥|A(♦)) 1 2 Q: Is it ♠? 1 4 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

39. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 1 4 ♥ · p(¬♥|A(♥)) 1 2 + 1 4 ♦ · p(♥|A(♦)) 1 2 + 1 4 ♣ · p(♥|A(♣)) 0 Q: Is it ♠? 1 4 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

40. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 1 4 ♥ · p(¬♥|A(♥)) 1 2 + 1 4 ♦ · p(♥|A(♦)) 1 2 + 1 4 ♣ · p(♥|A(♣)) 0 + 1 4 ♠ · p(♥|A(♠)) 0 = 1 4 Q: Is it ♠? 1 4 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

41. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 1 4 ♥ · p(¬♥|A(♥)) 1 2 + 1 4 ♦ · p(♥|A(♦)) 1 2 + 1 4 ♣ · p(♥|A(♣)) 0 + 1 4 ♠ · p(♥|A(♠)) 0 = 1 4 Q: Is it ♠? 1 4 1 4 ♥ · p(¬♥|A(♥)) 2 3 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

42. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 1 4 ♥ · p(¬♥|A(♥)) 1 2 + 1 4 ♦ · p(♥|A(♦)) 1 2 + 1 4 ♣ · p(♥|A(♣)) 0 + 1 4 ♠ · p(♥|A(♠)) 0 = 1 4 Q: Is it ♠? 1 4 1 4 ♥ · p(¬♥|A(♥)) 2 3 + 1 4 ♦ · p(♥|A(♦)) 1 3 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

43. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 1 4 ♥ · p(¬♥|A(♥)) 1 2 + 1 4 ♦ · p(♥|A(♦)) 1 2 + 1 4 ♣ · p(♥|A(♣)) 0 + 1 4 ♠ · p(♥|A(♠)) 0 = 1 4 Q: Is it ♠? 1 4 1 4 ♥ · p(¬♥|A(♥)) 2 3 + 1 4 ♦ · p(♥|A(♦)) 1 3 + 1 4 ♣ · p(♥|A(♣)) 1 3 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

44. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 1 4 ♥ · p(¬♥|A(♥)) 1 2 + 1 4 ♦ · p(♥|A(♦)) 1 2 + 1 4 ♣ · p(♥|A(♣)) 0 + 1 4 ♠ · p(♥|A(♠)) 0 = 1 4 Q: Is it ♠? 1 4 1 4 ♥ · p(¬♥|A(♥)) 2 3 + 1 4 ♦ · p(♥|A(♦)) 1 3 + 1 4 ♣ · p(♥|A(♣)) 1 3 + 1 4 ♠ · p(♥|A(♠)) 0 = 1 3 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

45. An adversary picks a card at random and our goal is to ascertain whether the suit of their card was ♥ under 0-1 loss. i.e. f ∼ uniform({♥, ♦, ♣, ♠}). Consider two possible experiments: Experiment Bayes’ risk Probabilistic numerics risk (maximum a posteriori (full posterior) point estimate) Q: Is it red? 1 4 1 4 ♥ · p(¬♥|A(♥)) 1 2 + 1 4 ♦ · p(♥|A(♦)) 1 2 + 1 4 ♣ · p(♥|A(♣)) 0 + 1 4 ♠ · p(♥|A(♠)) 0 = 1 4 Q: Is it ♠? 1 4 1 4 ♥ · p(¬♥|A(♥)) 2 3 + 1 4 ♦ · p(♥|A(♦)) 1 3 + 1 4 ♣ · p(♥|A(♣)) 1 3 + 1 4 ♠ · p(♥|A(♠)) 0 = 1 3 since“¬♥” always a posterior mode. =⇒ optimal information for Bayesian probabilistic numerics = Bayesian decision theory.

46. Average Case (∗) ↔ Bayesian Decision ? ↔ Bayesian Probabilistic Analysis Theory Numerical Methods (∗) Kadane and Wasilkowski (1985) Average Case -Complexity in Computer Science: A Bayesian View.

47. Average Case (∗) ↔ Bayesian Decision ? ↔ Bayesian Probabilistic Analysis Theory Numerical Methods (∗) Kadane and Wasilkowski (1985) Average Case -Complexity in Computer Science: A Bayesian View. Bayes rule (decision theory) Optimal (Bayesian probabilistic numerics) Contours of constant average risk Risk set (decision theory) Risk set (Bayesian probabilistic numerics)

48. Average Case (∗) ↔ Bayesian Decision ? ↔ Bayesian Probabilistic Analysis Theory Numerical Methods (∗) Kadane and Wasilkowski (1985) Average Case -Complexity in Computer Science: A Bayesian View. Bayes rule (decision theory) Optimal (Bayesian probabilistic numerics) Contours of constant average risk Risk set (decision theory) Risk set (Bayesian probabilistic numerics) Theorem Let u(f ) be the quantity of interest. Assume that u(f ) belongs to an inner-product space with associated norm · and consider the canonical loss L(u(f ), u(f )) = u(f ) − u(f ) 2 . Then optimal information for Bayesian probabilistic numerics = Bayesian decision theory (= average case analysis).

51. Example For the linear PDE −∆u = f 1 on Ω u = f 2 on ∂Ω we can consider a loss function L(u, u ) = u − u 2 L2(Ω). Corollary The points {. . . , , . . . , , . . . } are asymptotically optimal iﬀ h1 ∨ h2 = O(n−1/2 ) where h1 = max x∈Ω min x − 2 h2 = max x∈∂Ω min x − 2. Wendland (2005) Scattered Data Approximation.

52. Conclusion In Part I it has been argued that: Eﬃcient large-scale computation precludes popular a posteriori methods. Probabilistic numerical methods provide a principled alternative framework. Optimal information for Bayesian probabilistic numerical methods is not always equivalent to optimal information in Bayesian decision theory. Full details (Parts I and II) can be found in the preprint: Bayesian Probabilistic Numerical Methods Cockayne, Oates, Sullivan, Girolami (2017) arXiv:1702.03673 Thank you for your attention!

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Bayesian Probabilistic Numerical Methods (Part I) - Chris Oates, Aug 29, 2017

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Bayesian Probabilistic Numerical Methods (Part I) - Chris Oates, Aug 29, 2017

Ähnlich wie Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Bayesian Probabilistic Numerical Methods (Part I) - Chris Oates, Aug 29, 2017 (20)

Mehr von The Statistical and Applied Mathematical Sciences Institute

Mehr von The Statistical and Applied Mathematical Sciences Institute (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Bayesian Probabilistic Numerical Methods (Part I) - Chris Oates, Aug 29, 2017