SlideShare ist ein Scribd-Unternehmen logo
1 von 103
Downloaden Sie, um offline zu lesen
Manifold Monte Carlo Methods

            Mark Girolami

      Department of Statistical Science
         University College London

      Joint work with Ben Calderhead


Research Section Ordinary Meeting

   The Royal Statistical Society
        October 13, 2010
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods




                                                                 
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods




                                                                                    




       Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
       Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods




                                                                                    




       Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
       Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.

       Advanced Monte Carlo methodology founded on geometric principles
Talk Outline


        Motivation to improve MCMC capability for challenging problems
Talk Outline


        Motivation to improve MCMC capability for challenging problems

        Stochastic diffusion as adaptive proposal process
Talk Outline


        Motivation to improve MCMC capability for challenging problems

        Stochastic diffusion as adaptive proposal process

        Exploring geometric concepts in MCMC methodology
Talk Outline


        Motivation to improve MCMC capability for challenging problems

        Stochastic diffusion as adaptive proposal process

        Exploring geometric concepts in MCMC methodology

        Diffusions across Riemann manifold as proposal mechanism
Talk Outline


        Motivation to improve MCMC capability for challenging problems

        Stochastic diffusion as adaptive proposal process

        Exploring geometric concepts in MCMC methodology

        Diffusions across Riemann manifold as proposal mechanism

        Deterministic geodesic flows on manifold form basis of MCMC methods
Talk Outline


        Motivation to improve MCMC capability for challenging problems

        Stochastic diffusion as adaptive proposal process

        Exploring geometric concepts in MCMC methodology

        Diffusions across Riemann manifold as proposal mechanism

        Deterministic geodesic flows on manifold form basis of MCMC methods
        Illustrative Examples:-
            Warped Bivariate Gaussian
Talk Outline


        Motivation to improve MCMC capability for challenging problems

        Stochastic diffusion as adaptive proposal process

        Exploring geometric concepts in MCMC methodology

        Diffusions across Riemann manifold as proposal mechanism

        Deterministic geodesic flows on manifold form basis of MCMC methods
        Illustrative Examples:-
            Warped Bivariate Gaussian
            Gaussian mixture model
Talk Outline


        Motivation to improve MCMC capability for challenging problems

        Stochastic diffusion as adaptive proposal process

        Exploring geometric concepts in MCMC methodology

        Diffusions across Riemann manifold as proposal mechanism

        Deterministic geodesic flows on manifold form basis of MCMC methods
        Illustrative Examples:-
            Warped Bivariate Gaussian
            Gaussian mixture model
            Log-Gaussian Cox process
Talk Outline


        Motivation to improve MCMC capability for challenging problems

        Stochastic diffusion as adaptive proposal process

        Exploring geometric concepts in MCMC methodology

        Diffusions across Riemann manifold as proposal mechanism

        Deterministic geodesic flows on manifold form basis of MCMC methods
        Illustrative Examples:-
            Warped Bivariate Gaussian
            Gaussian mixture model
            Log-Gaussian Cox process


        Conclusions
Motivation Simulation Based Inference


       Monte Carlo method employs samples from p(θ) to obtain estimate
                    Z
                                     1 X                    1
                       φ(θ)p(θ)dθ =         φ(θ n ) + O(N − 2 )
                                    N     n
Motivation Simulation Based Inference


       Monte Carlo method employs samples from p(θ) to obtain estimate
                    Z
                                     1 X                    1
                       φ(θ)p(θ)dθ =         φ(θ n ) + O(N − 2 )
                                    N     n



       Draw θ n from ergodic Markov process with stationary distribution p(θ)
Motivation Simulation Based Inference


       Monte Carlo method employs samples from p(θ) to obtain estimate
                    Z
                                     1 X                    1
                       φ(θ)p(θ)dθ =         φ(θ n ) + O(N − 2 )
                                    N     n



       Draw θ n from ergodic Markov process with stationary distribution p(θ)
       Construct process in two parts
Motivation Simulation Based Inference


       Monte Carlo method employs samples from p(θ) to obtain estimate
                    Z
                                     1 X                    1
                       φ(θ)p(θ)dθ =         φ(θ n ) + O(N − 2 )
                                    N     n



       Draw θ n from ergodic Markov process with stationary distribution p(θ)
       Construct process in two parts
           Propose a move θ → θ with probability pp (θ |θ)
Motivation Simulation Based Inference


       Monte Carlo method employs samples from p(θ) to obtain estimate
                    Z
                                     1 X                    1
                       φ(θ)p(θ)dθ =         φ(θ n ) + O(N − 2 )
                                    N     n



       Draw θ n from ergodic Markov process with stationary distribution p(θ)
       Construct process in two parts
           Propose a move θ → θ with probability pp (θ |θ)
           accept or reject proposal with probability
                                               »                    –
                                                     p(θ )pp (θ|θ )
                                pa (θ |θ) = min 1,
                                                      p(θ)pp (θ |θ)
Motivation Simulation Based Inference


       Monte Carlo method employs samples from p(θ) to obtain estimate
                    Z
                                     1 X                    1
                       φ(θ)p(θ)dθ =         φ(θ n ) + O(N − 2 )
                                    N     n



       Draw θ n from ergodic Markov process with stationary distribution p(θ)
       Construct process in two parts
           Propose a move θ → θ with probability pp (θ |θ)
           accept or reject proposal with probability
                                               »                    –
                                                     p(θ )pp (θ|θ )
                                pa (θ |θ) = min 1,
                                                      p(θ)pp (θ |θ)

       Efficiency dependent on pp (θ |θ) defining proposal mechanism
Motivation Simulation Based Inference


       Monte Carlo method employs samples from p(θ) to obtain estimate
                    Z
                                     1 X                    1
                       φ(θ)p(θ)dθ =         φ(θ n ) + O(N − 2 )
                                    N     n



       Draw θ n from ergodic Markov process with stationary distribution p(θ)
       Construct process in two parts
           Propose a move θ → θ with probability pp (θ |θ)
           accept or reject proposal with probability
                                               »                    –
                                                     p(θ )pp (θ|θ )
                                pa (θ |θ) = min 1,
                                                      p(θ)pp (θ |θ)

       Efficiency dependent on pp (θ |θ) defining proposal mechanism

       Success of MCMC reliant upon appropriate proposal design
Adaptive Proposal Distributions - Exploit Discretised Diffusion
Adaptive Proposal Distributions - Exploit Discretised Diffusion

        For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
                                      1
                            dθ(t) =       θ L(θ(t))dt   + db(t)
                                      2
Adaptive Proposal Distributions - Exploit Discretised Diffusion

        For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
                                      1
                            dθ(t) =       θ L(θ(t))dt   + db(t)
                                      2
        First order Euler-Maruyama discrete integration of diffusion
                                             2
                        θ(τ + ) = θ(τ ) +        θ L(θ(τ ))   + z(τ )
                                            2
Adaptive Proposal Distributions - Exploit Discretised Diffusion

        For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
                                      1
                            dθ(t) =           θ L(θ(t))dt   + db(t)
                                      2
        First order Euler-Maruyama discrete integration of diffusion
                                                 2
                        θ(τ + ) = θ(τ ) +            θ L(θ(τ ))   + z(τ )
                                                2
        Proposal                                                            2
                                          2
             pp (θ |θ) = N (θ |µ(θ, ),        I) with µ(θ, ) = θ +              θ L(θ)
                                                                            2
Adaptive Proposal Distributions - Exploit Discretised Diffusion

        For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
                                       1
                             dθ(t) =           θ L(θ(t))dt   + db(t)
                                       2
        First order Euler-Maruyama discrete integration of diffusion
                                                  2
                         θ(τ + ) = θ(τ ) +            θ L(θ(τ ))   + z(τ )
                                                 2
        Proposal                                                             2
                                           2
             pp (θ |θ) = N (θ |µ(θ, ),         I) with µ(θ, ) = θ +              θ L(θ)
                                                                             2
        Acceptance probability to correct for bias
                                           »                   –
                                                p(θ )pp (θ|θ )
                         pa (θ |θ) = min 1,
                                                p(θ)pp (θ |θ)
Adaptive Proposal Distributions - Exploit Discretised Diffusion

        For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
                                       1
                             dθ(t) =           θ L(θ(t))dt   + db(t)
                                       2
        First order Euler-Maruyama discrete integration of diffusion
                                                  2
                         θ(τ + ) = θ(τ ) +            θ L(θ(τ ))   + z(τ )
                                                 2
        Proposal                                                             2
                                           2
             pp (θ |θ) = N (θ |µ(θ, ),         I) with µ(θ, ) = θ +              θ L(θ)
                                                                             2
        Acceptance probability to correct for bias
                                           »                   –
                                                p(θ )pp (θ|θ )
                         pa (θ |θ) = min 1,
                                                p(θ)pp (θ |θ)
        Isotropic diffusion inefficient, employ pre-conditioning
                                                          √
                              θ = θ + 2 M θ L(θ)/2 +        Mz
Adaptive Proposal Distributions - Exploit Discretised Diffusion

        For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
                                       1
                             dθ(t) =           θ L(θ(t))dt   + db(t)
                                       2
        First order Euler-Maruyama discrete integration of diffusion
                                                  2
                         θ(τ + ) = θ(τ ) +            θ L(θ(τ ))   + z(τ )
                                                 2
        Proposal                                                             2
                                           2
             pp (θ |θ) = N (θ |µ(θ, ),         I) with µ(θ, ) = θ +              θ L(θ)
                                                                             2
        Acceptance probability to correct for bias
                                           »                   –
                                                p(θ )pp (θ|θ )
                         pa (θ |θ) = min 1,
                                                p(θ)pp (θ |θ)
        Isotropic diffusion inefficient, employ pre-conditioning
                                                          √
                              θ = θ + 2 M θ L(θ)/2 +        Mz

        How to set M systematically? Tuning in transient & stationary phases
Geometric Concepts in MCMC
Geometric Concepts in MCMC

      Rao, 1945; Jeffreys, 1948, to first order
                 Z
                                      p(y; θ + δθ)
                    p(y; θ + δθ) log               dθ ≈ δθ T G(θ)δθ
                                         p(y; θ)
Geometric Concepts in MCMC

      Rao, 1945; Jeffreys, 1948, to first order
                 Z
                                      p(y; θ + δθ)
                    p(y; θ + δθ) log               dθ ≈ δθ T G(θ)δθ
                                         p(y; θ)
      where                          (                               )
                                                                 T
                                         θ p(y; θ)   θ p(y; θ)
                       G(θ) = Ey|θ
                                         p(y; θ)     p(y; θ)
Geometric Concepts in MCMC

      Rao, 1945; Jeffreys, 1948, to first order
                 Z
                                      p(y; θ + δθ)
                    p(y; θ + δθ) log               dθ ≈ δθ T G(θ)δθ
                                         p(y; θ)
      where                          (                               )
                                                                 T
                                         θ p(y; θ)   θ p(y; θ)
                       G(θ) = Ey|θ
                                         p(y; θ)     p(y; θ)

      Fisher Information G(θ) is p.d. metric defining a Riemann manifold
Geometric Concepts in MCMC

      Rao, 1945; Jeffreys, 1948, to first order
                 Z
                                      p(y; θ + δθ)
                    p(y; θ + δθ) log               dθ ≈ δθ T G(θ)δθ
                                         p(y; θ)
      where                          (                               )
                                                                 T
                                         θ p(y; θ)   θ p(y; θ)
                       G(θ) = Ey|θ
                                         p(y; θ)     p(y; θ)

      Fisher Information G(θ) is p.d. metric defining a Riemann manifold
      Non-Euclidean geometry for probabilities - distances, metrics, invariants,
      curvature, geodesics
Geometric Concepts in MCMC

      Rao, 1945; Jeffreys, 1948, to first order
                 Z
                                      p(y; θ + δθ)
                    p(y; θ + δθ) log               dθ ≈ δθ T G(θ)δθ
                                         p(y; θ)
      where                          (                               )
                                                                 T
                                         θ p(y; θ)   θ p(y; θ)
                       G(θ) = Ey|θ
                                         p(y; θ)     p(y; θ)

      Fisher Information G(θ) is p.d. metric defining a Riemann manifold
      Non-Euclidean geometry for probabilities - distances, metrics, invariants,
      curvature, geodesics
      Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
      1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;
      Lauritsen, 1989
Geometric Concepts in MCMC

      Rao, 1945; Jeffreys, 1948, to first order
                 Z
                                      p(y; θ + δθ)
                    p(y; θ + δθ) log               dθ ≈ δθ T G(θ)δθ
                                         p(y; θ)
      where                          (                               )
                                                                 T
                                         θ p(y; θ)   θ p(y; θ)
                       G(θ) = Ey|θ
                                         p(y; θ)     p(y; θ)

      Fisher Information G(θ) is p.d. metric defining a Riemann manifold
      Non-Euclidean geometry for probabilities - distances, metrics, invariants,
      curvature, geodesics
      Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
      1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;
      Lauritsen, 1989

      Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
Geometric Concepts in MCMC

      Rao, 1945; Jeffreys, 1948, to first order
                 Z
                                      p(y; θ + δθ)
                    p(y; θ + δθ) log               dθ ≈ δθ T G(θ)δθ
                                         p(y; θ)
      where                          (                               )
                                                                 T
                                         θ p(y; θ)   θ p(y; θ)
                       G(θ) = Ey|θ
                                         p(y; θ)     p(y; θ)

      Fisher Information G(θ) is p.d. metric defining a Riemann manifold
      Non-Euclidean geometry for probabilities - distances, metrics, invariants,
      curvature, geodesics
      Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
      1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;
      Lauritsen, 1989

      Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

      Can geometric structure be employed in MCMC methodology?
Geometric Concepts in MCMC
Geometric Concepts in MCMC




      Tangent space - local metric defined by δθ T G(θ)δθ =
                                                             P
                                                             k ,l   gkl δθk δθl
Geometric Concepts in MCMC




      Tangent space - local metric defined by δθ T G(θ)δθ = k ,l gkl δθk δθl
                                                            P

      Christoffel symbols - characterise connection on curved manifold
Geometric Concepts in MCMC




      Tangent space - local metric defined by δθ T G(θ)δθ = k ,l gkl δθk δθl
                                                             P

      Christoffel symbols - characterise connection on curved manifold
                                       „                     «
                             1 X im ∂gmk        ∂gml    ∂gkl
                      Γikl =       g          +      − m
                             2 m          ∂θl    ∂θk    ∂θ
Geometric Concepts in MCMC




      Tangent space - local metric defined by δθ T G(θ)δθ = k ,l gkl δθk δθl
                                                             P

      Christoffel symbols - characterise connection on curved manifold
                                       „                     «
                             1 X im ∂gmk        ∂gml    ∂gkl
                      Γikl =       g          +      − m
                             2 m          ∂θl    ∂θk    ∂θ

      Geodesics - shortest path between two points on manifold
Geometric Concepts in MCMC




      Tangent space - local metric defined by δθ T G(θ)δθ = k ,l gkl δθk δθl
                                                             P

      Christoffel symbols - characterise connection on curved manifold
                                       „                     «
                             1 X im ∂gmk        ∂gml    ∂gkl
                      Γikl =       g          +      − m
                             2 m          ∂θl    ∂θk    ∂θ

      Geodesics - shortest path between two points on manifold
                              d 2 θi   X i dθk dθl
                                     +  Γkl        =0
                              dt 2          dt dt
                                      k,l
Illustration of Geometric Concepts



       Consider Normal density p(x|µ, σ) = Nx (µ, σ)
Illustration of Geometric Concepts



       Consider Normal density p(x|µ, σ) = Nx (µ, σ)

       Local inner product on tangent space defined by metric tensor, i.e.
       δθ T G(θ)δθ, where θ = (µ, σ)T
Illustration of Geometric Concepts



       Consider Normal density p(x|µ, σ) = Nx (µ, σ)

       Local inner product on tangent space defined by metric tensor, i.e.
       δθ T G(θ)δθ, where θ = (µ, σ)T

       Metric is Fisher Information
                                              σ −2
                                          »                  –
                                                      0
                              G(µ, σ) =
                                               0     2σ −2
Illustration of Geometric Concepts



       Consider Normal density p(x|µ, σ) = Nx (µ, σ)

       Local inner product on tangent space defined by metric tensor, i.e.
       δθ T G(θ)δθ, where θ = (µ, σ)T

       Metric is Fisher Information
                                               σ −2
                                           »                  –
                                                       0
                               G(µ, σ) =
                                                0     2σ −2


       Inner-product σ −2 (δµ2 + 2δσ 2 ) so densities N (0, 1) & N (1, 1) further
       apart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
Illustration of Geometric Concepts



       Consider Normal density p(x|µ, σ) = Nx (µ, σ)

       Local inner product on tangent space defined by metric tensor, i.e.
       δθ T G(θ)δθ, where θ = (µ, σ)T

       Metric is Fisher Information
                                               σ −2
                                           »                  –
                                                       0
                               G(µ, σ) =
                                                0     2σ −2


       Inner-product σ −2 (δµ2 + 2δσ 2 ) so densities N (0, 1) & N (1, 1) further
       apart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean

       Metric tensor for univariate Normal defines a Hyperbolic Space
M.C. Escher, Heaven and Hell, 1960
Langevin Diffusion on Riemannian manifold
       Discretised Langevin diffusion on manifold defines proposal mechanism
                    2   “                    ”               D
                                                             X                   “p        ”
        θd = θd +        G −1 (θ)   θ L(θ)           −   2
                                                                   G(θ)−1 Γd +
                                                                       ij  ij      G−1 (θ)z
                    2                            d                                             d
                                                             i,j
Langevin Diffusion on Riemannian manifold
       Discretised Langevin diffusion on manifold defines proposal mechanism
                    2   “                    ”                D
                                                              X                    “p        ”
        θd = θd +        G −1 (θ)   θ L(θ)           −    2
                                                                    G(θ)−1 Γd +
                                                                        ij  ij       G−1 (θ)z
                    2                            d                                               d
                                                              i,j
       Manifold with constant curvature then proposal mechanism reduces to
                                    2
                                        G −1 (θ)
                                                                        p
                        θ =θ+                            θ L(θ)     +       G−1 (θ)z
                                    2
Langevin Diffusion on Riemannian manifold
       Discretised Langevin diffusion on manifold defines proposal mechanism
                    2   “                    ”                D
                                                              X                    “p        ”
        θd = θd +        G −1 (θ)   θ L(θ)           −    2
                                                                    G(θ)−1 Γd +
                                                                        ij  ij       G−1 (θ)z
                    2                            d                                               d
                                                              i,j
       Manifold with constant curvature then proposal mechanism reduces to
                                    2
                                G −1 (θ) θ L(θ) +
                                                                        p
                        θ =θ+                                               G−1 (θ)z
                              2
       MALA proposal with preconditioning
                                             2                          √
                              θ =θ+              M       θ L(θ)     +    Mz
                                         2
Langevin Diffusion on Riemannian manifold
       Discretised Langevin diffusion on manifold defines proposal mechanism
                    2   “                    ”                D
                                                              X                    “p        ”
        θd = θd +        G −1 (θ)   θ L(θ)           −    2
                                                                    G(θ)−1 Γd +
                                                                        ij  ij       G−1 (θ)z
                    2                            d                                               d
                                                              i,j
       Manifold with constant curvature then proposal mechanism reduces to
                                    2
                                G −1 (θ) θ L(θ) +
                                                                        p
                        θ =θ+                                               G−1 (θ)z
                              2
       MALA proposal with preconditioning
                                             2                          √
                              θ =θ+              M       θ L(θ)     +    Mz
                                         2

       Proposal and acceptance probability

                            pp (θ |θ) = N (θ |µ(θ, ), 2 G(θ))
                                           »                  –
                                               p(θ )pp (θ|θ )
                            pa (θ |θ) = min 1,
                                               p(θ)pp (θ |θ)
Langevin Diffusion on Riemannian manifold
       Discretised Langevin diffusion on manifold defines proposal mechanism
                    2   “                    ”                D
                                                              X                    “p        ”
        θd = θd +        G −1 (θ)   θ L(θ)           −    2
                                                                    G(θ)−1 Γd +
                                                                        ij  ij       G−1 (θ)z
                    2                            d                                               d
                                                              i,j
       Manifold with constant curvature then proposal mechanism reduces to
                                    2
                                G −1 (θ) θ L(θ) +
                                                                        p
                        θ =θ+                                               G−1 (θ)z
                              2
       MALA proposal with preconditioning
                                             2                          √
                              θ =θ+              M       θ L(θ)     +    Mz
                                         2

       Proposal and acceptance probability

                            pp (θ |θ) = N (θ |µ(θ, ), 2 G(θ))
                                           »                  –
                                               p(θ )pp (θ|θ )
                            pa (θ |θ) = min 1,
                                               p(θ)pp (θ |θ)

       Proposal mechanism diffuses approximately along the manifold
Langevin Diffusion on Riemannian manifold



           40                      40


           35                      35


           30                      30


           25                      25
          !




                                   !
           20                      20


           15                      15


           10                      10


            5                       5
                −5     0    5               −5   0   5
                       µ                         µ
Langevin Diffusion on Riemannian manifold



           25                      25



           20                      20



           15                      15
          !




                                  !
           10                      10



            5                       5




                −10   0     10          −10   0   10
                      µ                       µ
Geodesic flow as proposal mechanism


      Desirable that proposals follow direct path on manifold - geodesics
Geodesic flow as proposal mechanism


      Desirable that proposals follow direct path on manifold - geodesics

      Define geodesic flow on manifold by solving

                             d 2 θi   X i dθk dθl
                                    +  Γkl        =0
                             dt 2          dt dt
                                     k,l
Geodesic flow as proposal mechanism


      Desirable that proposals follow direct path on manifold - geodesics

      Define geodesic flow on manifold by solving

                             d 2 θi   X i dθk dθl
                                    +  Γkl        =0
                             dt 2          dt dt
                                      k,l



      How can this be exploited in the design of a transition operator?
Geodesic flow as proposal mechanism


      Desirable that proposals follow direct path on manifold - geodesics

      Define geodesic flow on manifold by solving

                             d 2 θi   X i dθk dθl
                                    +  Γkl        =0
                             dt 2          dt dt
                                      k,l



      How can this be exploited in the design of a transition operator?

      Need slight detour
Geodesic flow as proposal mechanism


      Desirable that proposals follow direct path on manifold - geodesics

      Define geodesic flow on manifold by solving

                             d 2 θi   X i dθk dθl
                                    +  Γkl        =0
                             dt 2          dt dt
                                      k,l



      How can this be exploited in the design of a transition operator?

      Need slight detour - first define log-density under model as L(θ)
Geodesic flow as proposal mechanism


      Desirable that proposals follow direct path on manifold - geodesics

      Define geodesic flow on manifold by solving

                             d 2 θi   X i dθk dθl
                                    +  Γkl        =0
                             dt 2          dt dt
                                      k,l



      How can this be exploited in the design of a transition operator?

      Need slight detour - first define log-density under model as L(θ)

      Introduce auxiliary variable p ∼ N (0, G(θ))
Geodesic flow as proposal mechanism


      Desirable that proposals follow direct path on manifold - geodesics

      Define geodesic flow on manifold by solving

                              d 2 θi   X i dθk dθl
                                     +  Γkl        =0
                              dt 2          dt dt
                                      k,l



      How can this be exploited in the design of a transition operator?

      Need slight detour - first define log-density under model as L(θ)

      Introduce auxiliary variable p ∼ N (0, G(θ))

      Negative joint log density is
                                        1                  1
               H(θ, p)   =    −L(θ) +     log 2π D |G(θ)| + pT G(θ)−1 p
                                        2                  2
Riemannian Hamiltonian Monte Carlo
       Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
                                    1                 1
               H(θ, p)   =   −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p
                             |      2 {z           } |2    {z    }
                                   Potential Energy      Kinetic Energy
Riemannian Hamiltonian Monte Carlo
       Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
                                    1                 1
               H(θ, p)   =   −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p
                             |      2 {z           } |2    {z    }
                                   Potential Energy      Kinetic Energy



       Marginal density follows as required
                                 Z                  ff
                   exp {L(θ)}               1
          p(θ) ∝ p                  exp − pT G(θ)−1 p dp = exp {L(θ)}
                     2π D |G(θ)|            2
Riemannian Hamiltonian Monte Carlo
       Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
                                    1                 1
               H(θ, p)   =   −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p
                             |      2 {z           } |2    {z    }
                                             Potential Energy             Kinetic Energy



       Marginal density follows as required
                                 Z                  ff
                   exp {L(θ)}               1
          p(θ) ∝ p                  exp − pT G(θ)−1 p dp = exp {L(θ)}
                     2π D |G(θ)|            2

       Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p)

                                 pn+1 |θ n        ∼     N (0, G(θ n ))
                                 n+1        n+1
                             θ         |p         ∼     p(θ n+1 |pn+1 )
Riemannian Hamiltonian Monte Carlo
       Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
                                    1                 1
               H(θ, p)   =   −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p
                             |      2 {z           } |2    {z    }
                                             Potential Energy             Kinetic Energy



       Marginal density follows as required
                                 Z                  ff
                   exp {L(θ)}               1
          p(θ) ∝ p                  exp − pT G(θ)−1 p dp = exp {L(θ)}
                     2π D |G(θ)|            2

       Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p)

                                 pn+1 |θ n        ∼     N (0, G(θ n ))
                                 n+1        n+1
                             θ         |p         ∼     p(θ n+1 |pn+1 )


       Employ Hamiltonian dynamics to propose samples for p(θ n+1 |pn+1 ),
       Duane et al, 1987; Neal, 2010.
Riemannian Hamiltonian Monte Carlo
       Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
                                     1                 1
               H(θ, p)    =   −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p
                              |      2 {z           } |2    {z    }
                                              Potential Energy             Kinetic Energy



       Marginal density follows as required
                                 Z                  ff
                   exp {L(θ)}               1
          p(θ) ∝ p                  exp − pT G(θ)−1 p dp = exp {L(θ)}
                     2π D |G(θ)|            2

       Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p)

                                  pn+1 |θ n        ∼     N (0, G(θ n ))
                                  n+1        n+1
                              θ         |p         ∼     p(θ n+1 |pn+1 )


       Employ Hamiltonian dynamics to propose samples for p(θ n+1 |pn+1 ),
       Duane et al, 1987; Neal, 2010.
                         dθ    ∂                       dp     ∂
                            =    H(θ, p)                  = − H(θ, p)
                         dt   ∂p                       dt    ∂θ
Riemannian Manifold Hamiltonian Monte Carlo
                                ˜              ˜
       Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p
                                          2
Riemannian Manifold Hamiltonian Monte Carlo
                                ˜              ˜
       Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p
                                          2

       Hamiltonians with only a quadratic kinetic energy term exactly describe
       geodesic flow on the coordinate space θ with metric G ˜
Riemannian Manifold Hamiltonian Monte Carlo
                                ˜              ˜
       Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p
                                          2

       Hamiltonians with only a quadratic kinetic energy term exactly describe
       geodesic flow on the coordinate space θ with metric G ˜

       However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p
                                                    2
Riemannian Manifold Hamiltonian Monte Carlo
                                ˜              ˜
       Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p
                                          2

       Hamiltonians with only a quadratic kinetic energy term exactly describe
       geodesic flow on the coordinate space θ with metric G ˜

       However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p
                                                    2

                   ˜
       If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p)
Riemannian Manifold Hamiltonian Monte Carlo
                                ˜              ˜
       Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p
                                          2

       Hamiltonians with only a quadratic kinetic energy term exactly describe
       geodesic flow on the coordinate space θ with metric G ˜

       However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p
                                                    2

                   ˜
       If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p)

       Then the Maupertuis principle tells us that the Hamiltonian flow for
                   ˜
       H(θ, p) and H(θ, p) are equivalent along energy level h
Riemannian Manifold Hamiltonian Monte Carlo
                                ˜              ˜
       Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p
                                          2

       Hamiltonians with only a quadratic kinetic energy term exactly describe
       geodesic flow on the coordinate space θ with metric G ˜

       However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p
                                                    2

                   ˜
       If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p)

       Then the Maupertuis principle tells us that the Hamiltonian flow for
                   ˜
       H(θ, p) and H(θ, p) are equivalent along energy level h

       The solution of
                         dθ    ∂            dp     ∂
                            =    H(θ, p)       = − H(θ, p)
                         dt   ∂p            dt    ∂θ
Riemannian Manifold Hamiltonian Monte Carlo
                                ˜              ˜
       Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p
                                          2

       Hamiltonians with only a quadratic kinetic energy term exactly describe
       geodesic flow on the coordinate space θ with metric G ˜

       However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p
                                                    2

                   ˜
       If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p)

       Then the Maupertuis principle tells us that the Hamiltonian flow for
                   ˜
       H(θ, p) and H(θ, p) are equivalent along energy level h

       The solution of
                         dθ    ∂              dp     ∂
                            =    H(θ, p)         = − H(θ, p)
                         dt   ∂p              dt    ∂θ
       is therefore equivalent to the solution of

                               d 2 θi   X i dθk dθl
                                         ˜
                                      +  Γkl        =0
                               dt 2          dt dt
                                        k,l
Riemannian Manifold Hamiltonian Monte Carlo
                                ˜              ˜
       Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p
                                          2

       Hamiltonians with only a quadratic kinetic energy term exactly describe
       geodesic flow on the coordinate space θ with metric G ˜

       However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p
                                                    2

                   ˜
       If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p)

       Then the Maupertuis principle tells us that the Hamiltonian flow for
                   ˜
       H(θ, p) and H(θ, p) are equivalent along energy level h

       The solution of
                         dθ    ∂              dp     ∂
                            =    H(θ, p)         = − H(θ, p)
                         dt   ∂p              dt    ∂θ
       is therefore equivalent to the solution of

                               d 2 θi   X i dθk dθl
                                         ˜
                                      +  Γkl        =0
                               dt 2          dt dt
                                        k,l

       RMHMC proposals are along the manifold geodesics
Warped Bivariate Gaussian
                                   QN                  2    2                  2
       p(w1 , w2 |y, σx , σy ) ∝    n=1   N (yn |w1 + w2 , σy )N (w1 , w2 |0, σx I)
Warped Bivariate Gaussian
                                   QN                        2    2                  2
       p(w1 , w2 |y, σx , σy ) ∝          n=1   N (yn |w1 + w2 , σy )N (w1 , w2 |0, σx I)
            3                                                    3



                                          HMC                                          RMHMC
            2                                                    2




            1                                                    1




       w    0                                                w   0




            -1                                                   -1




            -2                                                   -2




            -3                                                   -3
              -4   -3   -2   -1       0         1   2    3         -4   -3   -2   -1       0   1   2   3

                                  w                                                    w
Gaussian Mixture Model



       Univariate finite mixture model
                                         K
                                         X                     2
                            p(xi |θ) =         πk N (xi |µk , σk )
                                         k=1
Gaussian Mixture Model



       Univariate finite mixture model
                                          K
                                          X                     2
                             p(xi |θ) =         πk N (xi |µk , σk )
                                          k=1


       FI based metric tensor non-analytic - employ empirical FI
Gaussian Mixture Model



       Univariate finite mixture model
                                          K
                                          X                     2
                             p(xi |θ) =         πk N (xi |µk , σk )
                                          k=1


       FI based metric tensor non-analytic - employ empirical FI
                       1 T     1
              G(θ) =     S S − 2 ssT
                                 ¯¯             −−→
                                                −−           cov (       θ L(θ))   = I(θ)
                       N      N                 N→∞


                                                !
                          ∂S T                                            ∂ sT
                                                             „                       «
           ∂G(θ)   1                  ∂S                1         ¯
                                                                 ∂s T       ¯
                 =             S + ST               −                ¯  ¯
                                                                     s +s
            ∂θd    N      ∂θd         ∂θd               N2       ∂θd      ∂θd

                                                        ∂ log p(xi |θ)              PN
       with score matrix S with elements Si,d =             ∂θd
                                                                             ¯
                                                                         and s =      i=1   ST
                                                                                             i,·
Gaussian Mixture Model
       Univariate finite mixture model
                  p(x|µ, σ 2 ) = 0.7 × N (x|0, σ 2 ) + 0.3 × N (x|µ, σ 2 )
Gaussian Mixture Model
       Univariate finite mixture model
                    p(x|µ, σ 2 ) = 0.7 × N (x|0, σ 2 ) + 0.3 × N (x|µ, σ 2 )


                           4


                           3


                           2


                           1
                    lnσ




                           0


                          −1


                          −2


                          −3


                          −4
                           −4   −3   −2   −1    0      1     2     3     4
                                                µ



       Figure: Arrows correspond to the gradients and ellipses to the inverse metric
       tensor. Dashed lines are isocontours of the joint log density
Gaussian Mixture Model

                                                                   MALA path                                                                                                             mMALA path                                                                                                   Simplified mMALA path
                               4                                                                                                                     4                                                                                                                    4
                                             −1410                                                                                                                 −1410                                                                                                                 −1410
                                                          −1310                                                                                                                 −1310                                                                                                                 −1310
                               3                                                                                                                     3                                                                                                                    3
                                                                          −1210                                                                                                                  −1210                                                                                                             −1210
                                                                                        −1110                                                                                                                −1110                                                                                                              −1110
                               2                                                                                                                     2                                                                                                                    2
                                                                                                −1010                                                                                                                −1010                                                                                                              −1010

                               1                                                                                                                     1                                                                                                                    1
                                                                                                −910                                                                                                                 −910                                                                                                               −910
                     lnσ




                                                                                                                                           lnσ




                                                                                                                                                                                                                                                                lnσ
                               0                                                                                                                     0                                                                                                                    0


                              −1                                                                                                                    −1                                                                                                                   −1

                                                                                                    −1410                                                                                                                −1410                                                                                                              −1410
                              −2                                                                                                                    −2                                                                                                                   −2

                                                                                                    −2910                                                                                                                −2910                                                                                                              −2910
                              −3                                                                                                                    −3                                                                                                                   −3


                              −4                                                                                                                    −4                                                                                                                   −4
                               −4       −3       −2       −1          0             1           2           3        4                               −4       −3       −2       −1           0           1           2           3        4                               −4       −3        −2       −1       0            1           2           3        4
                                                                      µ                                                                                                                      µ                                                                                                                 µ




                                                  MALA Sample Autocorrelation Function                                                                                 mMALA Sample Autocorrelation Function                                                                            Simplified mMALA Sample Autocorrelation Function




                              0.8                                                                                                                   0.8                                                                                                                  0.8
    Sample Autocorrelation




                                                                                                                          Sample Autocorrelation




                                                                                                                                                                                                                                               Sample Autocorrelation
                              0.6                                                                                                                   0.6                                                                                                                  0.6



                              0.4                                                                                                                   0.4                                                                                                                  0.4



                              0.2                                                                                                                   0.2                                                                                                                  0.2



                               0                                                                                                                     0                                                                                                                    0



                             −0.2                                                                                                                  −0.2                                                                                                                 −0.2
                                    0   2    4        6        8      10       12          14        16         18   20                                   0   2    4        6        8       10       12        14        16         18   20                                   0   2     4        6        8    10     12          14        16         18   20
                                                                     Lag                                                                                                                    Lag                                                                                                                Lag




   Figure: Comparison of MALA (left), mMALA (middle) and simplified mMALA (right)
   convergence paths and autocorrelation plots. Autocorrelation plots are from the
   stationary chains, i.e. once the chains have converged to the stationary distribution.
Gaussian Mixture Model

                                                                   HMC path                                                                                                         RMHMC path                                                                                                            Gibbs path
                               4                                                                                                                4                                                                                                                     4
                                             −1410                                                                                                            −1410                                                                                                                 −1410
                                                          −1310                                                                                                            −1310                                                                                                                 −1310
                               3                                                                                                                3                                                                                                                     3
                                                                          −1210                                                                                                             −1210                                                                                                                 −1210
                                                                                       −1110                                                                                                             −1110                                                                                                                  −1110
                               2                                                                                                                2                                                                                                                     2
                                                                                               −1010                                                                                                             −1010                                                                                                                  −1010

                               1                                                                                                                1                                                                                                                     1
                                                                                               −910                                                                                                              −910                                                                                                                   −910
                     lnσ




                                                                                                                                      lnσ




                                                                                                                                                                                                                                                            lnσ
                               0                                                                                                                0                                                                                                                     0


                              −1                                                                                                               −1                                                                                                                    −1

                                                                                                   −1410                                                                                                             −1410                                                                                                                  −1410
                              −2                                                                                                               −2                                                                                                                    −2

                                                                                                   −2910                                                                                                             −2910                                                                                                                  −2910
                              −3                                                                                                               −3                                                                                                                    −3


                              −4                                                                                                               −4                                                                                                                    −4
                               −4       −3       −2       −1          0            1           2           3    4                               −4       −3       −2       −1           0            1           2           3        4                               −4       −3       −2       −1           0             1           2           3        4
                                                                      µ                                                                                                                 µ                                                                                                                     µ




                                                  HMC Sample Autocorrelation Function                                                                             RMHMC Sample Autocorrelation Function                                                                                  Gibbs Sample Autocorrelation Function




                              0.8                                                                                                              0.8                                                                                                                   0.8
    Sample Autocorrelation




                                                                                                                     Sample Autocorrelation




                                                                                                                                                                                                                                           Sample Autocorrelation
                              0.6                                                                                                              0.6                                                                                                                   0.6



                              0.4                                                                                                              0.4                                                                                                                   0.4



                              0.2                                                                                                              0.2                                                                                                                   0.2



                               0                                                                                                                0                                                                                                                     0



                             −0.2                                                                                                             −0.2                                                                                                                  −0.2
                                    0   2    4        6        8      10      12         14         16     18   20                                   0   2    4        6        8       10      12          14        16         18   20                                   0   2    4        6        8       10       12          14        16         18   20
                                                                     Lag                                                                                                               Lag                                                                                                                   Lag




   Figure: Comparison of HMC (left), RMHMC (middle) and GIBBS (right) convergence
   paths and autocorrelation plots. Autocorrelation plots are from the stationary chains,
   i.e. once the chains have converged to the stationary distribution.
Log-Gaussian Cox Point Process with Latent Field

       The joint density for Poisson counts and latent Gaussian field
                           Y64
       p(y, x|µ, σ, β) ∝          exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 (x−µ1)/2)
                                                                             θ
                            i,j
Log-Gaussian Cox Point Process with Latent Field

       The joint density for Poisson counts and latent Gaussian field
                           Y64
       p(y, x|µ, σ, β) ∝           exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 (x−µ1)/2)
                                                                              θ
                             i,j



       Metric tensors
                                                „               «
                                         1           ∂Σθ −1 ∂Σθ
                        G(θ)i,j      =     trace Σ−1
                                                  θ      Σθ
                                         2           ∂θi    ∂θj
                           G(x)      =   Λ + Σ−1
                                              θ

       where Λ is diagonal with elements m exp(µ + (Σθ )i,i )
Log-Gaussian Cox Point Process with Latent Field

       The joint density for Poisson counts and latent Gaussian field
                           Y64
       p(y, x|µ, σ, β) ∝           exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 (x−µ1)/2)
                                                                              θ
                             i,j



       Metric tensors
                                                „               «
                                         1           ∂Σθ −1 ∂Σθ
                        G(θ)i,j      =     trace Σ−1
                                                  θ      Σθ
                                         2           ∂θi    ∂θj
                           G(x)      =   Λ + Σ−1
                                              θ

       where Λ is diagonal with elements m exp(µ + (Σθ )i,i )

       Latent field metric tensor defining flat manifold is 4096 × 4096, O(N 3 )
       obtained from parameter conditional
Log-Gaussian Cox Point Process with Latent Field

       The joint density for Poisson counts and latent Gaussian field
                           Y64
       p(y, x|µ, σ, β) ∝           exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 (x−µ1)/2)
                                                                              θ
                             i,j



       Metric tensors
                                                „               «
                                         1           ∂Σθ −1 ∂Σθ
                        G(θ)i,j      =     trace Σ−1
                                                  θ      Σθ
                                         2           ∂θi    ∂θj
                           G(x)      =   Λ + Σ−1
                                              θ

       where Λ is diagonal with elements m exp(µ + (Σθ )i,i )

       Latent field metric tensor defining flat manifold is 4096 × 4096, O(N 3 )
       obtained from parameter conditional

       MALA requires transformation of latent field to sample - with separate
       tuning in transient and stationary phases of Markov chain

       Manifold methods requires no pilot tuning or additional transformations
RMHMC for Log-Gaussian Cox Point Processes




   Table: Sampling the latent variables of a Log-Gaussian Cox Process - Comparison of
   sampling methods

         Method            Time     ESS (Min, Med, Max)     s/Min ESS    Rel. Speed
     MALA (Transient)     31,577          (3, 8, 50)          10,605         ×1
     MALA (Stationary)    31,118         (4, 16, 80)           7836        ×1.35
        mMALA               634         (26, 84, 174)          24.1        ×440
        RMHMC              2936      (1951, 4545, 5000)         1.5       ×7070
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
           Promising capability of methodology
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
           Promising capability of methodology

       Ongoing development
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
           Promising capability of methodology

       Ongoing development
           Potential bottleneck at metric tensor and Christoffel symbols
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
           Promising capability of methodology

       Ongoing development
           Potential bottleneck at metric tensor and Christoffel symbols
           Theoretical analysis of convergence
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
           Promising capability of methodology

       Ongoing development
           Potential bottleneck at metric tensor and Christoffel symbols
           Theoretical analysis of convergence
           Investigate alternative manifold structures
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
           Promising capability of methodology

       Ongoing development
           Potential bottleneck at metric tensor and Christoffel symbols
           Theoretical analysis of convergence
           Investigate alternative manifold structures
           Design and effect of metric
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
           Promising capability of methodology

       Ongoing development
           Potential bottleneck at metric tensor and Christoffel symbols
           Theoretical analysis of convergence
           Investigate alternative manifold structures
           Design and effect of metric
           Optimality of Hamiltonian flows as local geodesics
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
           Promising capability of methodology

       Ongoing development
           Potential bottleneck at metric tensor and Christoffel symbols
           Theoretical analysis of convergence
           Investigate alternative manifold structures
           Design and effect of metric
           Optimality of Hamiltonian flows as local geodesics
           Alternative transition kernels
Conclusion and Discussion

       Geometry of statistical models harnessed in Monte Carlo methods
           Diffusions that respect structure and curvature of space - Manifold MALA
           Geodesic flows on model manifold - RMHMC - generalisation of HMC
           Assessed on correlated & high-dimensional latent variable models
           Promising capability of methodology

       Ongoing development
           Potential bottleneck at metric tensor and Christoffel symbols
           Theoretical analysis of convergence
           Investigate alternative manifold structures
           Design and effect of metric
           Optimality of Hamiltonian flows as local geodesics
           Alternative transition kernels


       No silver bullet or cure all - new powerful methodology for MC toolkit
Funding Acknowledgment




      Girolami funded by EPSRC Advanced Research Fellowship and BBSRC
      project grant

      Calderhead supported by Microsoft Research PhD Scholarship

Weitere ähnliche Inhalte

Was ist angesagt?

Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methodsChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmologyChristian Robert
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
 

Was ist angesagt? (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
 

Ähnlich wie RSS Read Paper by Mark Girolami

Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Xian's abc
Xian's abcXian's abc
Xian's abcDeb Roy
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesAccelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesMatt Moores
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Xin-She Yang
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Tomasz Kusmierczyk
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_DefenseTeja Turk
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsChristian Robert
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10FredrikRonquist
 
Vanilla rao blackwellisation
Vanilla rao blackwellisationVanilla rao blackwellisation
Vanilla rao blackwellisationDeb Roy
 

Ähnlich wie RSS Read Paper by Mark Girolami (20)

Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Shanghai tutorial
Shanghai tutorialShanghai tutorial
Shanghai tutorial
 
Xian's abc
Xian's abcXian's abc
Xian's abc
 
ABC in Roma
ABC in RomaABC in Roma
ABC in Roma
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
 
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
 
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesAccelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methods
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
 
Trondheim, LGM2012
Trondheim, LGM2012Trondheim, LGM2012
Trondheim, LGM2012
 
Vanilla rao blackwellisation
Vanilla rao blackwellisationVanilla rao blackwellisation
Vanilla rao blackwellisation
 

Mehr von Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 

Mehr von Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 

Kürzlich hochgeladen

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 

Kürzlich hochgeladen (20)

prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 

RSS Read Paper by Mark Girolami

  • 1. Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October 13, 2010
  • 2. Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
  • 3. Riemann manifold Langevin and Hamiltonian Monte Carlo Methods  
  • 4. Riemann manifold Langevin and Hamiltonian Monte Carlo Methods   Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
  • 5. Riemann manifold Langevin and Hamiltonian Monte Carlo Methods   Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37. Advanced Monte Carlo methodology founded on geometric principles
  • 6. Talk Outline Motivation to improve MCMC capability for challenging problems
  • 7. Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process
  • 8. Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology
  • 9. Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism
  • 10. Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods
  • 11. Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian
  • 12. Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian Gaussian mixture model
  • 13. Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian Gaussian mixture model Log-Gaussian Cox process
  • 14. Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian Gaussian mixture model Log-Gaussian Cox process Conclusions
  • 15. Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z 1 X 1 φ(θ)p(θ)dθ = φ(θ n ) + O(N − 2 ) N n
  • 16. Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z 1 X 1 φ(θ)p(θ)dθ = φ(θ n ) + O(N − 2 ) N n Draw θ n from ergodic Markov process with stationary distribution p(θ)
  • 17. Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z 1 X 1 φ(θ)p(θ)dθ = φ(θ n ) + O(N − 2 ) N n Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts
  • 18. Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z 1 X 1 φ(θ)p(θ)dθ = φ(θ n ) + O(N − 2 ) N n Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ → θ with probability pp (θ |θ)
  • 19. Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z 1 X 1 φ(θ)p(θ)dθ = φ(θ n ) + O(N − 2 ) N n Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ → θ with probability pp (θ |θ) accept or reject proposal with probability » – p(θ )pp (θ|θ ) pa (θ |θ) = min 1, p(θ)pp (θ |θ)
  • 20. Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z 1 X 1 φ(θ)p(θ)dθ = φ(θ n ) + O(N − 2 ) N n Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ → θ with probability pp (θ |θ) accept or reject proposal with probability » – p(θ )pp (θ|θ ) pa (θ |θ) = min 1, p(θ)pp (θ |θ) Efficiency dependent on pp (θ |θ) defining proposal mechanism
  • 21. Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z 1 X 1 φ(θ)p(θ)dθ = φ(θ n ) + O(N − 2 ) N n Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ → θ with probability pp (θ |θ) accept or reject proposal with probability » – p(θ )pp (θ|θ ) pa (θ |θ) = min 1, p(θ)pp (θ |θ) Efficiency dependent on pp (θ |θ) defining proposal mechanism Success of MCMC reliant upon appropriate proposal design
  • 22. Adaptive Proposal Distributions - Exploit Discretised Diffusion
  • 23. Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion 1 dθ(t) = θ L(θ(t))dt + db(t) 2
  • 24. Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion 1 dθ(t) = θ L(θ(t))dt + db(t) 2 First order Euler-Maruyama discrete integration of diffusion 2 θ(τ + ) = θ(τ ) + θ L(θ(τ )) + z(τ ) 2
  • 25. Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion 1 dθ(t) = θ L(θ(t))dt + db(t) 2 First order Euler-Maruyama discrete integration of diffusion 2 θ(τ + ) = θ(τ ) + θ L(θ(τ )) + z(τ ) 2 Proposal 2 2 pp (θ |θ) = N (θ |µ(θ, ), I) with µ(θ, ) = θ + θ L(θ) 2
  • 26. Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion 1 dθ(t) = θ L(θ(t))dt + db(t) 2 First order Euler-Maruyama discrete integration of diffusion 2 θ(τ + ) = θ(τ ) + θ L(θ(τ )) + z(τ ) 2 Proposal 2 2 pp (θ |θ) = N (θ |µ(θ, ), I) with µ(θ, ) = θ + θ L(θ) 2 Acceptance probability to correct for bias » – p(θ )pp (θ|θ ) pa (θ |θ) = min 1, p(θ)pp (θ |θ)
  • 27. Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion 1 dθ(t) = θ L(θ(t))dt + db(t) 2 First order Euler-Maruyama discrete integration of diffusion 2 θ(τ + ) = θ(τ ) + θ L(θ(τ )) + z(τ ) 2 Proposal 2 2 pp (θ |θ) = N (θ |µ(θ, ), I) with µ(θ, ) = θ + θ L(θ) 2 Acceptance probability to correct for bias » – p(θ )pp (θ|θ ) pa (θ |θ) = min 1, p(θ)pp (θ |θ) Isotropic diffusion inefficient, employ pre-conditioning √ θ = θ + 2 M θ L(θ)/2 + Mz
  • 28. Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion 1 dθ(t) = θ L(θ(t))dt + db(t) 2 First order Euler-Maruyama discrete integration of diffusion 2 θ(τ + ) = θ(τ ) + θ L(θ(τ )) + z(τ ) 2 Proposal 2 2 pp (θ |θ) = N (θ |µ(θ, ), I) with µ(θ, ) = θ + θ L(θ) 2 Acceptance probability to correct for bias » – p(θ )pp (θ|θ ) pa (θ |θ) = min 1, p(θ)pp (θ |θ) Isotropic diffusion inefficient, employ pre-conditioning √ θ = θ + 2 M θ L(θ)/2 + Mz How to set M systematically? Tuning in transient & stationary phases
  • 30. Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ ≈ δθ T G(θ)δθ p(y; θ)
  • 31. Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ ≈ δθ T G(θ)δθ p(y; θ) where ( ) T θ p(y; θ) θ p(y; θ) G(θ) = Ey|θ p(y; θ) p(y; θ)
  • 32. Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ ≈ δθ T G(θ)δθ p(y; θ) where ( ) T θ p(y; θ) θ p(y; θ) G(θ) = Ey|θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold
  • 33. Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ ≈ δθ T G(θ)δθ p(y; θ) where ( ) T θ p(y; θ) θ p(y; θ) G(θ) = Ey|θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics
  • 34. Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ ≈ δθ T G(θ)δθ p(y; θ) where ( ) T θ p(y; θ) θ p(y; θ) G(θ) = Ey|θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975; Lauritsen, 1989
  • 35. Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ ≈ δθ T G(θ)δθ p(y; θ) where ( ) T θ p(y; θ) θ p(y; θ) G(θ) = Ey|θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975; Lauritsen, 1989 Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
  • 36. Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ ≈ δθ T G(θ)δθ p(y; θ) where ( ) T θ p(y; θ) θ p(y; θ) G(θ) = Ey|θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975; Lauritsen, 1989 Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998 Can geometric structure be employed in MCMC methodology?
  • 38. Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k ,l gkl δθk δθl
  • 39. Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = k ,l gkl δθk δθl P Christoffel symbols - characterise connection on curved manifold
  • 40. Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = k ,l gkl δθk δθl P Christoffel symbols - characterise connection on curved manifold „ « 1 X im ∂gmk ∂gml ∂gkl Γikl = g + − m 2 m ∂θl ∂θk ∂θ
  • 41. Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = k ,l gkl δθk δθl P Christoffel symbols - characterise connection on curved manifold „ « 1 X im ∂gmk ∂gml ∂gkl Γikl = g + − m 2 m ∂θl ∂θk ∂θ Geodesics - shortest path between two points on manifold
  • 42. Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = k ,l gkl δθk δθl P Christoffel symbols - characterise connection on curved manifold „ « 1 X im ∂gmk ∂gml ∂gkl Γikl = g + − m 2 m ∂θl ∂θk ∂θ Geodesics - shortest path between two points on manifold d 2 θi X i dθk dθl + Γkl =0 dt 2 dt dt k,l
  • 43. Illustration of Geometric Concepts Consider Normal density p(x|µ, σ) = Nx (µ, σ)
  • 44. Illustration of Geometric Concepts Consider Normal density p(x|µ, σ) = Nx (µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T
  • 45. Illustration of Geometric Concepts Consider Normal density p(x|µ, σ) = Nx (µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T Metric is Fisher Information σ −2 » – 0 G(µ, σ) = 0 2σ −2
  • 46. Illustration of Geometric Concepts Consider Normal density p(x|µ, σ) = Nx (µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T Metric is Fisher Information σ −2 » – 0 G(µ, σ) = 0 2σ −2 Inner-product σ −2 (δµ2 + 2δσ 2 ) so densities N (0, 1) & N (1, 1) further apart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
  • 47. Illustration of Geometric Concepts Consider Normal density p(x|µ, σ) = Nx (µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T Metric is Fisher Information σ −2 » – 0 G(µ, σ) = 0 2σ −2 Inner-product σ −2 (δµ2 + 2δσ 2 ) so densities N (0, 1) & N (1, 1) further apart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean Metric tensor for univariate Normal defines a Hyperbolic Space
  • 48. M.C. Escher, Heaven and Hell, 1960
  • 49. Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism 2 “ ” D X “p ” θd = θd + G −1 (θ) θ L(θ) − 2 G(θ)−1 Γd + ij ij G−1 (θ)z 2 d d i,j
  • 50. Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism 2 “ ” D X “p ” θd = θd + G −1 (θ) θ L(θ) − 2 G(θ)−1 Γd + ij ij G−1 (θ)z 2 d d i,j Manifold with constant curvature then proposal mechanism reduces to 2 G −1 (θ) p θ =θ+ θ L(θ) + G−1 (θ)z 2
  • 51. Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism 2 “ ” D X “p ” θd = θd + G −1 (θ) θ L(θ) − 2 G(θ)−1 Γd + ij ij G−1 (θ)z 2 d d i,j Manifold with constant curvature then proposal mechanism reduces to 2 G −1 (θ) θ L(θ) + p θ =θ+ G−1 (θ)z 2 MALA proposal with preconditioning 2 √ θ =θ+ M θ L(θ) + Mz 2
  • 52. Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism 2 “ ” D X “p ” θd = θd + G −1 (θ) θ L(θ) − 2 G(θ)−1 Γd + ij ij G−1 (θ)z 2 d d i,j Manifold with constant curvature then proposal mechanism reduces to 2 G −1 (θ) θ L(θ) + p θ =θ+ G−1 (θ)z 2 MALA proposal with preconditioning 2 √ θ =θ+ M θ L(θ) + Mz 2 Proposal and acceptance probability pp (θ |θ) = N (θ |µ(θ, ), 2 G(θ)) » – p(θ )pp (θ|θ ) pa (θ |θ) = min 1, p(θ)pp (θ |θ)
  • 53. Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism 2 “ ” D X “p ” θd = θd + G −1 (θ) θ L(θ) − 2 G(θ)−1 Γd + ij ij G−1 (θ)z 2 d d i,j Manifold with constant curvature then proposal mechanism reduces to 2 G −1 (θ) θ L(θ) + p θ =θ+ G−1 (θ)z 2 MALA proposal with preconditioning 2 √ θ =θ+ M θ L(θ) + Mz 2 Proposal and acceptance probability pp (θ |θ) = N (θ |µ(θ, ), 2 G(θ)) » – p(θ )pp (θ|θ ) pa (θ |θ) = min 1, p(θ)pp (θ |θ) Proposal mechanism diffuses approximately along the manifold
  • 54. Langevin Diffusion on Riemannian manifold 40 40 35 35 30 30 25 25 ! ! 20 20 15 15 10 10 5 5 −5 0 5 −5 0 5 µ µ
  • 55. Langevin Diffusion on Riemannian manifold 25 25 20 20 15 15 ! ! 10 10 5 5 −10 0 10 −10 0 10 µ µ
  • 56. Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics
  • 57. Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θi X i dθk dθl + Γkl =0 dt 2 dt dt k,l
  • 58. Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θi X i dθk dθl + Γkl =0 dt 2 dt dt k,l How can this be exploited in the design of a transition operator?
  • 59. Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θi X i dθk dθl + Γkl =0 dt 2 dt dt k,l How can this be exploited in the design of a transition operator? Need slight detour
  • 60. Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θi X i dθk dθl + Γkl =0 dt 2 dt dt k,l How can this be exploited in the design of a transition operator? Need slight detour - first define log-density under model as L(θ)
  • 61. Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θi X i dθk dθl + Γkl =0 dt 2 dt dt k,l How can this be exploited in the design of a transition operator? Need slight detour - first define log-density under model as L(θ) Introduce auxiliary variable p ∼ N (0, G(θ))
  • 62. Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θi X i dθk dθl + Γkl =0 dt 2 dt dt k,l How can this be exploited in the design of a transition operator? Need slight detour - first define log-density under model as L(θ) Introduce auxiliary variable p ∼ N (0, G(θ)) Negative joint log density is 1 1 H(θ, p) = −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p 2 2
  • 63. Riemannian Hamiltonian Monte Carlo Negative joint log-density ≡ Hamiltonian defined on Riemann manifold 1 1 H(θ, p) = −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p | 2 {z } |2 {z } Potential Energy Kinetic Energy
  • 64. Riemannian Hamiltonian Monte Carlo Negative joint log-density ≡ Hamiltonian defined on Riemann manifold 1 1 H(θ, p) = −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p | 2 {z } |2 {z } Potential Energy Kinetic Energy Marginal density follows as required Z  ff exp {L(θ)} 1 p(θ) ∝ p exp − pT G(θ)−1 p dp = exp {L(θ)} 2π D |G(θ)| 2
  • 65. Riemannian Hamiltonian Monte Carlo Negative joint log-density ≡ Hamiltonian defined on Riemann manifold 1 1 H(θ, p) = −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p | 2 {z } |2 {z } Potential Energy Kinetic Energy Marginal density follows as required Z  ff exp {L(θ)} 1 p(θ) ∝ p exp − pT G(θ)−1 p dp = exp {L(θ)} 2π D |G(θ)| 2 Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p) pn+1 |θ n ∼ N (0, G(θ n )) n+1 n+1 θ |p ∼ p(θ n+1 |pn+1 )
  • 66. Riemannian Hamiltonian Monte Carlo Negative joint log-density ≡ Hamiltonian defined on Riemann manifold 1 1 H(θ, p) = −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p | 2 {z } |2 {z } Potential Energy Kinetic Energy Marginal density follows as required Z  ff exp {L(θ)} 1 p(θ) ∝ p exp − pT G(θ)−1 p dp = exp {L(θ)} 2π D |G(θ)| 2 Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p) pn+1 |θ n ∼ N (0, G(θ n )) n+1 n+1 θ |p ∼ p(θ n+1 |pn+1 ) Employ Hamiltonian dynamics to propose samples for p(θ n+1 |pn+1 ), Duane et al, 1987; Neal, 2010.
  • 67. Riemannian Hamiltonian Monte Carlo Negative joint log-density ≡ Hamiltonian defined on Riemann manifold 1 1 H(θ, p) = −L(θ) + log 2π D |G(θ)| + pT G(θ)−1 p | 2 {z } |2 {z } Potential Energy Kinetic Energy Marginal density follows as required Z  ff exp {L(θ)} 1 p(θ) ∝ p exp − pT G(θ)−1 p dp = exp {L(θ)} 2π D |G(θ)| 2 Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p) pn+1 |θ n ∼ N (0, G(θ n )) n+1 n+1 θ |p ∼ p(θ n+1 |pn+1 ) Employ Hamiltonian dynamics to propose samples for p(θ n+1 |pn+1 ), Duane et al, 1987; Neal, 2010. dθ ∂ dp ∂ = H(θ, p) = − H(θ, p) dt ∂p dt ∂θ
  • 68. Riemannian Manifold Hamiltonian Monte Carlo ˜ ˜ Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p 2
  • 69. Riemannian Manifold Hamiltonian Monte Carlo ˜ ˜ Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p 2 Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G ˜
  • 70. Riemannian Manifold Hamiltonian Monte Carlo ˜ ˜ Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p 2 Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G ˜ However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p 2
  • 71. Riemannian Manifold Hamiltonian Monte Carlo ˜ ˜ Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p 2 Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G ˜ However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p 2 ˜ If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p)
  • 72. Riemannian Manifold Hamiltonian Monte Carlo ˜ ˜ Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p 2 Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G ˜ However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p 2 ˜ If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for ˜ H(θ, p) and H(θ, p) are equivalent along energy level h
  • 73. Riemannian Manifold Hamiltonian Monte Carlo ˜ ˜ Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p 2 Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G ˜ However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p 2 ˜ If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for ˜ H(θ, p) and H(θ, p) are equivalent along energy level h The solution of dθ ∂ dp ∂ = H(θ, p) = − H(θ, p) dt ∂p dt ∂θ
  • 74. Riemannian Manifold Hamiltonian Monte Carlo ˜ ˜ Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p 2 Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G ˜ However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p 2 ˜ If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for ˜ H(θ, p) and H(θ, p) are equivalent along energy level h The solution of dθ ∂ dp ∂ = H(θ, p) = − H(θ, p) dt ∂p dt ∂θ is therefore equivalent to the solution of d 2 θi X i dθk dθl ˜ + Γkl =0 dt 2 dt dt k,l
  • 75. Riemannian Manifold Hamiltonian Monte Carlo ˜ ˜ Consider the Hamiltonian H(θ, p) = 1 pT G(θ)−1 p 2 Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G ˜ However our Hamiltonian is H(θ, p) = V (θ) + 1 pT G(θ)−1 p 2 ˜ If we define G(θ) = G(θ) × (h − V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for ˜ H(θ, p) and H(θ, p) are equivalent along energy level h The solution of dθ ∂ dp ∂ = H(θ, p) = − H(θ, p) dt ∂p dt ∂θ is therefore equivalent to the solution of d 2 θi X i dθk dθl ˜ + Γkl =0 dt 2 dt dt k,l RMHMC proposals are along the manifold geodesics
  • 76. Warped Bivariate Gaussian QN 2 2 2 p(w1 , w2 |y, σx , σy ) ∝ n=1 N (yn |w1 + w2 , σy )N (w1 , w2 |0, σx I)
  • 77. Warped Bivariate Gaussian QN 2 2 2 p(w1 , w2 |y, σx , σy ) ∝ n=1 N (yn |w1 + w2 , σy )N (w1 , w2 |0, σx I) 3 3 HMC RMHMC 2 2 1 1 w 0 w 0 -1 -1 -2 -2 -3 -3 -4 -3 -2 -1 0 1 2 3 -4 -3 -2 -1 0 1 2 3 w w
  • 78. Gaussian Mixture Model Univariate finite mixture model K X 2 p(xi |θ) = πk N (xi |µk , σk ) k=1
  • 79. Gaussian Mixture Model Univariate finite mixture model K X 2 p(xi |θ) = πk N (xi |µk , σk ) k=1 FI based metric tensor non-analytic - employ empirical FI
  • 80. Gaussian Mixture Model Univariate finite mixture model K X 2 p(xi |θ) = πk N (xi |µk , σk ) k=1 FI based metric tensor non-analytic - employ empirical FI 1 T 1 G(θ) = S S − 2 ssT ¯¯ −−→ −− cov ( θ L(θ)) = I(θ) N N N→∞ ! ∂S T ∂ sT „ « ∂G(θ) 1 ∂S 1 ¯ ∂s T ¯ = S + ST − ¯ ¯ s +s ∂θd N ∂θd ∂θd N2 ∂θd ∂θd ∂ log p(xi |θ) PN with score matrix S with elements Si,d = ∂θd ¯ and s = i=1 ST i,·
  • 81. Gaussian Mixture Model Univariate finite mixture model p(x|µ, σ 2 ) = 0.7 × N (x|0, σ 2 ) + 0.3 × N (x|µ, σ 2 )
  • 82. Gaussian Mixture Model Univariate finite mixture model p(x|µ, σ 2 ) = 0.7 × N (x|0, σ 2 ) + 0.3 × N (x|µ, σ 2 ) 4 3 2 1 lnσ 0 −1 −2 −3 −4 −4 −3 −2 −1 0 1 2 3 4 µ Figure: Arrows correspond to the gradients and ellipses to the inverse metric tensor. Dashed lines are isocontours of the joint log density
  • 83. Gaussian Mixture Model MALA path mMALA path Simplified mMALA path 4 4 4 −1410 −1410 −1410 −1310 −1310 −1310 3 3 3 −1210 −1210 −1210 −1110 −1110 −1110 2 2 2 −1010 −1010 −1010 1 1 1 −910 −910 −910 lnσ lnσ lnσ 0 0 0 −1 −1 −1 −1410 −1410 −1410 −2 −2 −2 −2910 −2910 −2910 −3 −3 −3 −4 −4 −4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 µ µ µ MALA Sample Autocorrelation Function mMALA Sample Autocorrelation Function Simplified mMALA Sample Autocorrelation Function 0.8 0.8 0.8 Sample Autocorrelation Sample Autocorrelation Sample Autocorrelation 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 −0.2 −0.2 −0.2 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Lag Lag Lag Figure: Comparison of MALA (left), mMALA (middle) and simplified mMALA (right) convergence paths and autocorrelation plots. Autocorrelation plots are from the stationary chains, i.e. once the chains have converged to the stationary distribution.
  • 84. Gaussian Mixture Model HMC path RMHMC path Gibbs path 4 4 4 −1410 −1410 −1410 −1310 −1310 −1310 3 3 3 −1210 −1210 −1210 −1110 −1110 −1110 2 2 2 −1010 −1010 −1010 1 1 1 −910 −910 −910 lnσ lnσ lnσ 0 0 0 −1 −1 −1 −1410 −1410 −1410 −2 −2 −2 −2910 −2910 −2910 −3 −3 −3 −4 −4 −4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 µ µ µ HMC Sample Autocorrelation Function RMHMC Sample Autocorrelation Function Gibbs Sample Autocorrelation Function 0.8 0.8 0.8 Sample Autocorrelation Sample Autocorrelation Sample Autocorrelation 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 −0.2 −0.2 −0.2 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Lag Lag Lag Figure: Comparison of HMC (left), RMHMC (middle) and GIBBS (right) convergence paths and autocorrelation plots. Autocorrelation plots are from the stationary chains, i.e. once the chains have converged to the stationary distribution.
  • 85. Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field Y64 p(y, x|µ, σ, β) ∝ exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 (x−µ1)/2) θ i,j
  • 86. Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field Y64 p(y, x|µ, σ, β) ∝ exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 (x−µ1)/2) θ i,j Metric tensors „ « 1 ∂Σθ −1 ∂Σθ G(θ)i,j = trace Σ−1 θ Σθ 2 ∂θi ∂θj G(x) = Λ + Σ−1 θ where Λ is diagonal with elements m exp(µ + (Σθ )i,i )
  • 87. Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field Y64 p(y, x|µ, σ, β) ∝ exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 (x−µ1)/2) θ i,j Metric tensors „ « 1 ∂Σθ −1 ∂Σθ G(θ)i,j = trace Σ−1 θ Σθ 2 ∂θi ∂θj G(x) = Λ + Σ−1 θ where Λ is diagonal with elements m exp(µ + (Σθ )i,i ) Latent field metric tensor defining flat manifold is 4096 × 4096, O(N 3 ) obtained from parameter conditional
  • 88. Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field Y64 p(y, x|µ, σ, β) ∝ exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 (x−µ1)/2) θ i,j Metric tensors „ « 1 ∂Σθ −1 ∂Σθ G(θ)i,j = trace Σ−1 θ Σθ 2 ∂θi ∂θj G(x) = Λ + Σ−1 θ where Λ is diagonal with elements m exp(µ + (Σθ )i,i ) Latent field metric tensor defining flat manifold is 4096 × 4096, O(N 3 ) obtained from parameter conditional MALA requires transformation of latent field to sample - with separate tuning in transient and stationary phases of Markov chain Manifold methods requires no pilot tuning or additional transformations
  • 89. RMHMC for Log-Gaussian Cox Point Processes Table: Sampling the latent variables of a Log-Gaussian Cox Process - Comparison of sampling methods Method Time ESS (Min, Med, Max) s/Min ESS Rel. Speed MALA (Transient) 31,577 (3, 8, 50) 10,605 ×1 MALA (Stationary) 31,118 (4, 16, 80) 7836 ×1.35 mMALA 634 (26, 84, 174) 24.1 ×440 RMHMC 2936 (1951, 4545, 5000) 1.5 ×7070
  • 90. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods
  • 91. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA
  • 92. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC
  • 93. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models
  • 94. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology
  • 95. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development
  • 96. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols
  • 97. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence
  • 98. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures
  • 99. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric
  • 100. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric Optimality of Hamiltonian flows as local geodesics
  • 101. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric Optimality of Hamiltonian flows as local geodesics Alternative transition kernels
  • 102. Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric Optimality of Hamiltonian flows as local geodesics Alternative transition kernels No silver bullet or cure all - new powerful methodology for MC toolkit
  • 103. Funding Acknowledgment Girolami funded by EPSRC Advanced Research Fellowship and BBSRC project grant Calderhead supported by Microsoft Research PhD Scholarship