SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Approximate Bayesian Computation on Graphical
              Processing Units




                                           Thomas Thorne, Juliane Liepe, Sarah
                                              Filippi & Michael P.H. Stumpf

                                                Theoretical Systems Biology Group


                                                         06/09/2012




 ABC on GPUs   Thorne, Liepe, Filippi&Stumpf                                        1 of 23
Trends in High-Performance Computing




                        CUDA OpenCL MPI OpenMPI PVM




   ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Introduction   2 of 23
(Hidden) Costs of Computing

Prototyping/Development Time is typically reduced for scripting
             languages such as R or Python.
  Run Time on single threads C/C++ (or Fortran) have better
             performance characteristics. But for specialized tasks
             other languages, e.g. Haskell, can show good
             characteristics.
Energy Requirements Every Watt we use for computing we also have
             to extract with air conditioning.
The role of GPUs
• GPUs can be accessed from many different programming
  languages (e.g. PyCUDA).
• GPUs have a comparatively small footprint and relatively modest
  energy requirements compared to clusters of CPUs.
• GPUs were designed for consumer electronics: computer gamers
  have different needs from the HPC community.
     ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Introduction     3 of 23
What GPUs are Good At

• GPUs are good for linear threads involving mathematical functions.
• We should avoid loops and branches in the code.
• In an optimal world we should aim for all threads to finish at the
  same time.
                                     Execution cycle




    ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Introduction     4 of 23
Approximate Bayesian Computation
We can define the posterior as
                                      f (x |θi )π(θi )
                                  p(θi |x ) =
                                            p (x )
Here fi (x |θ) is the likelihood which is often hard to evaluate; consider
for example
                                                             dy
y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and
˜                                                                = g (y ; θ).
                                                             dt
But we can still simulate from the data-generating model, whence

                                 1(y = x )f (y |θi )π(θi )
                p(θi |x ) =                                dy
                               X          p (x )
                                 1 (∆(y , x ) < ) f (y |θi )π(θi )
                             ≈                                     dy
                               X              p (x )

Solutions for Complex Problems (?)
Approximate (i) data, (ii) model or (iii) distance.

      ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Approximate Bayesian Computation   5 of 23
Approximate Bayesian Computation

 Prior, π(θ)                   Define set of intermediate distributions, πt , t = 1, ...., T
                                                            1   >    2   > ...... >   T


                             πt −1 (θ|∆(Xs , X ) <          t −1 )



                                                                     πt (θ|∆(Xs , X ) <      t)



                                                                                                   πT (θ|∆(Xs , X ) <   T)




 Sequential importance sampling:
 Sample from proposal, ηt (θt ) and weight
 wt (θt ) = πt (θt )/ηt (θt ) with
 ηt (θt ) = πt −1 (θt −1 )Kt (θt −1 , θt )d θt −1 where
 Kt (θt −1 , θt ) is Markov perturbation kernel


Toni et al., J.Roy.Soc. Interface (2009); Toni & Stumpf, Bioinformatics (2010).

          ABC on GPUs       Thorne, Liepe, Filippi&Stumpf       Approximate Bayesian Computation                        6 of 23
Experimental Design for Dynamical Systems




   ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Dynamical Systems   7 of 23
Experimental Design for Dynamical Systems




   ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Dynamical Systems   7 of 23
Parameter Estimation for Dynamical Systems




   ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Dynamical Systems   8 of 23
Parameter Estimation for Dynamical Systems




   ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Dynamical Systems   8 of 23
Parameter Estimation for Dynamical Systems




Sampling parameters from the ABC posterior yields excellent
agreement with provided data (grey dots). The scatter is entirely
explained by the added noise.


     ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Dynamical Systems   8 of 23
Network Evolution Models




            (a) Duplication attachment (b) Duplication attachment
                                                   with complimentarity




                                                                     wj
           (c) Linear preferential                                   wi
                                                       (d) General scale-free
           attachment
   ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Network Evolution              9 of 23
ABC on Networks

Summarizing Networks
• Data are noisy and incomplete.
• We can simulate models of network
  evolution, but this does not allow us to
  calculate likelihoods for all but very
  trivial models.
• There is also no sufficient statistic that
  would allow us to summarize networks,
  so ABC approaches require some
  thought.
• Many possible summary statistics of
  networks are expensive to calculate.
    Full likelihood: Wiuf et al., PNAS (2006).
             ABC: Ratman et al., PLoS Comp.Biol. (2008).

                                                                           Stumpf & Wiuf, J. Roy. Soc. Interface (2010).

           ABC on GPUs       Thorne, Liepe, Filippi&Stumpf   Network Evolution                                             10 of 23
Graph Spectrum
                      c
                                                                              a b c d e
                                                                                                 
                                                                              0   1   1   1   0       a
    a                                                 e
                                                                                                 
                                         d                                   1   0   1   1   0   b
                                                                                                 
                                                                A =          1   1   0   0   0   c
                                                                                                 
                                                                   
                                                                             1   1   0   0   1   d
                                                                                                  
                      b                                                       0   0   0   1   0       e

Graph Spectra
Given a graph G comprised of a set of nodes N and edges (i , j ) ∈ E
with i , j ∈ N, the adjacency matrix, A, of the graph is defined by
                                                     1      if (i , j ) ∈ E ,
                                       ai ,j =
                               0 otherwise.
The eigenvalues, λ, of this matrix provide one way of defining the
graph spectrum.
        ABC on GPUs       Thorne, Liepe, Filippi&Stumpf   Network Evolution                               11 of 23
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                              D (A, B ) =                 (ai ,j − bi ,j )2 .
                                                   i ,j




     ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Network Evolution            12 of 23
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                              D (A, B ) =                 (ai ,j − bi ,j )2 .
                                                   i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

               D (A, B )          Dh (A, B ) =                    (ai ,j − bh(i ),h(j ) )2 ,
                                                           i ,j




     ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Network Evolution                           12 of 23
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                              D (A, B ) =                     (ai ,j − bi ,j )2 .
                                                       i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

               D (A, B )          Dh (A, B ) =                        (ai ,j − bh(i ),h(j ) )2 ,
                                                               i ,j

Given a spectrum (which is relatively cheap to compute) we have

                                                                 (α)          (β) 2
                         D (A, B ) =                          λl        − λl
                                                   l


     ABC on GPUs   Thorne, Liepe, Filippi&Stumpf       Network Evolution                           12 of 23
Protein Interaction Network Data
                               Species                         Proteins        Interactions           Genome size       Sampling fraction
                               S.cerevisiae                       5035                     22118                 6532         0.77
                               D. melanogaster                    7506                     22871                14076         0.53
                               H. pylori                           715                      1423                 1589         0.45
                               E. coli                            1888                      7008                 5416         0.35



                    0.5
                                                                                                Model Selection
                                                                                                 • Inference here was based on all
                    0.4

                                                                                                   the data, not summary
Model probability




                    0.3
                                                                       Organism
                                                                          S.cerevisae              statistics.
                                                                          D.melanogaster
                                                                          H.pylori
                                                                          E.coli                 • Duplication models receive the
                    0.2
                                                                                                   strongest support from the data.
                    0.1                                                                          • Several models receive support
                                                                                                   and no model is chosen
                    0.0
                                                                                                   unambiguously.
                          DA      DAC    LPA       SF   DACL    DACR
                                           Model

                               ABC on GPUs          Thorne, Liepe, Filippi&Stumpf           Network Evolution                               13 of 23
GPUs in Computational Statistics
Performance
• With highly optimized libraries (e.g. BLAST/ATLAS) we can run
  numerically demanding jobs relatively straightforwardly.
• Whenever poor-man’s parallelization is possible, GPUs offer
  considerable advantages over multi-core CPU systems (at
  favourable cost and energy requirements).
• Performance depends crucially on our ability to map tasks onto the
  hardware.

Challenges
• GPU hardware was initially conceived for different purposes —
  computer gamers need fewer random numbers than MCMC or
  SMC procedures. The address space accessible to
  single-precision numbers also suffices to kill zombies etc .
• Combining several GPUs requires additional work, using e.g. MPI.

    ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   GPU and the Alternatives   14 of 23
Alternatives to GPUs: FPGAs
    Field Programmable Gate Arrays are configurable electronic circuits
    that are partly (re)configurable. Here the hardware is adapted to the
2    D.B. Thomas, W. Luk, and M. Stumpf
    problem at hand and encapsulates the programme in its
    programmable logic arrays.
                                                                                               Multiple instances in xc4vlx60
         Performance (MSwap-Compare/s)




                                         100000
                                                                                               Single Instance in Virtex-4
                                                                                               Quad Opteron Software
                                         10000


                                          1000


                                           100


                                            10


                                             1
                                                  4                      9                   14                  19             24
                                                                                       Graph Size
           ABC on GPUs                                Thorne, Liepe, Filippi&Stumpf   GPU and the Alternatives                       15 of 23
Alternatives to GPUs: CPUs (and MPI. . . )

CPUs with multiple cores
                                                       40
are flexible, have large
address spaces and a
wealth of flexible numerical                            30

routines. This makes
implementation of                                                                                            processor




                                                time
numerically demanding                                  20
                                                                                                                  CPU
                                                                                                                  GPU
tasks relatively
straightforward. In
particular there is less of an                         10


incentive to consider how a
problem is best
implemented in software                                     100          500   1000          5000 10000   50000
                                                                                      size
that takes advantage of
hardware features.                                     6-core Xeon vs M2050 (448 cores) programmed in
                                                                                       OpenCL
       ABC on GPUs   Thorne, Liepe, Filippi&Stumpf          GPU and the Alternatives                                    16 of 23
(Pseudo) Random Numbers
The Mersenne-Twister is one of the standard random number
generators for simulation. MT19937 has a period of
219937 − 1 ≈ 4 × 106001 .
But MT does not have cryptographic strength (once 624 iterates have
been observed, all future states are predictable), unlike
Blum-Blum-Shub,
                           xn+1 = xn modM ,
where M = pq with p, q large prime numbers. But it is too slow for
simulations.
Parallel Random Number Generation
• Running RNGs in parallel does not produce reliable sets of random
  numbers.
• We need algorithms to produce large numbers of parallel streams
  of “good” random numbers.
• We also need better algorithms for weighted sampling.
     ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Random Numbers on GPUs   17 of 23
Generating RNGs in Parallel
We assume that we have a function xn+1 = f (xn ) which generates a
stream of (pseudo) random numbers

                            x0 , x1 , x2 , . . . , xr , . . . , xs , . . . , xt , . . . , xz



                                                       Parallel Streams
          x0 , x1 , . . .
         x0 , x1 , . . .
         x0 , x1 , . . .


                                                          Sub-Streams
        x0 , . . . , xr −1
        xr , . . . , xs−1
        xs , . . . , xt −1



     ABC on GPUs       Thorne, Liepe, Filippi&Stumpf     Random Numbers on GPUs                18 of 23
Counter-Based RNGs

We can represent RNGs as state-space processes,

                       yn+1 = f (yn )              with         f :Y −→ Y
    xn+1 = gk ,nmodJ (y          n+1/J      )      with        g :Y × K × ZJ −→ X ,

where x are essentially behaving as x ∼ U[0,1] ; here K is the key
space, J the number of random numbers generated from the internal
state of the RNG.
Salmon et al.(SC11) propose to use a simple form for f (·) and

                                         gk ,j = hk ◦ bk

 For a sufficiently complex bijection bk we can use simple updates
and leave the randomization to bk . Here cryptographic routines come
in useful.

     ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Random Numbers on GPUs             19 of 23
RNG Performance on CPUs and GPUs
                        Method        Max.    Min.    Output    Intel CPU    Nvidia GPU      AMD GPU
                                      input   state    size    cpB GB/s      cpB    GB/s     cpB GB/s
 Counter-based, Cryptographic
                        AES(sw)    (1+0)×16   11×16     1×16   31.2    0.4      –       –        –       –
                        AES(hw)    (1+0)×16   11×16     1×16    1.7    7.2      –       –        –       –
     Threefish (Threefry-4×64-72)    (4+4)×8       0      4×8    7.3    1.7   51.8    15.3    302.8     4.5
 Counter-based, Crush-resistant
                      ARS-5(hw)    (1+1)×16       0     1×16   0.7    17.8      –        –      –        –
                      ARS-7(hw)    (1+1)×16       0     1×16   1.1    11.1      –        –      –        –
                Threefry-2×64-13    (2+2)×8       0      2×8   2.0     6.3   13.6     58.1   25.6     52.5
                Threefry-2×64-20    (2+2)×8       0      2×8   2.4     5.1   15.3     51.7   30.4     44.5
                Threefry-4×64-12    (4+4)×8       0      4×8   1.1    11.2    9.4     84.1   15.2     90.0
                Threefry-4×64-20    (4+4)×8       0      4×8   1.9     6.4   15.0     52.8   29.2     46.4
                Threefry-4×32-12    (4+4)×4       0      4×4   2.2     5.6    9.5     83.0   12.8    106.2
                Threefry-4×32-20    (4+4)×4       0      4×4   3.9     3.1   15.7     50.4   25.2     53.8
                   Philox2×64-6     (2+1)×8       0      2×8   2.1     5.9    8.8     90.0   37.2     36.4
                  Philox2×64-10     (2+1)×8       0      2×8   4.3     2.8   14.7     53.7   62.8     21.6
                   Philox4×64-7     (4+2)×8       0      4×8   2.0     6.0    8.6     92.4   36.4     37.2
                  Philox4×64-10     (4+2)×8       0      4×8   3.2     3.9   12.9     61.5   54.0     25.1
                   Philox4×32-7     (4+2)×4       0      4×4   2.4     5.0    3.9    201.6   12.0    113.1
                  Philox4×32-10     (4+2)×4       0      4×4   3.6     3.4    5.4   145.3    17.2    79.1
 Conventional, Crush-resistant
                     MRG32k3a             0    6×4    1000×4    3.8    3.2      –       –       –       –
                     MRG32k3a             0    6×4       4×4   20.3    0.6      –       –       –       –
                      MRGk5-93            0    5×4       1×4    7.6    1.6    9.2    85.5       –       –
 Conventional, Crushable
                Mersenne Twister          0   312×8      1×8    2.0    6.1   43.3    18.3        –       –
                      XORWOW              0     6×4      1×4    1.6    7.7    5.8   136.7     16.8    81.1

Table 2: Memory and performance characteristics for a variety of counter-based and conventional PRNGs.
Taken from Salmon et al.(see References). a counter type of width c×w bytes and a key type of width
Maximum input is written as (c+k)×w, indicating
k×w bytes. Minimum state and output size are c×w bytes. Counter-based PRNG performance is reported
with the minimal number of Liepe, Filippi&Stumpf
       ABC on GPUs    Thorne, rounds for Crush-resistance,Numbers onwith extra rounds for “safety margin.” 23
                                                  Random and also GPUs                                20 of
Performance is shown in bold for recommended PRNGs that have the best platform-specific performance with
Conclusions and Challenges

• ABC is perhaps a method best left for cases when all else fails.
• Its use as an inferential framework in its own right is certainly worth
  further investigation.
• Memory and address space are potential issues: using single
  precision we are prone to run out of numbers for challenging SMC
  applications very quickly.
• Bandwidth is an issue — coordination between GPU and CPU is
  much more challenging in statistical applications than e.g. for
  texture mapping or graphics rendering tasks.
• Programming has to be much more hardware aware than for CPUs
  — more precisely, non-hardware adapted programming will be
  more obviously less efficient than for CPUs.
• There are some differences from conventional ANSI standards for
  mathematics (e.g. rounding).
• Knowing how to harness computational power does allow us to
  tackle real-world problems.
     ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Conclusions          21 of 23
References

GPUs and ABC
•   Zhou, Liepe, Sheng, Stumpf, Barnes (2011) GPU accelerated biochemical network
    simulation. Bioinformatics 27:874-876.
•   Liepe, Barnes, Cule, Erguler, Kirk, Toni, Stumpf (2010) ABC-SysBioapproximate Bayesian
    computation in Python with GPU support. Bioinformatics 26:1797-1799.
•   Thorne, Stumpf (2012) Graph spectral analysis of protein interaction network evolution.
    J.Roy.Soc. Interface 9:2653-2666.

GPUs and Random Numbers
•   Salmon, Moraes, Dror, Shaw (2011) Parallel Random Numbers: As Easy as 1, 2, 3. in
    Proceedings of 2011 International Conference for High Performance Computing,
    Networking, Storage and Analysis, ACM, New York;
    http://doi.acm.org/10.1145/2063384.2063405.
•   Howes, Thomas (2007) Efficient Random Number Generation and Application Using CUDA
    in GPU Gems 3, Addison-Wesley Professional, Boston;
    http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html.


       ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Conclusions                              22 of 23
Acknowledgements




   ABC on GPUs   Thorne, Liepe, Filippi&Stumpf   Conclusions   23 of 23

Weitere ähnliche Inhalte

Was ist angesagt?

Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Optimal Budget Allocation: Theoretical Guarantee and Efficient AlgorithmOptimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Optimal Budget Allocation: Theoretical Guarantee and Efficient AlgorithmTasuku Soma
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talkChristian Robert
 
Optimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-peripheryOptimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-peripheryFrancesco Tudisco
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodTasuku Soma
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011BigMC
 
Global illumination techniques for the computation of hight quality images in...
Global illumination techniques for the computation of hight quality images in...Global illumination techniques for the computation of hight quality images in...
Global illumination techniques for the computation of hight quality images in...Frederic Perez
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfgrssieee
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeTasuku Soma
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Francesco Tudisco
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...LARCA UPC
 
My PhD defence
My PhD defenceMy PhD defence
My PhD defenceJialin LIU
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsBigMC
 
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionA Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionKonkuk University, Korea
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationTasuku Soma
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 

Was ist angesagt? (19)

Deblurring in ct
Deblurring in ctDeblurring in ct
Deblurring in ct
 
Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Optimal Budget Allocation: Theoretical Guarantee and Efficient AlgorithmOptimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talk
 
Optimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-peripheryOptimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-periphery
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares Method
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
 
Global illumination techniques for the computation of hight quality images in...
Global illumination techniques for the computation of hight quality images in...Global illumination techniques for the computation of hight quality images in...
Global illumination techniques for the computation of hight quality images in...
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...
 
My PhD defence
My PhD defenceMy PhD defence
My PhD defence
 
16 fft
16 fft16 fft
16 fft
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering models
 
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionA Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function Maximization
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 

Ähnlich wie Approximate Bayesian Computation on Graphical Processing Units

Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...Michael Stumpf
 
MATHEON Center Days: Index determination and structural analysis using Algori...
MATHEON Center Days: Index determination and structural analysis using Algori...MATHEON Center Days: Index determination and structural analysis using Algori...
MATHEON Center Days: Index determination and structural analysis using Algori...Dagmar Monett
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesisValentin De Bortoli
 
High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...Vissarion Fisikopoulos
 
A new approach in specifying the inverse quadratic matrix in modulo-2 for con...
A new approach in specifying the inverse quadratic matrix in modulo-2 for con...A new approach in specifying the inverse quadratic matrix in modulo-2 for con...
A new approach in specifying the inverse quadratic matrix in modulo-2 for con...Anax Fotopoulos
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersIDES Editor
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues listsJames Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
StacksqueueslistsFraboni Ec
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsTony Nguyen
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsHarry Potter
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsYoung Alista
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Koh Takeuchi
 
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...Dagmar Monett
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntuaIEEE NTUA SB
 
Lec09- AI
Lec09- AILec09- AI
Lec09- AIdrmbalu
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...AIRCC Publishing Corporation
 
Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...ijcsit
 

Ähnlich wie Approximate Bayesian Computation on Graphical Processing Units (20)

Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...
 
MATHEON Center Days: Index determination and structural analysis using Algori...
MATHEON Center Days: Index determination and structural analysis using Algori...MATHEON Center Days: Index determination and structural analysis using Algori...
MATHEON Center Days: Index determination and structural analysis using Algori...
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesis
 
High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...
 
A new approach in specifying the inverse quadratic matrix in modulo-2 for con...
A new approach in specifying the inverse quadratic matrix in modulo-2 for con...A new approach in specifying the inverse quadratic matrix in modulo-2 for con...
A new approach in specifying the inverse quadratic matrix in modulo-2 for con...
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR Filters
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
 
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
 
Lec09- AI
Lec09- AILec09- AI
Lec09- AI
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
 
Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...
 

Mehr von Michael Stumpf

Gaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksGaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksMichael Stumpf
 
Beyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric ApproachBeyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric ApproachMichael Stumpf
 
Multi-Scale Models in Immunobiology
Multi-Scale Models in ImmunobiologyMulti-Scale Models in Immunobiology
Multi-Scale Models in ImmunobiologyMichael Stumpf
 
Time-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida GlabrataTime-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida GlabrataMichael Stumpf
 
Noisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networksNoisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networksMichael Stumpf
 

Mehr von Michael Stumpf (6)

Gaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksGaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory Networks
 
Beyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric ApproachBeyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric Approach
 
Are Powerlaws Useful?
Are Powerlaws Useful?Are Powerlaws Useful?
Are Powerlaws Useful?
 
Multi-Scale Models in Immunobiology
Multi-Scale Models in ImmunobiologyMulti-Scale Models in Immunobiology
Multi-Scale Models in Immunobiology
 
Time-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida GlabrataTime-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida Glabrata
 
Noisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networksNoisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networks
 

Kürzlich hochgeladen

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Approximate Bayesian Computation on Graphical Processing Units

  • 1. Approximate Bayesian Computation on Graphical Processing Units Thomas Thorne, Juliane Liepe, Sarah Filippi & Michael P.H. Stumpf Theoretical Systems Biology Group 06/09/2012 ABC on GPUs Thorne, Liepe, Filippi&Stumpf 1 of 23
  • 2. Trends in High-Performance Computing CUDA OpenCL MPI OpenMPI PVM ABC on GPUs Thorne, Liepe, Filippi&Stumpf Introduction 2 of 23
  • 3. (Hidden) Costs of Computing Prototyping/Development Time is typically reduced for scripting languages such as R or Python. Run Time on single threads C/C++ (or Fortran) have better performance characteristics. But for specialized tasks other languages, e.g. Haskell, can show good characteristics. Energy Requirements Every Watt we use for computing we also have to extract with air conditioning. The role of GPUs • GPUs can be accessed from many different programming languages (e.g. PyCUDA). • GPUs have a comparatively small footprint and relatively modest energy requirements compared to clusters of CPUs. • GPUs were designed for consumer electronics: computer gamers have different needs from the HPC community. ABC on GPUs Thorne, Liepe, Filippi&Stumpf Introduction 3 of 23
  • 4. What GPUs are Good At • GPUs are good for linear threads involving mathematical functions. • We should avoid loops and branches in the code. • In an optimal world we should aim for all threads to finish at the same time. Execution cycle ABC on GPUs Thorne, Liepe, Filippi&Stumpf Introduction 4 of 23
  • 5. Approximate Bayesian Computation We can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x ) Here fi (x |θ) is the likelihood which is often hard to evaluate; consider for example dy y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and ˜ = g (y ; θ). dt But we can still simulate from the data-generating model, whence 1(y = x )f (y |θi )π(θi ) p(θi |x ) = dy X p (x ) 1 (∆(y , x ) < ) f (y |θi )π(θi ) ≈ dy X p (x ) Solutions for Complex Problems (?) Approximate (i) data, (ii) model or (iii) distance. ABC on GPUs Thorne, Liepe, Filippi&Stumpf Approximate Bayesian Computation 5 of 23
  • 6. Approximate Bayesian Computation Prior, π(θ) Define set of intermediate distributions, πt , t = 1, ...., T 1 > 2 > ...... > T πt −1 (θ|∆(Xs , X ) < t −1 ) πt (θ|∆(Xs , X ) < t) πT (θ|∆(Xs , X ) < T) Sequential importance sampling: Sample from proposal, ηt (θt ) and weight wt (θt ) = πt (θt )/ηt (θt ) with ηt (θt ) = πt −1 (θt −1 )Kt (θt −1 , θt )d θt −1 where Kt (θt −1 , θt ) is Markov perturbation kernel Toni et al., J.Roy.Soc. Interface (2009); Toni & Stumpf, Bioinformatics (2010). ABC on GPUs Thorne, Liepe, Filippi&Stumpf Approximate Bayesian Computation 6 of 23
  • 7. Experimental Design for Dynamical Systems ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 7 of 23
  • 8. Experimental Design for Dynamical Systems ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 7 of 23
  • 9. Parameter Estimation for Dynamical Systems ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 8 of 23
  • 10. Parameter Estimation for Dynamical Systems ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 8 of 23
  • 11. Parameter Estimation for Dynamical Systems Sampling parameters from the ABC posterior yields excellent agreement with provided data (grey dots). The scatter is entirely explained by the added noise. ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 8 of 23
  • 12. Network Evolution Models (a) Duplication attachment (b) Duplication attachment with complimentarity wj (c) Linear preferential wi (d) General scale-free attachment ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 9 of 23
  • 13. ABC on Networks Summarizing Networks • Data are noisy and incomplete. • We can simulate models of network evolution, but this does not allow us to calculate likelihoods for all but very trivial models. • There is also no sufficient statistic that would allow us to summarize networks, so ABC approaches require some thought. • Many possible summary statistics of networks are expensive to calculate. Full likelihood: Wiuf et al., PNAS (2006). ABC: Ratman et al., PLoS Comp.Biol. (2008). Stumpf & Wiuf, J. Roy. Soc. Interface (2010). ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 10 of 23
  • 14. Graph Spectrum c a b c d e   0 1 1 1 0 a a e   d  1 0 1 1 0 b   A = 1 1 0 0 0 c     1 1 0 0 1 d  b 0 0 0 1 0 e Graph Spectra Given a graph G comprised of a set of nodes N and edges (i , j ) ∈ E with i , j ∈ N, the adjacency matrix, A, of the graph is defined by 1 if (i , j ) ∈ E , ai ,j = 0 otherwise. The eigenvalues, λ, of this matrix provide one way of defining the graph spectrum. ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 11 of 23
  • 15. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 12 of 23
  • 16. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j However for unlabelled graphs we require some mapping h from i ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,j ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 12 of 23
  • 17. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j However for unlabelled graphs we require some mapping h from i ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,j Given a spectrum (which is relatively cheap to compute) we have (α) (β) 2 D (A, B ) = λl − λl l ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 12 of 23
  • 18. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 0.5 Model Selection • Inference here was based on all 0.4 the data, not summary Model probability 0.3 Organism S.cerevisae statistics. D.melanogaster H.pylori E.coli • Duplication models receive the 0.2 strongest support from the data. 0.1 • Several models receive support and no model is chosen 0.0 unambiguously. DA DAC LPA SF DACL DACR Model ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 13 of 23
  • 19. GPUs in Computational Statistics Performance • With highly optimized libraries (e.g. BLAST/ATLAS) we can run numerically demanding jobs relatively straightforwardly. • Whenever poor-man’s parallelization is possible, GPUs offer considerable advantages over multi-core CPU systems (at favourable cost and energy requirements). • Performance depends crucially on our ability to map tasks onto the hardware. Challenges • GPU hardware was initially conceived for different purposes — computer gamers need fewer random numbers than MCMC or SMC procedures. The address space accessible to single-precision numbers also suffices to kill zombies etc . • Combining several GPUs requires additional work, using e.g. MPI. ABC on GPUs Thorne, Liepe, Filippi&Stumpf GPU and the Alternatives 14 of 23
  • 20. Alternatives to GPUs: FPGAs Field Programmable Gate Arrays are configurable electronic circuits that are partly (re)configurable. Here the hardware is adapted to the 2 D.B. Thomas, W. Luk, and M. Stumpf problem at hand and encapsulates the programme in its programmable logic arrays. Multiple instances in xc4vlx60 Performance (MSwap-Compare/s) 100000 Single Instance in Virtex-4 Quad Opteron Software 10000 1000 100 10 1 4 9 14 19 24 Graph Size ABC on GPUs Thorne, Liepe, Filippi&Stumpf GPU and the Alternatives 15 of 23
  • 21. Alternatives to GPUs: CPUs (and MPI. . . ) CPUs with multiple cores 40 are flexible, have large address spaces and a wealth of flexible numerical 30 routines. This makes implementation of processor time numerically demanding 20 CPU GPU tasks relatively straightforward. In particular there is less of an 10 incentive to consider how a problem is best implemented in software 100 500 1000 5000 10000 50000 size that takes advantage of hardware features. 6-core Xeon vs M2050 (448 cores) programmed in OpenCL ABC on GPUs Thorne, Liepe, Filippi&Stumpf GPU and the Alternatives 16 of 23
  • 22. (Pseudo) Random Numbers The Mersenne-Twister is one of the standard random number generators for simulation. MT19937 has a period of 219937 − 1 ≈ 4 × 106001 . But MT does not have cryptographic strength (once 624 iterates have been observed, all future states are predictable), unlike Blum-Blum-Shub, xn+1 = xn modM , where M = pq with p, q large prime numbers. But it is too slow for simulations. Parallel Random Number Generation • Running RNGs in parallel does not produce reliable sets of random numbers. • We need algorithms to produce large numbers of parallel streams of “good” random numbers. • We also need better algorithms for weighted sampling. ABC on GPUs Thorne, Liepe, Filippi&Stumpf Random Numbers on GPUs 17 of 23
  • 23. Generating RNGs in Parallel We assume that we have a function xn+1 = f (xn ) which generates a stream of (pseudo) random numbers x0 , x1 , x2 , . . . , xr , . . . , xs , . . . , xt , . . . , xz Parallel Streams x0 , x1 , . . . x0 , x1 , . . . x0 , x1 , . . . Sub-Streams x0 , . . . , xr −1 xr , . . . , xs−1 xs , . . . , xt −1 ABC on GPUs Thorne, Liepe, Filippi&Stumpf Random Numbers on GPUs 18 of 23
  • 24. Counter-Based RNGs We can represent RNGs as state-space processes, yn+1 = f (yn ) with f :Y −→ Y xn+1 = gk ,nmodJ (y n+1/J ) with g :Y × K × ZJ −→ X , where x are essentially behaving as x ∼ U[0,1] ; here K is the key space, J the number of random numbers generated from the internal state of the RNG. Salmon et al.(SC11) propose to use a simple form for f (·) and gk ,j = hk ◦ bk For a sufficiently complex bijection bk we can use simple updates and leave the randomization to bk . Here cryptographic routines come in useful. ABC on GPUs Thorne, Liepe, Filippi&Stumpf Random Numbers on GPUs 19 of 23
  • 25. RNG Performance on CPUs and GPUs Method Max. Min. Output Intel CPU Nvidia GPU AMD GPU input state size cpB GB/s cpB GB/s cpB GB/s Counter-based, Cryptographic AES(sw) (1+0)×16 11×16 1×16 31.2 0.4 – – – – AES(hw) (1+0)×16 11×16 1×16 1.7 7.2 – – – – Threefish (Threefry-4×64-72) (4+4)×8 0 4×8 7.3 1.7 51.8 15.3 302.8 4.5 Counter-based, Crush-resistant ARS-5(hw) (1+1)×16 0 1×16 0.7 17.8 – – – – ARS-7(hw) (1+1)×16 0 1×16 1.1 11.1 – – – – Threefry-2×64-13 (2+2)×8 0 2×8 2.0 6.3 13.6 58.1 25.6 52.5 Threefry-2×64-20 (2+2)×8 0 2×8 2.4 5.1 15.3 51.7 30.4 44.5 Threefry-4×64-12 (4+4)×8 0 4×8 1.1 11.2 9.4 84.1 15.2 90.0 Threefry-4×64-20 (4+4)×8 0 4×8 1.9 6.4 15.0 52.8 29.2 46.4 Threefry-4×32-12 (4+4)×4 0 4×4 2.2 5.6 9.5 83.0 12.8 106.2 Threefry-4×32-20 (4+4)×4 0 4×4 3.9 3.1 15.7 50.4 25.2 53.8 Philox2×64-6 (2+1)×8 0 2×8 2.1 5.9 8.8 90.0 37.2 36.4 Philox2×64-10 (2+1)×8 0 2×8 4.3 2.8 14.7 53.7 62.8 21.6 Philox4×64-7 (4+2)×8 0 4×8 2.0 6.0 8.6 92.4 36.4 37.2 Philox4×64-10 (4+2)×8 0 4×8 3.2 3.9 12.9 61.5 54.0 25.1 Philox4×32-7 (4+2)×4 0 4×4 2.4 5.0 3.9 201.6 12.0 113.1 Philox4×32-10 (4+2)×4 0 4×4 3.6 3.4 5.4 145.3 17.2 79.1 Conventional, Crush-resistant MRG32k3a 0 6×4 1000×4 3.8 3.2 – – – – MRG32k3a 0 6×4 4×4 20.3 0.6 – – – – MRGk5-93 0 5×4 1×4 7.6 1.6 9.2 85.5 – – Conventional, Crushable Mersenne Twister 0 312×8 1×8 2.0 6.1 43.3 18.3 – – XORWOW 0 6×4 1×4 1.6 7.7 5.8 136.7 16.8 81.1 Table 2: Memory and performance characteristics for a variety of counter-based and conventional PRNGs. Taken from Salmon et al.(see References). a counter type of width c×w bytes and a key type of width Maximum input is written as (c+k)×w, indicating k×w bytes. Minimum state and output size are c×w bytes. Counter-based PRNG performance is reported with the minimal number of Liepe, Filippi&Stumpf ABC on GPUs Thorne, rounds for Crush-resistance,Numbers onwith extra rounds for “safety margin.” 23 Random and also GPUs 20 of Performance is shown in bold for recommended PRNGs that have the best platform-specific performance with
  • 26. Conclusions and Challenges • ABC is perhaps a method best left for cases when all else fails. • Its use as an inferential framework in its own right is certainly worth further investigation. • Memory and address space are potential issues: using single precision we are prone to run out of numbers for challenging SMC applications very quickly. • Bandwidth is an issue — coordination between GPU and CPU is much more challenging in statistical applications than e.g. for texture mapping or graphics rendering tasks. • Programming has to be much more hardware aware than for CPUs — more precisely, non-hardware adapted programming will be more obviously less efficient than for CPUs. • There are some differences from conventional ANSI standards for mathematics (e.g. rounding). • Knowing how to harness computational power does allow us to tackle real-world problems. ABC on GPUs Thorne, Liepe, Filippi&Stumpf Conclusions 21 of 23
  • 27. References GPUs and ABC • Zhou, Liepe, Sheng, Stumpf, Barnes (2011) GPU accelerated biochemical network simulation. Bioinformatics 27:874-876. • Liepe, Barnes, Cule, Erguler, Kirk, Toni, Stumpf (2010) ABC-SysBioapproximate Bayesian computation in Python with GPU support. Bioinformatics 26:1797-1799. • Thorne, Stumpf (2012) Graph spectral analysis of protein interaction network evolution. J.Roy.Soc. Interface 9:2653-2666. GPUs and Random Numbers • Salmon, Moraes, Dror, Shaw (2011) Parallel Random Numbers: As Easy as 1, 2, 3. in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, New York; http://doi.acm.org/10.1145/2063384.2063405. • Howes, Thomas (2007) Efficient Random Number Generation and Application Using CUDA in GPU Gems 3, Addison-Wesley Professional, Boston; http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html. ABC on GPUs Thorne, Liepe, Filippi&Stumpf Conclusions 22 of 23
  • 28. Acknowledgements ABC on GPUs Thorne, Liepe, Filippi&Stumpf Conclusions 23 of 23