SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Janus: FPGA Based System
  for Scientific Computing
           Filippo Mantovani
              Physics Department
        Università degli Studi di Ferrara




       Milano, December 19th 2008
Overview:


   1.       The physical problem:
            - Ising model and Spin Glass in short;

   2.       The computational challenge:
            - Monte Carlo “tools”;
            - Computational features of Spin Glass problems;

   3.       The Janus system:
            - an exotic attempt to simulate spin systems;

   4.       Conclusions

Filippo Mantovani, December 19th 2008                          - 2 of 31 -
Statistical mechanics...
    Statistical mechanics tries to describe the
    macroscopic behavior of matter in terms of
     average values of microscopic structure.
For instance, try to explain why
magnets have a transition
temperature (beyond which they
lose their magnetic state)...

                                ...in terms of
                                elementary
                                magnetic dipoles
                                attached to
                                each atom in the material (that we call spins).

Filippo Mantovani, December 19th 2008                                     - 3 of 31 -
The Ising model
Each spin takes only the values +1 (up)
or -1 (down) and interacts only with its
nearest neighbours in a discrete
D-dimensional mesh (e.g. 2D).
 Energy of the system for a
                                                        ∑〈ij 〉 s i s j
                                        U {S i }=−J
 given configuration {Si}:

Note that:
 A) Each configuration {Si} has a probability given
                                         −U  {S } / kT    − U {S }
     by the Boltzmann factor: P{S i }∝e                ≡e              i                   i




  B) Macroscopic observables have averages obtained
     by summing on each configurations, each weighted
     by its probability.
                                            M ≡∑i s i        〈 M 〉 ∝ ∑{S } M {S i } P{S i }
                       Magnetization:
                                                                             i


Filippo Mantovani, December 19th 2008                                                     - 4 of 31 -
The Ising model

   What about the “J”?!?        U {S i }=−J ∑〈ij 〉 s i s j
   In the Ising model we
   assume that J are constant (=+1 or -1).
   The energy profile is therefore smooth (and
   in this case symmetric):




Filippo Mantovani, December 19th 2008                    - 5 of 31 -
The Spin Glass model
   Spin glasses are a generalization of Ising systems.
   An apparently trivial change in the energy function:
              U{Si }=−∑〈ij 〉 Jij si s j , s={1,−1}, J={1,−1 }

   makes spin glass models much more interesting and
   complex than Ising systems: if we leave J NOT fixed,
   we obtain 2 important consequences:
    1)      frustration:                2)   corrugation in
                                             energy function:




Filippo Mantovani, December 19th 2008                           - 6 of 31 -
Problems around that...

     Problems like finding the minimum energy
      configuration at a given temperature or
         finding the critical temperature are
        absolutely not trivial and represent a
             computational nightmare!!!

   HINT:
    If we consider a 3D lattice, with 48
    spins per side, we obtain that the
                                  483!!!
    possible configurations are 2


Filippo Mantovani, December 19th 2008       - 7 of 31 -
Overview:


   1.       The physical problem:
            - Spin Glass and Ising model in short;

   2.       The computational challenge:
            - Monte Carlo “tools”;
            - Computational features of Spin Glass problems;

   3.       The Janus system:
            - an exotic attempt to simulate spin systems;

   4.       Conclusions

Filippo Mantovani, December 19th 2008                          - 8 of 31 -
Monte Carlo algorithms
As seen before, relative small lattices produce a huge
number of possible configurations.
        Monte Carlo algorithms are tools allowing us to
        visit configuration according to its probability.

 The Markov chains theory
 assures us that is valid the
 property of ergodicity:


 So the probability of the
 configuration i, after a time
 long enough, tends to a
 probability
 (equilibrium condition).

Filippo Mantovani, December 19th 2008                - 9 of 31 -
Monte Carlo algorithms

 A direct consequence of this Markov chains property is
 that we can produce configurations that are distributed
 following a given distribution.
                                        C MC ;      i=1 n
                                          i


 The “good news” is that observables are trivially
 computed as unweighted sums of Monte Carlo
 generated configurations*:

                           〈 f 〉 ∝ ∑ {C } f C P C=∑i f Ci 
                                                                      MC




 *we do not need any more to sum the observable weighted with the probability.

Filippo Mantovani, December 19th 2008                                            - 10 of 31 -
The Metropolis algorithm
  Assuming a 3D lattice composed of spins, we can
  update each lattice site following these steps:
    ✔ consider one spin s0 and compute his
      local energy E: E=∑〈 0j〉 J ij s j s0
    ✔ flip the value of the spin: s0' = -s0
    ✔ compute the new local energy E': E '=∑〈 0j〉 J ij s j s0 '
    ✔ if E'<E then the spin flip is accepted;
           else we compute X =exp−E '−E/T  and generate a
           pseudo-random number R (0≤R≤1):
                 if R<X then the spin flip is accepted as well
                  else remains s.
  NOTE: If we replace the spin lattice with the topology of a random graph the algorithm
        keeps working (e.g. this is exactly what we used to implement graph coloring).

Filippo Mantovani, December 19th 2008                                              - 11 of 31 -
Or easier...




Filippo Mantovani, December 19th 2008   - 12 of 31 -
Computational features

   • Operations on spins ≡ bit-manipulation.
   • Computation of a limited set of exponential.
       (We can store them in LUTs).

   • Random numbers should be of “good quality”
     and with a long period.
   • A huge degree of available parallelism.
   • Regular program flow.
       (orderly loops on the grid sites)

   • Regular, predictable memory access pattern.
   • Information-exchange (processor ‹› memory)
     is huge, however the size of the data-base is tiny.
       (On-chip memory seems to be ideal...)

Filippo Mantovani, December 19th 2008                 - 13 of 31 -
The PC approach...
   Using standard architectures as Spin Glass engines two
   different algorithms are commonly used:

   Synchronous Multi-Spin Coding (SMSC):
      one CPU handles one single system;
      replicas are handled on a farm of several CPU (128 - 256);
      at every step each CPU update in parallel N spins
       (N=2 or 4);
   NOTE: bottleneck is the number of floating point random
   numbers can be generated in parallel.

   Asynchronous Multi-Spin Coding (AMSC):
      each CPU handles several (64 - 128) systems in parallel;
      replicas are handled on CPU farm (smaller than previous);
      at every step each CPU update in parallel 64 - 128 spins
       of different systems;
      on each single CPU random number is shared among
       all systems.
Filippo Mantovani, December 19th 2008                        - 14 of 31 -
Overview:


   1.       The physical problem:
            - Spin Glass and Ising model in short;

   2.       The computational challenge:
            - Monte Carlo “tools”;
            - Computational features of Spin Glass problems;

   3.       The Janus system:
            - an exotic attempt to simulate spin systems;

   4.       Conclusions

Filippo Mantovani, December 19th 2008                          - 15 of 31 -
The Janus project
A collaboration of:
 Universities of Rome “La Sapienza” and Ferrara
 Universities of Madrid, Zaragoza, Extremadura
 BIFI (Zaragoza)
 Eurotech




Filippo Mantovani, December 19th 2008              - 16 of 31 -
The Janus approach
Preamble: We split the 3D lattice in 2 parts,
“black” spins and “white” spins
(slice = checkerboard). We cannot update in
the same time black spins and white spins.



    We store the lattice within the embedded
     memories of the FPGA.
    We use a set of registers to have no latency
     access to a “small set” of spins (and couplings)
    We house a set (as large as possible) of
     Update Engines...


Filippo Mantovani, December 19th 2008             - 17 of 31 -
The Janus approach

   Each Update Engine:
     computes the local contribution to U:
                 U=−∑〈 ij〉 si Jij s j
     addresses a probability table
       according with U;
     compares with a freshly generated;
       random number;
     sets the new spin value;


Filippo Mantovani, December 19th 2008     - 18 of 31 -
The Janus approach
   A whole picture:




Filippo Mantovani, December 19th 2008   - 19 of 31 -
The toy...
   The basic hardware elements are:
    ▸ a 2-D grid of 4 x 4 FPGA-based processors (SPs);
    ▸ data links among nearest neighbours
      on the FPGA grid;
    ▸ one control processor on each board (IOP)
      with 2 gbit-ethernet ports;
    ▸ a standard PC (Janus host) for each 2 Janus boards.




Filippo Mantovani, December 19th 2008               - 20 of 31 -
Janus summary:
    256 Xilinx Virtex4-LX200 (+16 IOPs);
    1024 update engines on each processor;
    pipelineable to one spin update per
     clock cycle;
    88% of available logic resources;
    using a bandwidth of ~12000 read bits
     + ~1000 written bits per clock cycle
     ⇒ 47% of available on-chip memory;
    system clock at 62.5 MHz
     ⇒ 16 ps average spin update time.
    1 SP ~40W, 16 SP ~640W, rack ~11KW
Filippo Mantovani, December 19th 2008         - 21 of 31 -
Performance
 (for physicists)

           The relevant performance measure is the
             average update time per spin (R)
                measured in pico-sec per spin.

   For each FPGA in the system we have:
     R = 1024 spin updates for each clock cycle
        = 1024 * 16 ns
        ~ 16 ps / spin

   For one complete 16 processors board:
     R = 1/16 * 1024 spin updates for each clock cycle
        = 1/16 * 1024 * 16 ns
        ~ 1 ps / spin

Filippo Mantovani, December 19th 2008                - 22 of 31 -
Performance
 (for computer scientists)
   Each update-core performs 11 + 2 sustained pipelined
   operations* per clock cycle (@ 62.5 MHz)




                                                                 Operations
   1024 update engines perform:
                1024 × 13 ops / 16 ns = 832 Gops
                1024 × 3 ops / 16 ns = 192 Gops**

   * these operations are on very short data words!!!
   ** submitted to Gordon Bell Prize 2008.

   Sustaining these performance requires a huge bandwidth




                                                                 Memory bandwidth
   to/from (fine-grained) memory:
    ▸ processing just one bit implies getting 12 more bits
       from memory: 6 neighbours + 6 “J” = 12 bits;
    ▸ generating pseudo-randoms requires 3 32-bit data;
    ▸ 1 more 32-bit data (from the LUT) is necessary
       for MC step;
   All in all the memory bandwidth sustained is:
                    140 bit × 1024/16 ns ~ 1 TB/s
Filippo Mantovani, December 19th 2008                        - 23 of 31 -
Janus vs PC
    Assuming to perform 1012 Monte Carlo steps on a lattice size of 643 we
    obtain the following performances:

                   JANUS                          AMSC      SMSC
   processor         1 SP                          1 CPU    1 CPU
   statistic        1 (16)                        1 (128)     1
   wall clock time 50 days                       770 years 25 years
   energy           2.7 GJ                         2.3 TJ   77 GJ
   processor       256 SPs                        2 CPUs 256 CPUs
   statistic         256                            256      256
   wall clock time 50 days                       770 years 25 years
   energy           43 GJ                          4.6 TJ   20 TJ
  NOTE: 256 replicas are “physically/statistically interesting”: this is the
        reason of this choice!!!
Filippo Mantovani, December 19th 2008                                          - 24 of 31 -
Janus gallery: SP




Filippo Mantovani, December 19th 2008   - 25 of 31 -
Janus gallery: IOP




Filippo Mantovani, December 19th 2008   - 26 of 31 -
Janus gallery: board




Filippo Mantovani, December 19th 2008   - 27 of 31 -
Janus gallery: rack




Filippo Mantovani, December 19th 2008   - 28 of 31 -
Overview:


   1.       The physical problem:
            - Spin Glass and Ising model in short;
            - LQCD vs statistical mechanics;

   2.       The computational challenge:
            - Monte Carlo “tools”;
            - Computational features of Spin Glass problems;

   3.       The Janus system:
            - an exotic attempt to simulate spin systems;

   4.       Conclusions

Filippo Mantovani, December 19th 2008                          - 29 of 31 -
Conclusions
  ✔ the Janus system is an heterogeneous massively
    parallel systems;
  ✔ made by highly parallel processors implemented
    on FPGA;
  ✔ delivers impressive performance: 1 SP is equivalent
    to ~100 PC and ~30-50 new multi-core architectures;
  ✔ used for scientific applications like Spin Glass,
    but in principle available for any kind of problems
    fitting the topology of the machine and the
    resources of the FPGAs.
  ✔ uses NON-challenge technology...
    (plug the LEGO bricks and play)
Filippo Mantovani, December 19th 2008                 - 30 of 31 -
A slightly broader view...
 The study of these systems:
  ✰ for a physicist means a better understanding
    of condensed matter;
  ✰ for a computer scientist means developing
    new algorithms and improving old ones;
  ✰ for a “computer mason” (like me) means
    developing a massively parallel high
    performance system.

 Moreover:
 All these systems are
 surprisingly strong related
 with other areas,
 like e.g. graph coloring.

Filippo Mantovani, December 19th 2008              - 31 of 31 -
Filippo Mantovani, December 19th 2008   - 32 of 31 -
About the random numbers

   The pseudo random numbers are generated
    using a shift-register of N words of 32 bits
      and can generate up to ~100 random
             numbers per clock cycle.




  NOTE: 1 shift-register generating 32 numbers
        uses ~2% of the FPGA resources (~2500 Slices).
Filippo Mantovani, December 19th 2008               - 33 of 31 -

Weitere ähnliche Inhalte

Mehr von Marco Santambrogio (20)

RCIM 2008 - - hArtes Atmel
RCIM 2008 - - hArtes AtmelRCIM 2008 - - hArtes Atmel
RCIM 2008 - - hArtes Atmel
 
RCIM 2008 - - UniCal
RCIM 2008 - - UniCalRCIM 2008 - - UniCal
RCIM 2008 - - UniCal
 
RCIM 2008 - - ALTERA
RCIM 2008 - - ALTERARCIM 2008 - - ALTERA
RCIM 2008 - - ALTERA
 
DHow2 - L6 VHDL
DHow2 - L6 VHDLDHow2 - L6 VHDL
DHow2 - L6 VHDL
 
DHow2 - L6 Ant
DHow2 - L6 AntDHow2 - L6 Ant
DHow2 - L6 Ant
 
DHow2 - L5
DHow2 - L5DHow2 - L5
DHow2 - L5
 
RCIM 2008 - - ALaRI
RCIM 2008 - - ALaRIRCIM 2008 - - ALaRI
RCIM 2008 - - ALaRI
 
RCIM 2008 - Modello Scheduling
RCIM 2008 - Modello SchedulingRCIM 2008 - Modello Scheduling
RCIM 2008 - Modello Scheduling
 
RCIM 2008 - HLR
RCIM 2008 - HLRRCIM 2008 - HLR
RCIM 2008 - HLR
 
RCIM 2008 -- EHW
RCIM 2008 -- EHWRCIM 2008 -- EHW
RCIM 2008 -- EHW
 
RCIM 2008 - Modello Generale
RCIM 2008 - Modello GeneraleRCIM 2008 - Modello Generale
RCIM 2008 - Modello Generale
 
RCIM 2008 - Allocation Relocation
RCIM 2008 - Allocation RelocationRCIM 2008 - Allocation Relocation
RCIM 2008 - Allocation Relocation
 
RCIM 2008 - - hArtes_Ferrara
RCIM 2008 - - hArtes_FerraraRCIM 2008 - - hArtes_Ferrara
RCIM 2008 - - hArtes_Ferrara
 
RCIM 2008 - Intro
RCIM 2008 - IntroRCIM 2008 - Intro
RCIM 2008 - Intro
 
DHow2 - L2
DHow2 - L2DHow2 - L2
DHow2 - L2
 
DHow2 - L4
DHow2 - L4DHow2 - L4
DHow2 - L4
 
DHow2 - L1
DHow2 - L1DHow2 - L1
DHow2 - L1
 
RCW@DEI - Treasure hunt
RCW@DEI - Treasure huntRCW@DEI - Treasure hunt
RCW@DEI - Treasure hunt
 
RCW@DEI - ADL
RCW@DEI - ADLRCW@DEI - ADL
RCW@DEI - ADL
 
RCW@DEI - Design Flow 4 SoPc
RCW@DEI - Design Flow 4 SoPcRCW@DEI - Design Flow 4 SoPc
RCW@DEI - Design Flow 4 SoPc
 

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

RCIM 2008 - Janus

  • 1. Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Milano, December 19th 2008
  • 2. Overview: 1. The physical problem: - Ising model and Spin Glass in short; 2. The computational challenge: - Monte Carlo “tools”; - Computational features of Spin Glass problems; 3. The Janus system: - an exotic attempt to simulate spin systems; 4. Conclusions Filippo Mantovani, December 19th 2008 - 2 of 31 -
  • 3. Statistical mechanics... Statistical mechanics tries to describe the macroscopic behavior of matter in terms of average values of microscopic structure. For instance, try to explain why magnets have a transition temperature (beyond which they lose their magnetic state)... ...in terms of elementary magnetic dipoles attached to each atom in the material (that we call spins). Filippo Mantovani, December 19th 2008 - 3 of 31 -
  • 4. The Ising model Each spin takes only the values +1 (up) or -1 (down) and interacts only with its nearest neighbours in a discrete D-dimensional mesh (e.g. 2D). Energy of the system for a ∑〈ij 〉 s i s j U {S i }=−J given configuration {Si}: Note that: A) Each configuration {Si} has a probability given −U  {S } / kT − U {S } by the Boltzmann factor: P{S i }∝e ≡e i i B) Macroscopic observables have averages obtained by summing on each configurations, each weighted by its probability. M ≡∑i s i 〈 M 〉 ∝ ∑{S } M {S i } P{S i } Magnetization: i Filippo Mantovani, December 19th 2008 - 4 of 31 -
  • 5. The Ising model What about the “J”?!? U {S i }=−J ∑〈ij 〉 s i s j In the Ising model we assume that J are constant (=+1 or -1). The energy profile is therefore smooth (and in this case symmetric): Filippo Mantovani, December 19th 2008 - 5 of 31 -
  • 6. The Spin Glass model Spin glasses are a generalization of Ising systems. An apparently trivial change in the energy function: U{Si }=−∑〈ij 〉 Jij si s j , s={1,−1}, J={1,−1 } makes spin glass models much more interesting and complex than Ising systems: if we leave J NOT fixed, we obtain 2 important consequences: 1) frustration: 2) corrugation in energy function: Filippo Mantovani, December 19th 2008 - 6 of 31 -
  • 7. Problems around that... Problems like finding the minimum energy configuration at a given temperature or finding the critical temperature are absolutely not trivial and represent a computational nightmare!!! HINT: If we consider a 3D lattice, with 48 spins per side, we obtain that the 483!!! possible configurations are 2 Filippo Mantovani, December 19th 2008 - 7 of 31 -
  • 8. Overview: 1. The physical problem: - Spin Glass and Ising model in short; 2. The computational challenge: - Monte Carlo “tools”; - Computational features of Spin Glass problems; 3. The Janus system: - an exotic attempt to simulate spin systems; 4. Conclusions Filippo Mantovani, December 19th 2008 - 8 of 31 -
  • 9. Monte Carlo algorithms As seen before, relative small lattices produce a huge number of possible configurations. Monte Carlo algorithms are tools allowing us to visit configuration according to its probability. The Markov chains theory assures us that is valid the property of ergodicity: So the probability of the configuration i, after a time long enough, tends to a probability (equilibrium condition). Filippo Mantovani, December 19th 2008 - 9 of 31 -
  • 10. Monte Carlo algorithms A direct consequence of this Markov chains property is that we can produce configurations that are distributed following a given distribution. C MC ; i=1 n i The “good news” is that observables are trivially computed as unweighted sums of Monte Carlo generated configurations*: 〈 f 〉 ∝ ∑ {C } f C P C=∑i f Ci  MC *we do not need any more to sum the observable weighted with the probability. Filippo Mantovani, December 19th 2008 - 10 of 31 -
  • 11. The Metropolis algorithm Assuming a 3D lattice composed of spins, we can update each lattice site following these steps: ✔ consider one spin s0 and compute his local energy E: E=∑〈 0j〉 J ij s j s0 ✔ flip the value of the spin: s0' = -s0 ✔ compute the new local energy E': E '=∑〈 0j〉 J ij s j s0 ' ✔ if E'<E then the spin flip is accepted; else we compute X =exp−E '−E/T  and generate a pseudo-random number R (0≤R≤1):  if R<X then the spin flip is accepted as well else remains s. NOTE: If we replace the spin lattice with the topology of a random graph the algorithm keeps working (e.g. this is exactly what we used to implement graph coloring). Filippo Mantovani, December 19th 2008 - 11 of 31 -
  • 12. Or easier... Filippo Mantovani, December 19th 2008 - 12 of 31 -
  • 13. Computational features • Operations on spins ≡ bit-manipulation. • Computation of a limited set of exponential. (We can store them in LUTs). • Random numbers should be of “good quality” and with a long period. • A huge degree of available parallelism. • Regular program flow. (orderly loops on the grid sites) • Regular, predictable memory access pattern. • Information-exchange (processor ‹› memory) is huge, however the size of the data-base is tiny. (On-chip memory seems to be ideal...) Filippo Mantovani, December 19th 2008 - 13 of 31 -
  • 14. The PC approach... Using standard architectures as Spin Glass engines two different algorithms are commonly used: Synchronous Multi-Spin Coding (SMSC):  one CPU handles one single system;  replicas are handled on a farm of several CPU (128 - 256);  at every step each CPU update in parallel N spins (N=2 or 4); NOTE: bottleneck is the number of floating point random numbers can be generated in parallel. Asynchronous Multi-Spin Coding (AMSC):  each CPU handles several (64 - 128) systems in parallel;  replicas are handled on CPU farm (smaller than previous);  at every step each CPU update in parallel 64 - 128 spins of different systems;  on each single CPU random number is shared among all systems. Filippo Mantovani, December 19th 2008 - 14 of 31 -
  • 15. Overview: 1. The physical problem: - Spin Glass and Ising model in short; 2. The computational challenge: - Monte Carlo “tools”; - Computational features of Spin Glass problems; 3. The Janus system: - an exotic attempt to simulate spin systems; 4. Conclusions Filippo Mantovani, December 19th 2008 - 15 of 31 -
  • 16. The Janus project A collaboration of:  Universities of Rome “La Sapienza” and Ferrara  Universities of Madrid, Zaragoza, Extremadura  BIFI (Zaragoza)  Eurotech Filippo Mantovani, December 19th 2008 - 16 of 31 -
  • 17. The Janus approach Preamble: We split the 3D lattice in 2 parts, “black” spins and “white” spins (slice = checkerboard). We cannot update in the same time black spins and white spins.  We store the lattice within the embedded memories of the FPGA.  We use a set of registers to have no latency access to a “small set” of spins (and couplings)  We house a set (as large as possible) of Update Engines... Filippo Mantovani, December 19th 2008 - 17 of 31 -
  • 18. The Janus approach Each Update Engine:  computes the local contribution to U: U=−∑〈 ij〉 si Jij s j  addresses a probability table according with U;  compares with a freshly generated; random number;  sets the new spin value; Filippo Mantovani, December 19th 2008 - 18 of 31 -
  • 19. The Janus approach A whole picture: Filippo Mantovani, December 19th 2008 - 19 of 31 -
  • 20. The toy... The basic hardware elements are: ▸ a 2-D grid of 4 x 4 FPGA-based processors (SPs); ▸ data links among nearest neighbours on the FPGA grid; ▸ one control processor on each board (IOP) with 2 gbit-ethernet ports; ▸ a standard PC (Janus host) for each 2 Janus boards. Filippo Mantovani, December 19th 2008 - 20 of 31 -
  • 21. Janus summary:  256 Xilinx Virtex4-LX200 (+16 IOPs);  1024 update engines on each processor;  pipelineable to one spin update per clock cycle;  88% of available logic resources;  using a bandwidth of ~12000 read bits + ~1000 written bits per clock cycle ⇒ 47% of available on-chip memory;  system clock at 62.5 MHz ⇒ 16 ps average spin update time.  1 SP ~40W, 16 SP ~640W, rack ~11KW Filippo Mantovani, December 19th 2008 - 21 of 31 -
  • 22. Performance (for physicists) The relevant performance measure is the average update time per spin (R) measured in pico-sec per spin. For each FPGA in the system we have: R = 1024 spin updates for each clock cycle = 1024 * 16 ns ~ 16 ps / spin For one complete 16 processors board: R = 1/16 * 1024 spin updates for each clock cycle = 1/16 * 1024 * 16 ns ~ 1 ps / spin Filippo Mantovani, December 19th 2008 - 22 of 31 -
  • 23. Performance (for computer scientists) Each update-core performs 11 + 2 sustained pipelined operations* per clock cycle (@ 62.5 MHz) Operations 1024 update engines perform: 1024 × 13 ops / 16 ns = 832 Gops 1024 × 3 ops / 16 ns = 192 Gops** * these operations are on very short data words!!! ** submitted to Gordon Bell Prize 2008. Sustaining these performance requires a huge bandwidth Memory bandwidth to/from (fine-grained) memory: ▸ processing just one bit implies getting 12 more bits from memory: 6 neighbours + 6 “J” = 12 bits; ▸ generating pseudo-randoms requires 3 32-bit data; ▸ 1 more 32-bit data (from the LUT) is necessary for MC step; All in all the memory bandwidth sustained is: 140 bit × 1024/16 ns ~ 1 TB/s Filippo Mantovani, December 19th 2008 - 23 of 31 -
  • 24. Janus vs PC Assuming to perform 1012 Monte Carlo steps on a lattice size of 643 we obtain the following performances: JANUS AMSC SMSC processor 1 SP 1 CPU 1 CPU statistic 1 (16) 1 (128) 1 wall clock time 50 days 770 years 25 years energy 2.7 GJ 2.3 TJ 77 GJ processor 256 SPs 2 CPUs 256 CPUs statistic 256 256 256 wall clock time 50 days 770 years 25 years energy 43 GJ 4.6 TJ 20 TJ NOTE: 256 replicas are “physically/statistically interesting”: this is the reason of this choice!!! Filippo Mantovani, December 19th 2008 - 24 of 31 -
  • 25. Janus gallery: SP Filippo Mantovani, December 19th 2008 - 25 of 31 -
  • 26. Janus gallery: IOP Filippo Mantovani, December 19th 2008 - 26 of 31 -
  • 27. Janus gallery: board Filippo Mantovani, December 19th 2008 - 27 of 31 -
  • 28. Janus gallery: rack Filippo Mantovani, December 19th 2008 - 28 of 31 -
  • 29. Overview: 1. The physical problem: - Spin Glass and Ising model in short; - LQCD vs statistical mechanics; 2. The computational challenge: - Monte Carlo “tools”; - Computational features of Spin Glass problems; 3. The Janus system: - an exotic attempt to simulate spin systems; 4. Conclusions Filippo Mantovani, December 19th 2008 - 29 of 31 -
  • 30. Conclusions ✔ the Janus system is an heterogeneous massively parallel systems; ✔ made by highly parallel processors implemented on FPGA; ✔ delivers impressive performance: 1 SP is equivalent to ~100 PC and ~30-50 new multi-core architectures; ✔ used for scientific applications like Spin Glass, but in principle available for any kind of problems fitting the topology of the machine and the resources of the FPGAs. ✔ uses NON-challenge technology... (plug the LEGO bricks and play) Filippo Mantovani, December 19th 2008 - 30 of 31 -
  • 31. A slightly broader view... The study of these systems: ✰ for a physicist means a better understanding of condensed matter; ✰ for a computer scientist means developing new algorithms and improving old ones; ✰ for a “computer mason” (like me) means developing a massively parallel high performance system. Moreover: All these systems are surprisingly strong related with other areas, like e.g. graph coloring. Filippo Mantovani, December 19th 2008 - 31 of 31 -
  • 32. Filippo Mantovani, December 19th 2008 - 32 of 31 -
  • 33. About the random numbers The pseudo random numbers are generated using a shift-register of N words of 32 bits and can generate up to ~100 random numbers per clock cycle. NOTE: 1 shift-register generating 32 numbers uses ~2% of the FPGA resources (~2500 Slices). Filippo Mantovani, December 19th 2008 - 33 of 31 -