SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Heiko J Schick – IBM Deutschland R&D GmbH
January 2011




Experiences in Application Specific
Supercomputer Design
Reasons, Challenges and Lessons Learned




                                            © 2011 IBM Corporation
Agenda



 The Road to Exascale


 Reasons for Application Specific Supercomputers


 Example: QPACE    QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)



 Challenges and Lessons Learned




2                                                                                      © 2009 IBM Corporation
Where are we now? Blue Gene/Q !!!



 BG/Q Overview
   – 16384 cores per compute rack
   – Water cooled, 42U compute rack
   – PowerPC compliant 64-bit microprocessor
   – Double precision, quad pipe floating point acceleration on each core
   – The system is scalable to 1024 compute racks, each with 1024 compute node ASICs.


 BG/Q Compute Node
   – 16 PowerPC processing cores per node, 4w MT, 1,6 GHZ
   – 16 GB DDR3 SDRAM memory per node with 40GB/s DRAM access
   – 3 GF/W
       • 200GF/Chip
       • 60W/Node (All-inclusive: DRAM, Power Conversion, etc.)
   – Integrated Network




3                                                                            © 2009 IBM Corporation
Projected Performance Development




                   Almost a doubling every year !!!


4                                                     © 2009 IBM Corporation
The Big Leap from Petaflops to Exaflops



 We will hit 20 Petaflop in 2011/2012 …. Now beginning research for ~2018 Exascale.


 IT/CMOS industry is trying to double performance every 2 years.
  HPC industry is trying to double performance every year.


 Technology disruptions in many areas.


    – BAD NEWS: Scalability of current technologies?
       • Silicon Power, Interconnect, Memory, Packaging.

    – GOOD NEWS: Emerging technologies?
       • Memory technologies (e.g. storage class memory)


 Exploiting exascale machines.
   – Want to maximize science output per €.
   – Need multiple partner applications to evaluate HW trade-offs.
5                                                                               © 2009 IBM Corporation
Extrapolating an Exaflop in 2018
     Standard technology scaling will not get us there in 2018
                       BlueGene/L   Exaflop       Exaflop compromise   Assumption for “compromise guess”
                       (2005)       Directly      using traditional
                                    scaled        technology
Node Peak Perf         5.6GF        20TF          20TF                 Same node count (64k)

hardware               2            8000          1600                 Assume 3.5GHz
concurrency/node

System Power in        1 MW         3.5 GW        25 MW                Expected based on technology improvement through 4 technology generations. (Only
Compute Chip                                                           compute chip power scaling, I/Os also scaled same way)

Link Bandwidth         1.4Gbps      5 Tbps        1 Tbps               Not possible to maintain bandwidth ratio.
(Each unidirectional
3-D link)

Wires per              2            400 wires     80 wires             Large wire count will eliminate high density and drive links onto cables where they are
unidirectional 3-D                                                     100x more expensive. Assume 20 Gbps signaling
link

Pins in network on     24 pins      5,000 pins    1,000 pins           20 Gbps differential assumed. 20 Gbps over copper will be limited to 12 inches. Will need
node                                                                   optics for in rack interconnects.
                                                                       10Gbps now possible in both copper and optics.

Power in network       100 KW       20 MW         4 MW                 10 mW/Gbps assumed.
                                                                       Now: 25 mW/Gbps for long distance (greater than 2 feet on copper) for both ends one
                                                                       direction. 45mW/Gbps optics both ends one direction. + 15mW/Gbps of electrical
                                                                       Electrical power in future: separately optimized links for power.


Memory                 5.6GB/s      20TB/s        1 TB/s               Not possible to maintain external bandwidth/Flop
Bandwidth/node

L2 cache/node          4 MB         16 GB         500 MB               About 6-7 technology generations with expected eDRAM density improvements

Data pins associated   128 data     40,000 pins   2000 pins            3.2 Gbps per pin
with memory/node       pins

Power in memory I/O    12.8 KW      80 MW         4 MW                 10 mW/Gbps assumed. Most current power in address bus.
(not DRAM)                                                             Future probably about 15mW/Gbps maybe get to 10mW/Gbps (2.5mW/Gbps is c*v^2*f for
                                                                       random data on data pins) Address power is higher.

6                                                                                                                                          © 2009 IBM Corporation
Building Blocks of Matter




 QPACE = QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)


 Quarks are the constituents of matter which strongly interact exchanging gluons.


 Particular phenomena
    – Confinement
    – Asymptotic freedom (Nobel Prize 2004)


 Theory of strong interactions = Quantum Chromodynamics (QCD)

7                                                                                    © 2009 IBM Corporation
Balanced Hardware



 Example caxpy:




    Processor        FPU throughput    Memory bandwidth


                     [FLOPS / cycle]    [words / cycle]   [FLOPS / word]

    apeNEXT                8                   2                4
    QCDOC (MM)             2                 0.63              3.2
    QCDOC (LS)             2                   2                1
    Xeon                   2                 0.29               7
    GPU                  128 x 2            17.3 (*)           14.8
    Cell/B.E. (MM)        8x4                  1               32
    Cell/B.E. (LS)        8x4                8x4                1


9                                                               © 2009 IBM Corporation
Balanced Systems ?!?




10                     © 2009 IBM Corporation
… but are they Reliable, Available and Serviceable ?!?




11                                                       © 2009 IBM Corporation
Collaboration and Credits



 QPACE = QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)


 Academic Partners
     –   University Regensburg       S. Heybrock, D. Hierl, T. Maurer, N. Meyer, A. Nobile, A. Schaefer, S. Solbrig, T. Streuer, T. Wettig
     –   University Wuppertal        Z. Fodor, A. Frommer, M. Huesken
     –   University Ferrara          M. Pivanti, F. Schifano, R. Tripiccione
     –   University Milano           H. Simma
     –   DESY Zeuthen                D.Pleiter, K.-H. Sulanke, F. Winter
     –   Research Lab Juelich        M. Drochner, N. Eicker, T. Lippert

 Industrial Partner
     – IBM   (DE, US, FR)            H. Baier, H. Boettiger, A. Castellane, J.-F. Fauh, U. Fischer, G. Goldrian, C. Gomez, T. Huth, B. Krill,
                                     J. Lauritsen, J. McFadden, I. Ouda, M. Ries, H.J. Schick, J.-S. Vogt

 Main Funding
   – DFG (SFB TR55), IBM
 Support by Others
     – Eurotech (IT) , Knuerr (DE), Xilinx (US)


13                                                                                                                           © 2009 IBM Corporation
Production Chain




Major steps
     – Pre-integration at University Regensburg
     – Integration at IBM / Boeblingen
     – Installation at FZ Juelich and University Wuppertal




14                                                           © 2009 IBM Corporation
Concept



 System
     – Node card with IBM® PowerXCell™ 8i processor and network processor (NWP)
         • Important feature: fast double precision arithmetic's
     – Commodity processor interconnected by a custom network
     – Custom system design
     – Liquid cooling system


 Rack parameters
     – 256 node cards
         • 26 TFLOPS peak (double precision)
         • 1 TB Memory
     – O(35) kWatt power consumption


 Applications
     – Target sustained performance of 20-30%
     – Optimized for calculations in theoretical particle physics:
       Simulation of Quantum Chromodynamics




15                                                                                © 2009 IBM Corporation
Networks



 Torus network
     – Nearest-neighbor communication, 3-dimensional torus topology
     – Aggregate bandwidth 6 GByte/s per node and direction
     – Remote DMA communication (local store to local store)



 Interrupt tree network
     – Evaluation of global conditions and synchronization
     – Global Exceptions
     – 2 signals per direction



 Ethernet network
     –   1 Gigabit Ethernet link per node card to rack-level switches (switched network)
     –   I/O to parallel file system (user input / output)
     –   Linux network boot
     –   Aim of O(10) GB bandwidth per rack




16                                                                                         © 2009 IBM Corporation
Root Card
       (16 per rack)                            Backplane
                                                 (8 per rack)




     Node Card
     (256 per rack)


                              Power Supply and Power Adapter Card
                                           (24 per rack)
                       Rack


17                                                         © 2009 IBM Corporation
Node Card



 Components
     –   IBM PowerXCell 8i processor 3.2 GHZ
     –   4 Gigabyte DDR2 memory 800 MHZ with ECC
     –   Network processor (NWP) Xilinx FPGA LX110T FPGA
     –   Ethernet PHY
     –   6 x 1GB/s external links using PCI Express physical layer
     –   Service Processor (SP) Freescale 52211
     –   FLASH (firmware and FPGA configuration)
     –   Power subsystem
     –   Clocking


 Network Processor
     –   FLEXIO interface to PowerXCell 8i processor, 2 bytes with 3 GHZ bit rate
     –   Gigabit Ethernet
     –   UART FW Linux console
     –   UART SP communication
     –   SPI Master (boot flash)
     –   SPI Slave for training and configuration
     –   GPIO


18                                                                                  © 2009 IBM Corporation
Node Card

                              Network Processor   Network PHYs
              PowerXCell 8i   (FPGA)
     Memory   Processor




19                                                           © 2009 IBM Corporation
Node Card

                                                      DDR2           DDR2
                                                   DDR2            DDR2

                                                                       800MHz


              I2C
                           Power             SPI                                   RW
                         Subsystem                                               (Debug)
                                                       PowerXCell 8i




                                                   FLEXIO           FLEXIO
                          Clocking
                                                    6GB/s            6GB/s


                                                                                                 RS232
                                       SPI
            I2C
                       SP                                    FPGA Virtex-5
                                     UART
                    Freescale
                    MCF52211                                                         GigE         PHY



                                     SPI                                384 IO@250MHZ
                      Flash
                                                                                   4*8*2*6 = 384 IO
                                                                                680 available (LX110T)
                                                            6x 1GB/s PHY




                                                        Compute Network




20                                                                                                       © 2009 IBM Corporation
Network Processor

                                              x+
                                    Link                          Slices                     92 %
                                              PHY
                                 Interface
                                                                  PINs                       86 %
                                              x-
                                    Link
                                                                  LUT-FF pairs               73 %
                                              PHY
                                 Interface
                                                                  Flip-Flops                 55 %
                                              
                                              
                 Network Logic
                                                                LUTs                       53 %
                                              z-

      FlexIO       Routing          Link                          BRAM / FIFOs               35 %
                                              PHY
     Interface                   Interface
                  Arbitration

                    FIFOs
                                 Ethernet
                                              PHY
                 Configuration   Interface

                                  Global                                   Flip-Flops       LUTs
                                  Signals

                                                   Processor Interface          53 %         46 %
                                   Serial
                                 Interfaces
                                                   Torus                        36 %         39 %
                                 SPI Flash         Ethernet                      4%            2%



21                                                                               © 2009 IBM Corporation
Torus Network Architecture



 2-sided communication
     – Node A initiates send, node B initiates receive
     – Send and receive commands have to match
     – Multiple use of same link by virtual channels


 Send / receive from / to local store or main memory
     – CPU → NWP
        • CPU moves data and control info to NWP
        • Back-pressure controlled

     – NWP → NWP
        • Independent of processor
        • Each datagram has to be acknowledged

     – NWP → CPU
        • CPU provides credits to NWP
        • NWP writes data into processor
        • Completion indicated by notification



23                                                       © 2009 IBM Corporation
Torus Network Reconfiguration



 Torus network PHYs provide 2 interfaces
     – Used for network reconfiguration b selecting primary or secondary interface



 Example
     – 1x8 or 2x4 node-cards




 Partition sizes (1,2,2N) * (1,2,4,8,16) * (1,2,4,8)
     – N ... number of racks connected via cables

24                                                                                   © 2009 IBM Corporation
Cooling



 Concept
     – Node card mounted in housing = heat conductor
     – Housing connected to liquid cooled cold plate
     – Critical thermal interfaces
         • Processor – thermal box
         • Thermal box – cold plate
     – Dry connection between node card and cooling circuit



 Node card housing
     – Closed node card housing acts as heat conductor.
     – Heat conductor is linked with liquid-cooled “cold plate”
     – Cold Plate is placed between two rows of node cards.


 Simulation Results for one Cold Plate
     – Ambient          12°C
     – Water            10 L / min
     – Load             4224 Watt
                        2112 Watt / side

25                                                                © 2009 IBM Corporation
Project Review



 Hardware design
     – Almost all critical problems solved in time
     – Network Processor implementation was a challenge
     – No serious problems due to wrong design decisions



 Hardware status
     – Manufacturing quality good: Small bone pile, few defects during operation.



 Time schedule
     – Essentially stayed within planned schedule
     – Implementation of system / application software delayed




26                                                                                  © 2009 IBM Corporation
Summary



 QPACE is a new, scalable LQCD machine based on the PowerXCell 8i processor.


 Design highlights
      –   FPGA directly attached to processor
      –   LQCD optimized, low latency torus network
      –   Novel, cost-efficient liquid cooling system
      –   High packaging density
      –   Very power efficient architecture



 O(20-30%) sustained performance for key LQCD kernels is reached / feasible

     → O(10-16) TFLOPS / rack (SP)




27                                                                             © 2009 IBM Corporation
Power Efficiency




28                 © 2009 IBM Corporation
29   © 2009 IBM Corporation
30   © 2009 IBM Corporation
31   © 2009 IBM Corporation
33   © 2009 IBM Corporation
Challenge #1: Data Ordering



 InfiniBand test failed on cluster with 14 blade server

     –   Nodes were connected via InfiniBand DDR adapter.
     –   IMB (Pallas) stresses MPI traffic over InfiniBand network.
     –   System fails after couple minutes, waiting endless for a event.
     –   System runs stable, if global setting for strong ordering is set (default is relaxed).
     –   Problem was in the meanwhile recreated with same symptoms on InfiniBand SDR hardware .
     –   Changing from relaxed to strong ordering changes performance significantly !!!



 First indication points to DMA ordering issue

     – InfiniBand adapter do consecutive writes to memory, sending out data, followed by status.
     – InfiniBand software stack polls regularly on status.
     – If status is updated before data arrives, we clearly had an issue.




34                                                                                                 © 2009 IBM Corporation
Challenge #1: Data Ordering (continued)



    Ordering of device initiated write transactions

     –   Device (InfiniBand, GbE, ...) writes data to two different memory locations in host memory
     –   First transaction writes data block ( multiple writes )
     –   Second transaction writes status ( data ready )
     –   It must be ensured, that status does not reach host memory before complete data
            • If not, software may consume data, before it is valid !!!


    Solution 1:
     – Always do strong ordering, i.e. every node in the path from device to host will send data out in order received

     – Challenge: IO Bandwidth impact, which can be significant.

    Solution 2:
     – Provide means to enforce ordering of the second write behind the first, but leave all other writes unordered
     – Better performance

     – Challenge: Might need device firmware and/or software support




35                                                                                                               © 2009 IBM Corporation
Challenge #2: Data is Everything



 BAD NEWS: There is a many ways how an application can be accelerated.


     – An inline accelerator is an accelerator that runs sequentially with the main compute
       engine.

     – A core accelerator is a mechanism that accelerates the performance of a single core.
       A core may run multiple hardware threads in an SMT implementation.

     – A chip accelerator is an off-chip mechanism that boosts the performance of the
       primary compute chip. Graphics accelerators are typically of this type.

     – A system accelerator is a network-attached appliance that boosts the performance of a
       primary multinode system. Azul is an example of a system accelerator.




36                                                                                 © 2009 IBM Corporation
Challenge #2: Data is Everything (continued)



 GOOD NEWS: Application acceleration is possible!


     – It is all about data:

         • Who owns it?

         • Where is it now?

         • Where is it needed next?

         • How much does it cost to send it from now to next?

     – Scientists, computer architects, application developers and system administrators needs
       to work together closely.




37                                                                                 © 2009 IBM Corporation
Challenge #3: Traffic Pattern



 IDEA: Open QPACE for border range of HPC applications (e.g High Performance LINPACK).


 High-speed point-to-point interconnect with a 3D torus topology
 Direct SPE-to-SPE communication between neighboring nodes for good nearest neighbor
  performance




38                                                                             © 2009 IBM Corporation
Challenge #3: Traffic Pattern (continued)



 BAD NEWS: High Performance LINPACK Requirements


     – Matrix stored in main memory
        • Experiments show: Performance gain with increasing memory size

     – MPI communications:
        • Between processes in same row/column of process grid
        • Message sizes: 1 kB … 30 MB

     – Efficient Level 3 BLAS routines (DGEMM, DTRSM, …)

     – Space trade-offs and complexity leads to a PPE-centric programming model!




39                                                                                 © 2009 IBM Corporation
Challenge #3: Traffic Pattern (continued)



 GOOD NEWS: We have an FPGA. ;-)


     – DMA Engine was added to the NWP design on the FPGA and can fetch data from main
       memory.

     – PPE is responsible for MM-to-MM message transfers.

     – SPE is only used for computation offload.




40                                                                           © 2009 IBM Corporation
Challenge #4: Algorithmic Performance
 BAD NEWS: Algorithmic performance doesn’t necessarily reflect machine performance.

     – Numerical problem solving of a sparse matrix via an iterative method.
     – If room of residence is adequate small the algorithm has converged and calculation is finished.
     – Difference between the algorithms is the number of used auxiliary vectors (number in brackets).




                                                   Source: Andrea Nobile (University of Regensburg)

41                                                                                                    © 2009 IBM Corporation
Thank you very much for your attention.
42                                       © 2009 IBM Corporation
Disclaimer



 IBM®, DB2®, MVS/ESA, AIX®, S/390®, AS/400®, OS/390®, OS/400®, iSeries, pSeries, xS
  eries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere®, Netfinity®, Tivoli®, Informix und
  Informix® Dynamic ServerTM, IBM, BladeCenter and POWER and others are trademarks of
  the IBM Corporation in US and/or other countries.


 Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United
  States, other countries, or both and is used under license there from. Linux is a trademark of
  Linus Torvalds in the United States, other countries or both.


 Other company, product, or service names may be trademarks or service marks of others.
  The information and materials are provided on an "as is" basis and are subject to change.




43                                                                                   © 2009 IBM Corporation

Weitere ähnliche Inhalte

Was ist angesagt?

High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!Slide_N
 
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013btMasoud Nikravesh
 
Larry Smarr - Making Sense of Information Through Planetary Scale Computing
Larry Smarr - Making Sense of Information Through Planetary Scale ComputingLarry Smarr - Making Sense of Information Through Planetary Scale Computing
Larry Smarr - Making Sense of Information Through Planetary Scale ComputingDiamond Exchange
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDLarry Smarr
 
04 New opportunities in photon science with high-speed X-ray imaging detecto...
04 New opportunities in photon science with high-speed X-ray imaging  detecto...04 New opportunities in photon science with high-speed X-ray imaging  detecto...
04 New opportunities in photon science with high-speed X-ray imaging detecto...RCCSRENKEI
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Larry Smarr
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraRyousei Takano
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from IntelEdge AI and Vision Alliance
 
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...Larry Smarr
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
10 Abundant-Data Computing
10 Abundant-Data Computing10 Abundant-Data Computing
10 Abundant-Data ComputingRCCSRENKEI
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras SystemsRCCSRENKEI
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 

Was ist angesagt? (20)

High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!
 
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013bt
 
Larry Smarr - Making Sense of Information Through Planetary Scale Computing
Larry Smarr - Making Sense of Information Through Planetary Scale ComputingLarry Smarr - Making Sense of Information Through Planetary Scale Computing
Larry Smarr - Making Sense of Information Through Planetary Scale Computing
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XD
 
04 New opportunities in photon science with high-speed X-ray imaging detecto...
04 New opportunities in photon science with high-speed X-ray imaging  detecto...04 New opportunities in photon science with high-speed X-ray imaging  detecto...
04 New opportunities in photon science with high-speed X-ray imaging detecto...
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
10 Abundant-Data Computing
10 Abundant-Data Computing10 Abundant-Data Computing
10 Abundant-Data Computing
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 

Andere mochten auch

Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFSRun-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFSHeiko Joerg Schick
 
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHigh Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHeiko Joerg Schick
 
POLS 3620 Contemporary europe and asia presentation: Chinese & Western Opera
POLS 3620 Contemporary europe and asia presentation: Chinese & Western OperaPOLS 3620 Contemporary europe and asia presentation: Chinese & Western Opera
POLS 3620 Contemporary europe and asia presentation: Chinese & Western OperaShan Shan Hung
 
Browser and Management App for Google's Person Finder
Browser and Management App for Google's Person FinderBrowser and Management App for Google's Person Finder
Browser and Management App for Google's Person FinderHeiko Joerg Schick
 
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood MapsIBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood MapsHeiko Joerg Schick
 

Andere mochten auch (6)

Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFSRun-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
 
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHigh Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
 
POLS 3620 Contemporary europe and asia presentation: Chinese & Western Opera
POLS 3620 Contemporary europe and asia presentation: Chinese & Western OperaPOLS 3620 Contemporary europe and asia presentation: Chinese & Western Opera
POLS 3620 Contemporary europe and asia presentation: Chinese & Western Opera
 
Blue Gene Active Storage
Blue Gene Active StorageBlue Gene Active Storage
Blue Gene Active Storage
 
Browser and Management App for Google's Person Finder
Browser and Management App for Google's Person FinderBrowser and Management App for Google's Person Finder
Browser and Management App for Google's Person Finder
 
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood MapsIBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
 

Ähnlich wie Experiences in Application Specific Supercomputer Design - Reasons, Challenges and Lessons Learned

QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)Heiko Joerg Schick
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overviewNabil Chouba
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
Optimizing TSUBAME2.5 Network with Sub-Optimal Infrastructure
Optimizing TSUBAME2.5 Network with Sub-Optimal InfrastructureOptimizing TSUBAME2.5 Network with Sub-Optimal Infrastructure
Optimizing TSUBAME2.5 Network with Sub-Optimal InfrastructureAkihiro Nomura
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdfJunZhao68
 
Silicon to photonics optical interconnect routing: Compass EOS
Silicon to photonics optical interconnect routing: Compass EOSSilicon to photonics optical interconnect routing: Compass EOS
Silicon to photonics optical interconnect routing: Compass EOSAsaf Somekh
 
End nodes in the Multigigabit era
End nodes in the Multigigabit eraEnd nodes in the Multigigabit era
End nodes in the Multigigabit erarinnocente
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PROIDEA
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Fisnik Kraja
 
Silicon Photonics for Extreme Computing - Challenges and Opportunities
Silicon Photonics for Extreme Computing - Challenges and OpportunitiesSilicon Photonics for Extreme Computing - Challenges and Opportunities
Silicon Photonics for Extreme Computing - Challenges and Opportunitiesinside-BigData.com
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specificationsinside-BigData.com
 
How to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemHow to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemLarry Smarr
 
Michael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
Michael Gschwind, Chip Multiprocessing and the Cell Broadband EngineMichael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
Michael Gschwind, Chip Multiprocessing and the Cell Broadband EngineMichael Gschwind
 
Comparing Copper and Fiber Options Data Center
Comparing Copper and Fiber Options Data CenterComparing Copper and Fiber Options Data Center
Comparing Copper and Fiber Options Data Centerrobgross144
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...Rakuten Group, Inc.
 

Ähnlich wie Experiences in Application Specific Supercomputer Design - Reasons, Challenges and Lessons Learned (20)

QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 
Sponge v2
Sponge v2Sponge v2
Sponge v2
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overview
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
Optimizing TSUBAME2.5 Network with Sub-Optimal Infrastructure
Optimizing TSUBAME2.5 Network with Sub-Optimal InfrastructureOptimizing TSUBAME2.5 Network with Sub-Optimal Infrastructure
Optimizing TSUBAME2.5 Network with Sub-Optimal Infrastructure
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdf
 
Silicon to photonics optical interconnect routing: Compass EOS
Silicon to photonics optical interconnect routing: Compass EOSSilicon to photonics optical interconnect routing: Compass EOS
Silicon to photonics optical interconnect routing: Compass EOS
 
End nodes in the Multigigabit era
End nodes in the Multigigabit eraEnd nodes in the Multigigabit era
End nodes in the Multigigabit era
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
 
Silicon Photonics for Extreme Computing - Challenges and Opportunities
Silicon Photonics for Extreme Computing - Challenges and OpportunitiesSilicon Photonics for Extreme Computing - Challenges and Opportunities
Silicon Photonics for Extreme Computing - Challenges and Opportunities
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
 
How to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemHow to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway System
 
Michael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
Michael Gschwind, Chip Multiprocessing and the Cell Broadband EngineMichael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
Michael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
 
Comparing Copper and Fiber Options Data Center
Comparing Copper and Fiber Options Data CenterComparing Copper and Fiber Options Data Center
Comparing Copper and Fiber Options Data Center
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
 

Mehr von Heiko Joerg Schick

Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)Heiko Joerg Schick
 
Huawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technologyHuawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technologyHeiko Joerg Schick
 
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...Heiko Joerg Schick
 
The Smarter Car for Autonomous Driving
 The Smarter Car for Autonomous Driving The Smarter Car for Autonomous Driving
The Smarter Car for Autonomous DrivingHeiko Joerg Schick
 
From edge computing to in-car computing
From edge computing to in-car computingFrom edge computing to in-car computing
From edge computing to in-car computingHeiko Joerg Schick
 
Need and value for various levels of autonomous driving
Need and value for various levels of autonomous drivingNeed and value for various levels of autonomous driving
Need and value for various levels of autonomous drivingHeiko Joerg Schick
 
Real time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the PhilippinesReal time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the PhilippinesHeiko Joerg Schick
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressHeiko Joerg Schick
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)Heiko Joerg Schick
 

Mehr von Heiko Joerg Schick (12)

Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)
 
Huawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technologyHuawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technology
 
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
 
The Smarter Car for Autonomous Driving
 The Smarter Car for Autonomous Driving The Smarter Car for Autonomous Driving
The Smarter Car for Autonomous Driving
 
From edge computing to in-car computing
From edge computing to in-car computingFrom edge computing to in-car computing
From edge computing to in-car computing
 
Need and value for various levels of autonomous driving
Need and value for various levels of autonomous drivingNeed and value for various levels of autonomous driving
Need and value for various levels of autonomous driving
 
Real time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the PhilippinesReal time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the Philippines
 
Slimline Open Firmware
Slimline Open FirmwareSlimline Open Firmware
Slimline Open Firmware
 
Agnostic Device Drivers
Agnostic Device DriversAgnostic Device Drivers
Agnostic Device Drivers
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 

Kürzlich hochgeladen

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Experiences in Application Specific Supercomputer Design - Reasons, Challenges and Lessons Learned

  • 1. Heiko J Schick – IBM Deutschland R&D GmbH January 2011 Experiences in Application Specific Supercomputer Design Reasons, Challenges and Lessons Learned © 2011 IBM Corporation
  • 2. Agenda  The Road to Exascale  Reasons for Application Specific Supercomputers  Example: QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)  Challenges and Lessons Learned 2 © 2009 IBM Corporation
  • 3. Where are we now? Blue Gene/Q !!!  BG/Q Overview – 16384 cores per compute rack – Water cooled, 42U compute rack – PowerPC compliant 64-bit microprocessor – Double precision, quad pipe floating point acceleration on each core – The system is scalable to 1024 compute racks, each with 1024 compute node ASICs.  BG/Q Compute Node – 16 PowerPC processing cores per node, 4w MT, 1,6 GHZ – 16 GB DDR3 SDRAM memory per node with 40GB/s DRAM access – 3 GF/W • 200GF/Chip • 60W/Node (All-inclusive: DRAM, Power Conversion, etc.) – Integrated Network 3 © 2009 IBM Corporation
  • 4. Projected Performance Development Almost a doubling every year !!! 4 © 2009 IBM Corporation
  • 5. The Big Leap from Petaflops to Exaflops  We will hit 20 Petaflop in 2011/2012 …. Now beginning research for ~2018 Exascale.  IT/CMOS industry is trying to double performance every 2 years. HPC industry is trying to double performance every year.  Technology disruptions in many areas. – BAD NEWS: Scalability of current technologies? • Silicon Power, Interconnect, Memory, Packaging. – GOOD NEWS: Emerging technologies? • Memory technologies (e.g. storage class memory)  Exploiting exascale machines. – Want to maximize science output per €. – Need multiple partner applications to evaluate HW trade-offs. 5 © 2009 IBM Corporation
  • 6. Extrapolating an Exaflop in 2018 Standard technology scaling will not get us there in 2018 BlueGene/L Exaflop Exaflop compromise Assumption for “compromise guess” (2005) Directly using traditional scaled technology Node Peak Perf 5.6GF 20TF 20TF Same node count (64k) hardware 2 8000 1600 Assume 3.5GHz concurrency/node System Power in 1 MW 3.5 GW 25 MW Expected based on technology improvement through 4 technology generations. (Only Compute Chip compute chip power scaling, I/Os also scaled same way) Link Bandwidth 1.4Gbps 5 Tbps 1 Tbps Not possible to maintain bandwidth ratio. (Each unidirectional 3-D link) Wires per 2 400 wires 80 wires Large wire count will eliminate high density and drive links onto cables where they are unidirectional 3-D 100x more expensive. Assume 20 Gbps signaling link Pins in network on 24 pins 5,000 pins 1,000 pins 20 Gbps differential assumed. 20 Gbps over copper will be limited to 12 inches. Will need node optics for in rack interconnects. 10Gbps now possible in both copper and optics. Power in network 100 KW 20 MW 4 MW 10 mW/Gbps assumed. Now: 25 mW/Gbps for long distance (greater than 2 feet on copper) for both ends one direction. 45mW/Gbps optics both ends one direction. + 15mW/Gbps of electrical Electrical power in future: separately optimized links for power. Memory 5.6GB/s 20TB/s 1 TB/s Not possible to maintain external bandwidth/Flop Bandwidth/node L2 cache/node 4 MB 16 GB 500 MB About 6-7 technology generations with expected eDRAM density improvements Data pins associated 128 data 40,000 pins 2000 pins 3.2 Gbps per pin with memory/node pins Power in memory I/O 12.8 KW 80 MW 4 MW 10 mW/Gbps assumed. Most current power in address bus. (not DRAM) Future probably about 15mW/Gbps maybe get to 10mW/Gbps (2.5mW/Gbps is c*v^2*f for random data on data pins) Address power is higher. 6 © 2009 IBM Corporation
  • 7. Building Blocks of Matter  QPACE = QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)  Quarks are the constituents of matter which strongly interact exchanging gluons.  Particular phenomena – Confinement – Asymptotic freedom (Nobel Prize 2004)  Theory of strong interactions = Quantum Chromodynamics (QCD) 7 © 2009 IBM Corporation
  • 8. Balanced Hardware  Example caxpy: Processor FPU throughput Memory bandwidth [FLOPS / cycle] [words / cycle] [FLOPS / word] apeNEXT 8 2 4 QCDOC (MM) 2 0.63 3.2 QCDOC (LS) 2 2 1 Xeon 2 0.29 7 GPU 128 x 2 17.3 (*) 14.8 Cell/B.E. (MM) 8x4 1 32 Cell/B.E. (LS) 8x4 8x4 1 9 © 2009 IBM Corporation
  • 9. Balanced Systems ?!? 10 © 2009 IBM Corporation
  • 10. … but are they Reliable, Available and Serviceable ?!? 11 © 2009 IBM Corporation
  • 11. Collaboration and Credits  QPACE = QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)  Academic Partners – University Regensburg S. Heybrock, D. Hierl, T. Maurer, N. Meyer, A. Nobile, A. Schaefer, S. Solbrig, T. Streuer, T. Wettig – University Wuppertal Z. Fodor, A. Frommer, M. Huesken – University Ferrara M. Pivanti, F. Schifano, R. Tripiccione – University Milano H. Simma – DESY Zeuthen D.Pleiter, K.-H. Sulanke, F. Winter – Research Lab Juelich M. Drochner, N. Eicker, T. Lippert  Industrial Partner – IBM (DE, US, FR) H. Baier, H. Boettiger, A. Castellane, J.-F. Fauh, U. Fischer, G. Goldrian, C. Gomez, T. Huth, B. Krill, J. Lauritsen, J. McFadden, I. Ouda, M. Ries, H.J. Schick, J.-S. Vogt  Main Funding – DFG (SFB TR55), IBM  Support by Others – Eurotech (IT) , Knuerr (DE), Xilinx (US) 13 © 2009 IBM Corporation
  • 12. Production Chain Major steps – Pre-integration at University Regensburg – Integration at IBM / Boeblingen – Installation at FZ Juelich and University Wuppertal 14 © 2009 IBM Corporation
  • 13. Concept  System – Node card with IBM® PowerXCell™ 8i processor and network processor (NWP) • Important feature: fast double precision arithmetic's – Commodity processor interconnected by a custom network – Custom system design – Liquid cooling system  Rack parameters – 256 node cards • 26 TFLOPS peak (double precision) • 1 TB Memory – O(35) kWatt power consumption  Applications – Target sustained performance of 20-30% – Optimized for calculations in theoretical particle physics: Simulation of Quantum Chromodynamics 15 © 2009 IBM Corporation
  • 14. Networks  Torus network – Nearest-neighbor communication, 3-dimensional torus topology – Aggregate bandwidth 6 GByte/s per node and direction – Remote DMA communication (local store to local store)  Interrupt tree network – Evaluation of global conditions and synchronization – Global Exceptions – 2 signals per direction  Ethernet network – 1 Gigabit Ethernet link per node card to rack-level switches (switched network) – I/O to parallel file system (user input / output) – Linux network boot – Aim of O(10) GB bandwidth per rack 16 © 2009 IBM Corporation
  • 15. Root Card (16 per rack) Backplane (8 per rack) Node Card (256 per rack) Power Supply and Power Adapter Card (24 per rack) Rack 17 © 2009 IBM Corporation
  • 16. Node Card  Components – IBM PowerXCell 8i processor 3.2 GHZ – 4 Gigabyte DDR2 memory 800 MHZ with ECC – Network processor (NWP) Xilinx FPGA LX110T FPGA – Ethernet PHY – 6 x 1GB/s external links using PCI Express physical layer – Service Processor (SP) Freescale 52211 – FLASH (firmware and FPGA configuration) – Power subsystem – Clocking  Network Processor – FLEXIO interface to PowerXCell 8i processor, 2 bytes with 3 GHZ bit rate – Gigabit Ethernet – UART FW Linux console – UART SP communication – SPI Master (boot flash) – SPI Slave for training and configuration – GPIO 18 © 2009 IBM Corporation
  • 17. Node Card Network Processor Network PHYs PowerXCell 8i (FPGA) Memory Processor 19 © 2009 IBM Corporation
  • 18. Node Card DDR2 DDR2 DDR2 DDR2 800MHz I2C Power SPI RW Subsystem (Debug) PowerXCell 8i FLEXIO FLEXIO Clocking 6GB/s 6GB/s RS232 SPI I2C SP FPGA Virtex-5 UART Freescale MCF52211 GigE PHY SPI 384 IO@250MHZ Flash 4*8*2*6 = 384 IO 680 available (LX110T) 6x 1GB/s PHY Compute Network 20 © 2009 IBM Corporation
  • 19. Network Processor x+ Link Slices 92 % PHY Interface PINs 86 % x- Link LUT-FF pairs 73 % PHY Interface Flip-Flops 55 %     Network Logic   LUTs 53 % z- FlexIO Routing Link BRAM / FIFOs 35 % PHY Interface Interface Arbitration FIFOs Ethernet PHY Configuration Interface Global Flip-Flops LUTs Signals Processor Interface 53 % 46 % Serial Interfaces Torus 36 % 39 % SPI Flash Ethernet 4% 2% 21 © 2009 IBM Corporation
  • 20. Torus Network Architecture  2-sided communication – Node A initiates send, node B initiates receive – Send and receive commands have to match – Multiple use of same link by virtual channels  Send / receive from / to local store or main memory – CPU → NWP • CPU moves data and control info to NWP • Back-pressure controlled – NWP → NWP • Independent of processor • Each datagram has to be acknowledged – NWP → CPU • CPU provides credits to NWP • NWP writes data into processor • Completion indicated by notification 23 © 2009 IBM Corporation
  • 21. Torus Network Reconfiguration  Torus network PHYs provide 2 interfaces – Used for network reconfiguration b selecting primary or secondary interface  Example – 1x8 or 2x4 node-cards  Partition sizes (1,2,2N) * (1,2,4,8,16) * (1,2,4,8) – N ... number of racks connected via cables 24 © 2009 IBM Corporation
  • 22. Cooling  Concept – Node card mounted in housing = heat conductor – Housing connected to liquid cooled cold plate – Critical thermal interfaces • Processor – thermal box • Thermal box – cold plate – Dry connection between node card and cooling circuit  Node card housing – Closed node card housing acts as heat conductor. – Heat conductor is linked with liquid-cooled “cold plate” – Cold Plate is placed between two rows of node cards.  Simulation Results for one Cold Plate – Ambient 12°C – Water 10 L / min – Load 4224 Watt 2112 Watt / side 25 © 2009 IBM Corporation
  • 23. Project Review  Hardware design – Almost all critical problems solved in time – Network Processor implementation was a challenge – No serious problems due to wrong design decisions  Hardware status – Manufacturing quality good: Small bone pile, few defects during operation.  Time schedule – Essentially stayed within planned schedule – Implementation of system / application software delayed 26 © 2009 IBM Corporation
  • 24. Summary  QPACE is a new, scalable LQCD machine based on the PowerXCell 8i processor.  Design highlights – FPGA directly attached to processor – LQCD optimized, low latency torus network – Novel, cost-efficient liquid cooling system – High packaging density – Very power efficient architecture  O(20-30%) sustained performance for key LQCD kernels is reached / feasible → O(10-16) TFLOPS / rack (SP) 27 © 2009 IBM Corporation
  • 25. Power Efficiency 28 © 2009 IBM Corporation
  • 26. 29 © 2009 IBM Corporation
  • 27. 30 © 2009 IBM Corporation
  • 28. 31 © 2009 IBM Corporation
  • 29. 33 © 2009 IBM Corporation
  • 30. Challenge #1: Data Ordering  InfiniBand test failed on cluster with 14 blade server – Nodes were connected via InfiniBand DDR adapter. – IMB (Pallas) stresses MPI traffic over InfiniBand network. – System fails after couple minutes, waiting endless for a event. – System runs stable, if global setting for strong ordering is set (default is relaxed). – Problem was in the meanwhile recreated with same symptoms on InfiniBand SDR hardware . – Changing from relaxed to strong ordering changes performance significantly !!!  First indication points to DMA ordering issue – InfiniBand adapter do consecutive writes to memory, sending out data, followed by status. – InfiniBand software stack polls regularly on status. – If status is updated before data arrives, we clearly had an issue. 34 © 2009 IBM Corporation
  • 31. Challenge #1: Data Ordering (continued)  Ordering of device initiated write transactions – Device (InfiniBand, GbE, ...) writes data to two different memory locations in host memory – First transaction writes data block ( multiple writes ) – Second transaction writes status ( data ready ) – It must be ensured, that status does not reach host memory before complete data • If not, software may consume data, before it is valid !!!  Solution 1: – Always do strong ordering, i.e. every node in the path from device to host will send data out in order received – Challenge: IO Bandwidth impact, which can be significant.  Solution 2: – Provide means to enforce ordering of the second write behind the first, but leave all other writes unordered – Better performance – Challenge: Might need device firmware and/or software support 35 © 2009 IBM Corporation
  • 32. Challenge #2: Data is Everything  BAD NEWS: There is a many ways how an application can be accelerated. – An inline accelerator is an accelerator that runs sequentially with the main compute engine. – A core accelerator is a mechanism that accelerates the performance of a single core. A core may run multiple hardware threads in an SMT implementation. – A chip accelerator is an off-chip mechanism that boosts the performance of the primary compute chip. Graphics accelerators are typically of this type. – A system accelerator is a network-attached appliance that boosts the performance of a primary multinode system. Azul is an example of a system accelerator. 36 © 2009 IBM Corporation
  • 33. Challenge #2: Data is Everything (continued)  GOOD NEWS: Application acceleration is possible! – It is all about data: • Who owns it? • Where is it now? • Where is it needed next? • How much does it cost to send it from now to next? – Scientists, computer architects, application developers and system administrators needs to work together closely. 37 © 2009 IBM Corporation
  • 34. Challenge #3: Traffic Pattern  IDEA: Open QPACE for border range of HPC applications (e.g High Performance LINPACK).  High-speed point-to-point interconnect with a 3D torus topology  Direct SPE-to-SPE communication between neighboring nodes for good nearest neighbor performance 38 © 2009 IBM Corporation
  • 35. Challenge #3: Traffic Pattern (continued)  BAD NEWS: High Performance LINPACK Requirements – Matrix stored in main memory • Experiments show: Performance gain with increasing memory size – MPI communications: • Between processes in same row/column of process grid • Message sizes: 1 kB … 30 MB – Efficient Level 3 BLAS routines (DGEMM, DTRSM, …) – Space trade-offs and complexity leads to a PPE-centric programming model! 39 © 2009 IBM Corporation
  • 36. Challenge #3: Traffic Pattern (continued)  GOOD NEWS: We have an FPGA. ;-) – DMA Engine was added to the NWP design on the FPGA and can fetch data from main memory. – PPE is responsible for MM-to-MM message transfers. – SPE is only used for computation offload. 40 © 2009 IBM Corporation
  • 37. Challenge #4: Algorithmic Performance  BAD NEWS: Algorithmic performance doesn’t necessarily reflect machine performance. – Numerical problem solving of a sparse matrix via an iterative method. – If room of residence is adequate small the algorithm has converged and calculation is finished. – Difference between the algorithms is the number of used auxiliary vectors (number in brackets). Source: Andrea Nobile (University of Regensburg) 41 © 2009 IBM Corporation
  • 38. Thank you very much for your attention. 42 © 2009 IBM Corporation
  • 39. Disclaimer  IBM®, DB2®, MVS/ESA, AIX®, S/390®, AS/400®, OS/390®, OS/400®, iSeries, pSeries, xS eries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere®, Netfinity®, Tivoli®, Informix und Informix® Dynamic ServerTM, IBM, BladeCenter and POWER and others are trademarks of the IBM Corporation in US and/or other countries.  Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from. Linux is a trademark of Linus Torvalds in the United States, other countries or both.  Other company, product, or service names may be trademarks or service marks of others. The information and materials are provided on an "as is" basis and are subject to change. 43 © 2009 IBM Corporation

Hinweis der Redaktion

  1. If all links optics then cost for optics of Exaflop would be about $1BPower is biggest problemIntegration will effectively include main memory.Latency is one of the biggest challenge because software will be too slow to control messaging due to cores being only slightly faster than today.