SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
IBM Power Systems




    SMT Verification of the POWER5 and POWER6
    High-Performance Processors

                         John Ludden
                         Senior Technical Staff Member
                         Hardware Verification
                         IBM Systems & Technology Group




© 2008 IBM Corporation
IBM System p
         SMT Verification of the POWER5 and POWER6 High-Performance Processors


    Introduction to Simultaneous Multi-Threading
    (SMT)
     1. What is a multi-threaded processor?
        •     Essentially a processor core that executes multiple
              instruction streams simultaneously
        •     Each thread appears to software as a “virtual” processor core
     2. What are the advantages of SMT?
        •     More efficient utilization of silicon real estate and power: small
              die size increase compared to adding another core
        •     Increased system throughput by utilizing processor resources
              that would otherwise be idle
     3. What are the disadvantages of SMT?
        •     Increased complexity -> Makes verification state space MUCH
              larger
                •       SMT verification much harder than SMP
        •     Possibly degrades performance of some applications
2                                                                                             IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
         SMT Verification of the POWER5 and POWER6 High-Performance Processors




    Examples of SMT microprocessors
     1. Video Game Systems
        •     Sony Playstation 3: IBM CELL processor
        •     Xbox 360: IBM Xenon processor
     2. Personal Computers:
        •     Intel Pentium 4 Hyper-Threading (HT) processors
     3. Servers:
        •     SUN UltraSparc Systems: T1 (4 threads) and T2 (8 threads)
        •     HP Superdome Systems: Intel Itanium 2
        •     IBM Power Systems: POWER5 and POWER6 processors




3                                                                                             IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


Overview

    1. Context : POWER5 vs. POWER6 Microarchitecture Comparison


    2. Verification methodology: In the beginning…


    3. The times they are a changing: SMT arrives in POWER5


    4. POWER6: An in-order design should be simpler, but…


    5. Future directions?




4                                                                                            IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
       SMT Verification of the POWER5 and POWER6 High-Performance Processors


IBM POWER systems

                             Consistent predictable delivery



                                                                      2007
                                                     2006
                                                                          POWER6
                                      2004             POWER5+
                   2003               POWER5
    2001             POWER4+

    POWER4




5                                                                                           IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                            DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
       SMT Verification of the POWER5 and POWER6 High-Performance Processors

          POWER5 Chip                                                  POWER6 Chip



     High Freq            High Freq                           Ultra Freq         Ultra Freq
     POWER5               POWER5                               POWER6             POWER6
    SMT2 Core            SMT2 Core                            SMT2 Core          SMT2 Core


              ~2 MB L2                                         4 MB L2                4 MB L2
                                         36 MB                                                             32 MB
                           36 MB L3        L3                            32 MB L3                            L3
                           Controller     Chip                           Controller                        Chip(s)


     SMP Interconnect Fabric                                    SMP Interconnect Fabric


              Memory                                          Memory              Memory
              Controller                                      Controller          Controller


    Buffer                                                    Buffer                    Buffer
    Chips                                                     Chips                     Chips




6                                                                                                       IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                            DRAFT: IBM Confidential               © 2008 IBM Corporation
IBM System p
                 SMT Verification of the POWER5 and POWER6 High-Performance Processors


POWER5 Pipeline
                                                                                                        Out-of-Order Processing
    Branch Redirects
    Instruction Fetch                                                                                                   BR
                                                                             MP       ISS       RF        EX                         WB        Xfer
       IF      IC                                                                                                      LD/ST
      IF               BP                                                                                                                                    CP
                                                                             MP       ISS       RF        EA        DC   Fmt         WB        Xfer         CP
                       D0
                       D0         D1       D2       D3      Xfer      GD     MP       ISS       RF        EX                         WB        Xfer
                                                                                                                         FX
                                    Group Formation and                      MP       ISS       RF        F6
                                                                                                           F6
                                                                                                            F6            FP
                                     Instruction Decode                                                      F6
                                                                                                              F6
                                                                                                               F6                    WB        Xfer
Interrupts & Flushes



                                   Branch Prediction
                                                                                             Dynamic
                                                                                            Instruction              Shared
                                                                                             Selection
                               Branch    Return      Target                                                         Execution
    Program                                                                      Shared Issue
                               History   Stack       Cache                                                            Units
    Counter                                                                        Queues
                               Tables                                                                                 LSU0
               Alternate                                                                                              FXU0
                     Instruction                                                                                      LSU1
                       Buffer 0                  Group Formation,
Instruction                                                                                                           FXU1                       Group         Store
                                                Instruction Decode,
  Cache                                                                                                                                        Completion      Queue
                     Instruction                      Dispatch                                                        FPU0
Instruction            Buffer 1                                                                                       FPU1
Translation
                                                                                                                      BXU
                                    Thread                                                                                                          Data        Data
                                    Priority                          Shared                                          CRL                        Translation   Cache
                                                                                                   Read Shared                Write Shared
                                                                      Register
                                                                                                   Register Files             Register Files
                                                                      Mappers                                                                                   L2
                                                                                                                                                               Cache
       Shared by two threads             Resource used by thread 0          Resource used by thread 1

7                                                                                                                                                     IBM Systems
                         © 2006 IBM Corporation
                IBM Systems & Technology                                                      DRAFT: IBM Confidential                          © 2008 IBM Corporation
IBM System p
       SMT Verification of the POWER5 and POWER6 High-Performance Processors


High-end server: New POWER6 microprocessor
    Topology
      – Two cores on chip, a 2-way SMP
      – Core private L1s (64KB I, 64KB D)
      – Superscalar, SMT cores
      – Chip private 8 MB L2 cache
      – L3 32 MB off chip
      – Two-tier SMP fabric

    Technology
      – 65 nm SOI
      – 341 mm2 die size
      – 10 Layers of metal
      – 790 million transistors on chip
      – Frequency : 3.5, 4.2, 4.7, 5.0 GHz


    Custom & semi-custom design style
      – High frequency constraints
                                                                     3.3 M Lines of VHDL

8                                                                                           IBM Systems
            © 2006 IBM Corporation
      IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
                     SMT Verification of the POWER5 and POWER6 High-Performance Processors


 POWER6 core pipeline
 P1




                                                                                                                                                                            BR/CR
 P2                                                                                        RF




              IFAR                                                                                                                                                             FX
 P3                                                                                        RF         EX




 P4           IC0      IC1            ROT IB0     IB1    PD       DISP     RF       AG     DC0        DC1     FMT                                                            LOAD



              BHT                 Instruction dispatch pipeline            BR/FX/Load pipeline



              BHT                                                                                                              RF
                                                                                                               ISS                                        ECC




                                                                           EX1      EX2    EX3        EX4     EX5        EX6   EX7                        ECC

Instruction fetch pipeline
                                                                            Floating Point Pipeline                                                Check Point Recovery Pipeline




      Legend :          Pre-decode stage                Instruction Decode stage                      Write back stage               Cache access stage            FX result bypass

                        Ifetch/Branch stage             Instruction Dispatch/Issue stage              Completion stage                                             Load result bypass

                        Delayed/Transmit stage          Operand access/execution stage                Check Point stage                                            Float result bypass


9                                                                                                                                                                           IBM Systems
                             © 2006 IBM Corporation
                    IBM Systems & Technology                                                                 DRAFT: IBM Confidential                              © 2008 IBM Corporation
IBM System p
            SMT Verification of the POWER5 and POWER6 High-Performance Processors


POWER6 core
       POWER6 processor is ~2X frequency of POWER5 (4 – 5 GHz)

       POWER6 instruction pipeline depth equivalent to POWER5
        – Minimize power
        – Scale performance with frequency

     Instruction Fetch         Instruction Buffer/Decode         Instruction Dispatch/Issue   Data Fetch/Execute

                                                                                                                         ~6ns/instr

                                                           ~3ns/instr
                                                                 FXU Dependent execution
                                                                 Load Dependent execution


       POWER6 extends functionality of POWER5 core
        –    64K I cache, 64K D cache, 2 FXU, 2 Binary FPU, 1 branch execution unit
        –    Two way SMT with 7 instruction dispatch from 2 threads (maximum of 5 instructions per thread)
        –    Decimal Floating Point Unit
        –    VMX Unit (PowerPC’s SIMD ISA)
        –    Recovery Unit



10                                                                                                                        IBM Systems
                  © 2006 IBM Corporation
            IBM Systems & Technology                                         DRAFT: IBM Confidential               © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


Bullet-proof computing
     System reliability with recovery unit
     – Every measure possible taken to preserve application execution
     – Retry soft errors
     – Change hardware for hard errors

                                                Processor architected state check pointed
                                                             Every 1 cycle


                                               ECC & Non-ECC protected circuitry checked
                                                            Every cycle

                                                                            No error found
                                      Error found
                                               Processor restarts from last saved checkpoint

                                                                         No error found        Soft error case
                                      Error found
                                                Processor workload moved to another CPU
                                                                                                 Hard error case
11                                                                                                       IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                                  DRAFT: IBM Confidential          © 2008 IBM Corporation
IBM System p
         SMT Verification of the POWER5 and POWER6 High-Performance Processors


Overview

     1. Context : POWER5 vs. POWER6 microarchitecture comparison


     2. Verification methodology: In the beginning…


     3. The times they are a changing: SMT arrives in POWER5


     4. POWER6: An in-order design should be simpler, but…


     5. Future directions?




12                                                                                            IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
            SMT Verification of the POWER5 and POWER6 High-Performance Processors


POWER4/5/6 RTL verification technology

                                                  RTL                                               PSL et al.
                                                                 Driver/Checker
                                             (VHDL, Verilog)       Assertions




     Physical VLSI                          Language Compile
     Design Tools /
                                                Model Build
     Custom Design
                                                                                            Test Program
                                                                                              Generator
                                                                                          (GPRO, X-Gen)
                                              Cycle-based
                                                Model
                                                                                                                    Constraint
                                                                                               C++                  Random
                                                                                           Testbench                   Unit
                                                                                                                    Testbench
                    Formal
                                                               Software Simulator
                  Verification:
                    Boolean                (Semi) Formal            (MESA)
                  Equivalence                Verification
                                                                            Hardware
                     Check                  (SixthSense,
                                             RuleBase)                      Accelerator
                    (Verity)
                                                                              (Awan)




13                                                                                                                  IBM Systems
                  © 2006 IBM Corporation
            IBM Systems & Technology                                   DRAFT: IBM Confidential               © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


Single threaded uniprocessor verification for POWER4

     Unit level: methodology inherited from POWER4
       – Driven by a combination of instruction level test cases (AVPs) created by Genesys-
         Pro (GPRO) pseudo-random test generator and random C++ driven irritation
       – Instruction-By-Instruction (IBI) checking against AVP results
       – Low level microarchitecture checkers written in C++


     Processor core (aka “core”) level
       – Mixture of GPRO pseudo-random and directed random instruction level test cases
       – IBI checking against AVP results
       – Low level microarchitecture checkers written in C++
       - Irritation from random C++ drivers
       - Highly deterministic and architected state easily verifiable against test




14                                                                                           IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
         SMT Verification of the POWER5 and POWER6 High-Performance Processors


Symmetric multi-processor (SMP) verification for POWER4

     Chip (dual-core) level
        – Test generation similar to uniprocessor via GPRO for false-sharing
          or non-sharing tests
              • IBI checking against AVP results for two-independent instruction streams
                contained within single test
              • Low level microarchitecture checkers written in C++
              • L1/L2 interactions primary focus

        – True-sharing scenarios, lock testing and storage access (“weak”)
          ordering checked
              • GPRO employed but….
                   – IBI checking of these accesses is limited or not possible:
                         › Non-unique or non-deterministic results
                         › CML (architecture level coherency monitor) employed to detect
                           the “right answer” as a post-simulation rule check



15                                                                                                IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                             DRAFT: IBM Confidential       © 2008 IBM Corporation
IBM System p
         SMT Verification of the POWER5 and POWER6 High-Performance Processors


Overview

     1. Context : POWER5 vs. POWER6 microarchitecture comparison


     2. Verification methodology: In the beginning…


     3. The times they are a changing: SMT arrives in POWER5


     4. POWER6: An in-order design should be simpler, but…


     5. Future directions?




16                                                                                            IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


POWER5 SMT verification methodology

     Evolutionary based on single thread uniprocessor and SMP
     approaches
       – Traditional SMP scenarios now self-contained in a single core simulation model
             • Downward migration of dual-core methodology to single core model

     New SMT verification scenario categories
       – Shared resource and priority conflicts:
             • SMT resource types:
                  – Equally shared between threads: Queue full conditions easier to hit
                  – Dynamically shared / tagged: Either thread can consume most/all of the
                    resource
                  – Replicated: Not shared…same as single thread
       – Dynamic thread mode switching: SMT->ST; ST->SMT
             • Some applications attain better performance in ST mode
             • Shared resources re-allocated on each mode switch


17                                                                                           IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
     SMT Verification of the POWER5 and POWER6 High-Performance Processors


Traditional SMP approach applied to SMT verification

               SMP.def
               SMP.def                               Test
                                                     Test
            (test template)
            (test template)                        Generation
                                                   Generation

                                                          Output test case


                                                            SMT.tst
                              Core Level Registers common to both threads
                               Core Level Registers common to both threads

                                    t0 Registers                 t1 Registers




                                    Random t0
                                    Random t0                   Random t1
                                                                Random t1




                         Real memory is common to both threads with test generator
                                    managing some potential overlap

18                                                                                          IBM Systems
           © 2006 IBM Corporation
     IBM Systems & Technology                              DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


Shared resource and priority conflicts

     Approach was similar to SMP verification

       – Testing largely consisted of “symmetric” instruction streams
         on each thread
             • A particular resource targeted (e.g., GPR rename registers)
                     – 100 load instructions on each thread

       – Coverage and lab feedback validated this approach
             • Good enough: “Got the job done”




19                                                                                           IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
                SMT Verification of the POWER5 and POWER6 High-Performance Processors


POWER5 dynamic thread mode switching
                                         Thread 0                                                           Thread 1

                              All architected states initialized                                All architected states initialized
     Initial                          Thread enabled                                                    Thread enabled
     State



                                   Random instructions
                                                                                                     Random instructions


               Thread kills          Save architected
                  itself                  state
                                                                                                       Restart thread 0
                                   Thread 0 terminates                         read
                                                                     Oth er th
                                          itself
                                    Shared resources
                                       reallocated
                                                                                                     Random instructions
                                    Wake up thread
                                                              Interrupt
                                   Partition resources                        Sim Driver
                                   Restore architected
     Run                                   state
     State


                                      Normal finish                                                      Normal finish
     Final
                                     Thread enabled                                                     Thread enabled
     State
20                                                                                                                                   IBM Systems
                      © 2006 IBM Corporation
               IBM Systems & Technology                                               DRAFT: IBM Confidential              © 2008 IBM Corporation
IBM System p
      SMT Verification of the POWER5 and POWER6 High-Performance Processors


POWER5 shared resource re-allocation on mode switch

          Rename Registers per                              Load Miss Queue entries
                thread                                            per thread

     200                             SMT Mode              10
     100                             Max                    5                       SMT Mode
      0                              ST Mode                0                       ST Mode
            GPR          FPR                                     Split in half


           Branch Queue (BIQ)                               Max LRQ/SRQ entries per
            entries per thread                                      thread

     20                                                         40
                                                                                    SMT mode
                                                                20
     10                              SMT Mode                    0                  Max
      0                              ST Mode                     Dynamically        ST mode
           Split in half                                           Shared


21                                                                                         IBM Systems
            © 2006 IBM Corporation
     IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
         SMT Verification of the POWER5 and POWER6 High-Performance Processors


Overview

     1. Context : POWER5 vs. POWER6 microarchitecture comparison


     2. Verification methodology: In the beginning…


     3. The times they are a changing: SMT arrives in POWER5


     4. POWER6: An in-order design should be simpler, but…


     5. Future directions?




22                                                                                            IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


POWER5: centralized complexity

     POWER5

       – Out-of-order design: Even in single thread mode,                       IFU
         complex events naturally occur simultaneously

       – Started from POWER4+: Known working
         design that was modified incrementally
                                                                FXU             ISU             LSU
       – 23 FO4 design: Isolated complexity in
         Instruction Sequencing Unit (ISU):
             • Every unit communicated back to ISU
             • ISU resolved all exceptions and
               out-of-order conflicts
                                                                                FPU
       – ST and SMT modes both supported:
             • Alternating dispatch cycles per thread
             • Resources re-allocated on mode switch


23                                                                                           IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


POWER6 distributed complexity

     POWER6                                                                       IFU
        – From-scratch mostly in-order design
              • Normally, design is well behaved
                                                                   FXU                                IDU
              • Cross-thread interaction necessary for “tough
                bugs”

        – 13 FO4 design: Distributed complexity needed to
          achieve high performance goals

        – Recovery unit (RU):
              • Must resolve out-of-order FP with in-order
                pipelines
              • Checkpoints machine state                          RU                                 FPU
              • Recovers processor from soft errors

        – Design is inherently in SMT mode all the time
          (almost)                                                                LSU
              • Dispatch to both threads in same cycle
              • Most resources dynamically shared / tagged
              • No resource reallocation on mode switch



24                                                                                               IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                                DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


POWER6 verification process
The different verification engines have different strengths related to the
verification tasks

      Software simulation
         –   Slow, but low penalty for highly intrusive checking of model internals. Total model visibility.
         –   Hundreds of AIX workstations running 24x7x365
         –   New enhancements helped keep pace with design complexity
         –   2x number of simulation cycles of POWER5 design

     Hardware-accelerated simulation
         – 10-1k x Faster than SW sim, but need less intrusive driving/checking to not slow down hardware box.
         – New usage: Mainline function verification
         – Yields additional 3x simulation cycle advantage over POWER5 (5x cycle advantage overall)

     (Semi)-formal verification
         – (High to) Exhaustive coverage, but higher skill needed to drive. Scaling problems w/ model size.
         – Extensively used: Proved extremely valuable for complex SMT bugs

     Hardware bring-up
         – Ideal speed, very limited visibility/controllability



25                                                                                                                    IBM Systems
                © 2006 IBM Corporation
        IBM Systems & Technology                                        DRAFT: IBM Confidential                © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


Software simulation enhancements

     Random command driven unit simulation for most core units
       – Yielded >1 Million lines of C++ code
       – More control over generation for low level events
       – More efficient test generation


     Irritator threads at “core model” level
       – “Symmetric” instruction stream approach employed on POWER5 proved inadequate
                “S” in SMT is for “Simultaneous”, not “Symmetric”
       – Target cross-thread interactions at the microarchitecture level
       – ~2x test generation efficiency
       – Ensures both threads running the same length (self adjusting)




26                                                                                           IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
         SMT Verification of the POWER5 and POWER6 High-Performance Processors



Irritator thread example



          SMT_Irritator.def                       Test
           (test template)                      Generation

                                                          Output test case


                                                              SMT_Irritator.tst
           Core Level Registers common to both threads

                   t0 Registers                     t1 Registers                 Irritator thread restrictions
                                                                                   • Cannot cause unexpected
                                                                                     exceptions
                                                                                   • Cannot modify memory read
                   Long                               Short                          by random thread
                 Random t0                         Irritator t1                    • Cannot modify registers
                                                                                     shared with other threads
                                                                                   • Architected results may be
                                                                                     undefined

        Real memory with test generator managing some potential overlap


27                                                                                                 IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                              DRAFT: IBM Confidential       © 2008 IBM Corporation
IBM System p
     SMT Verification of the POWER5 and POWER6 High-Performance Processors


Irritator thread example


           Long Random Thread                                              Irritator Thread


             SEQUENCE
                                                                         SEQUENCE
                  REPEAT 100
                      SELECT           Kill Irritator Thre
                                                          ad             LB0: fdiv
                           Group_All                                     A:   b to LB0
                  stw nop, A




             Generated Instr: 101                                      Generated Instr: 2
             Simulated Instr: 101                                      Simulated Instr: Infinite




28                                                                                                        IBM Systems
           © 2006 IBM Corporation
     IBM Systems & Technology                                  DRAFT: IBM Confidential             © 2008 IBM Corporation
IBM System p
       SMT Verification of the POWER5 and POWER6 High-Performance Processors


Simulation acceleration usage on POWER6

     Extensively used on POWER6
       – Run lab exercisers prior to tape-out
             • Found additional bugs missed by software simulation
             • Debug new exerciser functionality prior to lab
             • Error injection and recovery testing
             • Reproducibility of lab bugs in “simulation-like” environment for
               rapid debug of root cause
             • Rapid testing of bug fixes and collateral damage testing

       – Linux boot prior to tape-out

       – Not employed on POWER5 for “mainline” functional
         verification

29                                                                                          IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                            DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


(Semi) Formal methods

     Formal methods are a vital complement to
     simulation flow

       – Lab bring-up bug re-creation
           • Often faster reproduction than simulation based
             approaches
           • Aids in root cause analysis
           • High-coverage / proof of side-effect-free fixes




30                                                                                           IBM Systems
             © 2006 IBM Corporation
       IBM Systems & Technology                             DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


Biggest challenge on POWER6

     Error detection and soft error recovery

        – Why so hard?
              • Myriads of injection points coupled with large SMT state space
                 – Often needed multiple “rare” combinations of “asymmetric”
                  events on both threads while specific error was injected
              • End-to-end recovery testing difficult at unit level
                  – Really a “core” effort
        – Verification strategy:

              – Error injection and recovery on hardware accelerated simulation
                platform
              – Dynamic on-the-fly error injection combined with “irritator threads”
                needed to cover large SMT recovery state space

31                                                                                            IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                            DRAFT: IBM Confidential    © 2008 IBM Corporation
IBM System p
          SMT Verification of the POWER5 and POWER6 High-Performance Processors


Summary

     1. SMT verification has four key pieces
         –   Traditional SMP-like effort
         –   Thread starvation and priority
         –   Starting and stopping threads
         –   Asymmetric “irritator thread” approach to verify often unforeseen cross-thread interactions at
             the microarchitecture level

     2. “From-scratch in-order” SMT design was more difficult to verify than the
        “out-of-order retrofitted” SMT design
         –   Complex events only occurred due to cross thread interaction
         –   Even though team had experience
         –   Required more “weapons” in the arsenal

     3. High frequency design drove distributed complexity
         –   Makes verification job harder
         –   Increased dependency on formal verification for difficult bugs

     4. “Mainframe”-like RAS on POWER6 drove a huge amount of work that was
        difficult to attack at the unit level

32                                                                                                    IBM Systems
               © 2006 IBM Corporation
         IBM Systems & Technology                             DRAFT: IBM Confidential          © 2008 IBM Corporation
IBM System p
         SMT Verification of the POWER5 and POWER6 High-Performance Processors


Overview

     1. Context : POWER5 vs. POWER6 microarchitecture comparison


     2. Verification methodology: In the beginning…


     3. The times they are a changing: SMT arrives in POWER5


     4. POWER6: An in-order design should be simpler, but…


     5. Future directions?




33                                                                                            IBM Systems
               © 2006 IBM Corporation
         IBM Systems & Technology                            DRAFT: IBM Confidential   © 2008 IBM Corporation
IBM System p
        SMT Verification of the POWER5 and POWER6 High-Performance Processors


Future directions

     Predictions
          – RAS features will be an increasingly important feature of server
            systems
                • POWER6 design has set the “bar” to a new high standard to which future
                  processors will have to measure up
                    - Power Systems Revenue up 29% in 2Q08 (from 2Q07)
                • Verification methods employed on POWER6 to attack nearly infinite state
                  space created by the combination of SMT and processor recovery features will
                  become standard practice

          – A migration of “pre-silicon” verification techniques into “post-silicon”
            hardware lab verification effort
                • Hardware is the fastest “simulator” available and the state space is getting
                  bigger with SMT




34                                                                                               IBM Systems
              © 2006 IBM Corporation
        IBM Systems & Technology                            DRAFT: IBM Confidential       © 2008 IBM Corporation

Weitere ähnliche Inhalte

Was ist angesagt?

9800301 04 8080-8085_assembly_language_programming_manual_may81
9800301 04 8080-8085_assembly_language_programming_manual_may819800301 04 8080-8085_assembly_language_programming_manual_may81
9800301 04 8080-8085_assembly_language_programming_manual_may81satolina
 
Maxon operation & application of maxon’s new epos controller
Maxon  operation & application of maxon’s new epos controllerMaxon  operation & application of maxon’s new epos controller
Maxon operation & application of maxon’s new epos controllerElectromate
 
Blade Servers & Virtualization: State of the Industry
Blade Servers & Virtualization: State of the IndustryBlade Servers & Virtualization: State of the Industry
Blade Servers & Virtualization: State of the IndustryIMEX Research
 
Blade Servers & Virtualization State of the Industry
Blade Servers & Virtualization State of the IndustryBlade Servers & Virtualization State of the Industry
Blade Servers & Virtualization State of the IndustryIMEX Research
 
Crimp force Monitors
Crimp force MonitorsCrimp force Monitors
Crimp force MonitorsKrdenas
 
Next Generation Data Centers
Next Generation Data CentersNext Generation Data Centers
Next Generation Data CentersIMEX Research
 

Was ist angesagt? (9)

RPG investment
RPG investmentRPG investment
RPG investment
 
Final
FinalFinal
Final
 
9800301 04 8080-8085_assembly_language_programming_manual_may81
9800301 04 8080-8085_assembly_language_programming_manual_may819800301 04 8080-8085_assembly_language_programming_manual_may81
9800301 04 8080-8085_assembly_language_programming_manual_may81
 
Maxon operation & application of maxon’s new epos controller
Maxon  operation & application of maxon’s new epos controllerMaxon  operation & application of maxon’s new epos controller
Maxon operation & application of maxon’s new epos controller
 
Blade Servers & Virtualization: State of the Industry
Blade Servers & Virtualization: State of the IndustryBlade Servers & Virtualization: State of the Industry
Blade Servers & Virtualization: State of the Industry
 
Blade Servers & Virtualization State of the Industry
Blade Servers & Virtualization State of the IndustryBlade Servers & Virtualization State of the Industry
Blade Servers & Virtualization State of the Industry
 
Crimp force Monitors
Crimp force MonitorsCrimp force Monitors
Crimp force Monitors
 
Next Generation Data Centers
Next Generation Data CentersNext Generation Data Centers
Next Generation Data Centers
 
Cis82 e2-1-packet forwarding
Cis82 e2-1-packet forwardingCis82 e2-1-packet forwarding
Cis82 e2-1-packet forwarding
 

Andere mochten auch

Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFOfer Rosenberg
 
IBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateIBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateAnderson Bassani
 
Embedded Solutions 2010: Intel Multicore by Eastronics
Embedded Solutions 2010:  Intel Multicore by Eastronics Embedded Solutions 2010:  Intel Multicore by Eastronics
Embedded Solutions 2010: Intel Multicore by Eastronics New-Tech Magazine
 
IBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsIBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsAnderson Bassani
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architecturesnextlib
 
Cache & CPU performance
Cache & CPU performanceCache & CPU performance
Cache & CPU performanceso61pi
 
可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释Yanpo Zhang
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Computex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceComputex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceAMD
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureLow Hong Chuan
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and ToolsBrendan Gregg
 

Andere mochten auch (20)

Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOF
 
IBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateIBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies Update
 
Embedded Solutions 2010: Intel Multicore by Eastronics
Embedded Solutions 2010:  Intel Multicore by Eastronics Embedded Solutions 2010:  Intel Multicore by Eastronics
Embedded Solutions 2010: Intel Multicore by Eastronics
 
IBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsIBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability Topics
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
 
z/OS V2R2 Enhancements
z/OS V2R2 Enhancementsz/OS V2R2 Enhancements
z/OS V2R2 Enhancements
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
 
Cache & CPU performance
Cache & CPU performanceCache & CPU performance
Cache & CPU performance
 
可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
SMP/Multithread
SMP/MultithreadSMP/Multithread
SMP/Multithread
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Computex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceComputex 2014 AMD Press Conference
Computex 2014 AMD Press Conference
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores Architecture
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 

Ähnlich wie Ludden q3 2008_boston

SMT Verification of the POWER5 and POWER6 High-Performance Processors
SMT Verification of the POWER5 and POWER6 High-Performance ProcessorsSMT Verification of the POWER5 and POWER6 High-Performance Processors
SMT Verification of the POWER5 and POWER6 High-Performance ProcessorsDVClub
 
Power Systems 2009 Hardware
Power Systems 2009 HardwarePower Systems 2009 Hardware
Power Systems 2009 HardwareAndrey Klyachkin
 
05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_power05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_powerGennaro (Rino) Persico
 
2013 02 08 annunci power 7 plus sito cta
2013 02 08 annunci power 7 plus sito cta2013 02 08 annunci power 7 plus sito cta
2013 02 08 annunci power 7 plus sito ctaLorenzo Corbetta
 
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologyEnterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologysolarisyougood
 
Presentation power vm common 2012
Presentation   power vm common 2012Presentation   power vm common 2012
Presentation power vm common 2012solarisyougood
 
Sun sparc enterprise t5440 server technical presentation
Sun sparc enterprise t5440 server technical presentationSun sparc enterprise t5440 server technical presentation
Sun sparc enterprise t5440 server technical presentationxKinAnx
 
021413 aix trends_jay_kruemcke
021413 aix trends_jay_kruemcke021413 aix trends_jay_kruemcke
021413 aix trends_jay_kruemckeJay Kruemcke
 
Sun sparc enterprise t5140 and t5240 servers technical presentation
Sun sparc enterprise t5140 and t5240 servers technical presentationSun sparc enterprise t5140 and t5240 servers technical presentation
Sun sparc enterprise t5140 and t5240 servers technical presentationxKinAnx
 
Approaches for Power Management Verification of SOC
Approaches for Power Management Verification of SOC Approaches for Power Management Verification of SOC
Approaches for Power Management Verification of SOC DVClub
 
Fast track foundations getting serious about sequential io
Fast track foundations getting serious about sequential ioFast track foundations getting serious about sequential io
Fast track foundations getting serious about sequential iojrowlandjones
 
Mainline Functional Verification of IBM's POWER7 Processor Core
Mainline Functional Verification of IBM's POWER7 Processor CoreMainline Functional Verification of IBM's POWER7 Processor Core
Mainline Functional Verification of IBM's POWER7 Processor CoreDVClub
 
Introducing the ADSP BF609 Blackfin Processors
Introducing the ADSP BF609 Blackfin ProcessorsIntroducing the ADSP BF609 Blackfin Processors
Introducing the ADSP BF609 Blackfin ProcessorsAnalog Devices, Inc.
 
Power Blades Implementation
Power Blades ImplementationPower Blades Implementation
Power Blades ImplementationAndrey Klyachkin
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 
Bladeservertechnology 111018061151-phpapp02
Bladeservertechnology 111018061151-phpapp02Bladeservertechnology 111018061151-phpapp02
Bladeservertechnology 111018061151-phpapp02gov1991
 
Simulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersSimulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersEric Van Hensbergen
 

Ähnlich wie Ludden q3 2008_boston (20)

SMT Verification of the POWER5 and POWER6 High-Performance Processors
SMT Verification of the POWER5 and POWER6 High-Performance ProcessorsSMT Verification of the POWER5 and POWER6 High-Performance Processors
SMT Verification of the POWER5 and POWER6 High-Performance Processors
 
Power Systems 2009 Hardware
Power Systems 2009 HardwarePower Systems 2009 Hardware
Power Systems 2009 Hardware
 
05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_power05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_power
 
2013 02 08 annunci power 7 plus sito cta
2013 02 08 annunci power 7 plus sito cta2013 02 08 annunci power 7 plus sito cta
2013 02 08 annunci power 7 plus sito cta
 
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologyEnterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technology
 
Ibm power7
Ibm power7Ibm power7
Ibm power7
 
Presentation power vm common 2012
Presentation   power vm common 2012Presentation   power vm common 2012
Presentation power vm common 2012
 
Sun sparc enterprise t5440 server technical presentation
Sun sparc enterprise t5440 server technical presentationSun sparc enterprise t5440 server technical presentation
Sun sparc enterprise t5440 server technical presentation
 
021413 aix trends_jay_kruemcke
021413 aix trends_jay_kruemcke021413 aix trends_jay_kruemcke
021413 aix trends_jay_kruemcke
 
Sun sparc enterprise t5140 and t5240 servers technical presentation
Sun sparc enterprise t5140 and t5240 servers technical presentationSun sparc enterprise t5140 and t5240 servers technical presentation
Sun sparc enterprise t5140 and t5240 servers technical presentation
 
Approaches for Power Management Verification of SOC
Approaches for Power Management Verification of SOC Approaches for Power Management Verification of SOC
Approaches for Power Management Verification of SOC
 
Fast track foundations getting serious about sequential io
Fast track foundations getting serious about sequential ioFast track foundations getting serious about sequential io
Fast track foundations getting serious about sequential io
 
IBM I and blade center update 2009
IBM I and blade center update 2009IBM I and blade center update 2009
IBM I and blade center update 2009
 
Mainline Functional Verification of IBM's POWER7 Processor Core
Mainline Functional Verification of IBM's POWER7 Processor CoreMainline Functional Verification of IBM's POWER7 Processor Core
Mainline Functional Verification of IBM's POWER7 Processor Core
 
Introducing the ADSP BF609 Blackfin Processors
Introducing the ADSP BF609 Blackfin ProcessorsIntroducing the ADSP BF609 Blackfin Processors
Introducing the ADSP BF609 Blackfin Processors
 
Power Blades Implementation
Power Blades ImplementationPower Blades Implementation
Power Blades Implementation
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
11136442.ppt
11136442.ppt11136442.ppt
11136442.ppt
 
Bladeservertechnology 111018061151-phpapp02
Bladeservertechnology 111018061151-phpapp02Bladeservertechnology 111018061151-phpapp02
Bladeservertechnology 111018061151-phpapp02
 
Simulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersSimulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to Supercomputers
 

Mehr von Obsidian Software (20)

Zhang rtp q307
Zhang rtp q307Zhang rtp q307
Zhang rtp q307
 
Zehr dv club_12052006
Zehr dv club_12052006Zehr dv club_12052006
Zehr dv club_12052006
 
Yang greenstein part_2
Yang greenstein part_2Yang greenstein part_2
Yang greenstein part_2
 
Yang greenstein part_1
Yang greenstein part_1Yang greenstein part_1
Yang greenstein part_1
 
Williamson arm validation metrics
Williamson arm validation metricsWilliamson arm validation metrics
Williamson arm validation metrics
 
Whipp q3 2008_sv
Whipp q3 2008_svWhipp q3 2008_sv
Whipp q3 2008_sv
 
Vishakantaiah validating
Vishakantaiah validatingVishakantaiah validating
Vishakantaiah validating
 
Validation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environmentValidation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environment
 
Tobin verification isglobal
Tobin verification isglobalTobin verification isglobal
Tobin verification isglobal
 
Tierney bq207
Tierney bq207Tierney bq207
Tierney bq207
 
The validation attitude
The validation attitudeThe validation attitude
The validation attitude
 
Thaker q3 2008
Thaker q3 2008Thaker q3 2008
Thaker q3 2008
 
Thaker q3 2008
Thaker q3 2008Thaker q3 2008
Thaker q3 2008
 
Strickland dvclub
Strickland dvclubStrickland dvclub
Strickland dvclub
 
Stinson post si and verification
Stinson post si and verificationStinson post si and verification
Stinson post si and verification
 
Shultz dallas q108
Shultz dallas q108Shultz dallas q108
Shultz dallas q108
 
Shreeve dv club_ams
Shreeve dv club_amsShreeve dv club_ams
Shreeve dv club_ams
 
Sharam salamian
Sharam salamianSharam salamian
Sharam salamian
 
Schulz sv q2_2009
Schulz sv q2_2009Schulz sv q2_2009
Schulz sv q2_2009
 
Schulz dallas q1_2008
Schulz dallas q1_2008Schulz dallas q1_2008
Schulz dallas q1_2008
 

Ludden q3 2008_boston

  • 1. IBM Power Systems SMT Verification of the POWER5 and POWER6 High-Performance Processors John Ludden Senior Technical Staff Member Hardware Verification IBM Systems & Technology Group © 2008 IBM Corporation
  • 2. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Introduction to Simultaneous Multi-Threading (SMT) 1. What is a multi-threaded processor? • Essentially a processor core that executes multiple instruction streams simultaneously • Each thread appears to software as a “virtual” processor core 2. What are the advantages of SMT? • More efficient utilization of silicon real estate and power: small die size increase compared to adding another core • Increased system throughput by utilizing processor resources that would otherwise be idle 3. What are the disadvantages of SMT? • Increased complexity -> Makes verification state space MUCH larger • SMT verification much harder than SMP • Possibly degrades performance of some applications 2 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 3. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Examples of SMT microprocessors 1. Video Game Systems • Sony Playstation 3: IBM CELL processor • Xbox 360: IBM Xenon processor 2. Personal Computers: • Intel Pentium 4 Hyper-Threading (HT) processors 3. Servers: • SUN UltraSparc Systems: T1 (4 threads) and T2 (8 threads) • HP Superdome Systems: Intel Itanium 2 • IBM Power Systems: POWER5 and POWER6 processors 3 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 4. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Overview 1. Context : POWER5 vs. POWER6 Microarchitecture Comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions? 4 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 5. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors IBM POWER systems Consistent predictable delivery 2007 2006 POWER6 2004 POWER5+ 2003 POWER5 2001 POWER4+ POWER4 5 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 6. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 Chip POWER6 Chip High Freq High Freq Ultra Freq Ultra Freq POWER5 POWER5 POWER6 POWER6 SMT2 Core SMT2 Core SMT2 Core SMT2 Core ~2 MB L2 4 MB L2 4 MB L2 36 MB 32 MB 36 MB L3 L3 32 MB L3 L3 Controller Chip Controller Chip(s) SMP Interconnect Fabric SMP Interconnect Fabric Memory Memory Memory Controller Controller Controller Buffer Buffer Buffer Chips Chips Chips 6 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 7. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 Pipeline Out-of-Order Processing Branch Redirects Instruction Fetch BR MP ISS RF EX WB Xfer IF IC LD/ST IF BP CP MP ISS RF EA DC Fmt WB Xfer CP D0 D0 D1 D2 D3 Xfer GD MP ISS RF EX WB Xfer FX Group Formation and MP ISS RF F6 F6 F6 FP Instruction Decode F6 F6 F6 WB Xfer Interrupts & Flushes Branch Prediction Dynamic Instruction Shared Selection Branch Return Target Execution Program Shared Issue History Stack Cache Units Counter Queues Tables LSU0 Alternate FXU0 Instruction LSU1 Buffer 0 Group Formation, Instruction FXU1 Group Store Instruction Decode, Cache Completion Queue Instruction Dispatch FPU0 Instruction Buffer 1 FPU1 Translation BXU Thread Data Data Priority Shared CRL Translation Cache Read Shared Write Shared Register Register Files Register Files Mappers L2 Cache Shared by two threads Resource used by thread 0 Resource used by thread 1 7 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 8. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors High-end server: New POWER6 microprocessor Topology – Two cores on chip, a 2-way SMP – Core private L1s (64KB I, 64KB D) – Superscalar, SMT cores – Chip private 8 MB L2 cache – L3 32 MB off chip – Two-tier SMP fabric Technology – 65 nm SOI – 341 mm2 die size – 10 Layers of metal – 790 million transistors on chip – Frequency : 3.5, 4.2, 4.7, 5.0 GHz Custom & semi-custom design style – High frequency constraints 3.3 M Lines of VHDL 8 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 9. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER6 core pipeline P1 BR/CR P2 RF IFAR FX P3 RF EX P4 IC0 IC1 ROT IB0 IB1 PD DISP RF AG DC0 DC1 FMT LOAD BHT Instruction dispatch pipeline BR/FX/Load pipeline BHT RF ISS ECC EX1 EX2 EX3 EX4 EX5 EX6 EX7 ECC Instruction fetch pipeline Floating Point Pipeline Check Point Recovery Pipeline Legend : Pre-decode stage Instruction Decode stage Write back stage Cache access stage FX result bypass Ifetch/Branch stage Instruction Dispatch/Issue stage Completion stage Load result bypass Delayed/Transmit stage Operand access/execution stage Check Point stage Float result bypass 9 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 10. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER6 core POWER6 processor is ~2X frequency of POWER5 (4 – 5 GHz) POWER6 instruction pipeline depth equivalent to POWER5 – Minimize power – Scale performance with frequency Instruction Fetch Instruction Buffer/Decode Instruction Dispatch/Issue Data Fetch/Execute ~6ns/instr ~3ns/instr FXU Dependent execution Load Dependent execution POWER6 extends functionality of POWER5 core – 64K I cache, 64K D cache, 2 FXU, 2 Binary FPU, 1 branch execution unit – Two way SMT with 7 instruction dispatch from 2 threads (maximum of 5 instructions per thread) – Decimal Floating Point Unit – VMX Unit (PowerPC’s SIMD ISA) – Recovery Unit 10 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 11. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Bullet-proof computing System reliability with recovery unit – Every measure possible taken to preserve application execution – Retry soft errors – Change hardware for hard errors Processor architected state check pointed Every 1 cycle ECC & Non-ECC protected circuitry checked Every cycle No error found Error found Processor restarts from last saved checkpoint No error found Soft error case Error found Processor workload moved to another CPU Hard error case 11 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 12. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Overview 1. Context : POWER5 vs. POWER6 microarchitecture comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions? 12 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 13. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER4/5/6 RTL verification technology RTL PSL et al. Driver/Checker (VHDL, Verilog) Assertions Physical VLSI Language Compile Design Tools / Model Build Custom Design Test Program Generator (GPRO, X-Gen) Cycle-based Model Constraint C++ Random Testbench Unit Testbench Formal Software Simulator Verification: Boolean (Semi) Formal (MESA) Equivalence Verification Hardware Check (SixthSense, RuleBase) Accelerator (Verity) (Awan) 13 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 14. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Single threaded uniprocessor verification for POWER4 Unit level: methodology inherited from POWER4 – Driven by a combination of instruction level test cases (AVPs) created by Genesys- Pro (GPRO) pseudo-random test generator and random C++ driven irritation – Instruction-By-Instruction (IBI) checking against AVP results – Low level microarchitecture checkers written in C++ Processor core (aka “core”) level – Mixture of GPRO pseudo-random and directed random instruction level test cases – IBI checking against AVP results – Low level microarchitecture checkers written in C++ - Irritation from random C++ drivers - Highly deterministic and architected state easily verifiable against test 14 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 15. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Symmetric multi-processor (SMP) verification for POWER4 Chip (dual-core) level – Test generation similar to uniprocessor via GPRO for false-sharing or non-sharing tests • IBI checking against AVP results for two-independent instruction streams contained within single test • Low level microarchitecture checkers written in C++ • L1/L2 interactions primary focus – True-sharing scenarios, lock testing and storage access (“weak”) ordering checked • GPRO employed but…. – IBI checking of these accesses is limited or not possible: › Non-unique or non-deterministic results › CML (architecture level coherency monitor) employed to detect the “right answer” as a post-simulation rule check 15 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 16. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Overview 1. Context : POWER5 vs. POWER6 microarchitecture comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions? 16 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 17. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 SMT verification methodology Evolutionary based on single thread uniprocessor and SMP approaches – Traditional SMP scenarios now self-contained in a single core simulation model • Downward migration of dual-core methodology to single core model New SMT verification scenario categories – Shared resource and priority conflicts: • SMT resource types: – Equally shared between threads: Queue full conditions easier to hit – Dynamically shared / tagged: Either thread can consume most/all of the resource – Replicated: Not shared…same as single thread – Dynamic thread mode switching: SMT->ST; ST->SMT • Some applications attain better performance in ST mode • Shared resources re-allocated on each mode switch 17 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 18. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Traditional SMP approach applied to SMT verification SMP.def SMP.def Test Test (test template) (test template) Generation Generation Output test case SMT.tst Core Level Registers common to both threads Core Level Registers common to both threads t0 Registers t1 Registers Random t0 Random t0 Random t1 Random t1 Real memory is common to both threads with test generator managing some potential overlap 18 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 19. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Shared resource and priority conflicts Approach was similar to SMP verification – Testing largely consisted of “symmetric” instruction streams on each thread • A particular resource targeted (e.g., GPR rename registers) – 100 load instructions on each thread – Coverage and lab feedback validated this approach • Good enough: “Got the job done” 19 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 20. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 dynamic thread mode switching Thread 0 Thread 1 All architected states initialized All architected states initialized Initial Thread enabled Thread enabled State Random instructions Random instructions Thread kills Save architected itself state Restart thread 0 Thread 0 terminates read Oth er th itself Shared resources reallocated Random instructions Wake up thread Interrupt Partition resources Sim Driver Restore architected Run state State Normal finish Normal finish Final Thread enabled Thread enabled State 20 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 21. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 shared resource re-allocation on mode switch Rename Registers per Load Miss Queue entries thread per thread 200 SMT Mode 10 100 Max 5 SMT Mode 0 ST Mode 0 ST Mode GPR FPR Split in half Branch Queue (BIQ) Max LRQ/SRQ entries per entries per thread thread 20 40 SMT mode 20 10 SMT Mode 0 Max 0 ST Mode Dynamically ST mode Split in half Shared 21 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 22. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Overview 1. Context : POWER5 vs. POWER6 microarchitecture comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions? 22 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 23. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5: centralized complexity POWER5 – Out-of-order design: Even in single thread mode, IFU complex events naturally occur simultaneously – Started from POWER4+: Known working design that was modified incrementally FXU ISU LSU – 23 FO4 design: Isolated complexity in Instruction Sequencing Unit (ISU): • Every unit communicated back to ISU • ISU resolved all exceptions and out-of-order conflicts FPU – ST and SMT modes both supported: • Alternating dispatch cycles per thread • Resources re-allocated on mode switch 23 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 24. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER6 distributed complexity POWER6 IFU – From-scratch mostly in-order design • Normally, design is well behaved FXU IDU • Cross-thread interaction necessary for “tough bugs” – 13 FO4 design: Distributed complexity needed to achieve high performance goals – Recovery unit (RU): • Must resolve out-of-order FP with in-order pipelines • Checkpoints machine state RU FPU • Recovers processor from soft errors – Design is inherently in SMT mode all the time (almost) LSU • Dispatch to both threads in same cycle • Most resources dynamically shared / tagged • No resource reallocation on mode switch 24 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 25. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER6 verification process The different verification engines have different strengths related to the verification tasks Software simulation – Slow, but low penalty for highly intrusive checking of model internals. Total model visibility. – Hundreds of AIX workstations running 24x7x365 – New enhancements helped keep pace with design complexity – 2x number of simulation cycles of POWER5 design Hardware-accelerated simulation – 10-1k x Faster than SW sim, but need less intrusive driving/checking to not slow down hardware box. – New usage: Mainline function verification – Yields additional 3x simulation cycle advantage over POWER5 (5x cycle advantage overall) (Semi)-formal verification – (High to) Exhaustive coverage, but higher skill needed to drive. Scaling problems w/ model size. – Extensively used: Proved extremely valuable for complex SMT bugs Hardware bring-up – Ideal speed, very limited visibility/controllability 25 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 26. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Software simulation enhancements Random command driven unit simulation for most core units – Yielded >1 Million lines of C++ code – More control over generation for low level events – More efficient test generation Irritator threads at “core model” level – “Symmetric” instruction stream approach employed on POWER5 proved inadequate “S” in SMT is for “Simultaneous”, not “Symmetric” – Target cross-thread interactions at the microarchitecture level – ~2x test generation efficiency – Ensures both threads running the same length (self adjusting) 26 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 27. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Irritator thread example SMT_Irritator.def Test (test template) Generation Output test case SMT_Irritator.tst Core Level Registers common to both threads t0 Registers t1 Registers Irritator thread restrictions • Cannot cause unexpected exceptions • Cannot modify memory read Long Short by random thread Random t0 Irritator t1 • Cannot modify registers shared with other threads • Architected results may be undefined Real memory with test generator managing some potential overlap 27 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 28. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Irritator thread example Long Random Thread Irritator Thread SEQUENCE SEQUENCE REPEAT 100 SELECT Kill Irritator Thre ad LB0: fdiv Group_All A: b to LB0 stw nop, A Generated Instr: 101 Generated Instr: 2 Simulated Instr: 101 Simulated Instr: Infinite 28 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 29. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Simulation acceleration usage on POWER6 Extensively used on POWER6 – Run lab exercisers prior to tape-out • Found additional bugs missed by software simulation • Debug new exerciser functionality prior to lab • Error injection and recovery testing • Reproducibility of lab bugs in “simulation-like” environment for rapid debug of root cause • Rapid testing of bug fixes and collateral damage testing – Linux boot prior to tape-out – Not employed on POWER5 for “mainline” functional verification 29 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 30. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors (Semi) Formal methods Formal methods are a vital complement to simulation flow – Lab bring-up bug re-creation • Often faster reproduction than simulation based approaches • Aids in root cause analysis • High-coverage / proof of side-effect-free fixes 30 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 31. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Biggest challenge on POWER6 Error detection and soft error recovery – Why so hard? • Myriads of injection points coupled with large SMT state space – Often needed multiple “rare” combinations of “asymmetric” events on both threads while specific error was injected • End-to-end recovery testing difficult at unit level – Really a “core” effort – Verification strategy: – Error injection and recovery on hardware accelerated simulation platform – Dynamic on-the-fly error injection combined with “irritator threads” needed to cover large SMT recovery state space 31 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 32. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Summary 1. SMT verification has four key pieces – Traditional SMP-like effort – Thread starvation and priority – Starting and stopping threads – Asymmetric “irritator thread” approach to verify often unforeseen cross-thread interactions at the microarchitecture level 2. “From-scratch in-order” SMT design was more difficult to verify than the “out-of-order retrofitted” SMT design – Complex events only occurred due to cross thread interaction – Even though team had experience – Required more “weapons” in the arsenal 3. High frequency design drove distributed complexity – Makes verification job harder – Increased dependency on formal verification for difficult bugs 4. “Mainframe”-like RAS on POWER6 drove a huge amount of work that was difficult to attack at the unit level 32 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 33. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Overview 1. Context : POWER5 vs. POWER6 microarchitecture comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions? 33 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation
  • 34. IBM System p SMT Verification of the POWER5 and POWER6 High-Performance Processors Future directions Predictions – RAS features will be an increasingly important feature of server systems • POWER6 design has set the “bar” to a new high standard to which future processors will have to measure up - Power Systems Revenue up 29% in 2Q08 (from 2Q07) • Verification methods employed on POWER6 to attack nearly infinite state space created by the combination of SMT and processor recovery features will become standard practice – A migration of “pre-silicon” verification techniques into “post-silicon” hardware lab verification effort • Hardware is the fastest “simulator” available and the state space is getting bigger with SMT 34 IBM Systems © 2006 IBM Corporation IBM Systems & Technology DRAFT: IBM Confidential © 2008 IBM Corporation