SlideShare a Scribd company logo
1 of 34
Download to read offline
IBM Power Systems
© 2008 IBM Corporation
SMT Verification of the POWER5 and POWER6
High-Performance Processors
John Ludden
Senior Technical Staff Member
Hardware Verification
IBM Systems & Technology Group
IBM System p
2 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
1. What is a multi-threaded processor?
• Essentially a processor core that executes multiple
instruction streams simultaneously
• Each thread appears to software as a “virtual” processor core
2. What are the advantages of SMT?
• More efficient utilization of silicon real estate and power: small
die size increase compared to adding another core
• Increased system throughput by utilizing processor resources
that would otherwise be idle
3. What are the disadvantages of SMT?
• Increased complexity -> Makes verification state space MUCH
larger
• SMT verification much harder than SMP
• Possibly degrades performance of some applications
Introduction to Simultaneous Multi-Threading
(SMT)
IBM System p
3 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
1. Video Game Systems
• Sony Playstation 3: IBM CELL processor
• Xbox 360: IBM Xenon processor
2. Personal Computers:
• Intel Pentium 4 Hyper-Threading (HT) processors
3. Servers:
• SUN UltraSparc Systems: T1 (4 threads) and T2 (8 threads)
• HP Superdome Systems: Intel Itanium 2
• IBM Power Systems: POWER5 and POWER6 processors
Examples of SMT microprocessors
IBM System p
4 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
1. Context : POWER5 vs. POWER6 Microarchitecture Comparison
2. Verification methodology: In the beginning…
3. The times they are a changing: SMT arrives in POWER5
4. POWER6: An in-order design should be simpler, but…
5. Future directions?
Overview
IBM System p
5 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Consistent predictable delivery
IBM POWER systems
POWER4+
POWER4
POWER5
POWER5+
POWER6
2001
2003
2004
2006
2007
IBM System p
6 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER5 Chip
High Freq
POWER5
SMT2 Core
~2 MB L2
36 MB L3
Controller
36 MB
L3
Chip
SMP Interconnect Fabric
Memory
Controller
Buffer
Chips
High Freq
POWER5
SMT2 Core
POWER6 Chip
Ultra Freq
POWER6
SMT2 Core
4 MB L2
32 MB L3
Controller
32 MB
L3
Chip(s)
SMP Interconnect Fabric
Ultra Freq
POWER6
SMT2 Core
4 MB L2
Memory
Controller
Memory
Controller
Buffer
Chips
Buffer
Chips
IBM System p
7 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER5 Pipeline
MP ISS RF EA DC WB Xfer
MP ISS RF EX WB Xfer
MP ISS RF EX WB Xfer
MP ISS RF F6
Xfer
F6F6F6F6F6
CP
BR
LD/ST
FX
FP
Group Formation and
Instruction Decode
Instruction Fetch
Branch Redirects
Interrupts & Flushes
Out-of-Order Processing
WB
Fmt
D1 D2 D3 Xfer GDD0D0
Shared by two threads Resource used by thread 1Resource used by thread 0
Shared Issue
Queues
CP
LSU0
FXU0
LSU1
FXU1
FPU0
FPU1
BXU
CRL
Shared
Execution
Units
Read Shared
Register Files
Dynamic
Instruction
Selection
Thread
Priority
Group Formation,
Instruction Decode,
Dispatch
Shared
Register
Mappers
Alternate
Target
Cache
Branch Prediction
Instruction
Translation
Instruction
Cache
Program
Counter
Branch
History
Tables
Return
Stack
Instruction
Buffer 1
Instruction
Buffer 0
Write Shared
Register Files
Group
Completion
Store
Queue
Data
Cache
Data
Translation
L2
Cache
IF BPICIF
IBM System p
8 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
High-end server: New POWER6 microprocessor
Topology
– Two cores on chip, a 2-way SMP
– Core private L1s (64KB I, 64KB D)
– Superscalar, SMT cores
– Chip private 8 MB L2 cache
– L3 32 MB off chip
– Two-tier SMP fabric
Technology
– 65 nm SOI
– 341 mm2 die size
– 10 Layers of metal
– 790 million transistors on chip
– Frequency : 3.5, 4.2, 4.7, 5.0 GHz
Custom & semi-custom design style
– High frequency constraints
3.3 M Lines of VHDL
IBM System p
9 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER6 core pipeline
Instruction fetch pipelineInstruction fetch pipeline
BR/FX/Load pipelineBR/FX/Load pipeline
Floating Point PipelineFloating Point Pipeline Check Point Recovery PipelineCheck Point Recovery Pipeline
BR/CRBR/CR
FXFX
LOADLOAD
Legend :Legend : Pre-decode stage
Ifetch/Branch stage
Delayed/Transmit stage
Instruction Decode stage
Instruction Dispatch/Issue stage
Operand access/execution stage
Write back stage
Completion stage
Check Point stage
FX result bypass
Load result bypass
Float result bypass
Cache access stage
P1P1
P2P2
P3P3
P4P4 IC0IC0 ROTROTIC1IC1
EX1EX1
FMTFMTAGAGDISPDISPPDPDIB0IB0 IB1IB1
RFRF
RFRF
RFRF
RFRF DC0DC0 DC1DC1
EX2EX2 EX3EX3 EX4EX4 EX5EX5 EX6EX6 EX7EX7
EXEX
ISSISS ECCECC
ECCECC
BHTBHT
BHTBHT
IFARIFAR
Instruction dispatch pipelineInstruction dispatch pipeline
IBM System p
10 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER6 core
POWER6 processor is ~2X frequency of POWER5 (4 – 5 GHz)
POWER6 instruction pipeline depth equivalent to POWER5
– Minimize power
– Scale performance with frequency
Instruction Fetch Instruction Buffer/Decode Instruction Dispatch/Issue Data Fetch/Execute
FXU Dependent execution
Load Dependent execution
POWER6 extends functionality of POWER5 core
– 64K I cache, 64K D cache, 2 FXU, 2 Binary FPU, 1 branch execution unit
– Two way SMT with 7 instruction dispatch from 2 threads (maximum of 5 instructions per thread)
– Decimal Floating Point Unit
– VMX Unit (PowerPC’s SIMD ISA)
– Recovery Unit
~6ns/instr
~3ns/instr
IBM System p
11 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Bullet-proof computing
System reliability with recovery unit
– Every measure possible taken to preserve application execution
– Retry soft errors
– Change hardware for hard errors
Processor architected state check pointed
Every 1 cycle
ECC & Non-ECC protected circuitry checked
Every cycle
Processor restarts from last saved checkpoint
Processor workload moved to another CPU
No error found
No error found
Error found
Error found
Soft error case
Hard error case
IBM System p
12 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Overview
1. Context : POWER5 vs. POWER6 microarchitecture comparison
2. Verification methodology: In the beginning…
3. The times they are a changing: SMT arrives in POWER5
4. POWER6: An in-order design should be simpler, but…
5. Future directions?
IBM System p
13 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER4/5/6 RTL verification technology
RTL
(VHDL, Verilog)
Language Compile
Model Build
Physical VLSI
Design Tools /
Custom Design
Cycle-based
Model
Formal
Verification:
Boolean
Equivalence
Check
(Verity)
Software Simulator
(MESA)
Hardware
Accelerator
(Awan)
Driver/Checker
Assertions
Test Program
Generator
(GPRO, X-Gen)
C++
Testbench
Constraint
Random
Unit
Testbench
PSL et al.
(Semi) Formal
Verification
(SixthSense,
RuleBase)
IBM System p
14 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Single threaded uniprocessor verification for POWER4
Unit level: methodology inherited from POWER4
– Driven by a combination of instruction level test cases (AVPs) created by Genesys-
Pro (GPRO) pseudo-random test generator and random C++ driven irritation
– Instruction-By-Instruction (IBI) checking against AVP results
– Low level microarchitecture checkers written in C++
Processor core (aka “core”) level
– Mixture of GPRO pseudo-random and directed random instruction level test cases
– IBI checking against AVP results
– Low level microarchitecture checkers written in C++
- Irritation from random C++ drivers
- Highly deterministic and architected state easily verifiable against test
IBM System p
15 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Symmetric multi-processor (SMP) verification for POWER4
Chip (dual-core) level
– Test generation similar to uniprocessor via GPRO for false-sharing
or non-sharing tests
• IBI checking against AVP results for two-independent instruction streams
contained within single test
• Low level microarchitecture checkers written in C++
• L1/L2 interactions primary focus
– True-sharing scenarios, lock testing and storage access (“weak”)
ordering checked
• GPRO employed but….
– IBI checking of these accesses is limited or not possible:
› Non-unique or non-deterministic results
› CML (architecture level coherency monitor) employed to detect
the “right answer” as a post-simulation rule check
IBM System p
16 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Overview
1. Context : POWER5 vs. POWER6 microarchitecture comparison
2. Verification methodology: In the beginning…
3. The times they are a changing: SMT arrives in POWER5
4. POWER6: An in-order design should be simpler, but…
5. Future directions?
IBM System p
17 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER5 SMT verification methodology
Evolutionary based on single thread uniprocessor and SMP
approaches
– Traditional SMP scenarios now self-contained in a single core simulation model
• Downward migration of dual-core methodology to single core model
New SMT verification scenario categories
– Shared resource and priority conflicts:
• SMT resource types:
– Equally shared between threads: Queue full conditions easier to hit
– Dynamically shared / tagged: Either thread can consume most/all of the
resource
– Replicated: Not shared…same as single thread
– Dynamic thread mode switching: SMT->ST; ST->SMT
• Some applications attain better performance in ST mode
• Shared resources re-allocated on each mode switch
IBM System p
18 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Traditional SMP approach applied to SMT verification
SMT.tst
Random t0 Random t1
Core Level Registers common to both threads
t0 Registers
SMP.def
(test template)
Test
Generation
Real memory is common to both threads with test generator
managing some potential overlap
t1 Registers
Output test case
SMT.tst
Random t0 Random t1
Core Level Registers common to both threads
t0 Registers
SMP.def
(test template)
Test
Generation
Real memory is common to both threads with test generator
managing some potential overlap
t1 Registers
Output test case
IBM System p
19 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Shared resource and priority conflicts
Approach was similar to SMP verification
– Testing largely consisted of “symmetric” instruction streams
on each thread
• A particular resource targeted (e.g., GPR rename registers)
– 100 load instructions on each thread
– Coverage and lab feedback validated this approach
• Good enough: “Got the job done”
IBM System p
20 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER5 dynamic thread mode switching
All architected states initialized
Thread enabledInitial
State
Thread 0 terminates
itself
Shared resources
reallocated
Random instructions
Normal finish
Thread enabled
Run
State
Random instructions
Restart thread 0
Normal finish
Thread enabled
Final
State
All architected states initialized
Thread enabled
Save architected
state
Wake up thread
Partition resources
Restore architected
state
Thread kills
itself
Random instructions
Thread 0 Thread 1
Sim Driver
Other thread
Interrupt
IBM System p
21 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER5 shared resource re-allocation on mode switch
0
100
200
GPR FPR
Rename Registers per
thread
SMT Mode
Max
ST Mode 0
5
10
Split in half
Load Miss Queue entries
per thread
SMT Mode
ST Mode
0
10
20
Split in half
Branch Queue (BIQ)
entries per thread
SMT Mode
ST Mode
0
20
40
Dynamically
Shared
Max LRQ/SRQ entries per
thread
SMT mode
Max
ST mode
IBM System p
22 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Overview
1. Context : POWER5 vs. POWER6 microarchitecture comparison
2. Verification methodology: In the beginning…
3. The times they are a changing: SMT arrives in POWER5
4. POWER6: An in-order design should be simpler, but…
5. Future directions?
IBM System p
23 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER5: centralized complexity
POWER5
– Out-of-order design: Even in single thread mode,
complex events naturally occur simultaneously
– Started from POWER4+: Known working
design that was modified incrementally
– 23 FO4 design: Isolated complexity in
Instruction Sequencing Unit (ISU):
• Every unit communicated back to ISU
• ISU resolved all exceptions and
out-of-order conflicts
– ST and SMT modes both supported:
• Alternating dispatch cycles per thread
• Resources re-allocated on mode switch
FXU
FPU
LSU
IFU
ISU
IBM System p
24 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
POWER6 distributed complexity
POWER6
– From-scratch mostly in-order design
• Normally, design is well behaved
• Cross-thread interaction necessary for “tough
bugs”
– 13 FO4 design: Distributed complexity needed to
achieve high performance goals
– Recovery unit (RU):
• Must resolve out-of-order FP with in-order
pipelines
• Checkpoints machine state
• Recovers processor from soft errors
– Design is inherently in SMT mode all the time
(almost)
• Dispatch to both threads in same cycle
• Most resources dynamically shared / tagged
• No resource reallocation on mode switch
IFU
IDU
FPU
LSU
RU
FXU
IBM System p
25 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
The different verification engines have different strengths related to the
verification tasks
POWER6 verification process
Software simulation
– Slow, but low penalty for highly intrusive checking of model internals. Total model visibility.
– Hundreds of AIX workstations running 24x7x365
– New enhancements helped keep pace with design complexity
– 2x number of simulation cycles of POWER5 design
Hardware-accelerated simulation
– 10-1k x Faster than SW sim, but need less intrusive driving/checking to not slow down hardware box.
– New usage: Mainline function verification
– Yields additional 3x simulation cycle advantage over POWER5 (5x cycle advantage overall)
(Semi)-formal verification
– (High to) Exhaustive coverage, but higher skill needed to drive. Scaling problems w/ model size.
– Extensively used: Proved extremely valuable for complex SMT bugs
Hardware bring-up
– Ideal speed, very limited visibility/controllability
IBM System p
26 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Software simulation enhancements
Random command driven unit simulation for most core units
– Yielded >1 Million lines of C++ code
– More control over generation for low level events
– More efficient test generation
Irritator threads at “core model” level
– “Symmetric” instruction stream approach employed on POWER5 proved inadequate
“S” in SMT is for “Simultaneous”, not “Symmetric”
– Target cross-thread interactions at the microarchitecture level
– ~2x test generation efficiency
– Ensures both threads running the same length (self adjusting)
IBM System p
27 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Irritator thread example
SMT_Irritator.tst
Long
Random t0
Short
Irritator t1
Core Level Registers common to both threads
SMT_Irritator.def
(test template)
Test
Generation
Real memory with test generator managing some potential overlap
Irritator thread restrictions
• Cannot cause unexpected
exceptions
• Cannot modify memory read
by random thread
• Cannot modify registers
shared with other threads
• Architected results may be
undefined
t1 Registerst0 Registers
Output test case
IBM System p
28 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Irritator thread example
SEQUENCE
REPEAT 100
SELECT
Group_All
stw nop, A
SEQUENCE
LB0: fdiv
A: b to LB0
Long Random Thread Irritator Thread
Generated Instr: 101
Simulated Instr: 101
Generated Instr: 2
Simulated Instr: Infinite
Kill Irritator Thread
IBM System p
29 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Simulation acceleration usage on POWER6
Extensively used on POWER6
– Run lab exercisers prior to tape-out
•Found additional bugs missed by software simulation
•Debug new exerciser functionality prior to lab
•Error injection and recovery testing
•Reproducibility of lab bugs in “simulation-like” environment for
rapid debug of root cause
•Rapid testing of bug fixes and collateral damage testing
– Linux boot prior to tape-out
– Not employed on POWER5 for “mainline” functional
verification
IBM System p
30 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Formal methods are a vital complement to
simulation flow
– Lab bring-up bug re-creation
• Often faster reproduction than simulation based
approaches
• Aids in root cause analysis
• High-coverage / proof of side-effect-free fixes
(Semi) Formal methods
IBM System p
31 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Error detection and soft error recovery
Biggest challenge on POWER6
– Why so hard?
• Myriads of injection points coupled with large SMT state space
– Often needed multiple “rare” combinations of “asymmetric”
events on both threads while specific error was injected
• End-to-end recovery testing difficult at unit level
– Really a “core” effort
– Verification strategy:
– Error injection and recovery on hardware accelerated simulation
platform
– Dynamic on-the-fly error injection combined with “irritator threads”
needed to cover large SMT recovery state space
IBM System p
32 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Summary
1. SMT verification has four key pieces
– Traditional SMP-like effort
– Thread starvation and priority
– Starting and stopping threads
– Asymmetric “irritator thread” approach to verify often unforeseen cross-thread interactions at
the microarchitecture level
2. “From-scratch in-order” SMT design was more difficult to verify than the
“out-of-order retrofitted” SMT design
– Complex events only occurred due to cross thread interaction
– Even though team had experience
– Required more “weapons” in the arsenal
3. High frequency design drove distributed complexity
– Makes verification job harder
– Increased dependency on formal verification for difficult bugs
4. “Mainframe”-like RAS on POWER6 drove a huge amount of work that was
difficult to attack at the unit level
IBM System p
33 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Overview
1. Context : POWER5 vs. POWER6 microarchitecture comparison
2. Verification methodology: In the beginning…
3. The times they are a changing: SMT arrives in POWER5
4. POWER6: An in-order design should be simpler, but…
5. Future directions?
IBM System p
34 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology
SMT Verification of the POWER5 and POWER6 High-Performance Processors
Future directions
Predictions
– RAS features will be an increasingly important feature of server
systems
• POWER6 design has set the “bar” to a new high standard to which future
processors will have to measure up
- Power Systems Revenue up 29% in 2Q08 (from 2Q07)
• Verification methods employed on POWER6 to attack nearly infinite state
space created by the combination of SMT and processor recovery features will
become standard practice
– A migration of “pre-silicon” verification techniques into “post-silicon”
hardware lab verification effort
• Hardware is the fastest “simulator” available and the state space is getting
bigger with SMT

More Related Content

What's hot

Kernel Features for Reducing Power Consumption on Embedded Devices
Kernel Features for Reducing Power Consumption on Embedded DevicesKernel Features for Reducing Power Consumption on Embedded Devices
Kernel Features for Reducing Power Consumption on Embedded Devices
Ryo Jin
 
Ciscointro
CiscointroCiscointro
Ciscointro
97148881557
 
Symm configuration management
Symm configuration managementSymm configuration management
Symm configuration management
.GastĂłn. .Bx.
 
Universal FlashStorage Assocation Slide Deck 06 jul2012
Universal FlashStorage Assocation Slide Deck 06 jul2012Universal FlashStorage Assocation Slide Deck 06 jul2012
Universal FlashStorage Assocation Slide Deck 06 jul2012
UniversalFlash
 

What's hot (15)

Arm Processor
Arm ProcessorArm Processor
Arm Processor
 
A z/OS System Programmer’s Guide to Migrating to a New IBM System z9 EC or z9...
A z/OS System Programmer’s Guide to Migrating to a New IBM System z9 EC or z9...A z/OS System Programmer’s Guide to Migrating to a New IBM System z9 EC or z9...
A z/OS System Programmer’s Guide to Migrating to a New IBM System z9 EC or z9...
 
Kernel Features for Reducing Power Consumption on Embedded Devices
Kernel Features for Reducing Power Consumption on Embedded DevicesKernel Features for Reducing Power Consumption on Embedded Devices
Kernel Features for Reducing Power Consumption on Embedded Devices
 
Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...
Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...
Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...
 
Ciscointro
CiscointroCiscointro
Ciscointro
 
Presentation vmax hardware deep dive
Presentation   vmax hardware deep divePresentation   vmax hardware deep dive
Presentation vmax hardware deep dive
 
Presentation power vm common 2012
Presentation   power vm common 2012Presentation   power vm common 2012
Presentation power vm common 2012
 
Symm configuration management
Symm configuration managementSymm configuration management
Symm configuration management
 
Universal Flash Storage
Universal Flash StorageUniversal Flash Storage
Universal Flash Storage
 
Int 1010 Tcp Offload
Int 1010 Tcp OffloadInt 1010 Tcp Offload
Int 1010 Tcp Offload
 
Blackfin system services
Blackfin system servicesBlackfin system services
Blackfin system services
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
 
EMCSymmetrix vmax-10
EMCSymmetrix vmax-10EMCSymmetrix vmax-10
EMCSymmetrix vmax-10
 
Universal FlashStorage Assocation Slide Deck 06 jul2012
Universal FlashStorage Assocation Slide Deck 06 jul2012Universal FlashStorage Assocation Slide Deck 06 jul2012
Universal FlashStorage Assocation Slide Deck 06 jul2012
 
Chapter5
Chapter5Chapter5
Chapter5
 

Viewers also liked (7)

Armageddon gen quiz 2014 by J Ramanand at Mind Palace-Prelims with answers
Armageddon gen quiz 2014 by J Ramanand at Mind Palace-Prelims with answersArmageddon gen quiz 2014 by J Ramanand at Mind Palace-Prelims with answers
Armageddon gen quiz 2014 by J Ramanand at Mind Palace-Prelims with answers
 
Mind Palace Biz-Tech Quiz Finals
Mind Palace Biz-Tech Quiz FinalsMind Palace Biz-Tech Quiz Finals
Mind Palace Biz-Tech Quiz Finals
 
XIMB - Inter Section Quiz 2017 - Finals
XIMB - Inter Section Quiz 2017 - FinalsXIMB - Inter Section Quiz 2017 - Finals
XIMB - Inter Section Quiz 2017 - Finals
 
Cinema Quiz 2017 prelims
Cinema Quiz 2017 prelims Cinema Quiz 2017 prelims
Cinema Quiz 2017 prelims
 
Bollywood Quiz 2017 | Finals
Bollywood Quiz 2017 | FinalsBollywood Quiz 2017 | Finals
Bollywood Quiz 2017 | Finals
 
Cinema quiz 2017 finals
Cinema quiz 2017 finalsCinema quiz 2017 finals
Cinema quiz 2017 finals
 
Chaos 2017 India Quiz Finals
Chaos 2017 India Quiz FinalsChaos 2017 India Quiz Finals
Chaos 2017 India Quiz Finals
 

Similar to SMT Verification of the POWER5 and POWER6 High-Performance Processors

The Cortex-A15 Verification Story
The Cortex-A15 Verification StoryThe Cortex-A15 Verification Story
The Cortex-A15 Verification Story
DVClub
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08
Neil Pittman
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification Experience
DVClub
 

Similar to SMT Verification of the POWER5 and POWER6 High-Performance Processors (20)

Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologyEnterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technology
 
OPAL-RT RT13 Conference: Rapid control prototyping solutions for power electr...
OPAL-RT RT13 Conference: Rapid control prototyping solutions for power electr...OPAL-RT RT13 Conference: Rapid control prototyping solutions for power electr...
OPAL-RT RT13 Conference: Rapid control prototyping solutions for power electr...
 
The Cortex-A15 Verification Story
The Cortex-A15 Verification StoryThe Cortex-A15 Verification Story
The Cortex-A15 Verification Story
 
Smart logic
Smart logicSmart logic
Smart logic
 
VisĂŁo geral do hardware do servidor System z e Linux on z - Concurso Mainframe
VisĂŁo geral do hardware do servidor System z e Linux on z - Concurso MainframeVisĂŁo geral do hardware do servidor System z e Linux on z - Concurso Mainframe
VisĂŁo geral do hardware do servidor System z e Linux on z - Concurso Mainframe
 
IBM POWER8 as an HPC platform
IBM POWER8 as an HPC platformIBM POWER8 as an HPC platform
IBM POWER8 as an HPC platform
 
FPGA MeetUp
FPGA MeetUpFPGA MeetUp
FPGA MeetUp
 
Getting to Know the R8C/2A, 2B Group MCUs
Getting to Know the R8C/2A, 2B Group MCUs Getting to Know the R8C/2A, 2B Group MCUs
Getting to Know the R8C/2A, 2B Group MCUs
 
Ibm cell
Ibm cell Ibm cell
Ibm cell
 
17 october embedded seminar
17 october embedded seminar17 october embedded seminar
17 october embedded seminar
 
IyCnet_Soluciones_Rockwell_CompactLogix_para_Maquinaria-min.pptx
IyCnet_Soluciones_Rockwell_CompactLogix_para_Maquinaria-min.pptxIyCnet_Soluciones_Rockwell_CompactLogix_para_Maquinaria-min.pptx
IyCnet_Soluciones_Rockwell_CompactLogix_para_Maquinaria-min.pptx
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
 
RT15 Berkeley | Introduction to FPGA Power Electronic & Electric Machine real...
RT15 Berkeley | Introduction to FPGA Power Electronic & Electric Machine real...RT15 Berkeley | Introduction to FPGA Power Electronic & Electric Machine real...
RT15 Berkeley | Introduction to FPGA Power Electronic & Electric Machine real...
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08
 
MC9S08MP16: 8-bit MCU For BLDC Motor Control
MC9S08MP16: 8-bit MCU For BLDC Motor ControlMC9S08MP16: 8-bit MCU For BLDC Motor Control
MC9S08MP16: 8-bit MCU For BLDC Motor Control
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
 
IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014
 
Processors selection
Processors selectionProcessors selection
Processors selection
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification Experience
 

More from DVClub

IP Reuse Impact on Design Verification Management Across the Enterprise
IP Reuse Impact on Design Verification Management Across the EnterpriseIP Reuse Impact on Design Verification Management Across the Enterprise
IP Reuse Impact on Design Verification Management Across the Enterprise
DVClub
 
Cisco Base Environment Overview
Cisco Base Environment OverviewCisco Base Environment Overview
Cisco Base Environment Overview
DVClub
 
Intel Xeon Pre-Silicon Validation: Introduction and Challenges
Intel Xeon Pre-Silicon Validation: Introduction and ChallengesIntel Xeon Pre-Silicon Validation: Introduction and Challenges
Intel Xeon Pre-Silicon Validation: Introduction and Challenges
DVClub
 
Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)
DVClub
 
Verification of Graphics ASICs (Part I)
Verification of Graphics ASICs (Part I)Verification of Graphics ASICs (Part I)
Verification of Graphics ASICs (Part I)
DVClub
 
Stop Writing Assertions! Efficient Verification Methodology
Stop Writing Assertions! Efficient Verification MethodologyStop Writing Assertions! Efficient Verification Methodology
Stop Writing Assertions! Efficient Verification Methodology
DVClub
 
Validating Next Generation CPUs
Validating Next Generation CPUsValidating Next Generation CPUs
Validating Next Generation CPUs
DVClub
 
Verification Automation Using IPXACT
Verification Automation Using IPXACTVerification Automation Using IPXACT
Verification Automation Using IPXACT
DVClub
 
Validation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentValidation and Design in a Small Team Environment
Validation and Design in a Small Team Environment
DVClub
 
Trends in Mixed Signal Validation
Trends in Mixed Signal ValidationTrends in Mixed Signal Validation
Trends in Mixed Signal Validation
DVClub
 
Verification In A Global Design Community
Verification In A Global Design CommunityVerification In A Global Design Community
Verification In A Global Design Community
DVClub
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemC
DVClub
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
DVClub
 
SystemVerilog Assertions (SVA) in the Design/Verification Process
SystemVerilog Assertions (SVA) in the Design/Verification ProcessSystemVerilog Assertions (SVA) in the Design/Verification Process
SystemVerilog Assertions (SVA) in the Design/Verification Process
DVClub
 
Efficiency Through Methodology
Efficiency Through MethodologyEfficiency Through Methodology
Efficiency Through Methodology
DVClub
 
Pre-Si Verification for Post-Si Validation
Pre-Si Verification for Post-Si ValidationPre-Si Verification for Post-Si Validation
Pre-Si Verification for Post-Si Validation
DVClub
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 Processor
DVClub
 
Using Assertions in AMS Verification
Using Assertions in AMS VerificationUsing Assertions in AMS Verification
Using Assertions in AMS Verification
DVClub
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
DVClub
 
UVM Update: Register Package
UVM Update: Register PackageUVM Update: Register Package
UVM Update: Register Package
DVClub
 

More from DVClub (20)

IP Reuse Impact on Design Verification Management Across the Enterprise
IP Reuse Impact on Design Verification Management Across the EnterpriseIP Reuse Impact on Design Verification Management Across the Enterprise
IP Reuse Impact on Design Verification Management Across the Enterprise
 
Cisco Base Environment Overview
Cisco Base Environment OverviewCisco Base Environment Overview
Cisco Base Environment Overview
 
Intel Xeon Pre-Silicon Validation: Introduction and Challenges
Intel Xeon Pre-Silicon Validation: Introduction and ChallengesIntel Xeon Pre-Silicon Validation: Introduction and Challenges
Intel Xeon Pre-Silicon Validation: Introduction and Challenges
 
Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)
 
Verification of Graphics ASICs (Part I)
Verification of Graphics ASICs (Part I)Verification of Graphics ASICs (Part I)
Verification of Graphics ASICs (Part I)
 
Stop Writing Assertions! Efficient Verification Methodology
Stop Writing Assertions! Efficient Verification MethodologyStop Writing Assertions! Efficient Verification Methodology
Stop Writing Assertions! Efficient Verification Methodology
 
Validating Next Generation CPUs
Validating Next Generation CPUsValidating Next Generation CPUs
Validating Next Generation CPUs
 
Verification Automation Using IPXACT
Verification Automation Using IPXACTVerification Automation Using IPXACT
Verification Automation Using IPXACT
 
Validation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentValidation and Design in a Small Team Environment
Validation and Design in a Small Team Environment
 
Trends in Mixed Signal Validation
Trends in Mixed Signal ValidationTrends in Mixed Signal Validation
Trends in Mixed Signal Validation
 
Verification In A Global Design Community
Verification In A Global Design CommunityVerification In A Global Design Community
Verification In A Global Design Community
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemC
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
 
SystemVerilog Assertions (SVA) in the Design/Verification Process
SystemVerilog Assertions (SVA) in the Design/Verification ProcessSystemVerilog Assertions (SVA) in the Design/Verification Process
SystemVerilog Assertions (SVA) in the Design/Verification Process
 
Efficiency Through Methodology
Efficiency Through MethodologyEfficiency Through Methodology
Efficiency Through Methodology
 
Pre-Si Verification for Post-Si Validation
Pre-Si Verification for Post-Si ValidationPre-Si Verification for Post-Si Validation
Pre-Si Verification for Post-Si Validation
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 Processor
 
Using Assertions in AMS Verification
Using Assertions in AMS VerificationUsing Assertions in AMS Verification
Using Assertions in AMS Verification
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
 
UVM Update: Register Package
UVM Update: Register PackageUVM Update: Register Package
UVM Update: Register Package
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

SMT Verification of the POWER5 and POWER6 High-Performance Processors

  • 1. IBM Power Systems © 2008 IBM Corporation SMT Verification of the POWER5 and POWER6 High-Performance Processors John Ludden Senior Technical Staff Member Hardware Verification IBM Systems & Technology Group
  • 2. IBM System p 2 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors 1. What is a multi-threaded processor? • Essentially a processor core that executes multiple instruction streams simultaneously • Each thread appears to software as a “virtual” processor core 2. What are the advantages of SMT? • More efficient utilization of silicon real estate and power: small die size increase compared to adding another core • Increased system throughput by utilizing processor resources that would otherwise be idle 3. What are the disadvantages of SMT? • Increased complexity -> Makes verification state space MUCH larger • SMT verification much harder than SMP • Possibly degrades performance of some applications Introduction to Simultaneous Multi-Threading (SMT)
  • 3. IBM System p 3 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors 1. Video Game Systems • Sony Playstation 3: IBM CELL processor • Xbox 360: IBM Xenon processor 2. Personal Computers: • Intel Pentium 4 Hyper-Threading (HT) processors 3. Servers: • SUN UltraSparc Systems: T1 (4 threads) and T2 (8 threads) • HP Superdome Systems: Intel Itanium 2 • IBM Power Systems: POWER5 and POWER6 processors Examples of SMT microprocessors
  • 4. IBM System p 4 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors 1. Context : POWER5 vs. POWER6 Microarchitecture Comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions? Overview
  • 5. IBM System p 5 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Consistent predictable delivery IBM POWER systems POWER4+ POWER4 POWER5 POWER5+ POWER6 2001 2003 2004 2006 2007
  • 6. IBM System p 6 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 Chip High Freq POWER5 SMT2 Core ~2 MB L2 36 MB L3 Controller 36 MB L3 Chip SMP Interconnect Fabric Memory Controller Buffer Chips High Freq POWER5 SMT2 Core POWER6 Chip Ultra Freq POWER6 SMT2 Core 4 MB L2 32 MB L3 Controller 32 MB L3 Chip(s) SMP Interconnect Fabric Ultra Freq POWER6 SMT2 Core 4 MB L2 Memory Controller Memory Controller Buffer Chips Buffer Chips
  • 7. IBM System p 7 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 Pipeline MP ISS RF EA DC WB Xfer MP ISS RF EX WB Xfer MP ISS RF EX WB Xfer MP ISS RF F6 Xfer F6F6F6F6F6 CP BR LD/ST FX FP Group Formation and Instruction Decode Instruction Fetch Branch Redirects Interrupts & Flushes Out-of-Order Processing WB Fmt D1 D2 D3 Xfer GDD0D0 Shared by two threads Resource used by thread 1Resource used by thread 0 Shared Issue Queues CP LSU0 FXU0 LSU1 FXU1 FPU0 FPU1 BXU CRL Shared Execution Units Read Shared Register Files Dynamic Instruction Selection Thread Priority Group Formation, Instruction Decode, Dispatch Shared Register Mappers Alternate Target Cache Branch Prediction Instruction Translation Instruction Cache Program Counter Branch History Tables Return Stack Instruction Buffer 1 Instruction Buffer 0 Write Shared Register Files Group Completion Store Queue Data Cache Data Translation L2 Cache IF BPICIF
  • 8. IBM System p 8 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors High-end server: New POWER6 microprocessor Topology – Two cores on chip, a 2-way SMP – Core private L1s (64KB I, 64KB D) – Superscalar, SMT cores – Chip private 8 MB L2 cache – L3 32 MB off chip – Two-tier SMP fabric Technology – 65 nm SOI – 341 mm2 die size – 10 Layers of metal – 790 million transistors on chip – Frequency : 3.5, 4.2, 4.7, 5.0 GHz Custom & semi-custom design style – High frequency constraints 3.3 M Lines of VHDL
  • 9. IBM System p 9 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER6 core pipeline Instruction fetch pipelineInstruction fetch pipeline BR/FX/Load pipelineBR/FX/Load pipeline Floating Point PipelineFloating Point Pipeline Check Point Recovery PipelineCheck Point Recovery Pipeline BR/CRBR/CR FXFX LOADLOAD Legend :Legend : Pre-decode stage Ifetch/Branch stage Delayed/Transmit stage Instruction Decode stage Instruction Dispatch/Issue stage Operand access/execution stage Write back stage Completion stage Check Point stage FX result bypass Load result bypass Float result bypass Cache access stage P1P1 P2P2 P3P3 P4P4 IC0IC0 ROTROTIC1IC1 EX1EX1 FMTFMTAGAGDISPDISPPDPDIB0IB0 IB1IB1 RFRF RFRF RFRF RFRF DC0DC0 DC1DC1 EX2EX2 EX3EX3 EX4EX4 EX5EX5 EX6EX6 EX7EX7 EXEX ISSISS ECCECC ECCECC BHTBHT BHTBHT IFARIFAR Instruction dispatch pipelineInstruction dispatch pipeline
  • 10. IBM System p 10 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER6 core POWER6 processor is ~2X frequency of POWER5 (4 – 5 GHz) POWER6 instruction pipeline depth equivalent to POWER5 – Minimize power – Scale performance with frequency Instruction Fetch Instruction Buffer/Decode Instruction Dispatch/Issue Data Fetch/Execute FXU Dependent execution Load Dependent execution POWER6 extends functionality of POWER5 core – 64K I cache, 64K D cache, 2 FXU, 2 Binary FPU, 1 branch execution unit – Two way SMT with 7 instruction dispatch from 2 threads (maximum of 5 instructions per thread) – Decimal Floating Point Unit – VMX Unit (PowerPC’s SIMD ISA) – Recovery Unit ~6ns/instr ~3ns/instr
  • 11. IBM System p 11 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Bullet-proof computing System reliability with recovery unit – Every measure possible taken to preserve application execution – Retry soft errors – Change hardware for hard errors Processor architected state check pointed Every 1 cycle ECC & Non-ECC protected circuitry checked Every cycle Processor restarts from last saved checkpoint Processor workload moved to another CPU No error found No error found Error found Error found Soft error case Hard error case
  • 12. IBM System p 12 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Overview 1. Context : POWER5 vs. POWER6 microarchitecture comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions?
  • 13. IBM System p 13 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER4/5/6 RTL verification technology RTL (VHDL, Verilog) Language Compile Model Build Physical VLSI Design Tools / Custom Design Cycle-based Model Formal Verification: Boolean Equivalence Check (Verity) Software Simulator (MESA) Hardware Accelerator (Awan) Driver/Checker Assertions Test Program Generator (GPRO, X-Gen) C++ Testbench Constraint Random Unit Testbench PSL et al. (Semi) Formal Verification (SixthSense, RuleBase)
  • 14. IBM System p 14 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Single threaded uniprocessor verification for POWER4 Unit level: methodology inherited from POWER4 – Driven by a combination of instruction level test cases (AVPs) created by Genesys- Pro (GPRO) pseudo-random test generator and random C++ driven irritation – Instruction-By-Instruction (IBI) checking against AVP results – Low level microarchitecture checkers written in C++ Processor core (aka “core”) level – Mixture of GPRO pseudo-random and directed random instruction level test cases – IBI checking against AVP results – Low level microarchitecture checkers written in C++ - Irritation from random C++ drivers - Highly deterministic and architected state easily verifiable against test
  • 15. IBM System p 15 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Symmetric multi-processor (SMP) verification for POWER4 Chip (dual-core) level – Test generation similar to uniprocessor via GPRO for false-sharing or non-sharing tests • IBI checking against AVP results for two-independent instruction streams contained within single test • Low level microarchitecture checkers written in C++ • L1/L2 interactions primary focus – True-sharing scenarios, lock testing and storage access (“weak”) ordering checked • GPRO employed but…. – IBI checking of these accesses is limited or not possible: › Non-unique or non-deterministic results › CML (architecture level coherency monitor) employed to detect the “right answer” as a post-simulation rule check
  • 16. IBM System p 16 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Overview 1. Context : POWER5 vs. POWER6 microarchitecture comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions?
  • 17. IBM System p 17 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 SMT verification methodology Evolutionary based on single thread uniprocessor and SMP approaches – Traditional SMP scenarios now self-contained in a single core simulation model • Downward migration of dual-core methodology to single core model New SMT verification scenario categories – Shared resource and priority conflicts: • SMT resource types: – Equally shared between threads: Queue full conditions easier to hit – Dynamically shared / tagged: Either thread can consume most/all of the resource – Replicated: Not shared…same as single thread – Dynamic thread mode switching: SMT->ST; ST->SMT • Some applications attain better performance in ST mode • Shared resources re-allocated on each mode switch
  • 18. IBM System p 18 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Traditional SMP approach applied to SMT verification SMT.tst Random t0 Random t1 Core Level Registers common to both threads t0 Registers SMP.def (test template) Test Generation Real memory is common to both threads with test generator managing some potential overlap t1 Registers Output test case SMT.tst Random t0 Random t1 Core Level Registers common to both threads t0 Registers SMP.def (test template) Test Generation Real memory is common to both threads with test generator managing some potential overlap t1 Registers Output test case
  • 19. IBM System p 19 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Shared resource and priority conflicts Approach was similar to SMP verification – Testing largely consisted of “symmetric” instruction streams on each thread • A particular resource targeted (e.g., GPR rename registers) – 100 load instructions on each thread – Coverage and lab feedback validated this approach • Good enough: “Got the job done”
  • 20. IBM System p 20 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 dynamic thread mode switching All architected states initialized Thread enabledInitial State Thread 0 terminates itself Shared resources reallocated Random instructions Normal finish Thread enabled Run State Random instructions Restart thread 0 Normal finish Thread enabled Final State All architected states initialized Thread enabled Save architected state Wake up thread Partition resources Restore architected state Thread kills itself Random instructions Thread 0 Thread 1 Sim Driver Other thread Interrupt
  • 21. IBM System p 21 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5 shared resource re-allocation on mode switch 0 100 200 GPR FPR Rename Registers per thread SMT Mode Max ST Mode 0 5 10 Split in half Load Miss Queue entries per thread SMT Mode ST Mode 0 10 20 Split in half Branch Queue (BIQ) entries per thread SMT Mode ST Mode 0 20 40 Dynamically Shared Max LRQ/SRQ entries per thread SMT mode Max ST mode
  • 22. IBM System p 22 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Overview 1. Context : POWER5 vs. POWER6 microarchitecture comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions?
  • 23. IBM System p 23 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER5: centralized complexity POWER5 – Out-of-order design: Even in single thread mode, complex events naturally occur simultaneously – Started from POWER4+: Known working design that was modified incrementally – 23 FO4 design: Isolated complexity in Instruction Sequencing Unit (ISU): • Every unit communicated back to ISU • ISU resolved all exceptions and out-of-order conflicts – ST and SMT modes both supported: • Alternating dispatch cycles per thread • Resources re-allocated on mode switch FXU FPU LSU IFU ISU
  • 24. IBM System p 24 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors POWER6 distributed complexity POWER6 – From-scratch mostly in-order design • Normally, design is well behaved • Cross-thread interaction necessary for “tough bugs” – 13 FO4 design: Distributed complexity needed to achieve high performance goals – Recovery unit (RU): • Must resolve out-of-order FP with in-order pipelines • Checkpoints machine state • Recovers processor from soft errors – Design is inherently in SMT mode all the time (almost) • Dispatch to both threads in same cycle • Most resources dynamically shared / tagged • No resource reallocation on mode switch IFU IDU FPU LSU RU FXU
  • 25. IBM System p 25 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors The different verification engines have different strengths related to the verification tasks POWER6 verification process Software simulation – Slow, but low penalty for highly intrusive checking of model internals. Total model visibility. – Hundreds of AIX workstations running 24x7x365 – New enhancements helped keep pace with design complexity – 2x number of simulation cycles of POWER5 design Hardware-accelerated simulation – 10-1k x Faster than SW sim, but need less intrusive driving/checking to not slow down hardware box. – New usage: Mainline function verification – Yields additional 3x simulation cycle advantage over POWER5 (5x cycle advantage overall) (Semi)-formal verification – (High to) Exhaustive coverage, but higher skill needed to drive. Scaling problems w/ model size. – Extensively used: Proved extremely valuable for complex SMT bugs Hardware bring-up – Ideal speed, very limited visibility/controllability
  • 26. IBM System p 26 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Software simulation enhancements Random command driven unit simulation for most core units – Yielded >1 Million lines of C++ code – More control over generation for low level events – More efficient test generation Irritator threads at “core model” level – “Symmetric” instruction stream approach employed on POWER5 proved inadequate “S” in SMT is for “Simultaneous”, not “Symmetric” – Target cross-thread interactions at the microarchitecture level – ~2x test generation efficiency – Ensures both threads running the same length (self adjusting)
  • 27. IBM System p 27 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Irritator thread example SMT_Irritator.tst Long Random t0 Short Irritator t1 Core Level Registers common to both threads SMT_Irritator.def (test template) Test Generation Real memory with test generator managing some potential overlap Irritator thread restrictions • Cannot cause unexpected exceptions • Cannot modify memory read by random thread • Cannot modify registers shared with other threads • Architected results may be undefined t1 Registerst0 Registers Output test case
  • 28. IBM System p 28 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Irritator thread example SEQUENCE REPEAT 100 SELECT Group_All stw nop, A SEQUENCE LB0: fdiv A: b to LB0 Long Random Thread Irritator Thread Generated Instr: 101 Simulated Instr: 101 Generated Instr: 2 Simulated Instr: Infinite Kill Irritator Thread
  • 29. IBM System p 29 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Simulation acceleration usage on POWER6 Extensively used on POWER6 – Run lab exercisers prior to tape-out •Found additional bugs missed by software simulation •Debug new exerciser functionality prior to lab •Error injection and recovery testing •Reproducibility of lab bugs in “simulation-like” environment for rapid debug of root cause •Rapid testing of bug fixes and collateral damage testing – Linux boot prior to tape-out – Not employed on POWER5 for “mainline” functional verification
  • 30. IBM System p 30 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Formal methods are a vital complement to simulation flow – Lab bring-up bug re-creation • Often faster reproduction than simulation based approaches • Aids in root cause analysis • High-coverage / proof of side-effect-free fixes (Semi) Formal methods
  • 31. IBM System p 31 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Error detection and soft error recovery Biggest challenge on POWER6 – Why so hard? • Myriads of injection points coupled with large SMT state space – Often needed multiple “rare” combinations of “asymmetric” events on both threads while specific error was injected • End-to-end recovery testing difficult at unit level – Really a “core” effort – Verification strategy: – Error injection and recovery on hardware accelerated simulation platform – Dynamic on-the-fly error injection combined with “irritator threads” needed to cover large SMT recovery state space
  • 32. IBM System p 32 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Summary 1. SMT verification has four key pieces – Traditional SMP-like effort – Thread starvation and priority – Starting and stopping threads – Asymmetric “irritator thread” approach to verify often unforeseen cross-thread interactions at the microarchitecture level 2. “From-scratch in-order” SMT design was more difficult to verify than the “out-of-order retrofitted” SMT design – Complex events only occurred due to cross thread interaction – Even though team had experience – Required more “weapons” in the arsenal 3. High frequency design drove distributed complexity – Makes verification job harder – Increased dependency on formal verification for difficult bugs 4. “Mainframe”-like RAS on POWER6 drove a huge amount of work that was difficult to attack at the unit level
  • 33. IBM System p 33 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Overview 1. Context : POWER5 vs. POWER6 microarchitecture comparison 2. Verification methodology: In the beginning… 3. The times they are a changing: SMT arrives in POWER5 4. POWER6: An in-order design should be simpler, but… 5. Future directions?
  • 34. IBM System p 34 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology SMT Verification of the POWER5 and POWER6 High-Performance Processors Future directions Predictions – RAS features will be an increasingly important feature of server systems • POWER6 design has set the “bar” to a new high standard to which future processors will have to measure up - Power Systems Revenue up 29% in 2Q08 (from 2Q07) • Verification methods employed on POWER6 to attack nearly infinite state space created by the combination of SMT and processor recovery features will become standard practice – A migration of “pre-silicon” verification techniques into “post-silicon” hardware lab verification effort • Hardware is the fastest “simulator” available and the state space is getting bigger with SMT