POWER7 will deliver new features and functions to the Power family of Processors
The enhancements include:
Additional core density
On chip cache using the energy saving technology developed in IBM Research
Energy efficient core
New on chip memory controller technology providing support for DDR3 memory. The memory will deliver more than 3X memory bandwidth of the POWER6 chip
Support for both single and dual precision SIMD processing
Support for additional Storage Protection Keys
Let’s take a closer look now at the POWER7 Chip.
POWER7 is fabricated in IBMs 45nm Silicon on insulator technology using copper interconnect and embedded dram for the L3.
The chip is 567mm square and contains 1.2B transistors.
However if you consider each EDRAM cell has the function of a 6T SRAM cell the chip actually has the equivalent function of a 2.7B transistors chip.
The chip as you can see has 8 processor cores each with 12 execution unit capable of running 4 way SMT. I’ll share some core details in a few slide.
To feed the processor cores: We have two memory controllers one on each side of the chip. Each memory controller supports 4 channels of DDR3 memory. Combined these 8 channels provided 100GBs sustained memory bandwidth.
On the top and bottom of the chip are our seven 8 byte multi-processor links providing 360GB/s bandwidth to make balanced SMP systems scalable to 32 sockets.
Next let’s take a closer look at the POWER7 core.
The smaller P7 core provides additional performance over our previous generation Power6 core by:
Having a shorter wider pipeline with better utilization leveraging SMT4 and Out of order execution
The net is higher performance even with a smaller core in equivalent technology thus saving power.
Taking a look at the chip floor plan you can see.
Two fixed point pipelines.
The two LSU pipes, The load store pipes are also capable executing simple fixed point instructions.
FPU pipelines capable of 4 double precision multiply add operations per cycle or 8 flops/cycle. This unit also handles vector instructions.
The instruction fetch unit which also executes branch and condition register instructions.
The decimal floating point unit
< A widened Instruction sequencing unit capable of dispatching 6 instructions per cycle including 2 branches and issuing up to 8 instructions per cycle.
<click> In POWER7 we took advantage of the out of order execution to switch from a dedicated recovery unit to a distributed one using the flush and refetch capability in the OOO machine.
The core caches on Power7 have been improved by making the L1 instruction and data cache 32KB and reducing the access time from 4 to 2 cycles and backing them with a 256KB L2 cache integrated with the core to be only 8 cycles away.
EP – cut backs
EX – enterprise
ITF will die
Itanic
Our RAS results are better because we start with a full systems view. We have very challenging for each element of RAS and we measure our systems performance. as we approach a new generation of Power, we attack those elements which have had the greatest impact on reliability, availability, or serviceability. we can do this because we design the HW, firmware and OS together.
Just as an example, look at the way we address processor execution errors. before an instruction executes, we save status information about the processor. if the instruction fails for any reason, we reload that status and retry the instruction using Processor Intstruction Retry. Most of the time, the instruction will work because most problems are intermittent caused by events like bombardment of the chip by alpha particles flipping a bit. This kind of event becomes more common as we make technology denser and the size of the alpha particles becomes larger relative to the distance between bits in execution reguisters. Some of the time, retry doesn’t work on the processor because it has a hard failure. In that case, if another processor is available, we use Alternate Processor Recovery andload the status into the other processor and try there to avoid any application outage. Hardware Instruction Retry requires cooperation between HW and firmware. Alternate Processor Recovery requires the additional cooperation of the OS. We develop all of them so we include that cooperation. Itanium and x86 systems have neither.
We have similar features throughout the system. As you can see in the chart above, if Xeon does get all the RAS features itanium has, it will be an improvement, but it will still leave Xeon based systems well behind Power systems.
Now let’s look at reliability, availability, and serviceability. A recent survey (independent - not vender funded) of 400 IT users worldwide by ITIC showed that the combination of AIX and Power Systems provides the best result in each of these categories.
Our availability is 99.997% - 2 ½ times the next best UNIX alternative and 10 times better than Windows on x86.
54% of IT execs surveyed say they need 99.99% availability or better. With these kind of results it is no wonder that more and more of them are choosing Power systems. Note that Solaris on SPARC has better availability than Linux on x86. If your client moves to x86, they will be taking a step backward.
To really put balanced performance into perspective, however, we have placed four of the leading performance results on this one chart. The telling statistic, however, is that regardless of workload, POWER7 technology performs. This means that regardless of your workload, POWER7 systems can deliver industry leading performance for your business. This means that you no longer have to buy specialized systems for different workloads. This means that you can feel safe in consolidating multiple workloads onto the p770, knowing that you will get the best possible performance for each of your applications
Virtualization without Limits increases flexibility and reduces costs:
Expanded system capability teamed with PowerVM’s performance, scalability and flexibility
Workload-optimizing systems improve service levels with assured performance:
PowerVM, Intelligent Threads and TurboCore mode enable you to optimize the performance of your workloads in a virtualized environment
Consolidation that delivers exponential ROI:
Industry’s leading performance, scalability and virtualization now unbounded with DB2 pureScale
Dynamic Energy Optimization that balances performance and efficiency:
>3X increased performance per watt, new EnergyScale features integrated with Active Energy Manager
Resiliency without Downtime:
Non-disruptive application upgrades from POWER6 and the improvements to the road for Continuous Availability
Good morning. This morning I am going to take you through a presentation that cover the POWER7 Express rack and tower roll out. With a focus on the products being announced in Feb.. I will talk about how there positioned and how they compared to legacy Power products as well as competitive servers.
There will be a session with Patrick O’rourke tomorrow morning that will go into technical detail on all POWER7 offering and features.
Ian…FAST flash
The Power 755 is a 4 socket, 4U rack-optimized server supporting 8-core POWER7 processors and up to 256GB of memory. The Power 755 is a high performance compute node targeted at small to mid-size clusters. It delivers better than 3X improvement in power than current power offerings.
POWER7 processors support AltiVec™ instruction set and extended VSX SIMD (single instruction multiple data) acceleration which can execute up to eight single-precision or double-precision floating point operations per clock cycle per core to improve fine-grained parallelism and accelerate data processing.
IBM HPC software stack has the development tools, libraries, file systems and system management software necessary to administer a Power 755 server cluster.
There will be a HPC technical session on the Power 755 on Wed..