22. Dual Core Performance Refresh
Data Center perf. optimization with Intel® Xeon 5500 (Nehalem-EP)
Business
2006: 1,000 servers 2009: 1,000 servers BENEFITS
Dual core Intel Xeon® 5160 Processor (WDC) New Intel Xeon® 5500 series
200
Performance
180
Up to 4X the performance;
160
140
BENEFIT over WDC
120
100
80
Up to 14% less power
SPECfp_rate_base2006 60
40
20
(4.16x) 0
WCD NHM
Source: Intel estimates and measurements as of Nov 2008. Performance comparison using SPECfp_rate _base2006. Use this slide in conjunction with backup slide.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by
those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to
Source: Intel internal measurements. Test configurationsof systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel
evaluate the performance in backup
products, visit Intel Performance Benchmark Limitations
For notes and disclaimers, see legal information slide atIntel analysisthis presentation.
22 Results have been estimated based on internal end of and are Intel Confidential
provided for informational purposes only. Any difference in system hardware or software design or
configuration may affect actual performance.
23. Dual Core Performance Refresh
Data Center perf. optimization with Intel® Xeon® 5500 (Nehalem-EP)
2006: 1,000 servers 2009: 1,215 servers
Dual core Intel® Xeon® 5160 Processor (WDC) New Intel® Xeon® 5500 series with SSDs
Up to 5X Performance*
200
Performance
180
160
140
BENEFIT over WDC
120
100
Same Power Envelope
80
SPECfp_rate_base2006 60
40
20
(5.06x) 0
WCD NHM (*without any benefit from SSDs)
Source: Intel estimates and measurements as of Nov 2008. Performance comparison using SPECfp_rate _base2006. Use this slide in conjunction with backup slide.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by
those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to
Source: Intel internal measurements. Test configurationsof systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel
evaluate the performance in backup
products, visit Intel Performance Benchmark Limitations
For notes and disclaimers, see legal information slide atIntel analysisthis presentation.
23 Results have been estimated based on internal end of and are Intel Confidential
provided for informational purposes only. Any difference in system hardware or software design or
configuration may affect actual performance.
Communicate how HPC and workstations work together.Technical computing is a combination of workstations and High performance computing clusters. The technical computing industry is driven to deliver results …fast. Workstations are required to create and HPC clusters are needed to simulate and analyze. After you analyze the data you can visualize the results to enable faster innovation and discovery
This slide is the Spring board into the rest of the presentationPerformance – maximize performance per sq meter and performance per watt to reduce TCO is what the user is seekingVersatility – Customers want to be able to see immediate results when they port their software over to new architecture. Intel needs to provide the tools to ensure versatilityEase of Deployment – Customers want to purchase and easily deploy a cluster. They want to maximize their ROI by seeing their assets going to work immediately. No one wants to see their asset sitting in a datacenter waiting to be installed or trying to debug why it isn’t running as promised.
The Intel Xeon processor 5500 series delivers up to 3x the performance of previous generation processors in HPC.New technology available in the processor will allow users to optimize the processor to their environmentAnd a more efficient processor to provide an even lower TCO then what was achievable by previous generation processors.
The slide identifies the key features of the Intel® Xeon® processor 5500 series that enable it to be the ideal solution for the customers HPC environment. By optimizing for your environment you can achieve lower TCO.
Nehalem processors performance is more than just frequency. QPI speed, memory speed and Turbo and HT support need to be considered. To ensure you are maximizing your performance customers need to consider the advanced skus. They offer highest frequency, fastest QPI, support the fastest memory, more Turbo; up to 400MHZ, and HT.
The graph identifies the positive performance results advanced skus have in HPC. Y axis – spec scoresX axis – Intel® Xeon® processor 5500 series skus
SSD’sExtreme Performance >100x IOPS€ performance gains vs. 15k HDDPower Efficient - >5x lower power€ vs. 15k HDDIncreased Reliability - 2.0M Hrs MTBF vs, 1.20M Hrs MTBF for 7.2K WD RE2Reduce system cost - Replace HDD and Memory with SSD’s10GbEExtreme Performance - iWARP provides low latency over 10GbE Low overhead and high bandwidthIncreased Reliability - Over 25 years delivering leading Ethernet products Broad OS Support Designed for Multi-corePower Efficient - Low power design <3.5WLower TCO Consolidated fabric through industry standardized technology
Pulling together everything we just talked about enables a possible data center to increase their performance by up to 7.8x while staying within the same footprint. In 2006 the datacenter was using 5160 series processors with HDD and standard 1U rack servers and a power utilization efficiency of 2.0 Today we can refresh the datacenter with Intel Xeon processors 5500 series, Solid state drives, use half size 1U mother boards and increase the PUE from 2.0 to 1.3. By doing all of this we are able to achieve a performance increase of 7.8X. In the example above the only benefit we are gaining from the SSD’s of lower power. Depending on your environment you may also achieve a performance benefit as well with SSD’s. The PUE (Power Utilization Efficiency) improvement from 2.0 to 1.3 will require an investment into your datacenter. 1.3 is what current datacenters are being designed to.If you wanted to do a 1,000 1,000 refresh (using HDD and full size boards) you can achieve a 4x performance gain and a power savings of 14%. Use the same PUE with SSD’s you can increase performamance by greater than 5X and add 215 nodes to the datacenterBy keeping the same footprint as woodcrest, 1,000 servers, you can get a performance increase of >4x and a power savings of 14% or 863K KW.5x5x5 = 5x more power states, 5x lower idle power, 5x faster transitions between power states
The Intel architecture is easy to use and flexible. IA architecture enables software to scale from one generation to another while achieving increased performance. By optimizing code you can achieve even greater performance. Intel software tools enable an easy transition from one generation to another and help prepare you for the future
We now must look at the applications supporting HPC and ensure they are taking advantage of the technology designed into Nehalem. Is the code parallelized? Is it optimized on NHM? For many years applications have been able to take advantage of the increased frequency to improve performance. Now we are offering more cores to gain performance. ISV’s are now taking their serial code and parallelizing it. This is a challenge Intel is trying to make as simple as possible.
Debug and Tune become equally important to carry forward to many-core. This is the heterogeneous tool set now, as many-core applications scale to terascale on clients, and these terascale nodes make clusters of petascale machines.Better performance, multi-core advancements and support for Intel® Core™ i7 processors. New versions of SW tools released in Nov. 08.the first step in the cycle is to gain insight into your code by analyzing it with tools such as Vtune performance analyzer and/or Thread CheckerNext, you parallelize your code with Intel tools such as Intel® Threading Blocks, Compilers, and Performance LibrariesAfter you parallelize your code you review the resutls for correctness/confidence. If you do not achieve the results you expect you can begin the cycle again with insight. Once you have achieved the desired results you and then performa a final optimization to ensure peak performance with Intel® VTune Performance Analyzer and Thread Profiler.
Intel understands users want to quickly deploy their cluster to ensure they are maximizing their investment. To quickly deploy the cluster Intel has developed a specification called Intel Cluster Ready. The specification enables OEMs to create recipes. Recipes can be combined with certified software to create a certified cluster configuration. The configurations can be validated with Intel Cluster Checker to quickly ensure the cluster has been properly configured. This allows for a simple way to install and launch a cluster. The end result is a awesome out of box experience. Let’s talk a bit more about ICR…
ICR enables users to simplify the purchasing process. Identify the certified software and certified cluster to ensure compatibility. Simplify manufacturing – enables manufacturers to build a particular configuration over and over. Simplify deployment – deploy the cluster, run Intel cluster checker and the system should run. If it does not run as it should Intel Cluster checker should idenify the issue for quick resolution.Simplify management – Easier to manage and ensure uptime with a certified cluster. Intel cluster checker can be run at any time to ensure the system has all key components operating properly.
We continue track to the tick tock strategy. Our future is bright as we transition to new process technology and new technologies Having a strong future is important but also knowing how we will meet the challenges of today is what I would like to focus todays presentation on…let’s get started.
Intelligent performance helping to deliver a lower TCO as well as ~3x the performance of previous generation processors.Intel Software tools enable users to easily optimize their software to maximize performance on current and future generation IA hardwareIntel Cluster Ready makes deploying a cluster easy
Hard to get to 95% Hard to get to 200XNehalem delivers ~~ 3X vs 18.26X (6X delta)Most apps will resemble Amdahl’s Neha,lem ~~3 X vs accelerator increase of 3.25XIs the pain worth the glory